Skip to content

fix(scraper): issue on video size, reproduce the bug#67

Merged
vvatelot merged 5 commits intomainfrom
fix/scraper/analyze-video
Mar 22, 2024
Merged

fix(scraper): issue on video size, reproduce the bug#67
vvatelot merged 5 commits intomainfrom
fix/scraper/analyze-video

Conversation

@vvatelot
Copy link
Copy Markdown
Member

@vvatelot vvatelot commented Mar 21, 2024

Following #58, it seems that on certain analysis the result is wrong with video resources. When using haralyzer, we can confirm that on a given page the results from the har analysis seems to be wrong:

https://www.graphic-sud.com/
         type  count_ecoindex  size_ecoindex  count_haralyzer  size_haralyzer
0       audio               0          0.000              0.0           0.000
1         css               5         24.403              5.0          24.403
2        font               2         72.424              NaN             NaN
3        html               1         30.474              1.0             NaN
4       image               8        585.199              8.0         585.199
5  javascript               7         84.564              7.0          84.564
6       other               0          0.000              6.0          54.877
7       video               2      15317.080              2.0          56.414
8       total              25      16114.144             25.0         853.478

Proposition:

  • Use haralyzer to extracts metrics instead of custom development
  • Update documentation of ecoindex website to explain that we simulate a keystroke arrow down during the analysis

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 21, 2024

Coverage PR 67

Branch coverage •
FileStmtsMissCoverMissing
components/ecoindex/scraper
   scrap.py915143%45, 48–49, 51, 60, 63, 66–68, 73–75, 77–81, 84–87, 89, 91, 98–101, 108–110, 112–121, 130–131, 134–135, 137, 146–147, 152–153, 157–158
TOTAL69523066% 

@PaulPHPE
Copy link
Copy Markdown
Collaborator

PaulPHPE commented Mar 21, 2024

Hello @vvatelot !

This difference between the two results is due to the fact that when we have a _transferSize = -1 then we opt to retrieve the content-length of the response, a step not performed by haranalyzer.
This enables us to take into account video and audio requests. However, we do not differentiate between videos in autoplay mode and those with autoplay disabled.

In response, I propose the following for the PR:
By default, refrain from retrieving the content-length. If _transferSize = -1, set size = 0.
Then, verify if the video is set to autoplay. If affirmative, obtain the value of the content-length. However, implementing this may pose challenges as it requires parsing the HTML to locate the autoplay option.

@github-actions github-actions Bot added the tests label Mar 21, 2024
@vvatelot vvatelot marked this pull request as ready for review March 22, 2024 08:32
@vvatelot vvatelot merged commit 88eadb8 into main Mar 22, 2024
@vvatelot vvatelot deleted the fix/scraper/analyze-video branch March 22, 2024 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants