This repository contains JavaScript code extracted semi-automatically from highly ranked webpages.
This data is published for two reasons:
- research;
- web compatibility tests.
The data is not owned by Mozilla. All these files were made publicly available on third-party web sites. Mozilla has merely compiled data from around the web to simplify the life of researchers and developers working own web compatibility.
- Establish list of pages to visit, using Alexa top 50 webpages at the time of visit.
- Install extension https://github.com/binast/js-scrapper, to automatically save to disk some of the content sent by the website.
- Visit each of the pages. Some pages were skipped as they required an account and did not support anonymous accounts.
- Have arbitrary/random interactions with the pages, including clicking on arbitrary links, buttons, videos, moving the mouse randomly around the page, scrolling randomly.
- Wait a few minutes before closing page.
- Once browsing session is complete, run https://crates.io/crates/dedup on the result to remove duplicate files.
We intend to update irregularly the repository to mirror evolutions of the web.