Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan top one million webpages #65

Open
ChargingBulle opened this issue Feb 9, 2019 · 3 comments
Open

Scan top one million webpages #65

ChargingBulle opened this issue Feb 9, 2019 · 3 comments

Comments

@ChargingBulle
Copy link

In #64 I proposed to offering statistics on how private the internet currently is. This of course would would become better when more pages are scanned.

Would be it be cool for you to scan the top one million webpages in the world?
A list of top million webpages by traffic can be downloaded for free here: https://majestic.com/reports/majestic-million

Scanning all these pages within a year would result in about 2700 new queries per day. I assume these could be automatically put into queue during time of low usage?

@ChargingBulle ChargingBulle changed the title Scan top one million wegpages Scan top one million webpages Feb 9, 2019
@hprid
Copy link
Member

hprid commented Feb 11, 2019

Thank you for your suggestion. Currently we are rewriting our scanning engine, since the old one has several (stability and scalability) issues. The new engine have already been used for research and capable of scanning one million pages in about 4-5 days on one of our servers (without TLS checks, those take a little longer). Until we have our new engine up and running, we would like to avoid larger scans. We hopefully will deploy it within the next one or two months and start scanning larger amounts of sites.

@ChargingBulle
Copy link
Author

that's very cool

@ChargingBulle
Copy link
Author

What's the status?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants