-
-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impossible to scrape this website #147
Comments
@simjoeweb , try this https://bot.sannysoft.com/ See if it's setup properly, there are stuff such as viewport that cna affect the result. Otherwise, check if you're on VPS , most site block famous VPS like DO , or AWS. To check this, just use your home IP address or mobile phone , it's a good way to see if those "VPS" are blocked. |
On top of what @Rainbowhat mentioned, i would recommend to also send a more valid user agent, you can use the user-agents library for that. Also try to send some headers like accept, accept encoding, accept language, etc.. For example: await page.setExtraHTTPHeaders({
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,q=0.9',
'Cache-Control': 'none',
}) You can inspect what the browser is sending and choose |
https://datadome.co/fr/bot-management-protection/une-detection-efficace-cote-client-est-essentielle/ They are talking about you |
Considering that your bot is being detected, i would apply some proxies and clear cookies on each run (if you don't close the browser, between the runs). There's work being done to hopefully help on avoid being detected, but, it's impossible to guarantee that. Closing for now. |
I get blocked on the first or second page. If I complete the initial verification then it works fine after that. I tried using proxies but I still get blocked after 1 page. Any ideas what it could be?
The text was updated successfully, but these errors were encountered: