Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NewEgg Ban #25

Closed
Anon546 opened this issue Nov 30, 2020 · 17 comments
Closed

NewEgg Ban #25

Anon546 opened this issue Nov 30, 2020 · 17 comments

Comments

@Anon546
Copy link

Anon546 commented Nov 30, 2020

I ran the script with the NewEgg file configurations and was eventually banned from visiting NewEgg's site for using an automatic process. Not sure if anything can be done for this programmatically but maybe worth mentioning in the ReadMe.

image

@Instah
Copy link

Instah commented Nov 30, 2020

How long did you run the program before they banned you?

@Anon546
Copy link
Author

Anon546 commented Nov 30, 2020

How long did you run the program before they banned you?

At least a few hours. Not sure of exact time but I only noticed when I stopped and removed the previous docker container when attempting to add another email address. On the new container I was receiving 403 errors, then when I went to their site I received the message posted above.

@Wood578Guy
Copy link

Wow, that's interesting - i've had it running about 72 hours on my server now, and still haven't been banned...maybe it's just a matter of time.

@Ronux25
Copy link

Ronux25 commented Nov 30, 2020

I have had it running for over 24 hours with no issues just yet. Perhaps the refresh timer was too quick? I have mine set to 8 seconds so I don't flood the sites I'm tracking. I guess you could always use a VPN as well to get around that if needed.

@Anon546
Copy link
Author

Anon546 commented Nov 30, 2020

I have had it running for over 24 hours with no issues just yet. Perhaps the refresh timer was too quick? I have mine set to 8 seconds so I don't flood the sites I'm tracking. I guess you could always use a VPN as well to get around that if needed.

This could be it, I kept it on the default 2 seconds

@pjneder
Copy link

pjneder commented Nov 30, 2020

Wow, I've been running mine at the default 2s for several days and I can still log into NE and browse from same public IP.

@alexgraham
Copy link

alexgraham commented Nov 30, 2020

I've been running it for around 7 days with 5 second interval with around 12 Newegg URLs. Have had no issues with Newegg, only Microcenter and BHPhoto have had issues. 2 seconds is probably too short to safely run over a longer period I would think.

@realMestizo
Copy link

I too have been banned from NewEgg. I have been running it for a little less than 24 hours and using the default config yaml.

@pjneder
Copy link

pjneder commented Nov 30, 2020

Hmmm...I just restarted mine with 5s interval.

Could it be related to how many or which URLs we have in the yaml? I only have 8 URLs. I don't want to get banned either.

@Instah
Copy link

Instah commented Nov 30, 2020

I’ve been running for about 7-8 hours on Newegg and Bestbuy both at 2 seconds and haven’t had an issue with ban. How could one tell if they’ve been banned ? Will it be clear in the logs or somewhere else ?

@alexgraham
Copy link

@pjneder could be, depends how they have their server setup. I'm a web dev not a server admin but I figured having more URL's would be safer and look less like a bot than having 1 URL and hammering it every 2 seconds.

I always remember the average page view time on the Internet is around 3 seconds so I chose a number > 3 to try and decrease chances of getting banned.

@Anon546
Copy link
Author

Anon546 commented Nov 30, 2020

I’ve been running for about 7-8 hours on Newegg and Bestbuy both at 2 seconds and haven’t had an issue with ban. How could one tell if they’ve been banned ? Will it be clear in the logs or somewhere else ?

Per the replies above I may have just been unlucky. I was able to tell from 403 errors in the logs and unable to access newegg on the same IP.

@realMestizo
Copy link

I’ve been running for about 7-8 hours on Newegg and Bestbuy both at 2 seconds and haven’t had an issue with ban. How could one tell if they’ve been banned ? Will it be clear in the logs or somewhere else ?

I've been met wit the same page as @Anon546 posted above. Logs are also showing an http 403 error code.

@alexgraham
Copy link

alexgraham commented Nov 30, 2020

@realMestizo with what interval time and how many newegg URL's? How long had you been scraping for?

@realMestizo
Copy link

@realMestizo with what interval time and how many newegg URL's? How long had you been scraping for?

All the default settings in the yaml config files pulled from this repo - so thats 46 URLS at 2 sec intervals. I wanna say I've been running this for ~24 hours or so, probably a little less than that.

@f7el
Copy link

f7el commented Dec 1, 2020

I got banned from newegg using 2s default and 18 urls.

@EricJMarti
Copy link
Owner

In my experience, Newegg is sensitive to frequent requests. So if you are running multiple Newegg scrapers in parallel or are actively browsing the site while a scraper is running in the background, you risk setting off Newegg's velocity control. There is logic built into inventory-hunter to detect this condition, but Newegg must have updated their website since I implemented it (see the "are you a human" code in hunter.py). I would try reducing the refresh interval in the config (I settled on 2 seconds by trial and error).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants