New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NewEgg Ban #25
Comments
How long did you run the program before they banned you? |
At least a few hours. Not sure of exact time but I only noticed when I stopped and removed the previous docker container when attempting to add another email address. On the new container I was receiving 403 errors, then when I went to their site I received the message posted above. |
Wow, that's interesting - i've had it running about 72 hours on my server now, and still haven't been banned...maybe it's just a matter of time. |
I have had it running for over 24 hours with no issues just yet. Perhaps the refresh timer was too quick? I have mine set to 8 seconds so I don't flood the sites I'm tracking. I guess you could always use a VPN as well to get around that if needed. |
This could be it, I kept it on the default 2 seconds |
Wow, I've been running mine at the default 2s for several days and I can still log into NE and browse from same public IP. |
I've been running it for around 7 days with 5 second interval with around 12 Newegg URLs. Have had no issues with Newegg, only Microcenter and BHPhoto have had issues. 2 seconds is probably too short to safely run over a longer period I would think. |
I too have been banned from NewEgg. I have been running it for a little less than 24 hours and using the default config yaml. |
Hmmm...I just restarted mine with 5s interval. Could it be related to how many or which URLs we have in the yaml? I only have 8 URLs. I don't want to get banned either. |
I’ve been running for about 7-8 hours on Newegg and Bestbuy both at 2 seconds and haven’t had an issue with ban. How could one tell if they’ve been banned ? Will it be clear in the logs or somewhere else ? |
@pjneder could be, depends how they have their server setup. I'm a web dev not a server admin but I figured having more URL's would be safer and look less like a bot than having 1 URL and hammering it every 2 seconds. I always remember the average page view time on the Internet is around 3 seconds so I chose a number > 3 to try and decrease chances of getting banned. |
Per the replies above I may have just been unlucky. I was able to tell from 403 errors in the logs and unable to access newegg on the same IP. |
I've been met wit the same page as @Anon546 posted above. Logs are also showing an http 403 error code. |
@realMestizo with what interval time and how many newegg URL's? How long had you been scraping for? |
All the default settings in the yaml config files pulled from this repo - so thats 46 URLS at 2 sec intervals. I wanna say I've been running this for ~24 hours or so, probably a little less than that. |
I got banned from newegg using 2s default and 18 urls. |
In my experience, Newegg is sensitive to frequent requests. So if you are running multiple Newegg scrapers in parallel or are actively browsing the site while a scraper is running in the background, you risk setting off Newegg's velocity control. There is logic built into inventory-hunter to detect this condition, but Newegg must have updated their website since I implemented it (see the "are you a human" code in hunter.py). I would try reducing the refresh interval in the config (I settled on 2 seconds by trial and error). |
I ran the script with the NewEgg file configurations and was eventually banned from visiting NewEgg's site for using an automatic process. Not sure if anything can be done for this programmatically but maybe worth mentioning in the ReadMe.
The text was updated successfully, but these errors were encountered: