Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read timed out #8

Closed
Kavan72 opened this issue Aug 24, 2020 · 5 comments
Closed

Read timed out #8

Kavan72 opened this issue Aug 24, 2020 · 5 comments

Comments

@Kavan72
Copy link

Kavan72 commented Aug 24, 2020

full traceback:- https://pastebin.com/1c61vqAS

is there any way to handle this error in iter obj ?

@wumpus
Copy link
Member

wumpus commented Aug 24, 2020

The code already retries 10 times for 30 seconds each.

Common Crawl's index server has been having problems with someone who's fetching data from thousands of AWS EC2 nodes. That person is making the CC server more unreliable than usual.

If you'd like to restart your computation roughly where it went off, you can use timestamps to do that.

If you'd like to make sure your code doesn't get one of these errors in the middle of the iteration, you can raise the maximum in the line if connect_errors > 10 in myrequests.py

@Kavan72
Copy link
Author

Kavan72 commented Aug 25, 2020

okay got it. can you give support to increase this on itor with kwargs ?

@wumpus
Copy link
Member

wumpus commented Aug 25, 2020

Yeah, I recall thinking about that potentially being a problem when I wrote the code :-) really I should raise the limits until it always works for the typical cdx server -- I just need to convince it to complain visibly after enough failures (i.e. --source hostname that resolves but no one's there) without making the test suite run longer (it's already a little annoying.)

@wumpus
Copy link
Member

wumpus commented Aug 26, 2020

I just pushed a new version which tries 10x more times to connect, while failing fast if the hostname DNS fails. That ought to be enough to battle through CC's occasional problems.

@wumpus wumpus closed this as completed Aug 26, 2020
@Kavan72
Copy link
Author

Kavan72 commented Aug 26, 2020

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants