Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Official Twitter bug affects twarc users too (error 404) #75

Closed
remagio opened this issue Jul 22, 2015 · 7 comments
Closed

Official Twitter bug affects twarc users too (error 404) #75

remagio opened this issue Jul 22, 2015 · 7 comments

Comments

@remagio
Copy link

remagio commented Jul 22, 2015

There is an official bug since a couple of week (Twitter API calls on all major platform but their apps too) get frequently 404 with unknown state. It's discussed too here:
https://twittercommunity.com/t/intermittent-404-responses-from-rest-api/46712
Despite apparent resolution by Twitter it started to raise again.
Probably users of Twarc too got it recently (requests.exceptions.HTTPError: 404 Client Error: Not Found), it looks like this:

2015-07-18 16:48:49,156 INFO archived 622089779308593152
Traceback (most recent call last):
  File "../../twarc/utils/archive.py", line 141, in <module>
    main()
  File "../../twarc/utils/archive.py", line 107, in main
    for tweet in tweets:
  File "build/bdist.macosx-10.6-intel/egg/twarc.py", line 235, in search
  File "build/bdist.macosx-10.6-intel/egg/twarc.py", line 188, in new_f
    return None
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/models.py", line 851, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found

If it happens retry your search again till you you get it without errors. It's not a permanent failure but an intermittent failure by Twitter.
@edsu proposed to introduce a catch&retry. I share this because my opinion is that it'll happen in the future too since they killed the Whale.

@edsu
Copy link
Member

edsu commented Jul 22, 2015

@remagio do you want to give the catch-404s branch a try and see if it helps? I've merged the pyopenssl branch into master, so the catch-404s branch should include it too.

@remagio
Copy link
Author

remagio commented Jul 22, 2015

Yes, I'm ready with it for next scheduled jobs keeping log traces of https://github.com/edsu/twarc/compare/catch-404s#diff-22aabc32493075291fbea730860bc3dbR340

@remagio
Copy link
Author

remagio commented Jul 24, 2015

Updates @edsu: I'm still running two VM, a catch-404s and a current, both work fine during last 24h on same searches and archive.py without getting 404. Looks like Twitter is solving with tricks till they solve their infrastructures.

@edsu
Copy link
Member

edsu commented Jul 24, 2015

Ok. I guess I'd rather not add this 404 catching unless it's necessary. So I'm going to leave in a branch for now.

@remagio
Copy link
Author

remagio commented Jul 24, 2015

I agree, better waiting official feedbacks about how they'll manage same or new return status code

@edsu edsu mentioned this issue Jul 28, 2015
@remagio
Copy link
Author

remagio commented Aug 2, 2015

FYI @edsu 404 raised up again, obviously not with the catch-404 branch

@edsu
Copy link
Member

edsu commented Aug 3, 2015

Ok, I'll merge it in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants