Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI: Allow to differentiate between 404 and connection timeout #681

Open
antibot4navalny opened this issue Dec 29, 2022 · 4 comments
Open

Comments

@antibot4navalny
Copy link

antibot4navalny commented Dec 29, 2022

As I'm struggling to work around the API connection issues with CLI twarc1, I need some way to differentiate unsuccessful completion with "Read timed out" and "404 no users found from the list specified".

My app deals with accounts that last few hours to few weeks, and therefore "No users found" for users/show is absolutely legit outcome that needs not to be retried or handled in any special way.

On the contrary, in timeout scenario I need to retry twarc invocation until it succeeds.

CLI twarc1 returns error code 1 both in case of 404 and connection timeout.

Is there way to either make twarc consider 404 as a non-error scenario, or to differentiate between 404 and connection timeout for CLI utility?

@igorbrigadir
Copy link
Contributor

igorbrigadir commented Dec 30, 2022

On another look, this may require some further digging. Technically, we already should be handling this - "twarc1" and "twarc2" should both handle retries for timeouts, and not retry for 404s. But it seems like 404s aren't being caught properly in twarc1 right now. So maybe there is a bug there:

twarc users acthatnotexist1

Should output a handled error, but instead

Traceback (most recent call last):
  File "/python/bin/twarc", line 11, in <module>
    load_entry_point('twarc', 'console_scripts', 'twarc')()
  File "/python/twarc/twarc/command.py", line 293, in main
    for thing in things:
  File "/python/twarc/twarc/client.py", line 372, in user_lookup
    for u in do_lookup():
  File "/python/twarc/twarc/client.py", line 361, in do_lookup
    raise e
  File "/python/twarc/twarc/client.py", line 357, in do_lookup
    resp = self.get(url, params=params, allow_404=True)
  File "/python/twarc/twarc/decorators.py", line 88, in new_f
    resp.raise_for_status()
  File "/python/lib/python3.7/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.twitter.com/1.1/users/lookup.json?screen_name=acthatnotexist1&tweet_mode=extended

A stack trace. I haven't fully replicated timeouts yet.

As an alternative - does twarc2 work better here? When users are not found, it returns an error message you can parse instead of 404 errors:

twarc2 user name accountthatdoesnotexist111
[
  {
    "errors": [
      {
        "value": "acthatnotexist1",
        "detail": "Could not find user with usernames: [acthatnotexist1].",
        "title": "Not Found Error",
        "resource_type": "user",
        "parameter": "usernames",
        "resource_id": "acthatnotexist1",
        "type": "https://api.twitter.com/2/problems/resource-not-found"
      }
    ],
    "__twarc": {...
    }
  }
]

@antibot4navalny
Copy link
Author

If twarc2 implies having API2 access, that's a separate issue I haven't find a way to clarify: I don't have API2 access, and I'm afraid applying for it means risking my existing API1.1 access. Until then, I need to stick with 1.1.

I am ready to provide stacktrace and any further diagnostics for request timeout cases if that can help us to have some progress.

@igorbrigadir
Copy link
Contributor

Oh, i see what you mean - i don't know about losing v1.1 Access, but as far as i know, brand new API accounts all get v2 access. Keeping this open for the v1.1 Errors bug tho - i'll double check stuff there for now.

@antibot4navalny
Copy link
Author

Similar problem with twarc {friends|followers} when account is protected: exit code for requests.exceptions.HTTPError: 401 is the same 1 as for requests.exceptions.ReadTimeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants