Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to configure max retries for v1.1 server errors #670

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

betsybookwyrm
Copy link
Contributor

Given the increasing instability of the v1.1 API, it's helpful to be able to tune the number of retries for server errors.

  • Adds max_server_error_retries param to the v1.1 client retaining old value of 30 as default value
  • Pass max_server_error_retries through the rate_limit decorator
  • Pass max_server_error_retries through uses of Twarc.get() in client.py

Given I haven't consulted with anyone on this change, please let me know what you think and I'm very happy to alter it!

Some questions I have for you all:

  • Do you think the client initialisation is an appropriate place to expose the parameter?
  • Should the parameter be exposed in the CLI as well as in the library?
  • I passed the parameter through all usages of Twarc.get() in client.py, but not in the usages in utils/deletes.py as those are a bit different as they initialise their own client. What are your thoughts on how they should handle the parameter?
  • Should I pass the parameter through Twarc.post() as well so they are also configurable? Should they possibly be a separate parameter for posts rather than gets as they're for different purposes so people may want different values?

For further context, I'm suggesting this change because we're seeing an increasing number of 500 errors returned from the v1.1 timeline endpoint, which is not surprising given everything that's going on at Twitter. For my usage of the endpoint, I'd much rather skip over that request earlier and move on to my next request - we are finding some requests do reach the 30 retries! I'd imagine that other people may want to retry for longer to increase their chances of getting the data they want.

Given the increasing instability of the v1.1 API, it's helpful to be
able to tune the number of retries for server errors.

- Adds max_server_error_retries param to the v1.1 client retaining old
  value of 30 as default value
- Pass max_server_error_retries through the rate_limit decorator
- Pass max_server_error_retries through uses of Twarc.get() in client.py
@igorbrigadir
Copy link
Contributor

This looks good to me! Thanks! Definitely useful to have, even though v1.1 is slowly being deprecated - i think because of staff cuts at twitter v1.1 may live for a while much longer.

The places where post is called generally don't need to retry often, if they do, there are other bigger problems and you shouldn't be retrying anyway, so i think that's fine.

@SamHames
Copy link
Contributor

👍

For your questions:

Do you think the client initialisation is an appropriate place to expose the parameter?

Yes, this is consistent with the others http and connection error configurable settings. The implementation is fine, but it would be a little cleaner to inject the self.max_server_error_retries from the client into the kwargs of the get method itself rather than modifying all the calling sites (similar to https://github.com/DocNow/twarc/blob/main/twarc/client.py#L830)

Should the parameter be exposed in the CLI as well as in the library?

I'd be happy to merge without that, I suspect most adhoc CLI usage has already migrated to V2.

I passed the parameter through all usages of Twarc.get() in client.py, but not in the usages in utils/deletes.py as those are a bit different as they initialise their own client. What are your thoughts on how they should handle the parameter?

utils can stay as they are with the defaults, I don't think it's worth trying to customise anything supporting v1.1 right now.

@betsybookwyrm
Copy link
Contributor Author

Yes, this is consistent with the others http and connection error configurable settings. The implementation is fine, but it would be a little cleaner to inject the self.max_server_error_retries from the client into the kwargs of the get method itself rather than modifying all the calling sites (similar to https://github.com/DocNow/twarc/blob/main/twarc/client.py#L830)

Hmm. I did look at ways of getting it directly in rather than passing it through, but as it's more the rate_limit decorator that needs it and rate_limit doesn't know anything about the Twarc object in the end I thought it was simpler and clearer to pass it through the calls to the function. But I can keep playing with it if you like - it is definitely a bit of a pain to have to add it to all the calls of the functions that use the rate_limit decorator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants