Add ability to configure max retries for v1.1 server errors #670

betsybookwyrm · 2022-11-29T01:51:05Z

Given the increasing instability of the v1.1 API, it's helpful to be able to tune the number of retries for server errors.

Adds max_server_error_retries param to the v1.1 client retaining old value of 30 as default value
Pass max_server_error_retries through the rate_limit decorator
Pass max_server_error_retries through uses of Twarc.get() in client.py

Given I haven't consulted with anyone on this change, please let me know what you think and I'm very happy to alter it!

Some questions I have for you all:

Do you think the client initialisation is an appropriate place to expose the parameter?
Should the parameter be exposed in the CLI as well as in the library?
I passed the parameter through all usages of Twarc.get() in client.py, but not in the usages in utils/deletes.py as those are a bit different as they initialise their own client. What are your thoughts on how they should handle the parameter?
Should I pass the parameter through Twarc.post() as well so they are also configurable? Should they possibly be a separate parameter for posts rather than gets as they're for different purposes so people may want different values?

For further context, I'm suggesting this change because we're seeing an increasing number of 500 errors returned from the v1.1 timeline endpoint, which is not surprising given everything that's going on at Twitter. For my usage of the endpoint, I'd much rather skip over that request earlier and move on to my next request - we are finding some requests do reach the 30 retries! I'd imagine that other people may want to retry for longer to increase their chances of getting the data they want.

Given the increasing instability of the v1.1 API, it's helpful to be able to tune the number of retries for server errors. - Adds max_server_error_retries param to the v1.1 client retaining old value of 30 as default value - Pass max_server_error_retries through the rate_limit decorator - Pass max_server_error_retries through uses of Twarc.get() in client.py

igorbrigadir · 2022-11-29T21:08:35Z

This looks good to me! Thanks! Definitely useful to have, even though v1.1 is slowly being deprecated - i think because of staff cuts at twitter v1.1 may live for a while much longer.

The places where post is called generally don't need to retry often, if they do, there are other bigger problems and you shouldn't be retrying anyway, so i think that's fine.

SamHames · 2022-11-30T05:31:45Z

👍

For your questions:

Do you think the client initialisation is an appropriate place to expose the parameter?

Yes, this is consistent with the others http and connection error configurable settings. The implementation is fine, but it would be a little cleaner to inject the self.max_server_error_retries from the client into the kwargs of the get method itself rather than modifying all the calling sites (similar to https://github.com/DocNow/twarc/blob/main/twarc/client.py#L830)

Should the parameter be exposed in the CLI as well as in the library?

I'd be happy to merge without that, I suspect most adhoc CLI usage has already migrated to V2.

I passed the parameter through all usages of Twarc.get() in client.py, but not in the usages in utils/deletes.py as those are a bit different as they initialise their own client. What are your thoughts on how they should handle the parameter?

utils can stay as they are with the defaults, I don't think it's worth trying to customise anything supporting v1.1 right now.

betsybookwyrm · 2022-12-02T05:09:12Z

Yes, this is consistent with the others http and connection error configurable settings. The implementation is fine, but it would be a little cleaner to inject the self.max_server_error_retries from the client into the kwargs of the get method itself rather than modifying all the calling sites (similar to https://github.com/DocNow/twarc/blob/main/twarc/client.py#L830)

Hmm. I did look at ways of getting it directly in rather than passing it through, but as it's more the rate_limit decorator that needs it and rate_limit doesn't know anything about the Twarc object in the end I thought it was simpler and clearer to pass it through the calls to the function. But I can keep playing with it if you like - it is definitely a bit of a pain to have to add it to all the calls of the functions that use the rate_limit decorator.

betsybookwyrm added the enhancement label Nov 29, 2022

igorbrigadir approved these changes Nov 29, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to configure max retries for v1.1 server errors #670

Add ability to configure max retries for v1.1 server errors #670

betsybookwyrm commented Nov 29, 2022

igorbrigadir commented Nov 29, 2022

SamHames commented Nov 30, 2022

betsybookwyrm commented Dec 2, 2022

Add ability to configure max retries for v1.1 server errors #670

Are you sure you want to change the base?

Add ability to configure max retries for v1.1 server errors #670

Conversation

betsybookwyrm commented Nov 29, 2022

igorbrigadir commented Nov 29, 2022

SamHames commented Nov 30, 2022

betsybookwyrm commented Dec 2, 2022