-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry Deepcell in case of bad gateway (or similar) response errors #605
Conversation
…returns an invalid response
@ngreenwald whoops, I forgot to request the review earlier. I wasn't able to replicate the bad gateway/ |
@ngreenwald since we make a similar API calls later to our Redis database, I can implement the same |
Just tried it, looks the error isn't being caught
|
@ngreenwald in that case, seems likely that the main endpoint doesn't actually throw a 502 error, but simply returns an HTML object that indicates it encountered one (which can't be processed as JSON). I've explicitly added a check for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this did the trick. Now we need to deal with zip files that never got processed. There should be a status update that gets printed for the user with with the name of the zip file that wasn't processed if the retry times out.
I think having the 5 retries in place should do the trick, but if it doesn't, we'll need a record of what happened so that we can add functionality to retry for those zip files that failed.
I also added some print statements for debugging, you can remove those.
Here's how I'm thinking the process will work:
I can also add another status variable that will prevent this from getting stuck in long cycle in case of a catastrophic error on Deepcell's end, but based on your findings using the retry module, I doubt this loop will need to run more than a few times at most. |
Yeah, I agree that's the long-term solution. However, for now I think we should just print a message when this error occurs. Switching the default batch_size to 5, and retrying on failures, seems to have resolved all the problems for me. Once other users report that they're running into the same issue, we can make the change, but it will require some refactoring of how the function works, since right now the output zip file isn't uniquely named. So we'll need to make the name matches with the input zip name, and also make sure that when the process is restarted the old zip files don't give a false positive for which file needs to get uploaded. Can you open an issue describing it? |
Sounds good, I've just opened the issue. I think the now-modified print statements we have do a pretty good job of notifying the user, I'll also include an additional print out of the FOVs that couldn't be processed with normal retry logic under the check for |
What is the purpose of this PR?
Closes #602, which indicates an error in decoding the JSON in case the Deepcell API returns a response indicating an error.
How did you implement your changes
Utilize the
Retry
module inrequests
and mount it to arequests.Session
, which we use to invoke thepost
method to Deepcell.We also add a
try
/except
block around this statement in case none of the retries work and the resultingjson
cannot be decoded anyways.Remaining issues
I'm a bit in the dark here because I've not encountered this error personally. @ngreenwald let me know if this resolves the issue, or if it will need additional work.