Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry forever to upload job chunks #898

Merged
merged 3 commits into from
Jan 18, 2019
Merged

Retry forever to upload job chunks #898

merged 3 commits into from
Jan 18, 2019

Conversation

keithpitt
Copy link
Member

Currently we wait a max of 50 seconds on each chunk before trying to upload it. If chunks are failing to upload, the main reason would be a problem with Buildkite. Presumably if they're failing to upload, finishing a job and starting a new one will also have the same problem (at which point we're just throwing chunk data away, to get more, and probably throw them away too).

This change tweaks the retry rules so we don't give up on chunks. It does handle cases that BK returns a 422, which "is" a successful chunk upload (whether or not we actually stored it is another thing), but a 422 does mean communications were successful, and Buildkite did "the right thing"

@keithpitt keithpitt requested review from sj26, lox and a team January 16, 2019 02:05
@sj26
Copy link
Member

sj26 commented Jan 16, 2019

image

We send 400 and 401 responses sometimes. I can't find any 422 responses.

400 Bad Request is sent for "Missing chunk" and "Job can't have chunks" at least. Both are unrecoverable and would halt an agent here.

401 is sent when the agent authorization fails, and is likely unrecoverable.

404 might be sent if a job disappears while chunks are being submitted, and is likely unrecoverable.

Perhaps 4XX codes should not infinitely retry, but 5XX should?

@keithpitt
Copy link
Member Author

@sj26 excellent point, changed!

Copy link
Member

@sj26 sj26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

@ticky
Copy link
Contributor

ticky commented Jan 16, 2019

Good sleuthing, @sj26! I need to remember how ridiculously powerful Insights is!

Copy link
Contributor

@toolmantim toolmantim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s merge and release? 🎉

Copy link
Contributor

@lox lox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems super sensible!

@lox lox merged commit c883fcb into master Jan 18, 2019
@lox lox deleted the chunk-retry-forever branch January 18, 2019 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants