Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor retry machinery to allow the use of exponential backoff #1588

Merged
merged 3 commits into from
May 3, 2022

Conversation

moskyb
Copy link
Contributor

@moskyb moskyb commented Mar 21, 2022

🤔 Problem: The retry util we currently use has a couple of issues:

  • There aren't any tests
  • Retry configuration errors happen when you try to use the retrier, rather than when you configure it
  • It only supports constant backoff - exponential backoff is really handy a lot of the time.

✅ Solution: This PR replaces the existing machinery (encapsulated in the retry.Do function and the retry.Config struct), with a struct called retry.Retrier. Retrier intentionally has a similar user interface to retry.Do, but it adds:

  • Configurable backoff strategies - currently constant and exponential, but we could theoretically do some freaky stuff with it if we really wanted to
  • Constructor panics when coding errors are made - eg, if a user puts a negative interval in to a constant strategy, that can never be valid and the program should complain loudly if it happens
  • Tests! Hooray!

💬 Things to discuss:

  • the math i use in the retry.Exponential function means that i can't have a base lower than 1 second, as i convert the time.Duration into an integer number of seconds, but if i don't do that, then the math gets all wack because time.Durations are integer numbers of nanoseconds - eg, math.Pow(2*time.Second, 2) != math.Pow(2, 2) * time.Second. is there a happy middle ground here? i'm embarrassed to say that my arithmetic skills are failing me here.
  • How do we feel about the functional constructor? i feel like i was overdoing it by adding it, but... i kinda like it
  • I've changed the artifact batch creation to use exponential backoff, but should we change other operations to use exponential? if so, which ones? This is no longer true, i'll change things to use expo backoff in another PR.

This PR will be most easily reviewed commit-by-commit

Closes PIP-234

@moskyb moskyb requested a review from a team March 21, 2022 03:15
@moskyb moskyb force-pushed the refactor-retry-config-to-be-pluggable branch from 70e0cf9 to b2fb634 Compare March 21, 2022 03:24
@pda
Copy link
Member

pda commented Mar 21, 2022

Ooh nice, I like the look of it.
(I haven't fully reviewed it yet.)

Copy link
Contributor

@tessereth tessereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, YAY exponential backoff and jitter. I'm all for this.

I've only really checked out the retrier.go code in detail. I like it. But there's an annoying number of pointy bits in this kind of code. Have you explored pulling in a library for this rather than us rolling our own?

retry/retrier.go Outdated Show resolved Hide resolved
retry/retrier.go Outdated Show resolved Hide resolved
retry/retrier.go Outdated Show resolved Hide resolved
@moskyb
Copy link
Contributor Author

moskyb commented Apr 11, 2022

@tessereth

Have you explored pulling in a library for this rather than us rolling our own?

yeah, i looked into this a bit - there weren't any libraries that i really loved, and none as nice as eg. python's tenacity or ruby's retryable. i suspect that it's the kind of thing that gets reimplemented privately a lot, because the libraries out there aren't awesome - QED, that's exactly what i did 😅

@moskyb moskyb force-pushed the refactor-retry-config-to-be-pluggable branch 2 times, most recently from de5ee5a to e3719c8 Compare April 21, 2022 04:26
@moskyb moskyb requested review from tessereth and pda April 21, 2022 04:28
@moskyb moskyb force-pushed the refactor-retry-config-to-be-pluggable branch 2 times, most recently from 430e9e3 to cb442dd Compare April 21, 2022 04:57
// WithJitter enables jitter on the retrier, which will cause all of the retries to wait a random amount of time < 1 second
// The idea here is to avoid thundering herds - retries that are in parallel will happen at slightly different times when
// jitter is enabled, whereas if jitter is disabled, all the retries might happen at the same time, causing further load
// on the system that we're tryung to do something with

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: 'trying'

Copy link

@waferbaby waferbaby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!

Copy link
Member

@sj26 sj26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it

@moskyb moskyb force-pushed the refactor-retry-config-to-be-pluggable branch from cb442dd to 17c1acc Compare April 27, 2022 04:14
@moskyb moskyb merged commit de27049 into main May 3, 2022
@moskyb moskyb deleted the refactor-retry-config-to-be-pluggable branch May 3, 2022 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants