Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential errors when computing exponential backoff and linear backoff #122

Open
sk- opened this issue Aug 19, 2020 · 8 comments
Open

Potential errors when computing exponential backoff and linear backoff #122

sk- opened this issue Aug 19, 2020 · 8 comments

Comments

@sk-
Copy link

sk- commented Aug 19, 2020

The code to compute the linear backoff and exponential backoff has some issues, as it does not wait in the first attempt, this is because currentRetryAttempt is initially set to 0 instead of 1. See

retry-axios/src/index.ts

Lines 183 to 188 in c284310

if (config.backoffType === 'linear') {
delay = config.currentRetryAttempt! * 1000;
} else if (config.backoffType === 'static') {
delay = config.retryDelay!;
} else {
delay = ((Math.pow(2, config.currentRetryAttempt!) - 1) / 2) * 1000;
.

It also does not consider the retryDelay, which according to the readme:

Milliseconds to delay at first. Defaults to 100.

Additionally, the formula for computing the exponential backoff should be something like:

Math.min((Math.pow(2, config.currentRetryAttempt) + Math.random()), MAX_DELAY) * 1000

See https://cloud.google.com/iot/docs/how-tos/exponential-backoff#example_algorithm

That would give retry times like:

1.234
2.314
4.012

Compare that to the times obtained with the current algorithm:

0
0.5
1.5
@sk-
Copy link
Author

sk- commented Aug 24, 2020

@JustinBeckwith would you accept PRs addressing this and other issues?.

@JustinBeckwith
Copy link
Owner

👋 absolutely! Would love PRs, especially ones that come with tests so I can really understand what was going wrong :)

@jgehrcke
Copy link
Contributor

jgehrcke commented Dec 16, 2020

@JustinBeckwith

about

so I can really understand what was going wrong :)

Have you read this? :)

it does not wait in the first attempt, this is because currentRetryAttempt is initially set to 0 instead of 1

Between the first attempt (attempt 0) and the first retry (attempt 1) there should be a non-zero delay. With the current code there isn't: there is no delay. That's one of two problems that @sk- was pointing out :).
((Math.pow(2, config.currentRetryAttempt!) - 1) / 2) * 1000; is 0 for currentRetryAttem being 0.

Same for the linear one: delay = config.currentRetryAttempt! * 1000; is 0 for ...

@JustinBeckwith
Copy link
Owner

👋 Saying "have you read this" is kind of being a turd - please don't be a turd. There were some changes here recently which may have resolved the issue. If folks are interested, I'd be happy to take a PR (as mentioned above), but I'm unlikely to dig in here soon.

@jgehrcke
Copy link
Contributor

jgehrcke commented Dec 16, 2020

Saying "have you read this" is kind of being a turd

Thanks for the feedback. I am sorry man, really didn't want to come across like that.

My addition to this ticket was meant in a neutral-friendly way, therefore also the ":)". It certainly was meant to be a productive contribution: it really appeared to me (based on the communication in here so far) as if you might have missed a specific problem description -- the one I have quoted.

Missing something happens to all of us. All I wanted is to ask and make sure that it's not just a simple misunderstanding. I got the impression that you missed it because you didn't comment on it and also suggested that you may not see/understand the specific problem(s) reported. Based on your "so I can really understand". Again: no blame, no stress -- all of this easily happens in a ticket like this (also because this bug report mixes two issues and does not have a precise title); and I wanted to help us align on a problem and/or acknowledge a problem description (which is one of the most important parts in my opinion for inviting contributors: define a problem to be solved rather well -- together).

@JustinBeckwith
Copy link
Owner

Loud and clear. Sorry if I was crass - I catch a lot of flack in issue trackers, and appreciate the clarification! I completely understand what you mean now, and apologize for being short.

On the issue itself - totally understand what folks are saying. What I was trying to get across is that I don't believe there are tests which dig into the specific timing of the retries. I'd like to avoid having a patch floated that "fixes" the issue without having a fairly in-depth suite of tests that specifically cover backoff expectations. If someone submitted a fix today for this, it's likely all the tests we have in place would just pass with no changes. After that, it's very likely that the next patch breaks it (or you know, I accidentally break it).

Thanks for bearing with my being grumpy.

@jgehrcke
Copy link
Contributor

Thank you for the kind words, Justin! Thanks for being vulnerable. I totally see where you're coming from with "I catch a lot of flack in issue trackers" -- and I am glad that we figured this out w/o feeling bad after all. Feels good that we talked through that; thanks for taking a moment to type up this response. I love open source also because of interactions like this.

jgehrcke added a commit to jgehrcke/retry-axios that referenced this issue Aug 16, 2021
See issue JustinBeckwith#122.

This is to confirm that the delay
between the first (actual) attempt
and the first retry is not zero,
but the configured delay.

tests: prettier format (noop)

tests: prettier format (noop)
jgehrcke added a commit to jgehrcke/retry-axios that referenced this issue Aug 16, 2021
jgehrcke added a commit to jgehrcke/retry-axios that referenced this issue Aug 16, 2021
The mutation did not affect the code below.
This is for issue JustinBeckwith#122.

calc delay: prettier (format)
@jgehrcke
Copy link
Contributor

Between the first attempt (attempt 0) and the first retry (attempt 1) there should be a non-zero delay. With the current code there isn't: there is no delay.

attempt to fix that in #163

JustinBeckwith pushed a commit that referenced this issue Aug 23, 2021
…and exp strategy (#163)

* tests: static backoff: check delay

See issue #122.

This is to confirm that the delay
between the first (actual) attempt
and the first retry is not zero,
but the configured delay.

tests: prettier format (noop)

tests: prettier format (noop)

* fix delay calc: non-zero between first two attempts

See issue #122.

* calc delay: fix currentRetryAttempt mutation

The mutation did not affect the code below.
This is for issue #122.

calc delay: prettier (format)

* tests: test delay for linear, exp strategies

* calc delay: mutate cfg on orig err obj

Found that state is not retained across
retries when mutating
`config.currentRetryAttempt` here.

* calc delay: use expressive variable name

* package.json: require nodejs 10.7.0 or newer

This is to address the `gts fix` error

    The 'process.hrtime.bigint' is not supported
    until Node.js 10.7.0. The configured version
    range is '>=10.0.0'

This function is only used in the test suite
so maybe it's not the best solution to change
the NodeJS version requirement for the
entire package. However, whoever runs this on
NodeJS older than 10.7.0 (released on
2018-07-18) might want to be encouraged to
use a more recent version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants