Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(executor): Remove IsTransientErr check for ExponentialBackoff. Fixes #4144 #4149

Merged
merged 2 commits into from
Oct 2, 2020

Conversation

anggao
Copy link
Contributor

@anggao anggao commented Sep 28, 2020

fix(executor): Remove IsTransientErr check for ExponentialBackoff #4144

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this is a chore.
  • The title of the PR is (a) conventional, (b) states what changed, and (c) suffixes the related issues number. E.g. "fix(controller): Updates such and such. Fixes #1234".
  • I've signed the CLA.
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My builds are green. Try syncing with master if they are not.
  • My organization is added to USERS.md.

util/util.go Outdated Show resolved Hide resolved
@anggao
Copy link
Contributor Author

anggao commented Sep 28, 2020

I am wondering shall we tweak this config

var ExecutorRetry = wait.Backoff{
	Steps:    8,
	Duration: 1 * time.Second,
	Factor:   1.0,
	Jitter:   0.1,
}

e.g. to make factor > 1.0 ? otherwise the retry will have the same timeout ?

@alexec
Copy link
Contributor

alexec commented Sep 29, 2020

Re-running failed jobs.

@alexec
Copy link
Contributor

alexec commented Sep 29, 2020

I am wondering shall we tweak this config

var ExecutorRetry = wait.Backoff{
	Steps:    8,
	Duration: 1 * time.Second,
	Factor:   1.0,
	Jitter:   0.1,
}

e.g. to make factor > 1.0 ? otherwise the retry will have the same timeout ?

I am wondering shall we tweak this config

var ExecutorRetry = wait.Backoff{
	Steps:    8,
	Duration: 1 * time.Second,
	Factor:   1.0,
	Jitter:   0.1,
}

e.g. to make factor > 1.0 ? otherwise the retry will have the same timeout ?

Good idea.

Currently http://backoffcalculator.com/?attempts=8&rate=1

Retry Seconds Timestamp
1 1.00 2020-09-29 12:53:50
2 2.00 2020-09-29 12:53:51
3 3.00 2020-09-29 12:53:52
4 4.00 2020-09-29 12:53:53
5 5.00 2020-09-29 12:53:54
6 6.00 2020-09-29 12:53:55
7 7.00 2020-09-29 12:53:56
8 8.00 2020-09-29 12:53:57

Maybe? http://backoffcalculator.com/?attempts=5&rate=2&interval=0.5

Retry Seconds Timestamp
1 0.50 2020-09-29 12:53:49
2 1.50 2020-09-29 12:53:50
3 3.50 2020-09-29 12:53:52
4 7.50 2020-09-29 12:53:56
5 15.50 2020-09-29 12:54:04

@jessesuen ?

@anggao
Copy link
Contributor Author

anggao commented Oct 2, 2020

According to https://pkg.go.dev/k8s.io/apimachinery/pkg/util/wait#Backoff

	// The initial duration.
	Duration time.Duration
	// Duration is multiplied by factor each iteration, if factor is not zero
	// and the limits imposed by Steps and Cap have not been reached.
	// Should not be negative.
	// The jitter does not contribute to the updates to the duration parameter.
	Factor float64
	// The sleep at each iteration is the duration plus an additional
	// amount chosen uniformly at random from the interval between
	// zero and `jitter*duration`.
	Jitter float64

It seems with Factor: 1.0, the duration will always around 1 second ?

@alexec
Copy link
Contributor

alexec commented Oct 2, 2020

I think your read is correct

@anggao
Copy link
Contributor Author

anggao commented Oct 2, 2020

@alexec what about this config, it will retry 5 times, with maximum wait around 10s

var ExecutorRetry = wait.Backoff{
	Steps:    5,
	Duration: 1 * time.Second,
	Factor:   1.6,
	Jitter:   0.5,
}

@alexec
Copy link
Contributor

alexec commented Oct 2, 2020

I get a different plan?

http://backoffcalculator.com/?attempts=5&interval=1&rate=1.6

Retry Seconds Timestamp
1 1.00 2020-10-02 10:43:50
2 2.60 2020-10-02 10:43:52
3 5.16 2020-10-02 10:43:54
4 9.26 2020-10-02 10:43:58
5 15.81 2020-10-02 10:44:05

@anggao
Copy link
Contributor Author

anggao commented Oct 2, 2020

I feel that UI page maybe not implement the same way as the go module ?
e.g. I thought it should be 1 * 1.6 ^ 5 = 10.48576

@alexec
Copy link
Contributor

alexec commented Oct 2, 2020

I feel that UI page maybe not implement the same way as the go module ?
e.g. I thought it should be 1 * 1.6 ^ 5 = 10.48576

I think you're correct, I just used anther one and got: http://exponentialbackoffcalculator.com/

un Seconds Timestamp
0 0.000 02/10/2020, 18:17:34:173 UTC
1 1.000 02/10/2020, 18:17:35:173 UTC
2 2.600 02/10/2020, 18:17:36:773 UTC
3 5.160 02/10/2020, 18:17:39:333 UTC
4 9.256 02/10/2020, 18:17:43:429 UTC

Really!?!? That is also incorrect - even though the state the correct formula on the page.

@alexec
Copy link
Contributor

alexec commented Oct 2, 2020

@nggao to move this PR forward, do the backoff part in another PR?

@anggao
Copy link
Contributor Author

anggao commented Oct 2, 2020

@alexec Yeah make sense let's do it in another PR

@anggao
Copy link
Contributor Author

anggao commented Oct 2, 2020

#4196

@alexec
Copy link
Contributor

alexec commented Oct 2, 2020

Can you sync with master to force a rebuild?

@anggao
Copy link
Contributor Author

anggao commented Oct 2, 2020

Done, did a rebase

@alexec alexec merged commit 2b12762 into argoproj:master Oct 2, 2020
@anggao
Copy link
Contributor Author

anggao commented Oct 19, 2020

@alexec Do you know when this fix will be released, it seems not included in v2.11.5 ?

@alexec
Copy link
Contributor

alexec commented Oct 19, 2020

I've cherry-picked to v2.11.6.

@alexec alexec modified the milestones: v2.12, v2.11 Oct 19, 2020
@anggao
Copy link
Contributor Author

anggao commented Oct 19, 2020

@alexec thanks!

alexcapras pushed a commit to alexcapras/argo that referenced this pull request Nov 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants