-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: nodeAntiAffinity is not working as expected when boundaryID is empty. Fixes: #9193 #12701
Merged
terrytangyuan
merged 9 commits into
argoproj:main
from
shuangkun:fix/TakeEffectNodeAntiAffinity
May 17, 2024
Merged
fix: nodeAntiAffinity is not working as expected when boundaryID is empty. Fixes: #9193 #12701
terrytangyuan
merged 9 commits into
argoproj:main
from
shuangkun:fix/TakeEffectNodeAntiAffinity
May 17, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
shuangkun
added
area/controller
Controller issues, panics
area/retry-manual
Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries
and removed
area/controller
Controller issues, panics
labels
Feb 26, 2024
shuangkun
force-pushed
the
fix/TakeEffectNodeAntiAffinity
branch
2 times, most recently
from
February 27, 2024 03:28
78a0641
to
3b5265c
Compare
3 tasks
6 tasks
github-merge-queue bot
pushed a commit
to linz/topo-workflows
that referenced
this pull request
Apr 11, 2024
#### Motivation We want every task within our workflows to retry (twice) automatically and on a different node/host (network issues) when they fail. #### Modification Configure the `retryStrategy` at the `workflowDefaults` level. Notes: - the `nodeAntiAffinity` to prioritize retrying on a different node/host is not working due to an issue in the Argo Workflows system. A [PR is opened](argoproj/argo-workflows#12701) in the Argo repo - this change will make every task retrying on failure. Some of our tasks (for example, `stac-validate`, `tileindex-validate`) are expected to fail if the system invalidate something. They will still retry twice in that case. This would be handle in a separate PR. #### Checklist - [ ] Tests updated NA - [x] Docs updated - [x] Issue linked in Title
shuangkun
force-pushed
the
fix/TakeEffectNodeAntiAffinity
branch
from
April 12, 2024 01:31
f0ddd52
to
a8bd167
Compare
shuangkun
force-pushed
the
fix/TakeEffectNodeAntiAffinity
branch
from
April 12, 2024 03:04
b1c6679
to
f016cba
Compare
3 tasks
github-merge-queue bot
pushed a commit
to linz/topo-workflows
that referenced
this pull request
Apr 18, 2024
…545) #### Motivation We've noticed that karpenter is erroring since #506. Errors makes us thinking that it could be related. Since the retry on a new node using `nodeAntiAffinity` [is not working](argoproj/argo-workflows#12701), it's a good idea to remove it to see if it solved `karpenter` issues. #### Modification Remove the `affinity` in the `retryStrategy`. #### Checklist - [ ] Tests updated - [x] Docs updated - [x] Issue linked in Title
agilgur5
added
area/retryStrategy
Template-level retryStrategy
and removed
area/retry-manual
Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries
labels
Apr 25, 2024
tczhao
reviewed
May 1, 2024
6 tasks
shuangkun
force-pushed
the
fix/TakeEffectNodeAntiAffinity
branch
5 times, most recently
from
May 8, 2024 11:44
bf88ddf
to
66c932d
Compare
shuangkun
force-pushed
the
fix/TakeEffectNodeAntiAffinity
branch
2 times, most recently
from
May 9, 2024 08:07
efb2600
to
0e4a9d6
Compare
Signed-off-by: shuangkun <tsk2013uestc@163.com>
Signed-off-by: shuangkun <tsk2013uestc@163.com>
Signed-off-by: shuangkun <tsk2013uestc@163.com>
shuangkun
force-pushed
the
fix/TakeEffectNodeAntiAffinity
branch
from
May 9, 2024 08:08
0e4a9d6
to
db43c36
Compare
terrytangyuan
approved these changes
May 17, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/controller
Controller issues, panics
area/retryStrategy
Template-level retryStrategy
prioritized-review
For members of the Sustainability Effort
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #9193
Motivation
The root cause is the workflow's boundaryID is empty. So did't find the retry node to fetch the HostNodeName.
So I want to find the retry node and fetch the HostNodeName as the new pod‘s nodeAntiAffinity.
Modifications
Find the correct retrynode when boundaryID is empty.
Verification
e2e tests and ut