Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Re-introduce 1 second sleep to reconcile informer #3684

Merged
merged 1 commit into from Aug 6, 2020

Conversation

simster7
Copy link
Member

@simster7 simster7 commented Aug 6, 2020

Fixes: #3665

This line was removed in v2.9.0, but issues arose with the controller picking up outdated versions of workflows, operating on them, then attempting to update the resources on the cluster. Since the workflows were outdated, conflicts arose when attempting to update, causing our conflict resolution code to be invoked. The conflict resolution code was not perfect and workflow information was not correctly persisted, causing undefined behavior.

Testing procedure:

  1. Make a build without this line.
  2. Run multiple workflows
  3. See undefined behavior (hanging pending pods, hanging running nodes, etc.); see the following log:
    Error updating workflow: Operation cannot be fulfilled on workflows.argoproj.io \""WF_NAME\"": the object has been modified; please apply your changes to the latest version and try again Conflict"
    
  4. Make a build containing this line.
  5. Run multiple workflows
  6. See no undefined behavior; don't see the problematic log.

Copy link
Contributor

@alexec alexec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the right solution, we should choose reliability over speed every time.

Backport to v2.9 and v2.10?

@alexec
Copy link
Contributor

alexec commented Aug 6, 2020

I think we should look at alternative methods to do this, but not today. Today, lets fix the bug with a know working solution.

@simster7 simster7 marked this pull request as ready for review August 6, 2020 19:25
@simster7 simster7 merged commit 41ebbe8 into argoproj:master Aug 6, 2020
@alexec
Copy link
Contributor

alexec commented Aug 6, 2020

@alexec
Copy link
Contributor

alexec commented Aug 6, 2020

Available in v2.9.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race conditions when persisting Workflows causing undefined behavior
2 participants