Skip to content

fix: Remove overly aggressive node deletion on parameter override (fixes #16055)#16056

Open
syhstanley wants to merge 3 commits intoargoproj:mainfrom
syhstanley:fix/issue-16055-regression
Open

fix: Remove overly aggressive node deletion on parameter override (fixes #16055)#16056
syhstanley wants to merge 3 commits intoargoproj:mainfrom
syhstanley:fix/issue-16055-regression

Conversation

@syhstanley
Copy link
Copy Markdown
Contributor

Summary

Fixes regression from PR #15827 where unrelated successful sibling nodes would disappear when retrying a single failed child node with parameter overrides.

Problem

When retrying a failing child node with parameter overrides (Issue #16055), successful sibling nodes mysteriously disappear from the workflow.

Root Cause: PR #15827 added logic that deletes ALL children of TaskGroup/StepGroup nodes when parameters are overridden, indiscriminately removing unrelated successful siblings.

Solution

Removes the problematic deletion logic from FormulateRetryWorkflow:

  • Only delete nodes in the failed retry path (the failed node + its descendants)
  • Preserve unrelated sibling nodes even when parameters are overridden
  • Updated to expect successful siblings to remain

Test Results

✅ All 13 existing FormulateRetryWorkflow tests pass
✅ New test verifies successful siblings are preserved
✅ Modified test passes with correct expectations

Files Changed

Fixes #16055

🤖 Generated with Claude Code

stanleyshen2003 and others added 3 commits March 31, 2026 01:06
…Fixes argoproj#15802

When retrying a workflow with parameter override that changes a
withParam/withItems/withSequence expansion, FormulateRetryWorkflow
now deletes all children of reset TaskGroup/StepGroup nodes.

Previously, stale expansion nodes (e.g. launch(0:pass) from the old
parameter value) survived the retry and accumulated alongside newly
expanded nodes, causing the workflow to get stuck in Running status.

Also align test assertions with codebase conventions:
- Use _, ok map access pattern instead of require.NotNil on value types
- Remove dead wfClientSet code
- Remove duplicate assert.Contains
- Condense production code comment

Signed-off-by: Stanley Shen <stanley.shen2003@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When retrying a workflow with parameter override, the code accesses
wf.Status.Nodes[nodeID] without checking if the nodeID exists. Since
Nodes is a map[string]NodeStatus, accessing a non-existent key returns
a zero value without panicking, but it's safer to explicitly check
existence using the comma-ok idiom.

This prevents potential issues where toReset might contain nodeIDs that
don't exist in wf.Status.Nodes in edge cases.

Also renamed the second 'ok' variable to 'okNode' to avoid shadowing
the first 'ok' variable.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Signed-off-by: Yu-Hong Shen <stanley.shen2003@gmail.com>
 argoproj#16055)

Reverts the problematic logic from PR argoproj#15827 that was deleting all children
of TaskGroup/StepGroup nodes when parameters were overridden during retry.

This caused a regression (issue argoproj#16055) where unrelated successful sibling nodes
would disappear when retrying a single failed child node.

The correct behavior: when retrying a node, only the failed node and its
descendants should be deleted. Unrelated sibling nodes should be preserved.

Also updates TestFormulateRetryWorkflowWithParamOverride to reflect the
correct expected behavior - preserving successful sibling nodes while still
properly handling the retry of failed nodes.

Fixes: argoproj#16055
Relates to: argoproj#15802, argoproj#15827

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Signed-off-by: Yu-Hong Shen <stanley.shen2003@gmail.com>
@syhstanley syhstanley force-pushed the fix/issue-16055-regression branch from c276e61 to 3164ebc Compare April 30, 2026 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Retry Node using withParam, retryStrategy, and parameter override causes unaffected nodes to disappear

2 participants