Skip to content

fix: preserve Failed status in determineStepCompletion#5941

Merged
EronWright merged 3 commits intomainfrom
EronWright/fix-fail-step-status
Mar 18, 2026
Merged

fix: preserve Failed status in determineStepCompletion#5941
EronWright merged 3 commits intomainfrom
EronWright/fix-fail-step-status

Conversation

@EronWright
Copy link
Copy Markdown
Contributor

@EronWright EronWright commented Mar 17, 2026

Summary

determineStepCompletion had two issues that caused a step's Failed status to be silently overridden by Errored:

  1. Failed + IsTerminal — the single IsTerminal case applied "an unrecoverable error occurred: ..." and set Errored regardless of whether the step had signaled Failed. A step like fail that intentionally signals a business-logic failure ended up with Errored phase and a noisy wrapper message.

  2. Error threshold — when a non-terminal error exhausted retries, the status was unconditionally forced to Errored, discarding any Failed status the step had set on its last attempt.

Changes

  • Split the IsTerminal switch case into two explicit cases:
    • Failed && IsTerminal → preserve Failed, use err.Error() directly (no boilerplate prefix)
    • IsTerminal (non-Failed) → set Errored explicitly, keep "an unrecoverable error occurred: %s" prefix (technical failure deserves the annotation)
  • At error threshold: preserve meta.Status (Failed or Errored) instead of forcing Errored
  • Tests updated to assert the correct statuses/messages; new cases cover both Failed-preservation paths

Example (UI)

Before:

Screenshot 2026-03-17 at 1 56 06 PM

After:

image

Example (API)

Before:

apiVersion: kargo.akuity.io/v1alpha1
kind: Promotion
metadata:
  annotations:
    kargo.akuity.io/create-actor: admin
  creationTimestamp: "2026-03-17T20:55:55Z"
  generation: 1
  name: test.01kkysaw76xh3hxrmjegvcwpvs.d150bc0
  namespace: failer
  resourceVersion: "1636209"
  uid: 0ae0c5a1-19ba-434b-a28c-efaa6eee5e97
spec:
  freight: d150bc0d8c32dbb1b1e83bf7c9458b517c8c23a2
  stage: test
  steps:
  - as: failer
    config:
      message: 'fail #1'
    uses: fail
status:
  finishedAt: "2026-03-17T20:55:55Z"
  message: 'an unrecoverable error occurred: error running step "failer": failed with
    message: fail #1'
  phase: Errored
  startedAt: "2026-03-17T20:55:55Z"
  state: {}
  stepExecutionMetadata:
  - alias: failer
    finishedAt: "2026-03-17T20:55:55Z"
    message: 'an unrecoverable error occurred: error running step "failer": failed
      with message: fail #1'
    startedAt: "2026-03-17T20:55:55Z"
    status: Errored

After:

apiVersion: kargo.akuity.io/v1alpha1
kind: Promotion
metadata:
  annotations:
    kargo.akuity.io/create-actor: admin
  creationTimestamp: "2026-03-17T20:50:40Z"
  generation: 1
  name: test.01kkys18xnk114z13qxkckfma9.a4a7a40
  namespace: failer
  resourceVersion: "1635437"
  uid: 2a8e0641-effc-4403-944c-4a3b1066b4c2
spec:
  freight: a4a7a4040c7bc25f1918a1781e1acac9899356be
  stage: test
  steps:
  - as: failer
    config:
      message: 'fail #1'
    uses: fail
status:
  finishedAt: "2026-03-17T20:50:40Z"
  message: 'step "failer": failed: fail #1'
  phase: Failed
  startedAt: "2026-03-17T20:50:40Z"
  state: {}
  stepExecutionMetadata:
  - alias: failer
    finishedAt: "2026-03-17T20:50:40Z"
    message: 'step "failer": failed: fail #1'
    startedAt: "2026-03-17T20:50:40Z"
    status: Failed

Test plan

  • go test ./pkg/promotion/ -run TestLocalOrchestrator -v — all cases pass, including the two new ones
  • Manual: promote with a fail step → promotion phase is Failed (not Errored), message is the raw step message without "an unrecoverable error occurred:" prefix

@EronWright EronWright requested a review from a team as a code owner March 17, 2026 21:00
@netlify
Copy link
Copy Markdown

netlify bot commented Mar 17, 2026

Deploy Preview for docs-kargo-io ready!

Name Link
🔨 Latest commit db7d9f4
🔍 Latest deploy log https://app.netlify.com/projects/docs-kargo-io/deploys/69b9ee0413d15200086e738a
😎 Deploy Preview https://deploy-preview-5941.docs.kargo.io
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Member

@jessesuen jessesuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.39%. Comparing base (1cc28ce) to head (db7d9f4).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5941      +/-   ##
==========================================
+ Coverage   56.38%   56.39%   +0.01%     
==========================================
  Files         457      457              
  Lines       38236    38245       +9     
==========================================
+ Hits        21558    21567       +9     
  Misses      15401    15401              
  Partials     1277     1277              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@EronWright EronWright requested a review from jessesuen March 17, 2026 22:24
When a step signals Failed via a TerminalError, the promotion phase
and step status were overridden to Errored with a noisy
"an unrecoverable error occurred: ..." wrapper. This commit fixes two
related issues in determineStepCompletion:

1. Failed + IsTerminal: split into its own case so the Failed status
   is preserved and the message is the raw error text, without the
   boilerplate prefix reserved for technical failures.

2. Error threshold: preserve meta.Status (Failed or Errored) instead
   of unconditionally setting Errored.

Tests are updated to assert the correct statuses and messages, and new
cases cover the Failed-preservation paths.

Signed-off-by: Eron Wright <eron.wright@akuity.io>
Avoid wrapping intentional failure messages (e.g. from the fail step)
with "error running step": prefix only Errored (technical) results.

Signed-off-by: Eron Wright <eron.wright@akuity.io>
For Failed status, use a shorter "step %q:" prefix instead of
"error running step %q:" to identify which step produced the error
without implying a technical failure.

Signed-off-by: Eron Wright <eron.wright@akuity.io>
@EronWright
Copy link
Copy Markdown
Contributor Author

EronWright commented Mar 18, 2026

Updated test results (with fail step from #5918)

Scenario Phase Message
Config error (before) Errored an unrecoverable error occurred: error running step "bad-config": invalid git-push config: (root): Additional property bogus is not allowed
Config error (after) Failed step "bad-config": invalid git-push config: (root): Additional property bogus is not allowed
Intentional fail (before) Errored an unrecoverable error occurred: error running step "failer": failed with message: fail #1
Intentional fail (after) Failed step "failer": failed: fail #1

The third commit (db7d9f4) adds a step "<alias>": prefix to non-technical errors (notably configuration errors) so the originating step is identifiable in the top-level promotion message, without the heavier error running step phrasing reserved for technical errors.

@EronWright EronWright force-pushed the EronWright/fix-fail-step-status branch from f057334 to db7d9f4 Compare March 18, 2026 00:12
@EronWright EronWright enabled auto-merge March 18, 2026 00:15
@EronWright EronWright added this pull request to the merge queue Mar 18, 2026
Merged via the queue into main with commit 61d03fc Mar 18, 2026
22 checks passed
@EronWright EronWright deleted the EronWright/fix-fail-step-status branch March 18, 2026 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants