Skip to content

fix(argo-workflows): handle Error phase webhook events#1112

Merged
jsbroks merged 3 commits intomainfrom
mleonidas/argo-workflow-webhook-missing-transition
May 7, 2026
Merged

fix(argo-workflows): handle Error phase webhook events#1112
jsbroks merged 3 commits intomainfrom
mleonidas/argo-workflow-webhook-missing-transition

Conversation

@mleonidas
Copy link
Copy Markdown
Collaborator

@mleonidas mleonidas commented May 7, 2026

Argo Workflows finalize as Error (controller/infra failure) or Failed (user-code exit). The webhook handler only mapped Failed; Error events silently dropped at the unmapped-phase guard, leaving jobs stuck in_progress.

Map Error -> JobStatus.Failure. Drop the workflowName fallback in getJobId since job.id is a UUID and workflowName never matches; surface missing labels as a warn log instead of a guaranteed no-op UPDATE. Add a notInArray guard on the UPDATE so the ~13 near-simultaneous sensor fires per workflow can't regress a terminal job. Add structured logs at every drop/update path so future silent failures stay loud.

Summary by CodeRabbit

  • Bug Fixes

    • Prevents completed jobs from being reverted by delayed webhook events through improved idempotent database update logic.
    • Validates presence of required job identifiers in webhook requests to prevent silent processing failures.
  • Tests

    • Added comprehensive test coverage for workflow phase-to-status mapping functionality with various scenarios.
  • Chores

    • Enhanced contextual logging for incoming webhook requests and job status update operations.

Argo Workflows finalize as `Error` (controller/infra failure) or `Failed`
(user-code exit). The webhook handler only mapped `Failed`; `Error` events
silently dropped at the unmapped-phase guard, leaving jobs stuck in_progress.

Map `Error` -> JobStatus.Failure. Drop the workflowName fallback in getJobId
since job.id is a UUID and workflowName never matches; surface missing labels
as a warn log instead of a guaranteed no-op UPDATE. Add a notInArray guard on
the UPDATE so the ~13 near-simultaneous sensor fires per workflow can't regress
a terminal job. Add structured logs at every drop/update path so future silent
failures stay loud.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack

Warning

Rate limit exceeded

@mleonidas has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 27 minutes and 51 seconds before requesting another review.

To continue reviewing without waiting, purchase usage credits in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e831509d-b53b-405e-9b1e-f8900c38e2c2

📥 Commits

Reviewing files that changed from the base of the PR and between 6bb539e and e5a458f.

📒 Files selected for processing (3)
  • apps/api/Dockerfile
  • apps/api/src/routes/argoworkflow/index.ts
  • apps/api/src/routes/argoworkflow/workflow.ts
📝 Walkthrough

Walkthrough

This PR enhances Argo workflow webhook handling with explicit status mapping, strict input validation, and idempotent database updates. It adds structured logging at the webhook entry point and throughout job status updates to improve observability while preventing race conditions where late events could overwrite terminal job states.

Changes

Argo Workflow Status & Idempotency

Layer / File(s) Summary
Logger Imports
apps/api/src/routes/argoworkflow/workflow.ts, apps/api/src/routes/argoworkflow/index.ts
Adds logger import from @ctrlplane/logger to both the webhook route handler and workflow processing logic.
Status Mapping Contract
apps/api/src/routes/argoworkflow/workflow.ts
Introduces statusMap object mapping Argo phase strings (Pending, Running, Succeeded, Failed, Error) to JobStatus enum values, and exports `mapTriggerToStatus(trigger: string): JobStatus
Webhook Request Logging
apps/api/src/routes/argoworkflow/index.ts
Logs receipt of Argo webhook with structured context fields (jobAgentId, workflowName, uid, phase, jobId, eventType) before delegating to handler.
Workflow Processing Validation
apps/api/src/routes/argoworkflow/workflow.ts
Changes getJobId to return only payload.jobId (no fallback to workflowName). Updates handleArgoWorkflow to validate jobId and phase presence with early returns and warning logs for missing or unmapped values.
Idempotent Job Update
apps/api/src/routes/argoworkflow/workflow.ts
Constrains job status update with .where(and(eq(job.id, jobId), notInArray(job.status, exitedStatus))) to prevent late non-terminal events from overwriting terminal job states.
Update Result Logging & Branching
apps/api/src/routes/argoworkflow/workflow.ts
Adds conditional logging: returns early with info log when update affects zero rows (indicating duplicate/late event), logs info on successful update.
Status Mapping Tests
apps/api/src/routes/argoworkflow/__tests__/workflow.test.ts
Adds Vitest test suite for mapTriggerToStatus verifying correct mapping of known Argo phases to JobStatus and null return for unknown/case-mismatched phases.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A rabbit hops through phases bright,
From Pending through to Success's light,
No late events shall turn back time,
The job stays done—idempotent and fine! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: handling Error phase webhook events in Argo Workflows, which aligns with the PR's primary objective to map the Error phase to JobStatus.Failure.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mleonidas/argo-workflow-webhook-missing-transition

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
apps/api/src/routes/argoworkflow/index.ts (1)

51-59: ⚡ Quick win

Add an explicit payload type before logging/handling.

req.body is used as an untyped object in this path; adding a concrete payload type (or runtime-validated parse result) here would make Line 53–57 field access and the handleArgoWorkflow call safer.

As per coding guidelines, "**/*.{ts,tsx}: Use explicit types in TypeScript code".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/api/src/routes/argoworkflow/index.ts` around lines 51 - 59, The code
uses req.body as an untyped payload when calling logger.info and
handleArgoWorkflow; declare or import a concrete payload type (e.g.,
ArgoWorkflowPayload) and replace the untyped payload with a typed/validated
variable before logging and calling handleArgoWorkflow—either by typing the
parsed body (const payload: ArgoWorkflowPayload = req.body) or by
runtime-validating and casting (e.g., with zod/validator) to produce a
typedPayload; then pass that typedPayload to logger.info and handleArgoWorkflow
so field access and the function call are statically/clinically safe.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@apps/api/src/routes/argoworkflow/index.ts`:
- Around line 51-59: The code uses req.body as an untyped payload when calling
logger.info and handleArgoWorkflow; declare or import a concrete payload type
(e.g., ArgoWorkflowPayload) and replace the untyped payload with a
typed/validated variable before logging and calling handleArgoWorkflow—either by
typing the parsed body (const payload: ArgoWorkflowPayload = req.body) or by
runtime-validating and casting (e.g., with zod/validator) to produce a
typedPayload; then pass that typedPayload to logger.info and handleArgoWorkflow
so field access and the function call are statically/clinically safe.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cf10fc61-3d86-416f-89a9-590d30f17aea

📥 Commits

Reviewing files that changed from the base of the PR and between 359044f and 6bb539e.

📒 Files selected for processing (3)
  • apps/api/src/routes/argoworkflow/__tests__/workflow.test.ts
  • apps/api/src/routes/argoworkflow/index.ts
  • apps/api/src/routes/argoworkflow/workflow.ts

mleonidas added 2 commits May 7, 2026 10:18
pnpm 11 places global binaries in $PNPM_HOME/bin, not $PNPM_HOME directly.
The previous PATH entry pointed at /pnpm so pnpm refused to install globals
with "configured global bin directory /pnpm/bin is not in PATH". Surfaced
when corepack@latest pulled pnpm 11.0.8; older pnpm tolerated the mismatch.
Export ArgoWorkflowPayload and cast req.body to it in the route handler so
field access in the entry log and the call into handleArgoWorkflow is checked
by tsc instead of relying on Express's `any`-typed body. Drop the now-redundant
optional chaining on payload fields (lint flagged them as unnecessary). No
runtime validation added — webhook stays auth-gated by shared secret, matching
the TFE and GitHub webhook handlers.
@jsbroks jsbroks merged commit c3e738f into main May 7, 2026
10 checks passed
@jsbroks jsbroks deleted the mleonidas/argo-workflow-webhook-missing-transition branch May 7, 2026 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants