fix(workflow-executor): reject duplicate triggers of in-flight runs (PRD-468)#1634
Conversation
2 new issues
|
|
Coverage Impact Unable to calculate total coverage change because base branch coverage was not found. Modified Files with Diff Coverage (3)
🛟 Help
|
fe27c82 to
431cb08
Compare
|
Nice work — the boundary error + the Description ↔ code mismatch The description says the in-flight check is done twice (early + late) and that "the late check is the correctness guard … removing it in favor of the early one alone would reintroduce double execution". But the diff actually removes the late check ( Why this matters — I checked the orchestrator side.
UPDATE "workflowRuns" SET "lockedAt"=NOW(), "runState"='loading'
WHERE id IN ( SELECT wr.id ... WHERE wr."runState"='pending' AND wr.id=:runId ...
FOR UPDATE SKIP LOCKED )
RETURNING *→ two concurrent calls: one claims, the other gets But for a
The test gives false confidence here: Suggested resolution (either):
Minor (non-blocking): |
…PRD-468) A duplicate trigger for a run already executing on this instance was silently accepted (200) after a wasted orchestrator round-trip — the inFlightRuns check sat after getAvailableRun and only `return`ed. triggerPoll now checks inFlightRuns up front via assertRunNotInFlight, before the orchestrator round-trip, throwing the new boundary RunAlreadyInFlightError (mapped to 400 by handleTrigger, so the front knows the trigger was rejected, not silently accepted). This is a best-effort local optimization, NOT a concurrency guard: inFlightRuns is per-instance and a deployment can run several executors, so genuine duplicate-execution prevention is the orchestrator's job — it atomically claims a pending run, and concurrent triggers (on this or another executor) get nothing back. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
431cb08 to
1e0f685
Compare
|
Thanks for the deep read — you're right that the commit message was stale (it described a late check that the diff removed), I've fixed the description + commit message. On the substance: I'm intentionally not restoring the late check. So the up-front Re 400 vs 409: agreed 409 reads better, keeping 400 per the ticket. |

What
A duplicate trigger for a run already executing on this instance was silently accepted (
200) after a wasted orchestrator round-trip — theinFlightRunscheck sat aftergetAvailableRunand onlyreturned.triggerPollnow checksinFlightRunsup front (assertRunNotInFlight), before the orchestrator round-trip, and throws the boundaryRunAlreadyInFlightError→handleTriggermaps it to400so the front knows the trigger was rejected, not silently accepted.Important: this is an optimization, not a concurrency guard
inFlightRunsis per-instance and a deployment can run several executors, so an executor-side check can never be the duplicate-execution guard. Genuine concurrency dedup is the orchestrator's job: it atomically claims apendingrun (UPDATE … WHERE runState='pending' … FOR UPDATE SKIP LOCKED), so concurrent triggers — on this executor or another — get nothing back.The up-front check is purely a best-effort local short-circuit to skip a useless orchestrator round-trip when this instance already knows it's running the run.
Changes
assertRunNotInFlight(runId)early intriggerPoll(+ explanatory comment on the intent)RunAlreadyInFlightError, mapped to400inhandleTriggerfixes PRD-468
🤖 Generated with Claude Code
Note
Reject duplicate triggers of in-flight workflow runs with a 400 error
Runner.triggerPollpreviously silently skipped duplicate triggers; it now throwsRunAlreadyInFlightErrorvia a newassertRunNotInFlighthelper when the run ID is already being processed.POST /runs/:runId/triggerendpoint inexecutor-http-server.tscatchesRunAlreadyInFlightErrorand returns a 400 response with the error message.Macroscope summarized 1e0f685.