Skip to content

feat(workflow-executor): add graceful shutdown with in-flight step drain#1512

Merged
matthv merged 8 commits intofeat/prd-214-setup-workflow-executor-packagefrom
feature/prd-241-graceful-shutdown-executor
Mar 27, 2026
Merged

feat(workflow-executor): add graceful shutdown with in-flight step drain#1512
matthv merged 8 commits intofeat/prd-214-setup-workflow-executor-packagefrom
feature/prd-241-graceful-shutdown-executor

Conversation

@matthv
Copy link
Copy Markdown
Member

@matthv matthv commented Mar 25, 2026

Summary

  • Graceful drainstop() now waits for in-flight steps to complete before closing resources
  • State getterrunner.state exposes the lifecycle: idle → running → draining → stopped
  • Configurable timeoutstopTimeoutMs (default 30s) prevents stop() from hanging on stuck steps
  • HTTP server stays up during drain — frontend can still query run status while steps finish
  • Logger.info — optional method added to Logger interface for drain status messages
  • Signal handling is the consumer's responsibility (Runner is a library class)

How it works

SIGTERM received
  → stop() called
    → state = 'draining', polling stopped
    → await in-flight steps (with timeout)
    → close HTTP server, AI client, RunStore
    → state = 'stopped'

Test plan

  • State transitions: idle → running → draining → stopped
  • State resets to idle on start failure
  • stop() waits for in-flight steps before resolving
  • stop() resolves after timeout when step is stuck (logs error)
  • stop() resolves immediately when no steps in flight
  • HTTP server closed after drain, not before
  • Drain info/completion logged via Logger.info
  • 49 runner tests pass, lint clean

fixes PRD-241

🤖 Generated with Claude Code

Note

Add graceful shutdown with in-flight step drain to workflow executor

  • Runner gains a RunnerState (idlerunningdrainingstopped) and stop() now waits for all in-flight steps to complete before shutting down, with a configurable stopTimeoutMs (default 30s).
  • buildInMemoryExecutor and buildDatabaseExecutor now return a WorkflowExecutor facade with start(), stop(), and state; signal handlers for SIGTERM/SIGINT are registered on start and removed on stop.
  • A new unauthenticated GET /health endpoint on ExecutorHttpServer returns 200 when state is running or draining, and 503 otherwise.
  • ConsoleLogger and the Logger port gain an optional info() method used to log drain progress.
  • Behavioral Change: Runner can no longer be restarted after reaching stopped or draining state, and start() no longer manages an HTTP server directly.

Macroscope summarized ba45db9.

@linear
Copy link
Copy Markdown

linear bot commented Mar 25, 2026

@qltysh
Copy link
Copy Markdown

qltysh bot commented Mar 25, 2026

Qlty

Coverage Impact

Unable to calculate total coverage change because base branch coverage was not found.

Modified Files with Diff Coverage (4)

RatingFile% DiffUncovered Line #s
New file Coverage rating: A
packages/workflow-executor/src/adapters/console-logger.ts100.0%
New file Coverage rating: A
packages/workflow-executor/src/runner.ts100.0%
New file Coverage rating: B
packages/workflow-executor/src/build-workflow-executor.ts74.4%97-114
New file Coverage rating: A
packages/workflow-executor/src/http/executor-http-server.ts100.0%
Total87.0%
🤖 Increase coverage with AI coding...

In the `feature/prd-241-graceful-shutdown-executor` branch, add test coverage for this new code:

- `packages/workflow-executor/src/build-workflow-executor.ts` -- Line 97-114

🚦 See full report on Qlty Cloud »

🛟 Help
  • Diff Coverage: Coverage for added or modified lines of code (excludes deleted files). Learn more.

  • Total Coverage: Coverage for the whole repository, calculated as the sum of all File Coverage. Learn more.

  • File Coverage: Covered Lines divided by Covered Lines plus Missed Lines. (Excludes non-executable lines including blank lines and comments.)

    • Indirect Changes: Changes to File Coverage for files that were not modified in this PR. Learn more.

@qltysh
Copy link
Copy Markdown

qltysh bot commented Mar 25, 2026

3 new issues

Tool Category Rule Count
qlty Structure Function with high complexity (count = 14): createWorkflowExecutor 3

@matthv matthv force-pushed the feature/prd-241-graceful-shutdown-executor branch from 5394fc7 to 17e709a Compare March 26, 2026 14:44
matthv and others added 2 commits March 26, 2026 18:39
- stop() now drains in-flight steps before closing resources
- Add Runner.state getter: idle → running → draining → stopped
- Add stopTimeoutMs config (default 30s) to prevent hanging on stuck steps
- Convert inFlightSteps from Set to Map to track step promises
- HTTP server stays up during drain for frontend access
- Add Logger.info optional method for drain status messages
- 7 new tests: drain, timeout, state transitions, log messages

fixes PRD-241

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Public endpoint (no JWT required) that returns the Runner state:
- 200 { state: 'running' } or { state: 'draining' }
- 503 { state: 'stopped' } or { state: 'idle' }

Usable as k8s readiness probe, ECS health check, or manual curl.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Scra3 Scra3 force-pushed the feature/prd-241-graceful-shutdown-executor branch from bf5ca08 to 371befc Compare March 26, 2026 17:40
alban bertolini and others added 2 commits March 27, 2026 09:20
Register process signal handlers in start(), remove them in stop().
On SIGTERM or SIGINT, the runner drains in-flight steps then exits.
If stop() fails, exits with code 1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dlers

- Runner is now a pure library: no HTTP server, no process signals
- Factory functions (buildInMemoryExecutor/buildDatabaseExecutor) compose
  Runner + ExecutorHttpServer + SIGTERM/SIGINT handlers
- Fix review issues:
  - Re-entrancy guard in stop() (idle/stopped/draining → return)
  - Clear drain timer when drain succeeds
  - Inspect Promise.allSettled results and log failures
  - process.exitCode instead of process.exit()
  - HTTP server stop wrapped in try-catch
- WorkflowExecutor interface now exposes state getter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- httpPort now required in ExecutorOptions (executor always needs HTTP)
- start() rejects on stopped Runner (cannot restart after stop)
- stop() uses finally block to guarantee state reaches 'stopped'
- logger.info called with ?. (info is optional on Logger interface)
- drainTimer initialized as undefined, cleared on success
- Shared shutdown promise (concurrent callers await the same shutdown)
- Safety net: force exit after 5s if event loop doesn't drain (.unref())
- HTTP server stop wrapped in try-catch (failure doesn't block drain)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
alban bertolini and others added 3 commits March 27, 2026 10:21
Prevents calling start() while stop() is draining, which would
reinitialize the store and resume polling against closing resources.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cycle

Test the composeur created by buildInMemoryExecutor:
- start() calls runner.start + registers SIGTERM/SIGINT
- stop() removes signals + calls runner.stop
- Concurrent stop() calls share the same promise
- HTTP server close failure doesn't prevent runner.stop
- state getter returns runner state

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ogger

- Runner: test start() after stop() throws, resource cleanup failure log
- DatabaseStore: test migration failure logging, close() error handling
- build-run-store: test both factory functions (SQLite + in-memory)
- ConsoleLogger: test info() method

Coverage improvements:
  console-logger   67% → 100%
  build-run-store  33% → 100%
  database-store   85% → 96%
  runner           97% → 99%

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@matthv matthv merged commit 7142e6c into feat/prd-214-setup-workflow-executor-package Mar 27, 2026
29 of 30 checks passed
@matthv matthv deleted the feature/prd-241-graceful-shutdown-executor branch March 27, 2026 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants