v4.0.0
Major Changes
-
e0c14ad: Stop hiding consequential behavior behind defaults.
Two defaults silently took actions the developer didn't ask for. Both now hand the decision back:
StepOpts.retriesdefaults to0(was3). A step runs once and its failure is terminal unless you opt in withretries: N. Previously every step silently retried up to 4× with exponential backoff; you had to writeretries: 0to get a single run. (Steps re-run on crash recovery regardless, so side-effecting bodies should already be idempotent.)engine.listRuns({ limit })throws whenlimit > 500instead of silently clamping to 500. Asking for more than the max now surfaces an error rather than truncating the page without a signal.
Migration: if you relied on automatic step retries, add
retries: 3(or your preferred count) to thosectx.step(...)/.step(...)calls. If you passedlistRuns({ limit })above 500, lower it to ≤ 500. -
e0c14ad: Group
EngineOptsinto descriptive config blocks.The flat options bag is replaced with four nested groups so related settings live together and each group's defaults are documented on the hover. Switchable subsystems (
reconciler,retention) takefalse | { … }; always-on tuning (worker,limits) takes{ … }. New:reconciler.schedulelets you change the sweep cadence (previously hardcoded to every minute).Migration:
Before (v3) After (v4) workerSchemaworker.schemaconcurrencyworker.concurrencypollIntervalworker.pollIntervalenqueueworker.enqueuedisableReconciler: truereconciler: falsereconcilerGraceMsreconciler.graceMsrunningStuckMsreconciler.runningStuckMsmaxRunAttemptslimits.maxRunAttemptsdefaultStepTimeoutMslimits.defaultStepTimeoutMsretentionandlimits(size caps) keep their fields;limitsnow also holdsmaxRunAttemptsanddefaultStepTimeoutMs.// before createEngine({ db, pool, workerSchema: "gw", concurrency: 10, disableReconciler: true, maxRunAttempts: 50, }); // after createEngine({ db, pool, worker: { schema: "gw", concurrency: 10 }, reconciler: false, limits: { maxRunAttempts: 50 }, });
Minor Changes
-
e0c14ad: Export
isSuspendandFlowSuspendfrom the public API.ctx.sleep/ctx.signal/ctx.invokepark a run by throwingFlowSuspend. Because it extendsError, atry/catcharound actx.*call silently swallows the suspend and the run never parks. These were@internal, so consumers had no way to guard. The correct pattern is now expressible:try { await ctx.signal("approval", { timeout: "24h" }); } catch (err) { if (isSuspend(err)) throw err; // let the run park // ...handle real errors }
Patch Changes
-
e0c14ad: Reap orphaned
cron:*jobs on worker startup.When a cron is removed from code, graphile-worker stops scheduling it but already-enqueued
cron:<name>jobs linger with no task handler — they sit forever, erroring across deploy cutovers.startGraphileWorkernow runs a best-effort purge afterrun(), completing anycron:*job whose task is no longer registered. It never throws, so a reap failure can't block worker startup.The cron policy (jitter, overlap, reaping) now lives in its own
cronmodule that the graphile adapter drives. -
e0c14ad: Recover runs whose worker crashed mid-execution.
A run that died while
status = runningcould never resume: the reconciler re-enqueued it but left the statusrunning, andclaimRunrejectsrunningas "lost" — so the re-enqueued job was skipped forever and the run hung permanently. The reconciler now resets a stuckrunningrun toretryingbefore re-enqueuing, so the next claim succeeds. Guarded by the existingreconciler.runningStuckMsthreshold (default 10 min).