feat: auto-cleanup step data and rename retry to retrigger#91
Conversation
- Add `cleanupSteps` option (default: true) to delete step output data when runs reach terminal state, preventing unbounded DB growth - Add `storage.deleteSteps(runId)` for step data cleanup - Rename `retry()` to `retrigger()` — creates a fresh run instead of resetting the original, preserving run history - Remove `run:retry` event; `retrigger()` now emits `run:trigger` - Set `completedAt` on cancelled runs and clean up their step data Closes #87 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename retry to retrigger in client hooks, SSE subscriber, and events - Remove run:retry event handling from subscription reducer - Fix browser useJobRun to show 'pending' status when runId is set - Update all example dashboards to use handleRetrigger Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename retry→retrigger across all API docs, guides, and examples - Remove run:retry event and RunRetryEvent from event docs - Add cleanupSteps option to createDurably docs - Update HTTP endpoint /retry→/retrigger - Update useRunActions docs (retrigger returns Promise<string>) - Regenerate llms.txt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove deleteSteps from cancel() — let worker's finally block handle it consistently, avoiding double-delete on cancelled running jobs - Wrap deleteSteps in try-catch in worker finally block to prevent cleanup errors from blocking worker teardown - Remove unnecessary getRun() DB read in finally — worker always reaches terminal state after try/catch Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughRenames "retry" to "retrigger" across code and docs, changes retrigger to create a fresh run (returns new runId) and removes the run:retry event; adds a cleanupSteps option to delete persisted step outputs when runs reach terminal states. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as React Client
participant API as Durably API
participant Storage as Storage Layer
participant DB as Database
Client->>API: retrigger(runId)
activate API
API->>Storage: getRun(runId)
activate Storage
Storage->>DB: SELECT durably_runs WHERE id=runId
DB-->>Storage: return run (terminal)
Storage-->>API: return run input/options
deactivate Storage
API->>Storage: createRun(input, options, labels)
activate Storage
Storage->>DB: INSERT INTO durably_runs (...)
DB-->>Storage: return new run record
Storage-->>API: return newRun
deactivate Storage
API->>API: emit event (type: 'run:trigger', runId: newRun.id, input)
API-->>Client: resolve Promise with newRun.id
deactivate API
Client->>Client: update UI with new runId / subscribe to run
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Fix "Retry" → "Retrigger" button text in all 3 example dashboards - Clean up step data when cancelling pending runs (no worker to handle it) - Respect cleanupSteps config flag in cancel path - Add cleanupSteps to DurablyState for access in createDurablyInstance Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
packages/durably-react/tests/browser/use-job-run.test.tsx (1)
242-242:⚠️ Potential issue | 🟡 MinorTest name inconsistent with retrigger rename.
The test name still uses "retried" terminology while the implementation and PR objectives use "retrigger". Other tests in this file (lines 292, 356) have been correctly renamed.
Proposed fix
- it('resets status when run is retried', async () => { + it('resets status when run is retriggered', async () => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably-react/tests/browser/use-job-run.test.tsx` at line 242, The test description string is inconsistent: change the "it" block titled "resets status when run is retried" to use the new terminology (e.g., "resets status when run is retriggered" or "resets status when run is retrigger") so it matches other tests in use-job-run.test.tsx; locate the offending "it" block (the test that currently starts with the string "resets status when run is retried") and update that description to the canonical "retrigger"/"retriggered" wording used by the tests at lines 292 and 356.examples/spa-vite-react/src/components/dashboard.tsx (1)
206-212:⚠️ Potential issue | 🟡 MinorUpdate the button copy to
Retrigger.The handler and API were renamed, but the visible label still says
Retry. That leaves the example UI out of sync with the rest of the docs and examples.✏️ Suggested fix
- Retry + Retrigger🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/spa-vite-react/src/components/dashboard.tsx` around lines 206 - 212, The button label is out of sync: update the visible text from "Retry" to "Retrigger" where the button invoking handleRetrigger(run.id) is rendered; locate the JSX that defines the button (the onClick calling handleRetrigger and className shown) and change the inner text node to "Retrigger" so the UI matches the renamed handler/API.packages/durably/tests/shared/server.shared.ts (1)
824-849:⚠️ Potential issue | 🟠 MajorReplace the fixed delay with a state-based wait.
Line 827 assumes the first run is terminal after 200 ms, but
retrigger()is rejected for pending runs and this can still race on slower CI nodes. Wait ford.getRun(run.id)to becomefailedhere, like the other retrigger tests already do.Proposed test hardening
const run = await d.jobs.job.trigger({}) d.start() - - await new Promise((r) => setTimeout(r, 200)) + await vi.waitFor( + async () => { + const updated = await d.getRun(run.id) + expect(updated?.status).toBe('failed') + }, + { timeout: 1000 }, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/tests/shared/server.shared.ts` around lines 824 - 849, The fixed 200ms delay is flaky; instead poll the run's state until it's terminal (failed) before proceeding: after triggering the run with d.jobs.job.trigger() and calling d.start(), replace the setTimeout-based wait with a loop that repeatedly calls d.getRun(run.id) (or an awaited helper) and awaits until the returned run.status === 'failed' (with a short sleep/backoff between attempts and a sensible timeout to avoid endless loops), then continue to open the subscribe request and call d.retrigger(run.id).
🧹 Nitpick comments (2)
examples/spa-react-router/app/routes/_index/dashboard.tsx (1)
209-215: Consider: Button label "Retry" vs API name "retrigger".The button text still reads "Retry" (line 214) while the underlying action is now
retrigger. This could be intentional UX—"Retry" is more user-friendly than "Retrigger"—but creates a terminology mismatch with the API.If this is intentional for better UX, no change needed. If you prefer consistency with the API rename, consider updating the label.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/spa-react-router/app/routes/_index/dashboard.tsx` around lines 209 - 215, The button label currently reads "Retry" while it invokes handleRetrigger (which calls the re-trigger API); decide on consistent terminology and update the UI or code: either change the button text to "Retrigger" to match the API/handler name (update the JSX button text where handleRetrigger(run.id) is used) or rename the handler/function (e.g., handleRetry -> handleRetrigger or vice versa) so the label and function name match; ensure any related usages of handleRetrigger and references to re-trigger/retry are updated accordingly.website/public/llms.txt (1)
706-706: LGTM: React hooks correctly updated to use retrigger.All references to the retry API have been correctly renamed to retrigger, including hook destructuring, button handlers, and UI labels. The code examples are accurate and consistent with the API changes.
📝 Optional: Document return type for useRunActions.retrigger
While the current example correctly demonstrates usage, consider adding an example that shows the return value to improve completeness:
const handleRetrigger = async (runId: string) => { const newRunId = await retrigger(runId) console.log('New run:', newRunId) }Per the PR summary,
useRunActions().retrigger(runId)returnsPromise<string>(the new run ID), which differs fromdurably.retrigger(runId)that returnsPromise<Run>.Also applies to: 880-895
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@website/public/llms.txt` at line 706, Add an example and brief note showing the return value of useRunActions().retrigger to clarify its type: update the example around useRunActions / retrigger (and optionally the similar examples at lines 880-895) to await retrigger(runId) and indicate it returns Promise<string> (the new run ID), and mention that this differs from durably.retrigger(runId) which returns Promise<Run>; reference the hook name useRunActions and the function retrigger in the text so readers can find and update the examples and docs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CLAUDE.md`:
- Line 31: Clarify that retrigger() does not mutate the original run but instead
creates and returns a brand-new Run with a new run ID: update the line to
explicitly state that retrigger() returns a new Run (new run ID) and the
original run remains unchanged so callers must treat it as a different entity;
reference the retrigger() API and the Run/run ID in the sentence to make the
breaking-change explicit.
- Line 47: The phrasing about `cleanupSteps` is ambiguous: change the line for
`cleanupSteps` to explicitly state that step outputs are persisted only until
the run reaches a terminal state when `cleanupSteps: true` (i.e., outputs are
temporary and removed at terminal state), and update or add a clarifying note
near the earlier “step return value is persisted” text to avoid contradiction by
explaining that persistence is temporary unless `cleanupSteps` is disabled;
reference `cleanupSteps`, “step return value is persisted”, and “terminal state”
when making the edits.
In `@packages/durably/src/durably.ts`:
- Around line 396-423: The retrigger method currently creates a run directly via
storage.createRun and emits run:trigger without checking jobRegistry; change
retrigger to first look up the job in jobRegistry (e.g., jobRegistry.get or has
for run.jobName) and throw an error if the job is not registered, then reuse the
registered job handle's trigger(...) path (call its trigger method so
input/label validation and normal run creation are applied) instead of calling
storage.createRun and eventEmitter.emit directly; keep the existing status
checks in retrigger and only delegate to the job handle's trigger to produce the
new Run.
In `@packages/durably/src/worker.ts`:
- Around line 214-220: The cleanup branch currently calls
storage.deleteSteps(run.id) unconditionally in the finally block; change it to
first verify the run is persisted in a terminal state (e.g., query the persisted
run status via the same storage/read method used elsewhere) and only call
storage.deleteSteps(run.id) if that persisted status is terminal; also catch and
surface any cleanup errors by emitting a worker:error (or using the existing
worker event emitter) instead of swallowing them silently, and keep the
config.cleanupSteps feature flag check around this guarded delete; look for the
finally block that references config.cleanupSteps and storage.deleteSteps to
make these changes.
In `@packages/durably/tests/shared/server.shared.ts`:
- Around line 540-543: The test currently asserts a transient 'pending' status
for the retriggered run (expect(updated?.status).toBe('pending')), which is
flaky because the worker may have already moved the run to another state; change
this by removing that exact pending assertion and instead wait for a
stable/terminal state for body.runId: poll d.getRun(body.runId) (or use an
existing helper like waitForRunToFinish if present) until its status is in a
terminal set (e.g., 'completed' or 'failed') and then assert the expected
terminal status or at least that the run exists and its id differs from run.id
(expect(body.runId).not.toBe(run.id) remains).
In `@packages/durably/tests/shared/step.shared.ts`:
- Around line 107-115: The test currently waits only for the run status to
become 'completed' (via vi.waitFor calling d.jobs.job.getRun) but then
immediately asserts that d.storage.getSteps(run.id) has length 0, causing a
race; modify the vi.waitFor callback to also check that (await
d.storage.getSteps(run.id)).length === 0 so the empty-steps assertion is retried
until cleanup finishes (i.e., keep both the status check and the expect(await
d.storage.getSteps(run.id)).toHaveLength(0) inside the vi.waitFor block
referencing vi.waitFor, d.jobs.job.getRun, d.storage.getSteps and run.id).
In `@website/guide/quick-start.md`:
- Around line 105-106: The example currently misrepresents resumability because
retrigger() starts a new run and cleanupSteps: true removes step outputs; update
the quick-start example to either rename the section to indicate "retrigger
starts a fresh run" or modify the example to demonstrate actual resume behavior
by disabling cleanup (set cleanupSteps: false) or using a non‑terminal
interruption so that retrigger() can resume existing step outputs; reference the
retrigger() call and the cleanupSteps configuration in the text so readers see
the corrected behavior.
---
Outside diff comments:
In `@examples/spa-vite-react/src/components/dashboard.tsx`:
- Around line 206-212: The button label is out of sync: update the visible text
from "Retry" to "Retrigger" where the button invoking handleRetrigger(run.id) is
rendered; locate the JSX that defines the button (the onClick calling
handleRetrigger and className shown) and change the inner text node to
"Retrigger" so the UI matches the renamed handler/API.
In `@packages/durably-react/tests/browser/use-job-run.test.tsx`:
- Line 242: The test description string is inconsistent: change the "it" block
titled "resets status when run is retried" to use the new terminology (e.g.,
"resets status when run is retriggered" or "resets status when run is
retrigger") so it matches other tests in use-job-run.test.tsx; locate the
offending "it" block (the test that currently starts with the string "resets
status when run is retried") and update that description to the canonical
"retrigger"/"retriggered" wording used by the tests at lines 292 and 356.
In `@packages/durably/tests/shared/server.shared.ts`:
- Around line 824-849: The fixed 200ms delay is flaky; instead poll the run's
state until it's terminal (failed) before proceeding: after triggering the run
with d.jobs.job.trigger() and calling d.start(), replace the setTimeout-based
wait with a loop that repeatedly calls d.getRun(run.id) (or an awaited helper)
and awaits until the returned run.status === 'failed' (with a short
sleep/backoff between attempts and a sensible timeout to avoid endless loops),
then continue to open the subscribe request and call d.retrigger(run.id).
---
Nitpick comments:
In `@examples/spa-react-router/app/routes/_index/dashboard.tsx`:
- Around line 209-215: The button label currently reads "Retry" while it invokes
handleRetrigger (which calls the re-trigger API); decide on consistent
terminology and update the UI or code: either change the button text to
"Retrigger" to match the API/handler name (update the JSX button text where
handleRetrigger(run.id) is used) or rename the handler/function (e.g.,
handleRetry -> handleRetrigger or vice versa) so the label and function name
match; ensure any related usages of handleRetrigger and references to
re-trigger/retry are updated accordingly.
In `@website/public/llms.txt`:
- Line 706: Add an example and brief note showing the return value of
useRunActions().retrigger to clarify its type: update the example around
useRunActions / retrigger (and optionally the similar examples at lines 880-895)
to await retrigger(runId) and indicate it returns Promise<string> (the new run
ID), and mention that this differs from durably.retrigger(runId) which returns
Promise<Run>; reference the hook name useRunActions and the function retrigger
in the text so readers can find and update the examples and docs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 46b0261e-9fad-440f-bbf4-30170003976a
📒 Files selected for processing (47)
CLAUDE.mdexamples/fullstack-react-router/app/routes/_index/dashboard.tsxexamples/fullstack-react-router/app/routes/api.durably.$.tsexamples/spa-react-router/README.mdexamples/spa-react-router/app/routes/_index/dashboard.tsxexamples/spa-vite-react/src/components/dashboard.tsxpackages/durably-react/docs/llms.mdpackages/durably-react/src/client/create-durably.tspackages/durably-react/src/client/use-run-actions.tspackages/durably-react/src/client/use-runs.tspackages/durably-react/src/hooks/use-job-run.tspackages/durably-react/src/hooks/use-job-subscription.tspackages/durably-react/src/hooks/use-runs.tspackages/durably-react/src/shared/durably-event-subscriber.tspackages/durably-react/src/shared/event-subscriber.tspackages/durably-react/src/shared/sse-event-subscriber.tspackages/durably-react/src/shared/subscription-reducer.tspackages/durably-react/src/shared/use-subscription.tspackages/durably-react/src/types.tspackages/durably-react/tests/browser/use-job-run.test.tsxpackages/durably-react/tests/client/use-job-run.test.tsxpackages/durably-react/tests/client/use-run-actions.test.tsxpackages/durably/docs/llms.mdpackages/durably/src/durably.tspackages/durably/src/events.tspackages/durably/src/index.tspackages/durably/src/server.tspackages/durably/src/storage.tspackages/durably/src/worker.tspackages/durably/tests/shared/recovery.shared.tspackages/durably/tests/shared/server.shared.tspackages/durably/tests/shared/step.shared.tswebsite/.vitepress/config.tswebsite/api/create-durably.mdwebsite/api/durably-react/fullstack.mdwebsite/api/durably-react/index.mdwebsite/api/durably-react/spa.mdwebsite/api/events.mdwebsite/api/http-handler.mdwebsite/api/index.mdwebsite/guide/auth.mdwebsite/guide/error-handling.mdwebsite/guide/fullstack-mode.mdwebsite/guide/multi-tenant.mdwebsite/guide/quick-start.mdwebsite/guide/server-mode.mdwebsite/public/llms.txt
💤 Files with no reviewable changes (11)
- website/api/events.md
- packages/durably-react/src/shared/subscription-reducer.ts
- packages/durably-react/src/shared/durably-event-subscriber.ts
- packages/durably-react/src/hooks/use-runs.ts
- packages/durably-react/src/shared/event-subscriber.ts
- packages/durably-react/src/hooks/use-job-subscription.ts
- packages/durably/src/index.ts
- packages/durably-react/src/shared/use-subscription.ts
- packages/durably-react/src/shared/sse-event-subscriber.ts
- packages/durably-react/src/types.ts
- packages/durably/src/events.ts
- Validate job exists in jobRegistry before retrigger (fail fast) - Clarify CLAUDE.md: retrigger returns new run ID, step persistence is temporary with cleanupSteps - Fix flaky server test: wait for terminal state instead of pending - Fix step cleanup test race: move getSteps assertion into waitFor - Clarify quick-start: retrigger creates a new run (new ID) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/durably/tests/shared/server.shared.ts (1)
818-853:⚠️ Potential issue | 🟠 MajorWait for the original run to fail before retriggering.
The fixed delay in this block can still leave the run in
pendingorrunning, sod.retrigger(run.id)intermittently throws on slower dialects and CI. Replace the sleep withvi.waitForond.getRun(run.id)?.status === 'failed'.🛠️ Proposed test hardening
const run = await d.jobs.job.trigger({}) d.start() - await new Promise((r) => setTimeout(r, 200)) + await vi.waitFor( + async () => { + const updated = await d.getRun(run.id) + expect(updated?.status).toBe('failed') + }, + { timeout: 1000 }, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/durably/tests/shared/server.shared.ts` around lines 818 - 853, The test currently uses a fixed sleep before calling d.retrigger(run.id), which can allow the original run to still be pending/running causing intermittent failures; replace the sleep with a wait that polls until the original run has failed by using vi.waitFor(() => d.getRun(run.id)?.status === 'failed') (or equivalent test-wait helper) before invoking d.retrigger(run.id), keeping the rest of the test (request/reader logic) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@packages/durably/tests/shared/server.shared.ts`:
- Around line 818-853: The test currently uses a fixed sleep before calling
d.retrigger(run.id), which can allow the original run to still be
pending/running causing intermittent failures; replace the sleep with a wait
that polls until the original run has failed by using vi.waitFor(() =>
d.getRun(run.id)?.status === 'failed') (or equivalent test-wait helper) before
invoking d.retrigger(run.id), keeping the rest of the test (request/reader
logic) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 912f18fe-4678-4fb6-b664-f0a62b64c7d2
📒 Files selected for processing (8)
CLAUDE.mdexamples/fullstack-react-router/app/routes/_index/dashboard.tsxexamples/spa-react-router/app/routes/_index/dashboard.tsxexamples/spa-vite-react/src/components/dashboard.tsxpackages/durably/src/durably.tspackages/durably/tests/shared/server.shared.tspackages/durably/tests/shared/step.shared.tswebsite/guide/quick-start.md
🚧 Files skipped from review as they are similar to previous changes (2)
- website/guide/quick-start.md
- examples/spa-vite-react/src/components/dashboard.tsx
Summary
Closes #87
cleanupStepsoption (default:true) — automatically deletes step output data when runs reach terminal state (completed/failed/cancelled), preventing unbounded DB growthretry()→retrigger()— creates a fresh run (new ID) instead of resetting the original, preserving run history. ReturnsPromise<Run>with the new runrun:retryevent removed —retrigger()emitsrun:triggerinsteadfinallyblock — wrapped in try-catch, avoids extra DB read, handles cancel consistentlyBreaking Changes
durably.retry(runId)→durably.retrigger(runId)(returns newRuninstead ofvoid)run:retryevent removed (userun:triggerfrom retrigger)RunRetryEventtype removedRunOperationtype:'retry'→'retrigger'POST /retry→POST /retriggeruseRunActions:retry→retrigger(returnsPromise<string>new run ID)cleanupSteps: false)Test plan
pnpm validatepasses (format, lint, typecheck, tests)/retriggerendpoint, SSE emitsrun:triggeruseRunActions.retriggerreturns new run IDretry/run:retry/RunRetryEventreferences🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Breaking Changes