Hi — sharing this as a proposal. Tried to open it as a draft PR first but external PRs are off, which is fair. Happy for this to get reshaped, dropped, or pointed at a pattern I missed.
The thing I ran into
Building some agents on the SDK, I noticed that retried @callable calls re-execute the method body. Retries come from ordinary stuff (a tab waking from sleep, a user double-clicking a spinner, a WebSocket reconnect after hibernation). For methods with side effects — paying via agents/x402, sending an email, opening a PR — re-running on a retry is a bug.
What I tried first
Before drafting anything I tried to solve it with what's already in the SDK:
startFiber({ idempotencyKey }) works, but the return type goes from Promise<Receipt> to Promise<FiberInspection>, so the method shape and the caller both have to change. Felt heavier than the problem. I might be holding it wrong.
schedule({ idempotent: true }) is the wrong shape because I want a value back synchronously.
runFiber / Workflows both feel built for "fire and recover later," whereas I wanted "run it now, give me the value." Maybe missing the intended idiom.
- Workers
RATE_LIMITER rate-limits the request but doesn't give the retried caller the prior result.
- AI Gateway is great for LLM-response caching, doesn't cover
sendEmail / bookMeeting / refundOrder.
- Rolling a
this.sql dedup table is what I ended up doing across a few agents. Got two things wrong on first attempts: caching failures (so retries never re-execute) and the concurrent-pending case (both in-flight calls execute). Those are the bits I'd love the SDK pattern on.
If startFiber is actually the intended answer, a docs note pointing at it would already help a lot.
The proposal
Optional idempotencyKey and idempotencyTtlMs on @callable:
@callable({
idempotencyKey: ([cartId]) => cartId,
idempotencyTtlMs: 60_000,
})
async checkout(cartId: string): Promise<Receipt> { ... }
First call executes and caches. Concurrent calls with the same key wait. Subsequent calls within the TTL return the cached value without re-executing. Thrown exceptions aren't cached. Streaming + idempotency is rejected at dispatch (caching a stream is a separate problem). Return type stays the same. Two new rpc:idempotent_hit / rpc:idempotent_miss events on the existing rpc channel.
The biggest design choice I'd push back on myself: function-as-key rather than string-as-key like startFiber, because RPC args vary per call. Might be the wrong call.
Working diff
I have a fully implemented and tested version of this on a personal fork: https://github.com/mconroy-cf/agents/tree/mconroy/callable-idempotency
It's a +837/-6 diff across 9 files (new SQLite table, schema 8→9, private _dispatchCallableWithIdempotency helper, one-line swap at the existing dispatcher's non-streaming call site, two observability events, 15 new tests, DDL snapshot + destroy cleanup updated, changeset added).
npm run check and nx run agents:test:workers clean: 1157 passed / 7 skipped / 0 failed, no regressions on the 1142-test baseline.
If you'd want to look at it that way, branch is there. If you'd rather see an explicit RFC or just keep the conversation here, equally fine.
Things I'd flag
- Function-as-key vs string-as-key is the most likely sticking point.
- Capped
idempotencyTtlMs at 24h to nudge callers toward thinking about staleness.
- Lazy prune uses two-step SELECT-then-DELETE because I wasn't sure DO SQLite supports
DELETE ... LIMIT.
- Per-DO-instance only. Cross-instance dedup is a different problem.
- No
invalidate(method, key) API in this cut.
I'm not married to this landing. Felt better to lead with something concrete than open with a wall of questions, but if you'd rather see an RFC first or already have something in flight that subsumes it, just say.
Hi — sharing this as a proposal. Tried to open it as a draft PR first but external PRs are off, which is fair. Happy for this to get reshaped, dropped, or pointed at a pattern I missed.
The thing I ran into
Building some agents on the SDK, I noticed that retried
@callablecalls re-execute the method body. Retries come from ordinary stuff (a tab waking from sleep, a user double-clicking a spinner, a WebSocket reconnect after hibernation). For methods with side effects — paying viaagents/x402, sending an email, opening a PR — re-running on a retry is a bug.What I tried first
Before drafting anything I tried to solve it with what's already in the SDK:
startFiber({ idempotencyKey })works, but the return type goes fromPromise<Receipt>toPromise<FiberInspection>, so the method shape and the caller both have to change. Felt heavier than the problem. I might be holding it wrong.schedule({ idempotent: true })is the wrong shape because I want a value back synchronously.runFiber/ Workflows both feel built for "fire and recover later," whereas I wanted "run it now, give me the value." Maybe missing the intended idiom.RATE_LIMITERrate-limits the request but doesn't give the retried caller the prior result.sendEmail/bookMeeting/refundOrder.this.sqldedup table is what I ended up doing across a few agents. Got two things wrong on first attempts: caching failures (so retries never re-execute) and the concurrent-pending case (both in-flight calls execute). Those are the bits I'd love the SDK pattern on.If
startFiberis actually the intended answer, a docs note pointing at it would already help a lot.The proposal
Optional
idempotencyKeyandidempotencyTtlMson@callable:First call executes and caches. Concurrent calls with the same key wait. Subsequent calls within the TTL return the cached value without re-executing. Thrown exceptions aren't cached. Streaming + idempotency is rejected at dispatch (caching a stream is a separate problem). Return type stays the same. Two new
rpc:idempotent_hit/rpc:idempotent_missevents on the existingrpcchannel.The biggest design choice I'd push back on myself: function-as-key rather than string-as-key like
startFiber, because RPC args vary per call. Might be the wrong call.Working diff
I have a fully implemented and tested version of this on a personal fork: https://github.com/mconroy-cf/agents/tree/mconroy/callable-idempotency
It's a +837/-6 diff across 9 files (new SQLite table, schema 8→9, private
_dispatchCallableWithIdempotencyhelper, one-line swap at the existing dispatcher's non-streaming call site, two observability events, 15 new tests, DDL snapshot + destroy cleanup updated, changeset added).npm run checkandnx run agents:test:workersclean: 1157 passed / 7 skipped / 0 failed, no regressions on the 1142-test baseline.If you'd want to look at it that way, branch is there. If you'd rather see an explicit RFC or just keep the conversation here, equally fine.
Things I'd flag
idempotencyTtlMsat 24h to nudge callers toward thinking about staleness.DELETE ... LIMIT.invalidate(method, key)API in this cut.I'm not married to this landing. Felt better to lead with something concrete than open with a wall of questions, but if you'd rather see an RFC first or already have something in flight that subsumes it, just say.