fix(isr): KVCacheHandler.set() now awaits KV put to prevent perpetual STALE#425
fix(isr): KVCacheHandler.set() now awaits KV put to prevent perpetual STALE#425james-elicx merged 2 commits intomainfrom
Conversation
… STALE Background ISR regen was calling ctx.waitUntil(renderFn()) where renderFn() resolved before the KV put network operation completed. set() returned Promise.resolve() immediately, so await __isrSet() in the regen resolved early, the waitUntil expired, and the Workers runtime killed the in-flight KV write — leaving the cache entry perpetually STALE on every request after the TTL. Fix: replace the fire-and-forget _putInBackground() with _put() which returns the actual kv.put() promise. set() now returns that promise, so await __isrSet() only resolves after the write is fully persisted. ctx.waitUntil() is also still registered as belt-and-suspenders. Adds three regression tests covering the STALE → regen → HIT lifecycle with a delayed KV put mock to prove the set() promise is not an immediately-resolved stub.
|
/bigbonk review |
commit: |
|
There was a problem hiding this comment.
Review: fix(isr): KVCacheHandler.set() now awaits KV put to prevent perpetual STALE
Good fix. The root cause analysis is correct and the change is minimal and targeted. The await chain from ctx.waitUntil(renderFn()) through __isrSet() through handler.set() to kv.put() is now properly connected end-to-end.
Correctness
The fix is sound. Before this change, set() returned Promise.resolve() which meant the renderFn() promise resolved immediately, ctx.waitUntil had nothing left to keep the isolate alive, and the actual kv.put() network operation was silently killed by the Workers runtime. Now set() returns this._put(...) which returns the real kv.put() promise, so the full chain stays alive until the write completes.
The dual registration pattern (return the promise AND ctx.waitUntil(promise)) is fine — waitUntil just keeps the isolate alive, it doesn't interfere with promise resolution semantics. In the regen path, both the outer ctx.waitUntil(renderFnPromise) and the inner ctx.waitUntil(kvPutPromise) point at the same underlying work, which is harmless redundancy.
The Node.js path (no ctx) now correctly depends on the caller awaiting the returned promise, which isrSet() and __isrSet() both do via await handler.set(...). This is actually better than the old behavior where Node.js was also fire-and-forget on put.
The asymmetry with _deleteInBackground (still fire-and-forget) is intentional and correct — deletes are cleanup for corrupted/invalidated entries and don't affect correctness if they're delayed or lost. Cache writes during regen must complete or the entry stays stale forever, which is exactly the bug this fixes.
Tests
The three regression tests are well-structured and prove the right things:
- The basic smoke test is simple but establishes the baseline invariant.
- The latch-controlled test (test 2) is the key one — it definitively proves
set()is coupled tokv.put()completion by showing the promise stays pending while the latch is held. - The regen simulation (test 3) validates the full chain from
renderFnthroughctx.waitUntil, matching the actual pattern inapp-rsc-entry.ts.
Minor nits
Two small things that don't block merge:
-
Dead variable in test 2 (line 504):
putResolvedBeforeSetAwaitedis declared but never read. Looks like it was part of an earlier draft of the test. Can be removed in a follow-up. -
Extra blank line at line 282-283 in
kv-cache-handler.ts— there's a double blank line between_deleteInBackgroundand the_putdoc comment. Cosmetic only.
LGTM — clean fix with good test coverage for a real production bug.
| }); | ||
|
|
||
| it("set() returned promise is the kv.put promise (not an immediately-resolved stub)", async () => { | ||
| // Swap out kv.put with a delayed version so we can verify that the |
There was a problem hiding this comment.
Nit: putResolvedBeforeSetAwaited is declared but never read — leftover from an earlier draft?
|
Review posted with approval and inline comment. Here's a summary: Approved PR #425 — The fix is correct and well-tested. What the PR does: The fix: Renames One nit flagged: Dead variable |
Problem
After a cache entry's TTL expires, every subsequent request returned
X-Vinext-Cache: STALEand never transitioned toHIT. Background regeneration appeared to trigger but the cache was never updated.Root cause:
KVCacheHandler.set()called_putInBackground()and immediately returnedPromise.resolve(). In the background regen path inapp-rsc-entry.ts:```js
await Promise.all([
__isrSet(htmlKey, ...), // resolves immediately — KV put not done
__isrSet(rscKey, ...),
]);
// renderFn() resolves here, before KV write completes
```
ctx.waitUntil(renderFn())expired as soon asrenderFn()resolved. The Cloudflare Workers runtime tore down the isolate while thekv.put()network operation was still in flight, silently killing the write. The entry stayed STALE forever.The MISS path was not affected — it tees the HTML stream and builds an explicit
__cachePromisethat fully awaits the write beforectx.waitUntilcan expire.Fix
Replace the fire-and-forget
_putInBackground()with_put()which returns the actualkv.put()promise.set()now returns that promise instead ofPromise.resolve().```ts
// Before
this._putInBackground(kvKey, json, opts);
return Promise.resolve(); // resolves before KV write
// After
return this._put(kvKey, json, opts); // resolves when KV write completes
```
_put()also registers withctx.waitUntil()as belt-and-suspenders (same as before), so the Workers runtime keeps the isolate alive even if a caller doesn'tawaitthe returned promise.The full regen chain is now properly awaited end-to-end:
ctx.waitUntil(renderFn())→await Promise.all([__isrSet(...)])→await handler.set(...)→return this._put(...)→return kv.put(...)Tests
Adds three regression tests in
tests/kv-cache-handler.test.tsunder "STALE → regen → HIT lifecycle":set()resolves only after the KV put completesset()returned promise is the actualkv.putpromise, not an immediately-resolved stub (uses a latch-controlled mock to prove this)waitUntilcovers the actual KV write with a delayed put — regen promise does not settle until the write completes