Skip to content

Fix fetch cache key collisions for Request and FormData bodies#332

Open
JaredStowell wants to merge 7 commits intocloudflare:mainfrom
JaredStowell:jstowell/fix-cache-key-collisions
Open

Fix fetch cache key collisions for Request and FormData bodies#332
JaredStowell wants to merge 7 commits intocloudflare:mainfrom
JaredStowell:jstowell/fix-cache-key-collisions

Conversation

@JaredStowell
Copy link

Summary

Fix fetch cache key generation so cached requests are keyed by the actual effective request body, including when the body is provided on a Request object.

This also fixes ambiguous FormData serialization that could cause distinct payloads to collapse into the same cache entry.

Problem

Vinext’s fetch cache key generation did not fully account for the request body in all supported call shapes.

Request bodies were ignored

The cache key logic correctly merged headers from both input and init, but it only serialized init.body. If the body lived on a Request object, vinext effectively treated the request as body-less for cache-key purposes.

Example:

await fetch(
  new Request("https://api.example.com/search", {
    method: "POST",
    body: JSON.stringify({ query: "alpha" }),
    headers: { "content-type": "application/json" },
  }),
  { next: { revalidate: 60 } },
)

await fetch(
  new Request("https://api.example.com/search", {
    method: "POST",
    body: JSON.stringify({ query: "bravo" }),
    headers: { "content-type": "application/json" },
  }),
  { next: { revalidate: 60 } },
)

Before this change, both requests could produce the same cache key even though the payloads were different.

That makes cached POST-style fetches unsafe: a response generated for one payload can be reused for another payload.

Concrete failure modes include:

  • a search endpoint returning results for the wrong query
  • a filtered API response being reused for the wrong filter set
  • application data keyed by request body being served from the wrong cached entry

Multi-value FormData could collide

FormData values were serialized by joining them with commas, which is ambiguous.

Example:

const formA = new FormData()
formA.append("name", "a,b")
formA.append("name", "c")
// serialized as: "name=a,b,c"

const formB = new FormData()
formB.append("name", "a")
formB.append("name", "b,c")
// also serialized as: "name=a,b,c"

These are different payloads, but they produced the same cache-key fragment.

Root Cause

The issue was in the fetch cache key builder, not in cache storage itself.

  • collectHeaders() already handled Request inputs correctly
  • buildFetchCacheKey() used serializeBody(init)
  • serializeBody() only looked at init.body
  • Request bodies were therefore omitted unless duplicated in init
  • FormData entries were flattened with comma-joining, which is not injective

In other words, the cache key did not always represent the true effective request.

What Changed

Fetch cache keying

  • Include Request object bodies in cache key generation
  • Support body extraction from Request inputs without mutating the original fetch behavior
  • Preserve the original request body for the underlying network fetch

FormData serialization

  • Replace ambiguous comma-joined serialization with structured serialization
  • Serialize per-key value lists in a format that preserves boundaries and ordering semantics
  • Keep existing oversized-body protections in place

Tests

Add regression coverage for:

  • different Request bodies producing distinct cache entries
  • identical Request bodies reusing the same cache entry
  • ambiguous comma-containing multi-value FormData payloads not colliding
  • Request bodies still being forwarded intact after cache-key generation

Examples

Example 1: POST search requests

Before:

await fetch(
  new Request("https://api.example.com/search", {
    method: "POST",
    body: JSON.stringify({ query: "alpha" }),
    headers: { "content-type": "application/json" },
  }),
  { next: { revalidate: 60 } },
)

await fetch(
  new Request("https://api.example.com/search", {
    method: "POST",
    body: JSON.stringify({ query: "bravo" }),
    headers: { "content-type": "application/json" },
  }),
  { next: { revalidate: 60 } },
)

These could collide and reuse the wrong cached response.

After:

  • the request body is included in the cache key
  • each distinct payload gets its own cache entry

Example 2: Multi-value form submissions

Before:

const formA = new FormData()
formA.append("name", "a,b")
formA.append("name", "c")

const formB = new FormData()
formB.append("name", "a")
formB.append("name", "b,c")

These serialized to the same cache-key fragment.

After:

  • FormData values are serialized in a structured format
  • these payloads no longer collide

Why This Approach

This change fixes the root cause while keeping the existing caching model intact.

The principle is simple:

  • semantically different requests must not share a cache entry
  • semantically identical requests should still reuse the same cache entry

The patch stays within the current fetch cache architecture and only changes key generation and regression coverage.

Files Changed

  • packages/vinext/src/shims/fetch-cache.ts
  • tests/fetch-cache.test.ts

Test Plan

Ran targeted regression coverage for the affected surface:

pnpm dlx vitest run tests/fetch-cache.test.ts

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 13731560dc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@james-elicx
Copy link
Collaborator

/bigbonk review

@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 7, 2026

Open in StackBlitz

npm i https://pkg.pr.new/vinext@332

commit: abfb62d

Copy link
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

The problem this PR addresses is real and the fix direction is correct: Request object bodies were silently ignored in cache key generation, and the old FormData serialization was ambiguous. Good test coverage for the new cases.

I found a few issues — one correctness concern, one memory safety gap, and two housekeeping items.


Issues Found

1. (Bug) Confusing double-clone and silent fallthrough for FormData parsing failure — see inline comments on lines 213 and 224.

2. (Memory) Full body buffered before size check for Request bodies — same as Codex flagged. See inline comment on line 224.

3. (Housekeeping) Cache key version not bumped

The FormData serialization format changed from key=val1,val2 to JSON.stringify([key, [{kind,value},...]]). The CACHE_KEY_PREFIX on line 40 should be bumped to "v3" to make the format boundary explicit and prevent ambiguity during rollbacks or gradual deployments. Not strictly required for correctness (old entries won't match the new format), but it's good hygiene.

4. (Minor) FormData.keys() iteration order

The extracted serializeFormData helper (line 95) iterates FormData keys via new Set(formData.keys()). Iteration order is insertion-order in all major engines but not spec-mandated. Two FormData objects with the same entries appended in different orders will produce different cache keys. This is arguably correct (different wire order = different payload), just worth being aware of. The new Set() dedup combined with getAll() is the right approach.

5. (Test gap) Already-consumed Request body

There's no test for the case where a Request body has already been consumed (locked/disturbed stream). Currently input.clone() inside serializeBody would throw. It might be worth adding a test documenting this behavior — either it throws and the caller skips caching, or it should be handled gracefully.

const call = fetchMock.mock.calls[0];
const forwardedRequest = call[0] as Request;
expect(forwardedRequest).toBeInstanceOf(Request);
expect(await forwardedRequest.text()).toBe("request-body-content");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test. One edge case worth considering for future coverage: what happens when the Request body has already been consumed (i.e., input.body is a locked/disturbed stream)? Currently input.clone() inside serializeBody would throw. A test documenting whether that throws or gracefully skips caching would be helpful.

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 7, 2026

Review posted successfully. Here's a summary of the feedback:

Key findings:

  1. Redundant clone + silent fallthrough (bug): Line 213 creates input.clone() when requestClone already exists. More importantly, if .formData() parsing fails, the error is swallowed and execution falls through to raw arrayBuffer() keying — which produces unstable cache keys for multipart bodies (boundary strings differ between identical logical payloads). Suggested either using requestClone and not falling through, or skipping caching entirely on parse failure.

  2. Memory: full body buffered before size check (same issue Codex flagged): Line 224 reads the entire body via arrayBuffer() before checking the 1 MiB limit. Suggested adding a Content-Length pre-check as a fast-path rejection, and ideally streaming the body with incremental size guards for full parity with the ReadableStream path.

  3. Cache key version not bumped: The FormData serialization format changed substantially but CACHE_KEY_PREFIX is still "v2". Should be bumped to "v3".

  4. FormData.keys() iteration order: Noted as an acceptable design choice — insertion order in practice, not spec-mandated.

  5. Missing test for already-consumed Request body: input.clone() would throw on a disturbed stream. Worth documenting this behavior with a test.

github run

@southpolesteve
Copy link
Collaborator

/bigbonk re-review please

Copy link
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR fixes two real correctness bugs in fetch cache key generation:

  1. Request object bodies were silently ignored in cache key computation
  2. FormData comma-joined serialization was ambiguous/non-injective

The approach is sound — the fix stays within the existing architecture and only changes key generation. Good test coverage (16 new tests covering the matrix of Request bodies, FormData collisions, multipart boundary normalization, malformed form bodies, and body passthrough). All 71 tests pass.

I see the previous review from ask-bonk already flagged several items, and I can confirm that some have been addressed in subsequent commits (cache key version bump to v3, Content-Length pre-check, SkipCacheKeyGenerationError for malformed bodies). A few remaining observations below.

Overall this is solid work. The issues below are minor-to-medium.

const boundedRequest = new Request(input.url, {
method: input.method,
headers: contentType ? { "content-type": contentType } : undefined,
body: new Blob(chunks.map((chunk) => chunk.slice())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary .slice() on chunks.

The chunks array was just read from a cloned request's stream in readRequestBodyChunksWithinLimit. These Uint8Array buffers are already owned by this function — nobody else holds a reference to them. The .slice() here creates a defensive copy of each chunk, but it's not needed since the buffers were freshly read from requestClone.body (which itself was cloned from input). The original input request body is untouched.

This is a minor allocation concern, not a bug.

Suggested change
body: new Blob(chunks.map((chunk) => chunk.slice())),
body: new Blob(chunks),

pushBodyChunk(JSON.stringify([key, { kind: "string", value: val }]));
continue;
}
if (val.size > MAX_CACHE_KEY_BODY_BYTES || getTotalBodyBytes() + val.size > MAX_CACHE_KEY_BODY_BYTES) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File size check doesn't account for JSON serialization overhead.

The size check uses val.size (raw file bytes), but pushBodyChunk then calls JSON.stringify(...) which adds the key name, {kind: "file", name: ..., type: ..., value: ...} wrapper, and JSON escaping overhead. For a file right at the size limit, the serialized JSON string will be larger than val.size, potentially exceeding the limit without being caught by this pre-check.

In practice this is unlikely to matter (the pushBodyChunk function itself also checks totalBodyBytes against the limit), but the pre-check here is misleadingly precise. Not blocking — just noting the discrepancy.

}
}

for (const chunk of chunks) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-form Request bodies don't go through pushBodyChunk for size checking.

When a Request body has a non-form content type, the chunks are decoded and pushed directly into bodyChunks (line 315) without going through pushBodyChunk, which means totalBodyBytes is not updated and the per-chunk size limit in pushBodyChunk is never hit. Only the final trailing chunk on line 319 goes through pushBodyChunk.

This is partially mitigated by the streaming size check in readRequestBodyChunksWithinLimit (which enforces the 1 MiB limit during reading), so this won't cause an actual over-limit key. But it means totalBodyBytes is inaccurate for the rest of the function, which could matter if future code depends on it.

Suggested change
for (const chunk of chunks) {
for (const chunk of chunks) {
pushBodyChunk(decoder.decode(chunk, { stream: true }));
}

});
const formData = await boundedRequest.formData();
await serializeFormData(formData, pushBodyChunk, getTotalBodyBytes);
canonicalizedContentType = formContentType === "multipart/form-data" && contentType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boundary stripping only applies to multipart/form-data, not application/x-www-form-urlencoded.

This is correct behavior (urlencoded doesn't have boundaries), but the conditional reads a bit awkwardly. Consider adding a brief comment explaining the asymmetry, since a reader might wonder why only multipart gets canonicalizedContentType treatment.

Also: when formContentType is "application/x-www-form-urlencoded", canonicalizedContentType stays undefined, so the original content-type header (including any charset=... params) remains in the cache key via collectHeaders. The test on line 525 ("urlencoded Request bodies with different charset headers get separate cache entries") confirms this works correctly. Good.

const call = fetchMock.mock.calls[0];
const forwardedRequest = call[0] as Request;
expect(forwardedRequest).toBeInstanceOf(Request);
expect(await forwardedRequest.text()).toBe("request-body-content");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test, but it's testing body passthrough for cache misses only.

This verifies that input (the Request) still has a consumable body after cache key generation on a cache miss. Worth noting: on a cache hit, the input Request is never forwarded to originalFetch, so the body state doesn't matter. And on the stale-while-revalidate path (line 558 in fetch-cache.ts), input is forwarded to originalFetch for the background refetch — so the body also needs to be intact there. This test implicitly covers that since it's the same input reference, but an explicit test for the SWR path with a Request body would be stronger.

Not blocking — the current coverage is adequate for the fix.

expect(fetchMock).toHaveBeenCalledTimes(1);
});

it("malformed multipart Request bodies bypass cache instead of hashing raw bytes", async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test for malformed multipart fallback behavior.

This confirms that malformed multipart bodies (where the boundary in the header doesn't match the body) bypass cache entirely (via SkipCacheKeyGenerationError) rather than falling through to raw-byte keying. This directly addresses the concern from the previous review about unstable cache keys from boundary-dependent raw bytes.

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 7, 2026

Review submitted successfully. Here's a summary of my feedback on PR #332:

Overall assessment: The PR fixes two real correctness bugs and the approach is sound. All 71 tests pass. The previous review's main concerns (cache key version bump, Content-Length pre-check, malformed body fallthrough) have been addressed in subsequent commits.

Remaining observations:

  1. Unnecessary .slice() on chunks (line 298) — The Uint8Array buffers in chunks are already freshly read from a cloned request body. The defensive copy via .slice() is not needed and wastes allocations.

  2. Non-form Request bodies bypass pushBodyChunk size tracking (lines 314-315) — When a Request has a non-form content type, chunks are pushed directly to bodyChunks without updating totalBodyBytes. This is mitigated by the streaming size check in readRequestBodyChunksWithinLimit, but leaves totalBodyBytes inaccurate. Suggested using pushBodyChunk instead.

  3. File size pre-check doesn't account for JSON serialization overhead (line 106) — Minor discrepancy: the pre-check uses raw val.size but the actual serialized JSON is larger. Not a real bug since pushBodyChunk also enforces the limit.

  4. Positive notes: The malformed multipart test (line 501) correctly validates that broken boundaries skip cache entirely rather than falling through to raw-byte keying. The boundary-stripping logic for multipart vs urlencoded is handled correctly, confirmed by the charset test.

github run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants