perf(prerender): reuse embedded RSC payload#1097
perf(prerender): reuse embedded RSC payload#1097NathanDrake2406 wants to merge 8 commits intocloudflare:mainfrom
Conversation
commit: |
|
/bigbonk review |
There was a problem hiding this comment.
Review: perf(prerender): reuse embedded RSC payload
Good optimization. The approach of extracting the RSC payload from the already-rendered HTML avoids a redundant second RSC: 1 handler invocation during prerender. The parser is carefully written, the error handling is strict, and the test coverage is solid.
What looks good
- Strict-or-fallback design. Recognized markers trigger strict parsing (malformed = throw), while unrecognized HTML falls back gracefully to the old separate-request path. This is the right call — silent corruption of
.rscfiles would be much worse than a redundant request. safeJsonStringifyround-trip safety. The parser usesJSON.parseto reverse thesafeJsonStringifyencoding, which correctly handles the\u003c/\u003e/\u0026escapes. The test at line 73 ("</script><script>alert(1)</script>") explicitly covers the XSS breakout case.- Integration test with rscRequestCount sentinel. The
prerenderApp — RSC extractiontest (line 142) uses a mock server that returns 500 on any RSC request, assertingrscRequestCount === 0. This proves the optimization actually avoids the second request. - Legacy payload support. Keeping
__VINEXT_RSC__support means existing prerendered content and older build outputs don't break.
Comments below
| } | ||
|
|
||
| export function extractRscPayloadFromPrerenderedHtml(html: string): ExtractRscPayloadResult { | ||
| const scriptPattern = /<script(?:\s[^>]*)?>([\s\S]*?)<\/script>/gi; |
There was a problem hiding this comment.
The regex /<script(?:\s[^>]*)?>[\s\S]*?<\/script>/gi is safe here because safeJsonStringify escapes all < and > in embedded JSON payloads, so no false </script> can appear inside a script body. Worth noting for future maintainers though — if the emitter (app-ssr-stream.ts) ever stops using safeJsonStringify, this parser would silently truncate chunks. A comment here linking to the safety invariant would be helpful:
| const scriptPattern = /<script(?:\s[^>]*)?>([\s\S]*?)<\/script>/gi; | |
| // Safe because safeJsonStringify (used by createRscEmbedTransform) escapes | |
| // all '<' and '>' in embedded JSON, preventing false </script> matches. | |
| const scriptPattern = /<script(?:\s[^>]*)?>[\s\S]*?<\/script>/gi; |
|
|
||
| function parseEmbeddedJsonString(script: string, start: number): { value: string; next: number } { | ||
| let index = start; | ||
| while (/\s/.test(script[index] ?? "")) index++; |
There was a problem hiding this comment.
Minor: calling new RegExp (via /\s/) on every character in a tight loop is not ideal for performance on large HTML documents. For hot-path parsing, a simple space/tab/newline check would be faster and clearer:
| while (/\s/.test(script[index] ?? "")) index++; | |
| while (isWhitespace(script[index] ?? "")) index++; |
with a helper like:
function isWhitespace(ch: string): boolean {
return ch === " " || ch === "\t" || ch === "\n" || ch === "\r";
}Not blocking — the script contents are typically short, so this is a micro-optimization. But since this runs during build for every prerendered page, it adds up.
|
|
||
| if (trimmedScript.startsWith(RSC_CHUNK_SCRIPT_PREFIX)) { | ||
| sawRscMarker = true; | ||
| const markerIndex = trimmedScript.indexOf(RSC_CHUNK_PUSH_MARKER); |
There was a problem hiding this comment.
The indexOf search for RSC_CHUNK_PUSH_MARKER ("__VINEXT_RSC_CHUNKS__.push(") scans the full trimmed script content from position 0, but we already know the script starts with RSC_CHUNK_SCRIPT_PREFIX. Searching from the prefix length would be slightly more precise:
| const markerIndex = trimmedScript.indexOf(RSC_CHUNK_PUSH_MARKER); | |
| const markerIndex = trimmedScript.indexOf(RSC_CHUNK_PUSH_MARKER, RSC_CHUNK_SCRIPT_PREFIX.length); |
This also guards against a hypothetical edge case where the push marker text appears inside the prefix itself (currently impossible, but defensive).
| } | ||
|
|
||
| function extractLegacyRscPayload(script: string): string | null { | ||
| const trimmedScript = script.trim().replace(/;$/, ""); |
There was a problem hiding this comment.
JSON.parse can throw on malformed input. Currently, a parse error here would bubble as an unhandled exception with a generic message. Consider wrapping it to provide the same clear [vinext] Malformed legacy prerender RSC embed: prefix:
let parsed: unknown;
try {
parsed = JSON.parse(jsonSource);
} catch {
throw new Error("[vinext] Malformed legacy prerender RSC embed: invalid JSON");
}Same applies to the JSON.parse in parseEmbeddedJsonString at line 211.
| }); | ||
|
|
||
| describe("prerenderApp — RSC extraction", () => { | ||
| it("writes the .rsc file from rendered HTML without a second RSC request", async () => { |
There was a problem hiding this comment.
Great test — using a mock server that fails RSC requests is a clean way to prove the optimization works.
One gap: there's no test for the fallback path where the HTML has no RSC markers and the separate RSC request is actually made. Consider adding a companion test where the mock server returns plain HTML (no __VINEXT_RSC_CHUNKS__ markers), and verifying that rscRequestCount === 1 and the .rsc file is still written correctly. This would ensure the fallback path doesn't regress.
|
|
||
| while ((match = scriptPattern.exec(html)) !== null) { | ||
| const script = match[1] ?? ""; | ||
| const trimmedScript = script.trim().replace(/;$/, ""); |
There was a problem hiding this comment.
The trimmedScript strips a trailing semicolon with .replace(/;$/, ""), but the done marker emitted by createInlineScriptTag is self.__VINEXT_RSC_DONE__=true (no trailing semicolon). This works fine. However, note that trimmedScript is also used for the chunk scripts, but the chunk content can end with ; (from the emitter). The assertOnlyTrailingSemicolon function handles that separately, so the .replace(/;$/, "") on line 272 could cause a subtle issue if a chunk script only consists of RSC_CHUNK_SCRIPT_PREFIX followed by the push call followed by ; — the semicolon would be stripped before the startsWith check on line 288, which is actually correct. Just wanted to trace through this to confirm the logic is sound — it is.
|
Review posted successfully. Here's a summary of the feedback: Overall: Solid optimization with good test coverage. The strict-or-fallback error handling approach is well-designed. Key comments:
|
|
Bonk: the test change in |
|
/bigbonk review |
There was a problem hiding this comment.
Review: perf(prerender): reuse embedded RSC payload
Good optimization — the approach is sound and the implementation is careful. The strict-or-fallback error model is well-designed, the parser is tightly scoped to the known emitter protocol, and the test coverage is strong (both unit and integration). The previous review's suggestions around isWhitespace, indexOf offset, and JSON.parse wrapping have all been addressed in the current diff.
Two substantive items below, plus a few smaller observations.
Key concern: fixFlightHints semantic difference
The embedded RSC chunks have fixFlightHints applied (app-ssr-stream.ts:49 — rewrites as="stylesheet" → as="style" in HL hint lines), while the raw RSC stream stored via getRawBuffer() does not. The old code path fetched RSC data via a separate RSC: 1 request, which would go through the full RSC entry — the comment at app-ssr-stream.ts:46-48 says the RSC entry also fixes HL hints at the source, so both paths should produce equivalent output.
However, this equivalence depends on the RSC entry always applying fixFlightHints at the source. If that ever changes, the extracted payload would have the fix applied while a fresh RSC request would not (or vice versa). This is not a bug today, but it's a coupling worth documenting — a brief comment near the extraction call site noting that the embedded chunks include fixFlightHints transforms would help future maintainers.
app-elements.test.ts change
The author's explanation in the PR comments is correct — this is a test fixture update to match the artifactCompatibility envelope from #1062. The field is already used in ~20 other places in the same test file. Not a concern.
| continue; | ||
| } | ||
|
|
||
| if (legacyPayload === null) { |
There was a problem hiding this comment.
Mixed protocols: legacy detection should be order-independent
The guard if (legacyPayload === null) means that if a page somehow contains both a legacy __VINEXT_RSC__ embed and modern chunk scripts, the legacy payload takes precedence (sets sawRscMarker = true early). The modern chunks would still be collected into chunks[] via the startsWith(RSC_CHUNK_SCRIPT_PREFIX) block below, and since chunks.length > 0 is checked first at line 326, the modern chunks would actually win in the return value.
This is accidentally correct — modern chunks win when both are present. But the intent isn't obvious. Consider adding a brief comment like // Legacy check runs once; modern chunks take priority if both are present.
| return { status: "ok", payload: legacyPayload }; | ||
| } | ||
|
|
||
| if (sawRscMarker) { |
There was a problem hiding this comment.
Unreachable state?
This branch fires when sawRscMarker is true but chunks is empty and legacyPayload is null. The only way sawRscMarker can be true without producing chunks or a legacy payload is if:
- A
__VINEXT_RSC_DONE__script was found with zero chunk scripts, OR - A script matched the
RSC_CHUNK_SCRIPT_PREFIXbut had no push marker (which already throws at line 311).
So this only catches case 1: a done marker with no data scripts. That's a valid malformed state. The error message is slightly misleading though — "no RSC chunks were found" could be read as "we didn't see any embed markers at all" rather than "we found the protocol but it was empty." Consider:
| if (sawRscMarker) { | |
| throw new Error("[vinext] Malformed prerender RSC embed: RSC protocol markers present but no data chunks found"); |
| rscHandler(rscRequest), | ||
| ); | ||
| const rscData = rscRes.ok ? await rscRes.text() : null; | ||
| const extractedRsc = extractRscPayloadFromPrerenderedHtml(html); |
There was a problem hiding this comment.
Worth noting (for future maintainers) that the payload extracted here includes fixFlightHints transforms from the emitter (app-ssr-stream.ts:49), which rewrites as="stylesheet" → as="style" in HL hint lines. The old separate-request path produced equivalent output because the RSC entry also applies the fix at the source. If the emitter or RSC entry ever diverge on this, the .rsc files would differ depending on which path was taken.
A comment would be helpful:
| const extractedRsc = extractRscPayloadFromPrerenderedHtml(html); | |
| // NOTE: The embedded RSC chunks include fixFlightHints transforms | |
| // (applied by createRscEmbedTransform in app-ssr-stream.ts). The RSC | |
| // entry applies the same fix at the source, so both paths produce | |
| // equivalent .rsc output. | |
| const extractedRsc = extractRscPayloadFromPrerenderedHtml(html); |
| const chunks = [ | ||
| '0:D{"name":"layout"}\n', | ||
| '1:["$","div",null,{"children":"hello ) world"}]\n', | ||
| '2:["$","span",null,{"children":"</script><script>alert(1)</script>"}]\n', |
There was a problem hiding this comment.
Nice — this chunk explicitly covers the </script> breakout vector and the ) character that could confuse the push-call parser. Good defensive test case.
| } | ||
| }); | ||
|
|
||
| it("falls back to a separate RSC request when rendered HTML has no RSC markers", async () => { |
There was a problem hiding this comment.
Good — this addresses the previous review's gap about missing fallback path coverage. The mock server returns plain HTML (no RSC markers), verifies rscRequestCount === 1, and confirms the .rsc file is still written correctly.
|
Review posted. Here's a summary of what I found: Overall: Solid optimization, well-implemented. The parser is tightly scoped to the emitter protocol, error handling is strict, and both the previous review's feedback (isWhitespace helper, indexOf offset, JSON.parse wrapping) and the missing fallback test have been addressed. Key comments:
|
|
/bigbonk review |
There was a problem hiding this comment.
Review: perf(prerender): reuse embedded RSC payload
This is a well-executed optimization. Two prior reviews have already flagged the substantive issues and the author has addressed them — JSON.parse wrapping, isWhitespace helper, indexOf offset, fallback path test. The code as-is is clean and correct.
What looks good
-
Parser is tightly coupled to the emitter by design.
extractRscPayloadFromPrerenderedHtmlmirrors the exact protocol fromcreateRscEmbedTransforminapp-ssr-stream.ts. The string constants (RSC_CHUNK_PUSH_MARKER,RSC_CHUNK_SCRIPT_PREFIX,RSC_DONE_MARKER) match the emitter's output character-for-character. This is fragile in the abstract, but the right call here — the parser only needs to handle vinext's own output, and the tests will break if either side changes. -
Strict-or-fallback error model is correct. Recognized markers trigger strict parsing (malformed = throw), unrecognized HTML falls through to the old separate-request path. This prevents silent
.rscfile corruption, which would be far worse than a redundant request. -
Test coverage is thorough. Unit tests cover: happy path, missing done marker, marker text inside payload data, trailing code injection, invalid JSON, legacy format, empty markers, and the XSS breakout vector (
</script>in chunk data). Integration tests prove both the optimization path (rscRequestCount === 0) and the fallback path (rscRequestCount === 1). -
fixFlightHintsequivalence is documented. The comment at line 1248 correctly explains why the extracted payload (which includesfixFlightHintsfrom the emitter) matches what the separate RSC request would return (RSC entry applies the same fix at source). The rewrite is idempotent, so this is safe even if both paths change independently. -
app-elements.test.tschange is correct. Just a stale test fixture update to includeartifactCompatibilityfrom #1062.
One minor nit below, not blocking.
| } catch { | ||
| throw new Error("[vinext] Malformed prerender RSC embed: invalid chunk JSON"); | ||
| } | ||
| if (typeof parsed !== "string") { |
There was a problem hiding this comment.
Nit: the typeof parsed !== "string" check after JSON.parse on a "..."-delimited slice is technically unreachable — JSON.parse('"hello"') always produces a string. The only way to get a non-string would be if the manual quote-matching logic above found a false closing quote, but that can't happen because the escape tracking is correct.
Not suggesting you remove it — defensive checks in parsers are fine — just noting it's dead code in practice. If you wanted to be explicit about this, a comment like // Defensive: should always be a string given the quote-delimited extraction above would clarify intent.
|
Review posted successfully. Here's a summary: Approved. The optimization is well-implemented and the previous two review rounds have already driven the important fixes (JSON.parse wrapping, isWhitespace helper, indexOf offset, fallback path test coverage). The code is clean and correct. Key observations:
One minor nit: the |
Cleanups on top of cloudflare#1097: 1. Drop the legacy `__VINEXT_RSC__` extractor. The browser entry still recognizes that format for in-the-wild cached HTML, but prerender parses HTML it just emitted in this build, where the writer always emits the chunked form. The legacy branch is unreachable here. 2. Remove the silent fallback to a second `RSC: 1` invocation. The writer is in-tree, so a missing-marker case is a vinext-internal regression we want to surface loudly rather than mask by double-rendering. `extractRscPayloadFromPrerenderedHtml` now returns `string` and throws on any anomaly. 3. Replace the manual JSON-string tokenizer (parseEmbeddedJsonString + assertOnlyTrailingSemicolon + isWhitespace, ~80 lines) with a ~12-line slice + JSON.parse. Relies on the same safeJsonStringify invariant the original parser relied on; tampering with trailing code now surfaces as 'invalid chunk JSON' instead of a separate diagnostic, which is fine for an internal protocol. 4. Remove the obsolete TODO marker ("Extract RSC payload from the first response instead of invoking the handler twice") — the PR resolves it. 7. Tighten the chunk-presence assertion: empty HTML or HTML with only the done marker now throw 'no chunk scripts found in HTML' instead of returning the fallback sentinel. Test updates mirror these: - Drop the two legacy-format tests. - Drop the "falls back to a separate RSC request" integration test (the fallback no longer exists). - Update error-message assertions where the diagnostic text changed.
|
/bigbonk re-review for issues |
|
UnknownError: "Overloaded" |
|
@james-elicx Bonk workflow failed. Check the logs for details. View workflow run · To retry, trigger Bonk again. |
|
/bigbonk re-review for issues |
|
UnknownError: "Overloaded" |
|
/bigbonk re-review for issues |
|
UnknownError: "Overloaded" |
|
/bigbonk re-review for issues |
|
UnknownError: "Overloaded" |
|
@codex review |
|
/bigbonk re-review for issues |
|
UnknownError: "Overloaded" |
|
/bigbonk re-review for issues |
💡 Codex Reviewvinext/packages/vinext/src/build/prerender.ts Line 1154 in 3574450 Routes that short-circuit in middleware by returning a direct HTML ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Re-review: perf(prerender): reuse embedded RSC payloadLooked through the implementation end-to-end (emitter coupling, error handling paths, call site context, test coverage). The optimization is correct and well-implemented. Prior review rounds have driven the important fixes. Two items worth noting: 1. Regex matches all script tags, not just RSC embed scripts (prerender.ts:200)
However, a user inline script that happens to start with the exact A tighter regex anchoring on the known prefix would be more robust and slightly faster for pages with many non-RSC scripts. Not blocking. 2. PR description is stale (prerender.ts:219)The PR description says:
Neither of these exists in the code. There is no legacy What looks correct
|
|
Review posted. Here's a summary: Overall: The optimization is correct and well-implemented. Two items worth noting:
Everything else checks out: error handling at the call site is correct, route filtering excludes non-page routes, the parser validates the exact shape, nonce handling works, |
…sponses The previous cleanup made `extractRscPayloadFromPrerenderedHtml` throw on any HTML lacking RSC embed chunks. That's wrong for one legitimate case: middleware that short-circuits the App Router pipeline with a custom 200 HTML response — that HTML never went through createRscEmbedTransform, so it has no chunks and no done marker. Distinguish two cases at the extractor level: - No chunks AND no done marker → middleware short-circuit. Return null so the caller falls back to a second invocation with `RSC: 1`. - Done marker present without chunks → real bug, throw. - Chunks present without done marker → real bug, throw. - Malformed/tampered chunk JSON → real bug, throw. The driver checks for null and recovers by issuing a second handler invocation with RSC headers, writing whatever that returns as the .rsc file. Adds an integration test alongside the existing dedup test: - Middleware short-circuit: server returns 200 HTML without chunks for page requests, valid Flight for RSC: 1 → driver writes the fallback payload as .rsc, with exactly one of each request kind.
|
/bigbonk review |
Review: perf(prerender): reuse embedded RSC payloadThe optimization is well-motivated — extracting the RSC payload from the already-rendered HTML avoids a redundant second handler invocation during prerender. The parser is carefully written, error handling is strict, and test coverage is solid. One behavioral change and a few small items worth discussing. 1. Behavioral change: empty
|
|
Review posted successfully. Here's a summary of the feedback: Overall: Good optimization with solid test coverage. The parser is carefully written and the strict-or-fallback error model is well-designed. One substantive issue found:
|
What this changes
App Router prerender now reconstructs the canonical
.rscpayload from the streamed RSC chunk scripts already embedded in the prerendered HTML response. When extraction succeeds, prerender avoids the previous secondRSC: 1handler invocation.Why
The prerender pipeline already renders HTML first, and that HTML includes streamed Flight chunks for hydration. Reusing those chunks avoids duplicate App Router work during static output generation.
Approach
self.__VINEXT_RSC_CHUNKS__...push(<safeJsonStringify(chunk)>)plusself.__VINEXT_RSC_DONE__=true.createRscEmbedTransform.RSC: 1request only when HTML has no RSC chunk scripts at all, which covers middleware/custom HTML responses that bypass the App Router HTML stream..rscartifact.Validation
vp test run tests/prerender.test.ts -t "RSC extraction|middleware short-circuit fallback RSC request fails|falls back to a second RSC"vp check tests/prerender.test.ts packages/vinext/src/build/prerender.tsvp check --fixplusknip --no-progressNotes
tests/app-elements.test.tswas updated to includeartifactCompatibilityafter merging currentmainso CI matches the current AppElements metadata contract.Refs #563