feat: add arx codec with domain dictionary and multi-tier Unicode encoding#4
feat: add arx codec with domain dictionary and multi-tier Unicode encoding#4
Conversation
…oding
Introduce a novel 'arx' compression codec optimized for fragment-based
artifact transport. The codec pipelines three stages — domain-dictionary
substitution, brotli compression (quality 11), and adaptive encoding —
to dramatically reduce fragment sizes compared to the existing deflate
codec.
Compression pipeline:
1. Domain-dictionary substitution: A shared dictionary
(public/arx-dictionary.json, 2857 bytes) maps 25 single-byte slots
(JSON envelope boilerplate, markdown syntax) and 147 extended slots
(programming keywords, React patterns, common English words) to
control-byte sequences, reducing redundancy before compression.
2. Brotli compression via brotli-wasm at quality 11, achieving ~20%
better ratios than zlib deflate on typical payloads.
3. Adaptive encoding with three tiers, auto-selecting the shortest:
- base76: 77 URL-fragment-safe ASCII characters (~6% denser than
base64url). Best for strict ASCII-only contexts.
- base1k: 1774 Unicode code points (U+00A1–U+07FF), ~10.79
bits/char. ~42% fewer characters than base76.
- baseBMP: ~62k safe BMP code points (U+00A1–U+FFEF), ~15.92
bits/char. ~70% fewer characters than deflate+base64url.
Benchmark results (8000-char payloads):
AGENTS.md (markdown): 8000 → 1307 chars (6.12x ratio, -69.5% vs deflate)
fragment.ts (code): 8000 → 946 chars (8.46x ratio, -77.8% vs deflate)
README.md (mixed): 3714 → 1014 chars (3.66x ratio, -55.0% vs deflate)
Shared dictionary:
- Served as a static endpoint at /arx-dictionary.json (+ .br variant
at 929 bytes) so agents can fetch it for local compression.
- The viewer loads the dictionary on startup; a built-in fallback
dictionary preserves backward compatibility if the fetch fails.
- scripts/compress-dictionary.mjs minifies and brotli-compresses the
dictionary file for deployment.
Protocol changes:
- Add 'arx' to the supported codecs union (plain | lz | deflate | arx)
- Add encodeEnvelopeAsync / decodeFragmentAsync for async brotli-wasm
support; sync encode/decode preserved for plain/lz/deflate
- baseBMP payloads use U+FFF0 marker prefix for auto-detection
- base1k payloads detected by high-Unicode character presence
Other notes:
- Recursive compression tested and confirmed to provide no benefit
(brotli output is incompressible at depth 2+)
- baseBMP is the fastest ARX encoding variant (~9ms avg for 8k payload)
- brotli-wasm added as a runtime dependency for browser-side compression
Files changed:
- src/lib/payload/arx-codec.ts (new): full codec implementation
- src/lib/payload/schema.ts: add 'arx' codec type
- src/lib/payload/fragment.ts: async encode/decode, three-tier selection
- src/lib/payload/link-creator.ts: use async encoding for best compression
- src/components/viewer-shell.tsx: async decode + dictionary loading
- src/components/home/link-creator.tsx: async encode integration
- public/arx-dictionary.json (new): shared substitution dictionary
- public/arx-dictionary.json.br (new): pre-compressed dictionary
- scripts/compress-dictionary.mjs (new): dictionary compression script
- scripts/codec-poc.mjs: updated benchmark with all encoding tiers
- tests/arx-codec.test.ts (new): 24 tests covering all codec paths
- vitest.config.ts: alias for Node-compatible brotli-wasm loading
- AGENTS.md, README.md, docs/*, skills/*: documentation updates
Co-authored-by: Junie <junie@jetbrains.com>
Ignore the top-level .junie directory by adding '/.junie' to .gitignore to prevent local/tool-specific files from being committed to the repository.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new ARX codec and async payload flows: dictionary-driven substitutions, Brotli compression, three base encodings (base76, base1k, baseBMP), runtime dictionary loading with fallback, async encode/decode APIs, tooling/tests/docs for ARX, plus a minor .gitignore entry. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Dict as ARX Dictionary Service
participant Brotli as Brotli (wasm)
participant ARX as ARX Encoder/Decoder
participant BaseEnc as Base Encoder/Chooser
Client->>ARX: submit JSON payload
ARX->>Dict: fetch/verify dictionary (async)
Dict-->>ARX: substitution table
ARX->>ARX: apply substitutions -> bytes
ARX->>Brotli: compress bytes
Brotli-->>ARX: compressed bytes
ARX->>BaseEnc: encode (baseBMP/base1k/base76)
BaseEnc-->>ARX: encoded fragment
ARX-->>Client: return encoded fragment
Client->>ARX: provide encoded fragment for decode
ARX->>BaseEnc: detect & decode encoding
BaseEnc-->>ARX: compressed bytes
ARX->>Brotli: decompress
Brotli-->>ARX: substituted bytes
ARX->>Dict: reverse substitutions
Dict-->>ARX: original JSON
ARX-->>Client: return JSON
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code Review SummaryStatus: Issues Previously Flagged | Recommendation: Address existing comments before merge Overview
The PR introduces the new Key Observations
Existing Issues to AddressThe following issues have been previously flagged in inline comments:
Files Reviewed (12 files)
Note: Security analysis shows proper input validation, size limits, and safe encoding practices. No new security vulnerabilities identified. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c44740ca79
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/lib/payload/fragment.ts
Outdated
| if (codec === "arx") { | ||
| decodedJson = await arxDecompress(encoded); |
There was a problem hiding this comment.
Decode percent-escaped arx payload before decompressing
Browsers percent-encode non-ASCII fragment characters (for example, setting #...é becomes #...%C3%A9 in location.hash), so base1k/baseBMP arx links arrive here as % sequences rather than raw Unicode code points. Passing encoded directly to arxDecompress causes the decoder to take the wrong path and fail as invalid-json, which breaks copied/opened arx links in normal browser navigation. Normalize the payload (e.g., decodeURIComponent) before arx decoding.
Useful? React with 👍 / 👎.
src/lib/payload/fragment.ts
Outdated
| arxCompressBMP(json), | ||
| ]); | ||
| return [ | ||
| { value: `${PAYLOAD_FRAGMENT_KEY}=v1.arx.${ascii}`, codec: "arx", packed }, |
There was a problem hiding this comment.
Include dictionary version in arx fragment format
This wire format stores arx data as v1.arx.<payload> with no dictionary identifier, but the active substitution table is mutable at runtime via external dictionary loading. When encoder and decoder use different dictionaries (fetch failure, version changes, or fallback vs loaded table), the same payload decodes with wrong substitutions and cannot be parsed. Make the fragment self-describing with a dict version (or hard-pin the same table everywhere) to keep arx links portable.
Useful? React with 👍 / 👎.
| const lenHigh = Math.floor(bytes.length / ALPHABET.length); | ||
| const lenLow = bytes.length % ALPHABET.length; | ||
| return ALPHABET[lenHigh] + ALPHABET[lenLow] + chars.join(""); |
There was a problem hiding this comment.
Enforce base76 length-prefix bounds
The base76 encoder writes byte length into exactly two base-77 characters, which only represents up to 77^2 - 1 bytes. For 5929+ byte inputs, ALPHABET[lenHigh] is out of range and serializes as undefined, producing corrupt output that decodeBase76 reconstructs with the wrong length. This silently breaks larger arxCompress ASCII outputs; add a bound check or a variable-length header.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (8)
.gitignore (1)
142-142: Add a clarifying comment explaining what.junieis.No references to
.junieexist elsewhere in the codebase or documentation, making its purpose unclear. If.junieis a legitimate team-wide artifact or tool, add a brief inline comment explaining what creates it and why it should be in the shared .gitignore.Suggested improvement
.cursor/* + +# Junie: [description of tool/artifact] /.junie🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.gitignore at line 142, Add a short inline comment above the "/.junie" entry in .gitignore explaining what ".junie" is, who or which tool creates it, and why it should be ignored (e.g., "temporary IDE/workspace files created by JUNIE tool, shared-team cache, etc."), so future contributors understand its purpose; update the .gitignore by inserting that explanatory comment immediately before the "/.junie" line.src/lib/payload/link-creator.ts (1)
140-171: Extract the shared link-finalization flow before sync/async drift.This is almost a line-for-line copy of
createGeneratedArtifactLink; only the encoder call changes. Keeping the fragment-length check andbaseUrlassembly in one helper will make future fixes much less likely to land in only one path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/payload/link-creator.ts` around lines 140 - 171, createGeneratedArtifactLinkAsync duplicates the post-encoding "finalize link" logic present in the synchronous createGeneratedArtifactLink; extract that shared flow into a helper (e.g. finalizeGeneratedArtifactLink or buildGeneratedArtifactLink) that accepts the validated envelope, computed hash string, and optional baseUrl and performs fragmentLength computation (hash.length - 1), MAX_FRAGMENT_LENGTH check, baseUrl URL/hash assembly, and returns the GeneratedArtifactLink shape (envelope, artifact, hash, url, fragmentLength); then have createGeneratedArtifactLinkAsync call encodeEnvelopeAsync(normalized.envelope) to build the hash and pass the envelope+hash to the new helper, and have createGeneratedArtifactLink use the same helper after its encodeEnvelope call so both paths share identical finalization behavior.tests/arx-codec.test.ts (1)
35-40: Consider seeding random data for reproducibility.Using
Math.random()for test data generation can make test failures harder to reproduce. However, for validating URL-fragment-safe character output, this is an acceptable tradeoff since the test verifies a property that should hold for all inputs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/arx-codec.test.ts` around lines 35 - 40, The test uses Math.random() to generate bytes which makes failures non-deterministic; update the test in tests/arx-codec.test.ts (the it block that calls encodeBase76) to seed the random data for reproducibility by replacing the Math.random() loop with a deterministic pseudo-random generator or a fixed seed sequence (e.g., a simple xorshift or a predefined byte array) so the generated Uint8Array is repeatable across runs while still exercising encodeBase76; ensure the replacement produces 100 bytes and preserves the same assertion that the encoded string matches the URL-fragment-safe regex.src/lib/payload/fragment.ts (2)
319-381: Significant code duplication withdecodeFragment.
decodeFragmentAsyncduplicates most of the validation logic fromdecodeFragment(lines 194-290). Consider extracting the common validation into a shared helper to reduce maintenance burden and ensure consistency.♻️ Suggested approach
Extract the common fragment parsing and validation into a helper:
type FragmentParseResult = | { ok: false; error: ParsedPayload } | { ok: true; codec: PayloadCodec; encoded: string; fragmentLength: number }; function parseFragmentHeader(hash: string): FragmentParseResult { const fragment = hash.startsWith("#") ? hash.slice(1) : hash; // ... common validation logic ... return { ok: true, codec, encoded, fragmentLength: fragment.length }; }Then both
decodeFragmentanddecodeFragmentAsynccan use this shared helper.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/payload/fragment.ts` around lines 319 - 381, decodeFragmentAsync duplicates the validation/parsing logic already present in decodeFragment; extract the shared header parsing into a helper (e.g., parseFragmentHeader) that takes the raw hash and returns either an error ParsedPayload or a success object containing codec: PayloadCodec, encoded: string, and fragmentLength: number; update decodeFragment and decodeFragmentAsync to call parseFragmentHeader, short-circuit on the error result, and then proceed with codec-specific decoding (arxDecompress or decodePayload) and JSON parsing in their respective functions (preserve resolveEnvelope call in decodeFragmentAsync with fragmentLength).
132-145: Consider adding error handling for individual encoding failures.
Promise.allwill reject immediately if any of the three compression functions fails (e.g., brotli-wasm load failure). This causes the entire candidate generation to fail even though other candidates (deflate, lz, plain) might succeed. Consider wrapping withPromise.allSettledor individual try-catch to gracefully handle partial failures.♻️ Proposed fix using Promise.allSettled
async function buildArxCandidates(envelope: PayloadEnvelope, packed: boolean): Promise<CandidateFragment[]> { const payloadEnvelope = { ...envelope, codec: "arx" as PayloadCodec }; const json = JSON.stringify(packed ? packEnvelope(payloadEnvelope) : payloadEnvelope); - const [ascii, unicode, bmp] = await Promise.all([ + const results = await Promise.allSettled([ arxCompress(json), arxCompressUnicode(json), arxCompressBMP(json), ]); - return [ - { value: `${PAYLOAD_FRAGMENT_KEY}=v1.arx.${ascii}`, codec: "arx", packed }, - { value: `${PAYLOAD_FRAGMENT_KEY}=v1.arx.${unicode}`, codec: "arx", packed }, - { value: `${PAYLOAD_FRAGMENT_KEY}=v1.arx.${bmp}`, codec: "arx", packed }, - ]; + const candidates: CandidateFragment[] = []; + for (const result of results) { + if (result.status === "fulfilled") { + candidates.push({ value: `${PAYLOAD_FRAGMENT_KEY}=v1.arx.${result.value}`, codec: "arx", packed }); + } + } + return candidates; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/payload/fragment.ts` around lines 132 - 145, The buildArxCandidates function currently uses Promise.all for arxCompress, arxCompressUnicode, and arxCompressBMP so a single failure aborts all candidates; change it to use Promise.allSettled (or individual try/catch) to collect successes and skip failures so other payload candidates (e.g., deflate, lz, plain) can proceed. Specifically, update buildArxCandidates to call Promise.allSettled on [arxCompress(json), arxCompressUnicode(json), arxCompressBMP(json)], map only the fulfilled results to CandidateFragment entries using PAYLOAD_FRAGMENT_KEY and the same v1.arx.{value} naming (preserve codec="arx" and packed), and handle the case where all three fail by returning an empty array (or otherwise preserving existing behavior).src/lib/payload/arx-codec.ts (3)
307-325: Silent data corruption on invalid Base1k input.In
decodeBase1k, unknown characters default to index0via the?? 0fallback (lines 310, 311, 316). This silently produces incorrect output rather than signaling an error. For a codec used in payload transport, consider throwing on invalid input to fail fast.💡 Optional: throw on unknown characters
export function decodeBase1k(str: string): Uint8Array { if (str.length < 2) return new Uint8Array(0); - const lenHigh = UNICODE_CHAR_TO_INDEX.get(str[0]) ?? 0; - const lenLow = UNICODE_CHAR_TO_INDEX.get(str[1]) ?? 0; + const lenHigh = UNICODE_CHAR_TO_INDEX.get(str[0]); + const lenLow = UNICODE_CHAR_TO_INDEX.get(str[1]); + if (lenHigh === undefined || lenLow === undefined) { + throw new Error("Invalid base1k length prefix"); + } const byteLen = lenHigh * UNICODE_ALPHABET.length + lenLow; let num = BIGINT_0; for (let i = 2; i < str.length; i++) { - num = num * UBASE + BigInt(UNICODE_CHAR_TO_INDEX.get(str[i]) ?? 0); + const idx = UNICODE_CHAR_TO_INDEX.get(str[i]); + if (idx === undefined) { + throw new Error(`Invalid base1k character at position ${i}`); + } + num = num * UBASE + BigInt(idx); }The same consideration applies to
decodeBase76(line 250) anddecodeBaseBMP(lines 547-553).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/payload/arx-codec.ts` around lines 307 - 325, In decodeBase1k (and likewise decodeBase76 and decodeBaseBMP) do not silently map unknown characters via the "?? 0" fallback; instead validate lookups from UNICODE_CHAR_TO_INDEX for lenHigh, lenLow and each loop character (use explicit undefined checks on UNICODE_CHAR_TO_INDEX.get(str[i])) and throw a descriptive Error that includes the invalid character and its index when a lookup fails; ensure the length calculation (lenHigh/lenLow -> byteLen) only proceeds after validation so invalid inputs fail fast rather than producing corrupted output.
200-211: Minor documentation inaccuracy: alphabet has 77 characters, not 76.The comments reference "Base76" and "77 total" but the actual encoding uses
ALPHABET.lengthwhich is 77. The naming is slightly inconsistent with the implementation.📝 Consider renaming or clarifying
Either rename to
base77throughout, or update the comment at line 203 to clarify:-// Alphabet: A-Za-z0-9 plus 15 fragment-safe punctuation chars (77 total). +// Alphabet: A-Za-z0-9 plus 15 fragment-safe punctuation chars (77 total, hence "base77").🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/payload/arx-codec.ts` around lines 200 - 211, The header comment incorrectly labels the encoding "Base76" while ALPHABET and BASE use 77 characters; update the comment to consistently refer to "Base77" (or otherwise state "77 total characters") and adjust the descriptive lines around the alphabet and packing density to match ALPHABET and BASE so the documentation matches the implementation (refer to ALPHABET and BASE in arx-codec.ts).
578-585: Prevent redundant concurrent brotli-wasm imports with promise-based pattern.The
BrotliModuletype correctly matches thebrotli-wasmAPI. The double cast is a necessary workaround sincebrotli-wasm's default export is a Promise. However, concurrent calls togetBrotli()before the first completes will trigger redundant dynamic imports. While the ES module system caches these, storing the import promise avoids wasted concurrent operations:Suggested refactoring
let brotliModule: BrotliModule | null = null; +let brotliPromise: Promise<BrotliModule> | null = null; async function getBrotli(): Promise<BrotliModule> { if (brotliModule) return brotliModule; - const mod = await import("brotli-wasm"); - brotliModule = (await mod.default) as unknown as BrotliModule; - return brotliModule; + if (!brotliPromise) { + brotliPromise = import("brotli-wasm").then(async (mod) => { + brotliModule = (await mod.default) as unknown as BrotliModule; + return brotliModule; + }); + } + return brotliPromise; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/payload/arx-codec.ts` around lines 578 - 585, Concurrent calls to getBrotli() can trigger redundant dynamic imports; introduce a brotliModulePromise variable in addition to brotliModule and use it to deduplicate in-flight imports: in getBrotli(), if brotliModule exists return it; if brotliModulePromise exists await and return its resolved value; otherwise set brotliModulePromise = import("brotli-wasm").then(m => (m.default as unknown) as BrotliModule), await it, assign the result to brotliModule, clear brotliModulePromise, and return brotliModule (reference symbols: brotliModule, brotliModulePromise, getBrotli, BrotliModule).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/codec-poc.mjs`:
- Around line 7-209: The benchmark in scripts/codec-poc.mjs is testing a
standalone reimplementation of the codec (functions like encBase76, encBase1k,
encBaseBMP, dictEncode, brotli) against plain file text instead of exercising
the real fragment path used in production (encodeEnvelopeAsync applied to a
normalized PayloadEnvelope), so replace the POC harness to call the shipped
encoder path: construct a real PayloadEnvelope like the app does, invoke
encodeEnvelopeAsync (or the exported encode function the app uses) to produce
fragments, and run your size/time measurements on those generated fragments
rather than on the parallel implementations and raw file bytes; keep the
reimplementation only for cross-checking if needed but ensure README/PR
benchmarks use encodeEnvelopeAsync and the same envelope packing logic as
production.
In `@scripts/compress-dictionary.mjs`:
- Around line 21-27: The current flow writes the minified JSON file before
brotli.compress runs, risking out-of-sync artifacts if compression fails; change
the sequence in the script so you first compute minified (const minified =
JSON.stringify(src)), run brotli.compress(Buffer.from(minified, "utf8"), {
quality: 11 }) and only after the compress call succeeds write both outputs with
writeFileSync("public/arx-dictionary.json.br", compressed) and
writeFileSync("public/arx-dictionary.json", minified, "utf8"); wrap the compress
call in a try/catch so you never write the JSON if compression throws and ensure
you reference the same local variables (minified, compressed) and writeFileSync
calls as shown.
In `@src/components/viewer-shell.tsx`:
- Around line 311-313: The call to encodeEnvelopeAsync(...) lacks a .catch()
handler so rejections from encodeEnvelopeAsync (which can happen when
buildCandidatesAsync or brotli-wasm loading fails) cause unhandled promise
rejections; add a .catch(...) on the promise returned by encodeEnvelopeAsync in
the component where setFragmentHash(...) is used (reference encodeEnvelopeAsync
and setFragmentHash) to handle errors: log the error (or surface it to the
UI/state), and ensure you clear or leave fragment state in a consistent way on
failure (e.g., do not call setFragmentHash on error and optionally set an error
flag/state for the UI).
- Around line 248-252: The effect unconditionally sets dictReady true even when
loadArxDictionary() returns -1 and also can re-run in StrictMode; change the
useEffect to call loadArxDictionary(), inspect its return value and only call
setDictReady(true) on success, otherwise set an error state (e.g., setDictError
or setDictReady(false)) and log the failure including the returned error code;
additionally guard against double fetches in React StrictMode by using a ref
flag (e.g., hasLoadedRef) or an AbortController so loadArxDictionary() runs only
once per mount; reference loadArxDictionary, dictReady, setDictReady, and the
useEffect wrapper when making the changes.
In `@src/lib/payload/arx-codec.ts`:
- Around line 564-567: The doc comment on isBaseBMPEncoded incorrectly states
the marker is U+FFEE while the actual constant BMP_MARKER is "\uFFF0"; update
the comment to reference U+FFF0 (or phrase it to read from BMP_MARKER) so the
documented marker matches the BMP_MARKER constant used by isBaseBMPEncoded.
- Around line 127-149: The function loadArxDictionary currently skips schema
validation when source is an object; adjust it so that when source is an object
you call isArxDictionary(source) and handle failure exactly like the fetched
path (return -1) before assigning dict; only pass the validated dict to
buildSubstitutions and set activeDictVersion/ dictionaryLoaded; keep the
existing error handling behavior on exceptions.
---
Nitpick comments:
In @.gitignore:
- Line 142: Add a short inline comment above the "/.junie" entry in .gitignore
explaining what ".junie" is, who or which tool creates it, and why it should be
ignored (e.g., "temporary IDE/workspace files created by JUNIE tool, shared-team
cache, etc."), so future contributors understand its purpose; update the
.gitignore by inserting that explanatory comment immediately before the
"/.junie" line.
In `@src/lib/payload/arx-codec.ts`:
- Around line 307-325: In decodeBase1k (and likewise decodeBase76 and
decodeBaseBMP) do not silently map unknown characters via the "?? 0" fallback;
instead validate lookups from UNICODE_CHAR_TO_INDEX for lenHigh, lenLow and each
loop character (use explicit undefined checks on
UNICODE_CHAR_TO_INDEX.get(str[i])) and throw a descriptive Error that includes
the invalid character and its index when a lookup fails; ensure the length
calculation (lenHigh/lenLow -> byteLen) only proceeds after validation so
invalid inputs fail fast rather than producing corrupted output.
- Around line 200-211: The header comment incorrectly labels the encoding
"Base76" while ALPHABET and BASE use 77 characters; update the comment to
consistently refer to "Base77" (or otherwise state "77 total characters") and
adjust the descriptive lines around the alphabet and packing density to match
ALPHABET and BASE so the documentation matches the implementation (refer to
ALPHABET and BASE in arx-codec.ts).
- Around line 578-585: Concurrent calls to getBrotli() can trigger redundant
dynamic imports; introduce a brotliModulePromise variable in addition to
brotliModule and use it to deduplicate in-flight imports: in getBrotli(), if
brotliModule exists return it; if brotliModulePromise exists await and return
its resolved value; otherwise set brotliModulePromise =
import("brotli-wasm").then(m => (m.default as unknown) as BrotliModule), await
it, assign the result to brotliModule, clear brotliModulePromise, and return
brotliModule (reference symbols: brotliModule, brotliModulePromise, getBrotli,
BrotliModule).
In `@src/lib/payload/fragment.ts`:
- Around line 319-381: decodeFragmentAsync duplicates the validation/parsing
logic already present in decodeFragment; extract the shared header parsing into
a helper (e.g., parseFragmentHeader) that takes the raw hash and returns either
an error ParsedPayload or a success object containing codec: PayloadCodec,
encoded: string, and fragmentLength: number; update decodeFragment and
decodeFragmentAsync to call parseFragmentHeader, short-circuit on the error
result, and then proceed with codec-specific decoding (arxDecompress or
decodePayload) and JSON parsing in their respective functions (preserve
resolveEnvelope call in decodeFragmentAsync with fragmentLength).
- Around line 132-145: The buildArxCandidates function currently uses
Promise.all for arxCompress, arxCompressUnicode, and arxCompressBMP so a single
failure aborts all candidates; change it to use Promise.allSettled (or
individual try/catch) to collect successes and skip failures so other payload
candidates (e.g., deflate, lz, plain) can proceed. Specifically, update
buildArxCandidates to call Promise.allSettled on [arxCompress(json),
arxCompressUnicode(json), arxCompressBMP(json)], map only the fulfilled results
to CandidateFragment entries using PAYLOAD_FRAGMENT_KEY and the same
v1.arx.{value} naming (preserve codec="arx" and packed), and handle the case
where all three fail by returning an empty array (or otherwise preserving
existing behavior).
In `@src/lib/payload/link-creator.ts`:
- Around line 140-171: createGeneratedArtifactLinkAsync duplicates the
post-encoding "finalize link" logic present in the synchronous
createGeneratedArtifactLink; extract that shared flow into a helper (e.g.
finalizeGeneratedArtifactLink or buildGeneratedArtifactLink) that accepts the
validated envelope, computed hash string, and optional baseUrl and performs
fragmentLength computation (hash.length - 1), MAX_FRAGMENT_LENGTH check, baseUrl
URL/hash assembly, and returns the GeneratedArtifactLink shape (envelope,
artifact, hash, url, fragmentLength); then have createGeneratedArtifactLinkAsync
call encodeEnvelopeAsync(normalized.envelope) to build the hash and pass the
envelope+hash to the new helper, and have createGeneratedArtifactLink use the
same helper after its encodeEnvelope call so both paths share identical
finalization behavior.
In `@tests/arx-codec.test.ts`:
- Around line 35-40: The test uses Math.random() to generate bytes which makes
failures non-deterministic; update the test in tests/arx-codec.test.ts (the it
block that calls encodeBase76) to seed the random data for reproducibility by
replacing the Math.random() loop with a deterministic pseudo-random generator or
a fixed seed sequence (e.g., a simple xorshift or a predefined byte array) so
the generated Uint8Array is repeatable across runs while still exercising
encodeBase76; ensure the replacement produces 100 bytes and preserves the same
assertion that the encoded string matches the URL-fragment-safe regex.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c6d47df5-65ae-49a3-9637-0bca763eced6
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (19)
.gitignoreAGENTS.mdREADME.mddocs/architecture.mddocs/payload-format.mdpackage.jsonpublic/arx-dictionary.jsonpublic/arx-dictionary.json.brscripts/codec-poc.mjsscripts/compress-dictionary.mjsskills/agent-render-linking/SKILL.mdsrc/components/home/link-creator.tsxsrc/components/viewer-shell.tsxsrc/lib/payload/arx-codec.tssrc/lib/payload/fragment.tssrc/lib/payload/link-creator.tssrc/lib/payload/schema.tstests/arx-codec.test.tsvitest.config.ts
| // --- Encoding helpers (mirrors arx-codec.ts) --- | ||
| const ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~!$*()',;:@/"; | ||
| const BASE = BigInt(ALPHABET.length); | ||
| const B0 = 0n, B8 = 8n; | ||
|
|
||
| function decodePayload(encoded, codec) { | ||
| switch (codec) { | ||
| case "plain": | ||
| return strFromU8(fromBase64UrlBytes(encoded)); | ||
| case "lz": | ||
| return decompressFromEncodedURIComponent(encoded); | ||
| case "deflate": | ||
| return strFromU8(inflateSync(fromBase64UrlBytes(encoded))); | ||
| default: | ||
| throw new Error(`Unsupported codec: ${codec}`); | ||
| } | ||
| function encBase76(bytes) { | ||
| if (!bytes.length) return ""; | ||
| let num = B0; for (const b of bytes) num = (num << B8) | BigInt(b); | ||
| const c = []; while (num > B0) { c.push(ALPHABET[Number(num % BASE)]); num /= BASE; } c.reverse(); | ||
| return ALPHABET[Math.floor(bytes.length / ALPHABET.length)] + ALPHABET[bytes.length % ALPHABET.length] + c.join(""); | ||
| } | ||
|
|
||
| function packEnvelope(envelope) { | ||
| return { | ||
| p: 1, | ||
| v: envelope.v, | ||
| c: envelope.codec, | ||
| t: envelope.title, | ||
| a: envelope.activeArtifactId, | ||
| r: envelope.artifacts.map((artifact) => { | ||
| if (artifact.kind === "code") { | ||
| return { | ||
| i: artifact.id, | ||
| k: artifact.kind, | ||
| t: artifact.title, | ||
| f: artifact.filename, | ||
| c: artifact.content, | ||
| l: artifact.language, | ||
| }; | ||
| } | ||
|
|
||
| if (artifact.kind === "diff") { | ||
| return { | ||
| i: artifact.id, | ||
| k: artifact.kind, | ||
| t: artifact.title, | ||
| f: artifact.filename, | ||
| p: artifact.patch, | ||
| o: artifact.oldContent, | ||
| n: artifact.newContent, | ||
| l: artifact.language, | ||
| w: artifact.view, | ||
| }; | ||
| } | ||
|
|
||
| return { | ||
| i: artifact.id, | ||
| k: artifact.kind, | ||
| t: artifact.title, | ||
| f: artifact.filename, | ||
| c: artifact.content, | ||
| }; | ||
| }), | ||
| }; | ||
| const UA = []; | ||
| for (let cp = 0xA1; cp <= 0x7FF; cp++) { if (cp === 0xAD) continue; if (cp >= 0x300 && cp <= 0x36F) continue; UA.push(String.fromCodePoint(cp)); } | ||
| const UB = BigInt(UA.length); | ||
| function encBase1k(bytes) { | ||
| if (!bytes.length) return ""; | ||
| let num = B0; for (const b of bytes) num = (num << B8) | BigInt(b); | ||
| const c = []; while (num > B0) { c.push(UA[Number(num % UB)]); num /= UB; } c.reverse(); | ||
| return UA[Math.floor(bytes.length / UA.length)] + UA[bytes.length % UA.length] + c.join(""); | ||
| } | ||
|
|
||
| function unpackEnvelope(value) { | ||
| if (!value || value.p !== 1 || !Array.isArray(value.r)) { | ||
| return value; | ||
| } | ||
|
|
||
| return { | ||
| v: value.v, | ||
| codec: value.c, | ||
| title: value.t, | ||
| activeArtifactId: value.a, | ||
| artifacts: value.r.map((artifact) => { | ||
| if (artifact.k === "code") { | ||
| return { | ||
| id: artifact.i, | ||
| kind: "code", | ||
| title: artifact.t, | ||
| filename: artifact.f, | ||
| content: artifact.c, | ||
| language: artifact.l, | ||
| }; | ||
| } | ||
|
|
||
| if (artifact.k === "diff") { | ||
| return { | ||
| id: artifact.i, | ||
| kind: "diff", | ||
| title: artifact.t, | ||
| filename: artifact.f, | ||
| patch: artifact.p, | ||
| oldContent: artifact.o, | ||
| newContent: artifact.n, | ||
| language: artifact.l, | ||
| view: artifact.w, | ||
| }; | ||
| } | ||
|
|
||
| return { | ||
| id: artifact.i, | ||
| kind: artifact.k, | ||
| title: artifact.t, | ||
| filename: artifact.f, | ||
| content: artifact.c, | ||
| }; | ||
| }), | ||
| }; | ||
| // baseBMP alphabet (simplified — same exclusions as arx-codec.ts) | ||
| const BMP = []; | ||
| for (let cp = 0xA1; cp <= 0xFFEF; cp++) { | ||
| if (cp >= 0xD800 && cp <= 0xDFFF) continue; | ||
| if (cp === 0xAD) continue; | ||
| if (cp >= 0x300 && cp <= 0x36F) continue; | ||
| if (cp >= 0x483 && cp <= 0x489) continue; | ||
| if (cp >= 0x591 && cp <= 0x5BD) continue; | ||
| if (cp === 0x5BF || (cp >= 0x5C1 && cp <= 0x5C2)) continue; | ||
| if (cp >= 0x5C4 && cp <= 0x5C5) continue; | ||
| if (cp === 0x5C7) continue; | ||
| if (cp >= 0x610 && cp <= 0x61A) continue; | ||
| if (cp >= 0x64B && cp <= 0x65F) continue; | ||
| if (cp === 0x670) continue; | ||
| if (cp >= 0x6D6 && cp <= 0x6DC) continue; | ||
| if (cp >= 0x6DF && cp <= 0x6E4) continue; | ||
| if (cp >= 0x6E7 && cp <= 0x6E8) continue; | ||
| if (cp >= 0x6EA && cp <= 0x6ED) continue; | ||
| if (cp === 0x711) continue; | ||
| if (cp >= 0x730 && cp <= 0x74A) continue; | ||
| if (cp >= 0x7A6 && cp <= 0x7B0) continue; | ||
| if (cp >= 0x7EB && cp <= 0x7F3) continue; | ||
| if (cp === 0x7FD) continue; | ||
| if (cp >= 0x816 && cp <= 0x819) continue; | ||
| if (cp >= 0x81B && cp <= 0x823) continue; | ||
| if (cp >= 0x825 && cp <= 0x827) continue; | ||
| if (cp >= 0x829 && cp <= 0x82D) continue; | ||
| if (cp >= 0x859 && cp <= 0x85B) continue; | ||
| if (cp >= 0x898 && cp <= 0x89F) continue; | ||
| if (cp >= 0x8CA && cp <= 0x903) continue; | ||
| if (cp >= 0x93A && cp <= 0x94F) continue; | ||
| if (cp >= 0x951 && cp <= 0x957) continue; | ||
| if (cp >= 0x962 && cp <= 0x963) continue; | ||
| if (cp >= 0x981 && cp <= 0x983) continue; | ||
| if (cp === 0x9BC) continue; | ||
| if (cp >= 0x9BE && cp <= 0x9C4) continue; | ||
| if (cp >= 0x9C7 && cp <= 0x9C8) continue; | ||
| if (cp >= 0x9CB && cp <= 0x9CD) continue; | ||
| if (cp === 0x9D7) continue; | ||
| if (cp >= 0x9E2 && cp <= 0x9E3) continue; | ||
| if (cp === 0x9FE) continue; | ||
| if (cp >= 0xA01 && cp <= 0xA03) continue; | ||
| if (cp >= 0xA3C && cp <= 0xA51) continue; | ||
| if (cp >= 0xA70 && cp <= 0xA71) continue; | ||
| if (cp === 0xA75) continue; | ||
| if (cp >= 0xA81 && cp <= 0xA83) continue; | ||
| if (cp >= 0xABC && cp <= 0xACD) continue; | ||
| if (cp >= 0xAE2 && cp <= 0xAE3) continue; | ||
| if (cp >= 0xAFA && cp <= 0xAFF) continue; | ||
| if (cp >= 0xB01 && cp <= 0xB03) continue; | ||
| if (cp >= 0xB3C && cp <= 0xB57) continue; | ||
| if (cp >= 0xB62 && cp <= 0xB63) continue; | ||
| if (cp >= 0xB82 && cp <= 0xB83) continue; | ||
| if (cp >= 0xBBE && cp <= 0xBCD) continue; | ||
| if (cp === 0xBD7) continue; | ||
| if (cp >= 0xC00 && cp <= 0xC04) continue; | ||
| if (cp >= 0xC3C && cp <= 0xC56) continue; | ||
| if (cp >= 0xC62 && cp <= 0xC63) continue; | ||
| if (cp >= 0xC81 && cp <= 0xC83) continue; | ||
| if (cp >= 0xCBC && cp <= 0xCD6) continue; | ||
| if (cp >= 0xCE2 && cp <= 0xCE3) continue; | ||
| if (cp === 0xCF3) continue; | ||
| if (cp >= 0xD00 && cp <= 0xD03) continue; | ||
| if (cp >= 0xD3B && cp <= 0xD57) continue; | ||
| if (cp >= 0xD62 && cp <= 0xD63) continue; | ||
| if (cp >= 0xD81 && cp <= 0xD83) continue; | ||
| if (cp >= 0xDCA && cp <= 0xDDF) continue; | ||
| if (cp >= 0xDF2 && cp <= 0xDF3) continue; | ||
| if (cp === 0xE31) continue; | ||
| if (cp >= 0xE34 && cp <= 0xE3A) continue; | ||
| if (cp >= 0xE47 && cp <= 0xE4E) continue; | ||
| if (cp === 0xEB1) continue; | ||
| if (cp >= 0xEB4 && cp <= 0xEBC) continue; | ||
| if (cp >= 0xEC8 && cp <= 0xECE) continue; | ||
| if (cp >= 0xF18 && cp <= 0xF19) continue; | ||
| if (cp === 0xF35 || cp === 0xF37 || cp === 0xF39) continue; | ||
| if (cp >= 0xF3E && cp <= 0xF3F) continue; | ||
| if (cp >= 0xF71 && cp <= 0xF84) continue; | ||
| if (cp >= 0xF86 && cp <= 0xF87) continue; | ||
| if (cp >= 0xF8D && cp <= 0xFBC) continue; | ||
| if (cp === 0xFC6) continue; | ||
| if (cp >= 0x102B && cp <= 0x103E) continue; | ||
| if (cp >= 0x1056 && cp <= 0x1059) continue; | ||
| if (cp >= 0x105E && cp <= 0x1060) continue; | ||
| if (cp >= 0x1062 && cp <= 0x1064) continue; | ||
| if (cp >= 0x1067 && cp <= 0x106D) continue; | ||
| if (cp >= 0x1071 && cp <= 0x1074) continue; | ||
| if (cp >= 0x1082 && cp <= 0x108D) continue; | ||
| if (cp === 0x108F) continue; | ||
| if (cp >= 0x109A && cp <= 0x109D) continue; | ||
| if (cp >= 0x135D && cp <= 0x135F) continue; | ||
| if (cp >= 0x1712 && cp <= 0x1715) continue; | ||
| if (cp >= 0x1732 && cp <= 0x1734) continue; | ||
| if (cp >= 0x1752 && cp <= 0x1753) continue; | ||
| if (cp >= 0x1772 && cp <= 0x1773) continue; | ||
| if (cp >= 0x17B4 && cp <= 0x17D3) continue; | ||
| if (cp === 0x17DD) continue; | ||
| if (cp >= 0x180B && cp <= 0x180F) continue; | ||
| if (cp >= 0x1885 && cp <= 0x1886) continue; | ||
| if (cp === 0x18A9) continue; | ||
| if (cp >= 0x1920 && cp <= 0x192B) continue; | ||
| if (cp >= 0x1930 && cp <= 0x193B) continue; | ||
| if (cp >= 0x1A17 && cp <= 0x1A1B) continue; | ||
| if (cp >= 0x1A55 && cp <= 0x1A5E) continue; | ||
| if (cp >= 0x1A60 && cp <= 0x1A7C) continue; | ||
| if (cp === 0x1A7F) continue; | ||
| if (cp >= 0x1AB0 && cp <= 0x1ACE) continue; | ||
| if (cp >= 0x1B00 && cp <= 0x1B04) continue; | ||
| if (cp >= 0x1B34 && cp <= 0x1B44) continue; | ||
| if (cp >= 0x1B6B && cp <= 0x1B73) continue; | ||
| if (cp >= 0x1B80 && cp <= 0x1B82) continue; | ||
| if (cp >= 0x1BA1 && cp <= 0x1BAD) continue; | ||
| if (cp >= 0x1BE6 && cp <= 0x1BF3) continue; | ||
| if (cp >= 0x1C24 && cp <= 0x1C37) continue; | ||
| if (cp >= 0x1CD0 && cp <= 0x1CF9) continue; | ||
| if (cp >= 0x1DC0 && cp <= 0x1DFF) continue; | ||
| if (cp >= 0x200B && cp <= 0x200F) continue; | ||
| if (cp >= 0x2028 && cp <= 0x202E) continue; | ||
| if (cp >= 0x2060 && cp <= 0x2069) continue; | ||
| if (cp >= 0x20D0 && cp <= 0x20F0) continue; | ||
| if (cp >= 0x2CEF && cp <= 0x2CF1) continue; | ||
| if (cp === 0x2D7F) continue; | ||
| if (cp >= 0x2DE0 && cp <= 0x2DFF) continue; | ||
| if (cp >= 0x302A && cp <= 0x302F) continue; | ||
| if (cp >= 0x3099 && cp <= 0x309A) continue; | ||
| if (cp >= 0xA66F && cp <= 0xA672) continue; | ||
| if (cp >= 0xA674 && cp <= 0xA67D) continue; | ||
| if (cp >= 0xA69E && cp <= 0xA69F) continue; | ||
| if (cp >= 0xA6F0 && cp <= 0xA6F1) continue; | ||
| if (cp === 0xA802 || cp === 0xA806 || cp === 0xA80B) continue; | ||
| if (cp >= 0xA823 && cp <= 0xA827) continue; | ||
| if (cp === 0xA82C) continue; | ||
| if (cp >= 0xA880 && cp <= 0xA881) continue; | ||
| if (cp >= 0xA8B4 && cp <= 0xA8C5) continue; | ||
| if (cp >= 0xA8E0 && cp <= 0xA8F1) continue; | ||
| if (cp === 0xA8FF) continue; | ||
| if (cp >= 0xA926 && cp <= 0xA92D) continue; | ||
| if (cp >= 0xA947 && cp <= 0xA953) continue; | ||
| if (cp >= 0xA980 && cp <= 0xA983) continue; | ||
| if (cp >= 0xA9B3 && cp <= 0xA9C0) continue; | ||
| if (cp === 0xA9E5) continue; | ||
| if (cp >= 0xAA29 && cp <= 0xAA36) continue; | ||
| if (cp === 0xAA43) continue; | ||
| if (cp >= 0xAA4C && cp <= 0xAA4D) continue; | ||
| if (cp >= 0xAA7B && cp <= 0xAA7D) continue; | ||
| if (cp === 0xAAB0) continue; | ||
| if (cp >= 0xAAB2 && cp <= 0xAAB4) continue; | ||
| if (cp >= 0xAAB7 && cp <= 0xAAB8) continue; | ||
| if (cp >= 0xAABE && cp <= 0xAABF) continue; | ||
| if (cp === 0xAAC1) continue; | ||
| if (cp >= 0xAAEB && cp <= 0xAAEF) continue; | ||
| if (cp >= 0xAAF5 && cp <= 0xAAF6) continue; | ||
| if (cp >= 0xABE3 && cp <= 0xABEA) continue; | ||
| if (cp >= 0xABEC && cp <= 0xABED) continue; | ||
| if (cp === 0xFB1E) continue; | ||
| if (cp >= 0xFE00 && cp <= 0xFE0F) continue; | ||
| if (cp >= 0xFE20 && cp <= 0xFE2F) continue; | ||
| if (cp === 0xFEFF) continue; | ||
| if (cp >= 0xFFF0 && cp <= 0xFFFF) continue; | ||
| BMP.push(String.fromCodePoint(cp)); | ||
| } | ||
| const BB = BigInt(BMP.length); | ||
| function encBaseBMP(bytes) { | ||
| if (!bytes.length) return ""; | ||
| let num = B0; for (const b of bytes) num = (num << B8) | BigInt(b); | ||
| const c = []; while (num > B0) { c.push(BMP[Number(num % BB)]); num /= BB; } c.reverse(); | ||
| return "\uFFF0" + BMP[Math.floor(bytes.length / BMP.length)] + BMP[bytes.length % BMP.length] + c.join(""); | ||
| } | ||
|
|
||
| function buildFragment(envelope, codec, packed) { | ||
| const json = JSON.stringify(packed ? packEnvelope({ ...envelope, codec }) : { ...envelope, codec }); | ||
| return `${FRAGMENT_KEY}=v1.${codec}.${encodePayload(json, codec)}`; | ||
| // --- Dictionary substitution --- | ||
| const SBC = [1,2,3,4,5,6,7,8,0x0b,0x0e,0x0f,0x10,0x11,0x12,0x13,0x14,0x15,0x16,0x17,0x18,0x19,0x1a,0x1b,0x1c,0x1d]; | ||
| const subs = []; | ||
| for (let i = 0; i < dict.singleByteSlots.length && i < SBC.length; i++) | ||
| subs.push([dict.singleByteSlots[i], String.fromCharCode(SBC[i])]); | ||
| for (let i = 0; i < dict.extendedSlots.length; i++) | ||
| subs.push([dict.extendedSlots[i], "\x00" + String.fromCharCode(i + 1)]); | ||
| function dictEncode(t) { let r = t; for (const [f,to] of subs) r = r.split(f).join(to); return r; } | ||
|
|
||
| function brotli(buf) { | ||
| return zlib.brotliCompressSync(Buffer.from(buf), { params: { [zlib.constants.BROTLI_PARAM_QUALITY]: 11 } }); | ||
| } |
There was a problem hiding this comment.
Benchmark the shipped fragment path, not a parallel implementation.
The app generates links from a normalized PayloadEnvelope via encodeEnvelopeAsync, but this POC reimplements the codec and runs it over bare file text. That makes the published size and timing numbers easy to drift from real fragments as envelope packing or codec internals evolve, so they are a shaky basis for the README/PR benchmarks.
Also applies to: 211-295
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/codec-poc.mjs` around lines 7 - 209, The benchmark in
scripts/codec-poc.mjs is testing a standalone reimplementation of the codec
(functions like encBase76, encBase1k, encBaseBMP, dictEncode, brotli) against
plain file text instead of exercising the real fragment path used in production
(encodeEnvelopeAsync applied to a normalized PayloadEnvelope), so replace the
POC harness to call the shipped encoder path: construct a real PayloadEnvelope
like the app does, invoke encodeEnvelopeAsync (or the exported encode function
the app uses) to produce fragments, and run your size/time measurements on those
generated fragments rather than on the parallel implementations and raw file
bytes; keep the reimplementation only for cross-checking if needed but ensure
README/PR benchmarks use encodeEnvelopeAsync and the same envelope packing logic
as production.
| // Re-serialize minified (no whitespace) | ||
| const minified = JSON.stringify(src); | ||
| writeFileSync("public/arx-dictionary.json", minified, "utf8"); | ||
|
|
||
| // Brotli-compress the minified JSON | ||
| const compressed = brotli.compress(Buffer.from(minified, "utf8"), { quality: 11 }); | ||
| writeFileSync("public/arx-dictionary.json.br", compressed); |
There was a problem hiding this comment.
Write both artifacts only after Brotli succeeds.
Line 23 updates public/arx-dictionary.json before the compressed buffer exists. If Brotli throws, the source JSON and .br variant drift out of sync.
Proposed fix
// Re-serialize minified (no whitespace)
const minified = JSON.stringify(src);
-writeFileSync("public/arx-dictionary.json", minified, "utf8");
// Brotli-compress the minified JSON
const compressed = brotli.compress(Buffer.from(minified, "utf8"), { quality: 11 });
+writeFileSync("public/arx-dictionary.json", minified, "utf8");
writeFileSync("public/arx-dictionary.json.br", compressed);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Re-serialize minified (no whitespace) | |
| const minified = JSON.stringify(src); | |
| writeFileSync("public/arx-dictionary.json", minified, "utf8"); | |
| // Brotli-compress the minified JSON | |
| const compressed = brotli.compress(Buffer.from(minified, "utf8"), { quality: 11 }); | |
| writeFileSync("public/arx-dictionary.json.br", compressed); | |
| // Re-serialize minified (no whitespace) | |
| const minified = JSON.stringify(src); | |
| // Brotli-compress the minified JSON | |
| const compressed = brotli.compress(Buffer.from(minified, "utf8"), { quality: 11 }); | |
| writeFileSync("public/arx-dictionary.json", minified, "utf8"); | |
| writeFileSync("public/arx-dictionary.json.br", compressed); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/compress-dictionary.mjs` around lines 21 - 27, The current flow
writes the minified JSON file before brotli.compress runs, risking out-of-sync
artifacts if compression fails; change the sequence in the script so you first
compute minified (const minified = JSON.stringify(src)), run
brotli.compress(Buffer.from(minified, "utf8"), { quality: 11 }) and only after
the compress call succeeds write both outputs with
writeFileSync("public/arx-dictionary.json.br", compressed) and
writeFileSync("public/arx-dictionary.json", minified, "utf8"); wrap the compress
call in a try/catch so you never write the JSON if compression throws and ensure
you reference the same local variables (minified, compressed) and writeFileSync
calls as shown.
| const [dictReady, setDictReady] = useState(false); | ||
|
|
||
| useEffect(() => { | ||
| loadArxDictionary().then(() => setDictReady(true)); | ||
| }, []); |
There was a problem hiding this comment.
Missing error handling for dictionary load failure.
loadArxDictionary() can return -1 on failure (network error, invalid JSON), but dictReady is set to true regardless. This masks load failures and may cause silent fallback to the built-in dictionary without any user indication. Additionally, in React StrictMode (dev), this effect runs twice, causing redundant fetches.
🛡️ Proposed fix to handle dictionary load result
useEffect(() => {
- loadArxDictionary().then(() => setDictReady(true));
+ let cancelled = false;
+ loadArxDictionary().then((version) => {
+ if (!cancelled) {
+ // version === -1 means load failed, but fallback is used
+ setDictReady(true);
+ }
+ });
+ return () => { cancelled = true; };
}, []);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/components/viewer-shell.tsx` around lines 248 - 252, The effect
unconditionally sets dictReady true even when loadArxDictionary() returns -1 and
also can re-run in StrictMode; change the useEffect to call loadArxDictionary(),
inspect its return value and only call setDictReady(true) on success, otherwise
set an error state (e.g., setDictError or setDictReady(false)) and log the
failure including the returned error code; additionally guard against double
fetches in React StrictMode by using a ref flag (e.g., hasLoadedRef) or an
AbortController so loadArxDictionary() runs only once per mount; reference
loadArxDictionary, dictReady, setDictReady, and the useEffect wrapper when
making the changes.
| encodeEnvelopeAsync({ ...envelope, activeArtifactId: artifactId }, { codec: envelope.codec }).then((encoded) => { | ||
| setFragmentHash(`#${encoded}`); | ||
| }); |
There was a problem hiding this comment.
Missing .catch() handler on encodeEnvelopeAsync promise.
Per context snippet at src/lib/payload/fragment.ts:188-192, encodeEnvelopeAsync can reject if buildCandidatesAsync fails (e.g., brotli-wasm module load failure). Without a .catch() handler, unhandled promise rejections will occur, potentially leaving the UI in an inconsistent state.
🐛 Proposed fix to add error handling
encodeEnvelopeAsync({ ...envelope, activeArtifactId: artifactId }, { codec: envelope.codec }).then((encoded) => {
setFragmentHash(`#${encoded}`);
- });
+ }).catch((err) => {
+ console.error("Failed to encode envelope:", err);
+ // Optionally: fall back to sync encoding for non-arx codecs
+ });📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| encodeEnvelopeAsync({ ...envelope, activeArtifactId: artifactId }, { codec: envelope.codec }).then((encoded) => { | |
| setFragmentHash(`#${encoded}`); | |
| }); | |
| encodeEnvelopeAsync({ ...envelope, activeArtifactId: artifactId }, { codec: envelope.codec }).then((encoded) => { | |
| setFragmentHash(`#${encoded}`); | |
| }).catch((err) => { | |
| console.error("Failed to encode envelope:", err); | |
| // Optionally: fall back to sync encoding for non-arx codecs | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/components/viewer-shell.tsx` around lines 311 - 313, The call to
encodeEnvelopeAsync(...) lacks a .catch() handler so rejections from
encodeEnvelopeAsync (which can happen when buildCandidatesAsync or brotli-wasm
loading fails) cause unhandled promise rejections; add a .catch(...) on the
promise returned by encodeEnvelopeAsync in the component where
setFragmentHash(...) is used (reference encodeEnvelopeAsync and setFragmentHash)
to handle errors: log the error (or surface it to the UI/state), and ensure you
clear or leave fragment state in a consistent way on failure (e.g., do not call
setFragmentHash on error and optionally set an error flag/state for the UI).
| export async function loadArxDictionary(source?: string | ArxDictionary): Promise<number> { | ||
| try { | ||
| let dict: ArxDictionary; | ||
|
|
||
| if (source && typeof source === "object") { | ||
| dict = source; | ||
| } else { | ||
| const url = typeof source === "string" ? source : resolveDefaultDictionaryUrl(); | ||
| const response = await fetch(url); | ||
| if (!response.ok) return -1; | ||
| const json: unknown = await response.json(); | ||
| if (!isArxDictionary(json)) return -1; | ||
| dict = json; | ||
| } | ||
|
|
||
| activeSubs = buildSubstitutions(dict); | ||
| activeDictVersion = dict.version; | ||
| dictionaryLoaded = true; | ||
| return dict.version; | ||
| } catch { | ||
| return -1; | ||
| } | ||
| } |
There was a problem hiding this comment.
Missing validation when source is an object.
When source is passed as an object (lines 131-132), it bypasses the isArxDictionary validation that's applied to fetched JSON (line 138). A malformed dictionary object could corrupt activeSubs.
🛡️ Proposed fix to add validation
if (source && typeof source === "object") {
+ if (!isArxDictionary(source)) return -1;
dict = source;
} else {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export async function loadArxDictionary(source?: string | ArxDictionary): Promise<number> { | |
| try { | |
| let dict: ArxDictionary; | |
| if (source && typeof source === "object") { | |
| dict = source; | |
| } else { | |
| const url = typeof source === "string" ? source : resolveDefaultDictionaryUrl(); | |
| const response = await fetch(url); | |
| if (!response.ok) return -1; | |
| const json: unknown = await response.json(); | |
| if (!isArxDictionary(json)) return -1; | |
| dict = json; | |
| } | |
| activeSubs = buildSubstitutions(dict); | |
| activeDictVersion = dict.version; | |
| dictionaryLoaded = true; | |
| return dict.version; | |
| } catch { | |
| return -1; | |
| } | |
| } | |
| export async function loadArxDictionary(source?: string | ArxDictionary): Promise<number> { | |
| try { | |
| let dict: ArxDictionary; | |
| if (source && typeof source === "object") { | |
| if (!isArxDictionary(source)) return -1; | |
| dict = source; | |
| } else { | |
| const url = typeof source === "string" ? source : resolveDefaultDictionaryUrl(); | |
| const response = await fetch(url); | |
| if (!response.ok) return -1; | |
| const json: unknown = await response.json(); | |
| if (!isArxDictionary(json)) return -1; | |
| dict = json; | |
| } | |
| activeSubs = buildSubstitutions(dict); | |
| activeDictVersion = dict.version; | |
| dictionaryLoaded = true; | |
| return dict.version; | |
| } catch { | |
| return -1; | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/payload/arx-codec.ts` around lines 127 - 149, The function
loadArxDictionary currently skips schema validation when source is an object;
adjust it so that when source is an object you call isArxDictionary(source) and
handle failure exactly like the fetched path (return -1) before assigning dict;
only pass the validated dict to buildSubstitutions and set activeDictVersion/
dictionaryLoaded; keep the existing error handling behavior on exceptions.
| /** Returns true if the encoded string uses baseBMP encoding (starts with U+FFEE marker). */ | ||
| export function isBaseBMPEncoded(str: string): boolean { | ||
| return str.startsWith(BMP_MARKER); | ||
| } |
There was a problem hiding this comment.
Documentation inconsistency: comment says U+FFEE but marker is U+FFF0.
Line 564 states "U+FFEE marker" but BMP_MARKER is defined as "\uFFF0" (line 520).
📝 Fix the comment
-/** Returns true if the encoded string uses baseBMP encoding (starts with U+FFEE marker). */
+/** Returns true if the encoded string uses baseBMP encoding (starts with U+FFF0 marker). */
export function isBaseBMPEncoded(str: string): boolean {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/payload/arx-codec.ts` around lines 564 - 567, The doc comment on
isBaseBMPEncoded incorrectly states the marker is U+FFEE while the actual
constant BMP_MARKER is "\uFFF0"; update the comment to reference U+FFF0 (or
phrase it to read from BMP_MARKER) so the documented marker matches the
BMP_MARKER constant used by isBaseBMPEncoded.
Deploying agent-render with
|
| Latest commit: |
38e13a8
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://c024da87.agent-render.pages.dev |
| Branch Preview URL: | https://experimental-compression.agent-render.pages.dev |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/agent-render-linking/SKILL.md`:
- Line 197: Change the lowercase "markdown" in the JSON envelope example string
'","kind":"markdown","content":"' to capitalized "Markdown" so the user-facing
sentence reads '","kind":"Markdown","content":"'; update the surrounding text
where the word "markdown" is used in that sentence to "Markdown" to keep the
proper noun capitalized.
In `@src/lib/payload/fragment.ts`:
- Around line 97-99: The current getSyncCandidateCodecs function unconditionally
removes "arx" which causes a misleading generic error later in encodeEnvelope;
instead, change getSyncCandidateCodecs to first collect candidates via
getCandidateCodecs(options) and if the only candidate is "arx" return ["arx"]
(preserving the explicit sync-path ARX error), otherwise return the candidate
list with "arx" filtered out; update references to getSyncCandidateCodecs,
getCandidateCodecs, and ensure encodeEnvelope still receives the preserved "arx"
candidate so encodeEnvelopeAsync guidance is produced.
- Around line 132-145: The fragment encoder (buildArxCandidates) currently
embeds getActiveDictVersion into the fragment string
(PAYLOAD_FRAGMENT_KEY=v1.arx.${dictVersion}.…), but the decoder ignores that
version and calls arxDecompress without it; update the decoder path that parses
the numeric version to pass that extracted version into a version-aware resolver
or into arxDecompress (or call a new resolver like
resolveArxDictVersion(version) before decompression), and if resolution fails
fall back to using the full encoded remainder for decompression so older
fragments decompress with their original dictionary when available.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 32db5e48-7c7a-4e38-be8b-168d9609cce1
📒 Files selected for processing (5)
AGENTS.mddocs/payload-format.mdskills/agent-render-linking/SKILL.mdsrc/lib/payload/fragment.tstests/arx-codec.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- AGENTS.md
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
skills/agent-render-linking/SKILL.md (1)
39-42: Documentation may mislead agents about sync codec priority.The documented codec priority
arx -> deflate -> lz -> plainonly applies to async encoding (encodeEnvelopeAsync). The sync path (encodeEnvelope) defaults to["deflate", "lz", "plain"]and explicitly excludes ARX. Agents following this guidance for sync operations will mispredict the actual codec selection order.Consider clarifying that ARX requires async encoding and that sync encoding uses a different default priority.
📝 Suggested clarification
Prefer: 1. shortest valid fragment for the target surface -2. codec priority `arx -> deflate -> lz -> plain` unless explicitly overridden +2. codec priority `arx -> deflate -> lz -> plain` for async encoding (recommended) + - sync encoding defaults to `deflate -> lz -> plain` (arx requires async) 3. packed wire mode when available🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/agent-render-linking/SKILL.md` around lines 39 - 42, Update the SKILL.md documentation to clarify codec priority differences: explicitly state that ARX is only available for async encoding (encodeEnvelopeAsync) and is excluded from the sync encoder (encodeEnvelope), and that the sync default codec priority is ["deflate","lz","plain"] while async defaults to ["arx","deflate","lz","plain"] unless overridden; also note packed wire mode preference and the rule to prefer the shortest valid fragment for the target surface so agents don't assume the async priority for sync calls.src/lib/payload/fragment.ts (1)
335-341: Dictionary version compatibility is handled via decode fallback, not version-aware loading.
resolveArxDictVersiononly checks if the fragment's version matches the currently active dictionary. If they don't match, the code falls back to trying different payload parsing strategies (lines 394-396) rather than loading the appropriate dictionary version.This is a reasonable v1 approach, but if the dictionary schema changes significantly in future versions, payloads encoded with older dictionaries may fail to decode even with the fallback attempts.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/payload/fragment.ts` around lines 335 - 341, resolveArxDictVersion currently only compares the fragment version to getActiveDictVersion and returns false on mismatch, relying on decode fallbacks; change it to actively attempt to load/switch to the fragment's dictionary version when version !== null and different from getActiveDictVersion by calling the existing dictionary loader/switch function (e.g., loadDictVersion or switchActiveDict) and return true only if the load/switch succeeds; keep the original null-handling behavior (null -> true) and ensure resolveArxDictVersion returns false if the requested dictionary version cannot be loaded so higher-level code can still try fallback parsing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/payload/fragment.ts`:
- Around line 415-417: In decodeFragmentAsync, the else-branch incorrectly
passes a non-existent variable "encoded" to decodePayload; replace that with the
correct variable "remainder" (the non-ARX payload defined earlier) so that
decodePayload(remainder, codec) is called and decodedJson is set correctly in
the decodeFragmentAsync function.
---
Nitpick comments:
In `@skills/agent-render-linking/SKILL.md`:
- Around line 39-42: Update the SKILL.md documentation to clarify codec priority
differences: explicitly state that ARX is only available for async encoding
(encodeEnvelopeAsync) and is excluded from the sync encoder (encodeEnvelope),
and that the sync default codec priority is ["deflate","lz","plain"] while async
defaults to ["arx","deflate","lz","plain"] unless overridden; also note packed
wire mode preference and the rule to prefer the shortest valid fragment for the
target surface so agents don't assume the async priority for sync calls.
In `@src/lib/payload/fragment.ts`:
- Around line 335-341: resolveArxDictVersion currently only compares the
fragment version to getActiveDictVersion and returns false on mismatch, relying
on decode fallbacks; change it to actively attempt to load/switch to the
fragment's dictionary version when version !== null and different from
getActiveDictVersion by calling the existing dictionary loader/switch function
(e.g., loadDictVersion or switchActiveDict) and return true only if the
load/switch succeeds; keep the original null-handling behavior (null -> true)
and ensure resolveArxDictVersion returns false if the requested dictionary
version cannot be loaded so higher-level code can still try fallback parsing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8689721c-58da-46f9-9afa-c9677bf8f11b
📒 Files selected for processing (3)
skills/agent-render-linking/SKILL.mdsrc/lib/payload/fragment.tstests/fragment.test.ts
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/lib/payload/fragment.ts (1)
141-149:⚠️ Potential issue | 🟠 MajorResolve the stamped ARX dictionary version to a real table.
buildArxCandidates()now writesgetActiveDictVersion()into the fragment, butresolveArxDictVersion()only compares that number to the current global state andarxDecompress()still uses whichever dictionary is active insrc/lib/payload/arx-codec.ts. Fragments created before the shared dictionary finishes loading, or fragments from an older dictionary version, can still be decoded with the wrong substitution table. Please resolve the parsed version to an immutable dictionary snapshot, or fail with an explicit version-mismatch error instead of decoding against the current table.Based on learnings,
**/lib/payload/*.ts: Treat every payload as untrusted input and fail clearly on malformed or oversized payloads before renderer mount when possible.Also applies to: 335-340, 393-403
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/payload/fragment.ts` around lines 141 - 149, The fragment writer (buildArxCandidates) stamps getActiveDictVersion into the fragment but the decoder (resolveArxDictVersion / arxDecompress) still uses the global active table; change resolveArxDictVersion to map the parsed version to an immutable dictionary snapshot (e.g., fetch or look up a frozen substitution table keyed by that version) and make arxDecompress accept and use that snapshot instead of the global active table; if the stamped version is missing or incompatible, throw a clear version-mismatch error and fail decoding early (also add the same snapshot-resolution and explicit failure checks where similar ARX decoding occurs around the other ranges noted).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/payload/fragment.ts`:
- Around line 138-150: buildArxCandidates currently ranks ARX fragments by raw
string length which miscounts non-ASCII outputs; update buildArxCandidates (and
the parallel candidates at lines ~195-198) to compute and store a
transportLength for each candidate using the same percent-encoding/serialization
used by decodeArxEncodedPayload (i.e., percent-encode non-ASCII bytes the same
way the URL fragment will be serialized) and use transportLength for both
ranking and the 8000-character fragment budget check; ensure CandidateFragment
objects include this transportLength field and that downstream budget checks
reference it instead of value.length; keep symbols: buildArxCandidates,
arxCompress, arxCompressUnicode, arxCompressBMP, PAYLOAD_FRAGMENT_KEY,
decodeArxEncodedPayload, and getActiveDictVersion.
---
Duplicate comments:
In `@src/lib/payload/fragment.ts`:
- Around line 141-149: The fragment writer (buildArxCandidates) stamps
getActiveDictVersion into the fragment but the decoder (resolveArxDictVersion /
arxDecompress) still uses the global active table; change resolveArxDictVersion
to map the parsed version to an immutable dictionary snapshot (e.g., fetch or
look up a frozen substitution table keyed by that version) and make
arxDecompress accept and use that snapshot instead of the global active table;
if the stamped version is missing or incompatible, throw a clear
version-mismatch error and fail decoding early (also add the same
snapshot-resolution and explicit failure checks where similar ARX decoding
occurs around the other ranges noted).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a2daacd2-5071-49f0-a6ae-86dd13f10130
📒 Files selected for processing (1)
src/lib/payload/fragment.ts
…length arxCompressUnicode and arxCompressBMP emit non-ASCII Unicode characters. Browsers percent-encode these in URL fragments (each UTF-8 byte → %XX, 3 chars per byte), so a baseBMP candidate with raw length N can serialize to up to 9N chars. Ranking by raw value.length caused Unicode/BMP fragments to appear shorter than they actually are in the URL, potentially exceeding the 8000-char budget. Add transportLength to CandidateFragment that computes the serialized (percent-encoded) length. Update selectCandidate to sort and budget-check by transportLength. buildArxCandidates uses computeTransportLength() for non-ASCII payloads; non-ARX codecs produce ASCII-only output so their transportLength equals value.length. Co-authored-by: Aanish Bhirud <baanish@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/lib/payload/fragment.ts (1)
219-315: Consider extracting shared fragment parsing/validation to one helper.
decodeFragmentanddecodeFragmentAsyncduplicate key parsing/version/codec checks. A small shared parser would reduce drift risk as codecs evolve.Also applies to: 361-451
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/payload/fragment.ts` around lines 219 - 315, Duplicate fragment header parsing exists in decodeFragment and decodeFragmentAsync; extract a single helper (e.g., parseFragmentHeader or parseFragmentMetadata) to centralize: take the raw hash string, strip leading "#", validate empty and MAX_FRAGMENT_LENGTH, split key/value and verify PAYLOAD_FRAGMENT_KEY, locate dot positions and verify format, validate version === "v1" and codec ∈ codecs, and return a uniform result object containing {ok:boolean, errorResponse?:ParsedPayload, fragment:string, version, codec, encoded, fragmentLength}. Replace the duplicated blocks in decodeFragment and decodeFragmentAsync to call this helper, short-circuit on helper.errorResponse, and then continue with codec-specific decoding (including the arx async check) and JSON parsing; keep using existing constants (PAYLOAD_FRAGMENT_KEY, MAX_FRAGMENT_LENGTH, MAX_DECODED_PAYLOAD_LENGTH, codecs) and preserve existing error codes/messages so behavior/tests remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/payload/fragment.ts`:
- Around line 404-415: The code currently treats any payload starting with
"<digits>." as versioned and, when useVersionedPayload is true, only attempts
decodeArxEncodedPayload(versionedPayload), which can false-negative legacy
payloads that merely begin with a numeric prefix; change the decode attempt
ordering so we always try the full remainder (fallbackPayload) first and then
the stripped versionedPayload when parsedDictVersion !== null and
useVersionedPayload is true. Concretely: keep parsedDictVersion,
versionedPayload, fallbackPayload and useVersionedPayload as-is, but build
decodeAttempts so that if useVersionedPayload and parsedDictVersion !== null you
push decodeArxEncodedPayload(fallbackPayload) first then
decodeArxEncodedPayload(versionedPayload); otherwise preserve existing
fallback-only or fallback-then-versioned behavior; ensure resolveArxDictVersion
still gates whether the versioned form is attempted.
---
Nitpick comments:
In `@src/lib/payload/fragment.ts`:
- Around line 219-315: Duplicate fragment header parsing exists in
decodeFragment and decodeFragmentAsync; extract a single helper (e.g.,
parseFragmentHeader or parseFragmentMetadata) to centralize: take the raw hash
string, strip leading "#", validate empty and MAX_FRAGMENT_LENGTH, split
key/value and verify PAYLOAD_FRAGMENT_KEY, locate dot positions and verify
format, validate version === "v1" and codec ∈ codecs, and return a uniform
result object containing {ok:boolean, errorResponse?:ParsedPayload,
fragment:string, version, codec, encoded, fragmentLength}. Replace the
duplicated blocks in decodeFragment and decodeFragmentAsync to call this helper,
short-circuit on helper.errorResponse, and then continue with codec-specific
decoding (including the arx async check) and JSON parsing; keep using existing
constants (PAYLOAD_FRAGMENT_KEY, MAX_FRAGMENT_LENGTH,
MAX_DECODED_PAYLOAD_LENGTH, codecs) and preserve existing error codes/messages
so behavior/tests remain unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f92adb8f-b000-4504-815d-ef9ab3989194
📒 Files selected for processing (1)
src/lib/payload/fragment.ts
…r helper - When useVersionedPayload && parsedDictVersion !== null, try fallbackPayload (full remainder) first before versionedPayload (stripped), preventing false negatives for legacy payloads whose content coincidentally starts with a numeric prefix. - Extract parseFragmentHeader() to deduplicate the shared fragment header parsing logic (strip "#", validate length/key, locate dots, check version and codec) that was duplicated between decodeFragment and decodeFragmentAsync; both functions now call the helper and short-circuit on errorResponse. https://claude.ai/code/session_01KPxKh7zY5iNbNqfR1KpqK8
Introduce a novel 'arx' compression codec optimized for fragment-based
artifact transport. The codec pipelines three stages — domain-dictionary
substitution, brotli compression (quality 11), and adaptive encoding —
to dramatically reduce fragment sizes compared to the existing deflate
codec.
Compression pipeline:
(public/arx-dictionary.json, 2857 bytes) maps 25 single-byte slots
(JSON envelope boilerplate, markdown syntax) and 147 extended slots
(programming keywords, React patterns, common English words) to
control-byte sequences, reducing redundancy before compression.
better ratios than zlib deflate on typical payloads.
base64url). Best for strict ASCII-only contexts.
bits/char. ~42% fewer characters than base76.
bits/char. ~70% fewer characters than deflate+base64url.
Benchmark results (8000-char payloads):
AGENTS.md (markdown): 8000 → 1307 chars (6.12x ratio, -69.5% vs deflate)
fragment.ts (code): 8000 → 946 chars (8.46x ratio, -77.8% vs deflate)
README.md (mixed): 3714 → 1014 chars (3.66x ratio, -55.0% vs deflate)
Shared dictionary:
at 929 bytes) so agents can fetch it for local compression.
dictionary preserves backward compatibility if the fetch fails.
dictionary file for deployment.
Protocol changes:
support; sync encode/decode preserved for plain/lz/deflate
Other notes:
(brotli output is incompressible at depth 2+)
Files changed:
Co-authored-by: Junie junie@jetbrains.com
Summary by CodeRabbit
New Features
Documentation
Tests
Chores