Skip to content

feat: add arx codec with domain dictionary and multi-tier Unicode encoding#4

Merged
baanish merged 8 commits intomainfrom
experimental-compression
Mar 13, 2026
Merged

feat: add arx codec with domain dictionary and multi-tier Unicode encoding#4
baanish merged 8 commits intomainfrom
experimental-compression

Conversation

@baanish
Copy link
Owner

@baanish baanish commented Mar 12, 2026

Introduce a novel 'arx' compression codec optimized for fragment-based
artifact transport. The codec pipelines three stages — domain-dictionary
substitution, brotli compression (quality 11), and adaptive encoding —
to dramatically reduce fragment sizes compared to the existing deflate
codec.

Compression pipeline:

  1. Domain-dictionary substitution: A shared dictionary
    (public/arx-dictionary.json, 2857 bytes) maps 25 single-byte slots
    (JSON envelope boilerplate, markdown syntax) and 147 extended slots
    (programming keywords, React patterns, common English words) to
    control-byte sequences, reducing redundancy before compression.
  2. Brotli compression via brotli-wasm at quality 11, achieving ~20%
    better ratios than zlib deflate on typical payloads.
  3. Adaptive encoding with three tiers, auto-selecting the shortest:
    • base76: 77 URL-fragment-safe ASCII characters (~6% denser than
      base64url). Best for strict ASCII-only contexts.
    • base1k: 1774 Unicode code points (U+00A1–U+07FF), ~10.79
      bits/char. ~42% fewer characters than base76.
    • baseBMP: ~62k safe BMP code points (U+00A1–U+FFEF), ~15.92
      bits/char. ~70% fewer characters than deflate+base64url.

Benchmark results (8000-char payloads):
AGENTS.md (markdown): 8000 → 1307 chars (6.12x ratio, -69.5% vs deflate)
fragment.ts (code): 8000 → 946 chars (8.46x ratio, -77.8% vs deflate)
README.md (mixed): 3714 → 1014 chars (3.66x ratio, -55.0% vs deflate)

Shared dictionary:

  • Served as a static endpoint at /arx-dictionary.json (+ .br variant
    at 929 bytes) so agents can fetch it for local compression.
  • The viewer loads the dictionary on startup; a built-in fallback
    dictionary preserves backward compatibility if the fetch fails.
  • scripts/compress-dictionary.mjs minifies and brotli-compresses the
    dictionary file for deployment.

Protocol changes:

  • Add 'arx' to the supported codecs union (plain | lz | deflate | arx)
  • Add encodeEnvelopeAsync / decodeFragmentAsync for async brotli-wasm
    support; sync encode/decode preserved for plain/lz/deflate
  • baseBMP payloads use U+FFF0 marker prefix for auto-detection
  • base1k payloads detected by high-Unicode character presence

Other notes:

  • Recursive compression tested and confirmed to provide no benefit
    (brotli output is incompressible at depth 2+)
  • baseBMP is the fastest ARX encoding variant (~9ms avg for 8k payload)
  • brotli-wasm added as a runtime dependency for browser-side compression

Files changed:

  • src/lib/payload/arx-codec.ts (new): full codec implementation
  • src/lib/payload/schema.ts: add 'arx' codec type
  • src/lib/payload/fragment.ts: async encode/decode, three-tier selection
  • src/lib/payload/link-creator.ts: use async encoding for best compression
  • src/components/viewer-shell.tsx: async decode + dictionary loading
  • src/components/home/link-creator.tsx: async encode integration
  • public/arx-dictionary.json (new): shared substitution dictionary
  • public/arx-dictionary.json.br (new): pre-compressed dictionary
  • scripts/compress-dictionary.mjs (new): dictionary compression script
  • scripts/codec-poc.mjs: updated benchmark with all encoding tiers
  • tests/arx-codec.test.ts (new): 24 tests covering all codec paths
  • vitest.config.ts: alias for Node-compatible brotli-wasm loading
  • AGENTS.md, README.md, docs/, skills/: documentation updates

Co-authored-by: Junie junie@jetbrains.com

Summary by CodeRabbit

  • New Features

    • ARX codec for smaller fragments (dictionary-driven substitution + Brotli + multi-encoding tiers).
    • Async link generation and async decode flow to support ARX.
    • Public ARX dictionary endpoint with a pre-compressed distribution and a script to generate it.
  • Documentation

    • Updated architecture and payload-format docs to describe ARX behavior, codec priorities, and fragment formats.
  • Tests

    • Added comprehensive ARX encoding/decoding and fragment round-trip tests.
  • Chores

    • Added brotli dependency and new ignore rule for /.junie.

baanish and others added 2 commits March 12, 2026 16:20
…oding

Introduce a novel 'arx' compression codec optimized for fragment-based
artifact transport. The codec pipelines three stages — domain-dictionary
substitution, brotli compression (quality 11), and adaptive encoding —
to dramatically reduce fragment sizes compared to the existing deflate
codec.

Compression pipeline:
  1. Domain-dictionary substitution: A shared dictionary
     (public/arx-dictionary.json, 2857 bytes) maps 25 single-byte slots
     (JSON envelope boilerplate, markdown syntax) and 147 extended slots
     (programming keywords, React patterns, common English words) to
     control-byte sequences, reducing redundancy before compression.
  2. Brotli compression via brotli-wasm at quality 11, achieving ~20%
     better ratios than zlib deflate on typical payloads.
  3. Adaptive encoding with three tiers, auto-selecting the shortest:
     - base76: 77 URL-fragment-safe ASCII characters (~6% denser than
       base64url). Best for strict ASCII-only contexts.
     - base1k: 1774 Unicode code points (U+00A1–U+07FF), ~10.79
       bits/char. ~42% fewer characters than base76.
     - baseBMP: ~62k safe BMP code points (U+00A1–U+FFEF), ~15.92
       bits/char. ~70% fewer characters than deflate+base64url.

Benchmark results (8000-char payloads):
  AGENTS.md (markdown):  8000 → 1307 chars (6.12x ratio, -69.5% vs deflate)
  fragment.ts (code):    8000 →  946 chars (8.46x ratio, -77.8% vs deflate)
  README.md (mixed):     3714 → 1014 chars (3.66x ratio, -55.0% vs deflate)

Shared dictionary:
  - Served as a static endpoint at /arx-dictionary.json (+ .br variant
    at 929 bytes) so agents can fetch it for local compression.
  - The viewer loads the dictionary on startup; a built-in fallback
    dictionary preserves backward compatibility if the fetch fails.
  - scripts/compress-dictionary.mjs minifies and brotli-compresses the
    dictionary file for deployment.

Protocol changes:
  - Add 'arx' to the supported codecs union (plain | lz | deflate | arx)
  - Add encodeEnvelopeAsync / decodeFragmentAsync for async brotli-wasm
    support; sync encode/decode preserved for plain/lz/deflate
  - baseBMP payloads use U+FFF0 marker prefix for auto-detection
  - base1k payloads detected by high-Unicode character presence

Other notes:
  - Recursive compression tested and confirmed to provide no benefit
    (brotli output is incompressible at depth 2+)
  - baseBMP is the fastest ARX encoding variant (~9ms avg for 8k payload)
  - brotli-wasm added as a runtime dependency for browser-side compression

Files changed:
  - src/lib/payload/arx-codec.ts (new): full codec implementation
  - src/lib/payload/schema.ts: add 'arx' codec type
  - src/lib/payload/fragment.ts: async encode/decode, three-tier selection
  - src/lib/payload/link-creator.ts: use async encoding for best compression
  - src/components/viewer-shell.tsx: async decode + dictionary loading
  - src/components/home/link-creator.tsx: async encode integration
  - public/arx-dictionary.json (new): shared substitution dictionary
  - public/arx-dictionary.json.br (new): pre-compressed dictionary
  - scripts/compress-dictionary.mjs (new): dictionary compression script
  - scripts/codec-poc.mjs: updated benchmark with all encoding tiers
  - tests/arx-codec.test.ts (new): 24 tests covering all codec paths
  - vitest.config.ts: alias for Node-compatible brotli-wasm loading
  - AGENTS.md, README.md, docs/*, skills/*: documentation updates

Co-authored-by: Junie <junie@jetbrains.com>
Ignore the top-level .junie directory by adding '/.junie' to .gitignore to prevent local/tool-specific files from being committed to the repository.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new ARX codec and async payload flows: dictionary-driven substitutions, Brotli compression, three base encodings (base76, base1k, baseBMP), runtime dictionary loading with fallback, async encode/decode APIs, tooling/tests/docs for ARX, plus a minor .gitignore entry.

Changes

Cohort / File(s) Summary
ARX Codec Core
src/lib/payload/arx-codec.ts, src/lib/payload/schema.ts, tests/arx-codec.test.ts
New arx implementation: dictionary types/state, substitution encode/decode, base76/base1k/baseBMP encoders, lazy brotli integration, arxCompress/arxDecompress APIs, dict load helpers, and comprehensive tests.
Payload Async Integration
src/lib/payload/fragment.ts, src/lib/payload/link-creator.ts, src/lib/payload/..., src/components/home/link-creator.tsx, src/components/viewer-shell.tsx
Adds async encode/decode APIs (encodeEnvelopeAsync, decodeFragmentAsync, createGeneratedArtifactLinkAsync), gates ARX to async flows, updates components to load dictionary and use async paths, and adds error/cancellation handling.
Dictionary & Assets
public/arx-dictionary.json, public/arx-dictionary.json.br
Adds versioned ARX substitution dictionary and a pre-compressed Brotli artifact (served endpoint for local encoding).
Tooling & Scripts
scripts/compress-dictionary.mjs, scripts/codec-poc.mjs, package.json, vitest.config.ts
Adds compressor script to Brotli-minify dictionary, a POC benchmark rewritten for ARX variants, brotli-wasm dependency and Vitest alias.
Docs & Guidance
AGENTS.md, README.md, docs/architecture.md, docs/payload-format.md, skills/agent-render-linking/SKILL.md
Documentation updates describing ARX pipeline, fragment shape v1.arx.<dictVersion>.<payload>, encoding tiers, dictionary endpoint, and codec selection order (async-first: arx -> deflate -> lz -> plain).
Tests & Config
tests/fragment.test.ts, vitest.config.ts
Adds test ensuring sync ARX encode throws (guides to async API); test config alias for brotli module.
Miscellaneous
.gitignore
Added ignore rule for root path /.junie.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant Dict as ARX Dictionary Service
    participant Brotli as Brotli (wasm)
    participant ARX as ARX Encoder/Decoder
    participant BaseEnc as Base Encoder/Chooser

    Client->>ARX: submit JSON payload
    ARX->>Dict: fetch/verify dictionary (async)
    Dict-->>ARX: substitution table
    ARX->>ARX: apply substitutions -> bytes
    ARX->>Brotli: compress bytes
    Brotli-->>ARX: compressed bytes
    ARX->>BaseEnc: encode (baseBMP/base1k/base76)
    BaseEnc-->>ARX: encoded fragment
    ARX-->>Client: return encoded fragment

    Client->>ARX: provide encoded fragment for decode
    ARX->>BaseEnc: detect & decode encoding
    BaseEnc-->>ARX: compressed bytes
    ARX->>Brotli: decompress
    Brotli-->>ARX: substituted bytes
    ARX->>Dict: reverse substitutions
    Dict-->>ARX: original JSON
    ARX-->>Client: return JSON
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I nibbled through tokens, dictionary bright,
Brotli hums low beneath each tiny bite,
BaseBMP, Base1k, Base76 in a row,
Substitutions tuck long strings nice and low,
Hop, compress, return — the JSON glows.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 24.44% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: add arx codec with domain dictionary and multi-tier Unicode encoding' accurately describes the main change: introducing a new ARX compression codec with dictionary-based substitution and multiple Unicode encoding tiers.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch experimental-compression
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kilo-code-bot
Copy link

kilo-code-bot bot commented Mar 12, 2026

Code Review Summary

Status: Issues Previously Flagged | Recommendation: Address existing comments before merge

Overview

Severity Count
CRITICAL 1
WARNING 2
SUGGESTION 0

The PR introduces the new arx codec with substantial compression improvements (~70% smaller fragments than deflate). The existing inline comments already identify the key issues that should be addressed.

Key Observations

  1. Async/Sync Separation - The code properly separates async (arx) and sync (plain/lz/deflate) codecs with distinct entry points. The sync path throws clear errors when arx is explicitly requested.

  2. Dictionary Loading - The arx dictionary is loaded from /arx-dictionary.json with proper validation via isArxDictionary(). Falls back to built-in dictionary on failure.

  3. Size Limits - Both fragment size and decoded payload size limits are properly enforced before UI rendering.

  4. Codec Priority - The async codec selection now defaults to arx -> deflate -> lz -> plain which should produce the smallest fragments.

Existing Issues to Address

The following issues have been previously flagged in inline comments:

File Line Severity Status
src/lib/payload/fragment.ts 396 CRITICAL Open
src/lib/payload/fragment.ts 376 WARNING Open
src/lib/payload/arx-codec.ts 238 WARNING Open

Files Reviewed (12 files)

  • .gitignore - Added local dev file
  • AGENTS.md - Documentation update
  • README.md - Documentation update
  • docs/architecture.md - Documentation update
  • docs/payload-format.md - Documentation update
  • package-lock.json - Added brotli-wasm dependency
  • src/lib/payload/arx-codec.ts - New codec implementation
  • src/lib/payload/fragment.ts - Async encoding/decoding
  • src/lib/payload/link-creator.ts - Async link creation
  • src/lib/payload/schema.ts - Added arx to codecs
  • tests/arx-codec.test.ts - New test file
  • tests/fragment.test.ts - Added arx test
  • vitest.config.ts - Test configuration

Note: Security analysis shows proper input validation, size limits, and safe encoding practices. No new security vulnerabilities identified.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c44740ca79

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +361 to +362
if (codec === "arx") {
decodedJson = await arxDecompress(encoded);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Decode percent-escaped arx payload before decompressing

Browsers percent-encode non-ASCII fragment characters (for example, setting #...é becomes #...%C3%A9 in location.hash), so base1k/baseBMP arx links arrive here as % sequences rather than raw Unicode code points. Passing encoded directly to arxDecompress causes the decoder to take the wrong path and fail as invalid-json, which breaks copied/opened arx links in normal browser navigation. Normalize the payload (e.g., decodeURIComponent) before arx decoding.

Useful? React with 👍 / 👎.

arxCompressBMP(json),
]);
return [
{ value: `${PAYLOAD_FRAGMENT_KEY}=v1.arx.${ascii}`, codec: "arx", packed },

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include dictionary version in arx fragment format

This wire format stores arx data as v1.arx.<payload> with no dictionary identifier, but the active substitution table is mutable at runtime via external dictionary loading. When encoder and decoder use different dictionaries (fetch failure, version changes, or fallback vs loaded table), the same payload decodes with wrong substitutions and cannot be parsed. Make the fragment self-describing with a dict version (or hard-pin the same table everywhere) to keep arx links portable.

Useful? React with 👍 / 👎.

Comment on lines +236 to +238
const lenHigh = Math.floor(bytes.length / ALPHABET.length);
const lenLow = bytes.length % ALPHABET.length;
return ALPHABET[lenHigh] + ALPHABET[lenLow] + chars.join("");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce base76 length-prefix bounds

The base76 encoder writes byte length into exactly two base-77 characters, which only represents up to 77^2 - 1 bytes. For 5929+ byte inputs, ALPHABET[lenHigh] is out of range and serializes as undefined, producing corrupt output that decodeBase76 reconstructs with the wrong length. This silently breaks larger arxCompress ASCII outputs; add a bound check or a variable-length header.

Useful? React with 👍 / 👎.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (8)
.gitignore (1)

142-142: Add a clarifying comment explaining what .junie is.

No references to .junie exist elsewhere in the codebase or documentation, making its purpose unclear. If .junie is a legitimate team-wide artifact or tool, add a brief inline comment explaining what creates it and why it should be in the shared .gitignore.

Suggested improvement
 .cursor/*
+
+# Junie: [description of tool/artifact]
 /.junie
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore at line 142, Add a short inline comment above the "/.junie" entry
in .gitignore explaining what ".junie" is, who or which tool creates it, and why
it should be ignored (e.g., "temporary IDE/workspace files created by JUNIE
tool, shared-team cache, etc."), so future contributors understand its purpose;
update the .gitignore by inserting that explanatory comment immediately before
the "/.junie" line.
src/lib/payload/link-creator.ts (1)

140-171: Extract the shared link-finalization flow before sync/async drift.

This is almost a line-for-line copy of createGeneratedArtifactLink; only the encoder call changes. Keeping the fragment-length check and baseUrl assembly in one helper will make future fixes much less likely to land in only one path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/payload/link-creator.ts` around lines 140 - 171,
createGeneratedArtifactLinkAsync duplicates the post-encoding "finalize link"
logic present in the synchronous createGeneratedArtifactLink; extract that
shared flow into a helper (e.g. finalizeGeneratedArtifactLink or
buildGeneratedArtifactLink) that accepts the validated envelope, computed hash
string, and optional baseUrl and performs fragmentLength computation
(hash.length - 1), MAX_FRAGMENT_LENGTH check, baseUrl URL/hash assembly, and
returns the GeneratedArtifactLink shape (envelope, artifact, hash, url,
fragmentLength); then have createGeneratedArtifactLinkAsync call
encodeEnvelopeAsync(normalized.envelope) to build the hash and pass the
envelope+hash to the new helper, and have createGeneratedArtifactLink use the
same helper after its encodeEnvelope call so both paths share identical
finalization behavior.
tests/arx-codec.test.ts (1)

35-40: Consider seeding random data for reproducibility.

Using Math.random() for test data generation can make test failures harder to reproduce. However, for validating URL-fragment-safe character output, this is an acceptable tradeoff since the test verifies a property that should hold for all inputs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/arx-codec.test.ts` around lines 35 - 40, The test uses Math.random() to
generate bytes which makes failures non-deterministic; update the test in
tests/arx-codec.test.ts (the it block that calls encodeBase76) to seed the
random data for reproducibility by replacing the Math.random() loop with a
deterministic pseudo-random generator or a fixed seed sequence (e.g., a simple
xorshift or a predefined byte array) so the generated Uint8Array is repeatable
across runs while still exercising encodeBase76; ensure the replacement produces
100 bytes and preserves the same assertion that the encoded string matches the
URL-fragment-safe regex.
src/lib/payload/fragment.ts (2)

319-381: Significant code duplication with decodeFragment.

decodeFragmentAsync duplicates most of the validation logic from decodeFragment (lines 194-290). Consider extracting the common validation into a shared helper to reduce maintenance burden and ensure consistency.

♻️ Suggested approach

Extract the common fragment parsing and validation into a helper:

type FragmentParseResult = 
  | { ok: false; error: ParsedPayload }
  | { ok: true; codec: PayloadCodec; encoded: string; fragmentLength: number };

function parseFragmentHeader(hash: string): FragmentParseResult {
  const fragment = hash.startsWith("#") ? hash.slice(1) : hash;
  // ... common validation logic ...
  return { ok: true, codec, encoded, fragmentLength: fragment.length };
}

Then both decodeFragment and decodeFragmentAsync can use this shared helper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/payload/fragment.ts` around lines 319 - 381, decodeFragmentAsync
duplicates the validation/parsing logic already present in decodeFragment;
extract the shared header parsing into a helper (e.g., parseFragmentHeader) that
takes the raw hash and returns either an error ParsedPayload or a success object
containing codec: PayloadCodec, encoded: string, and fragmentLength: number;
update decodeFragment and decodeFragmentAsync to call parseFragmentHeader,
short-circuit on the error result, and then proceed with codec-specific decoding
(arxDecompress or decodePayload) and JSON parsing in their respective functions
(preserve resolveEnvelope call in decodeFragmentAsync with fragmentLength).

132-145: Consider adding error handling for individual encoding failures.

Promise.all will reject immediately if any of the three compression functions fails (e.g., brotli-wasm load failure). This causes the entire candidate generation to fail even though other candidates (deflate, lz, plain) might succeed. Consider wrapping with Promise.allSettled or individual try-catch to gracefully handle partial failures.

♻️ Proposed fix using Promise.allSettled
 async function buildArxCandidates(envelope: PayloadEnvelope, packed: boolean): Promise<CandidateFragment[]> {
   const payloadEnvelope = { ...envelope, codec: "arx" as PayloadCodec };
   const json = JSON.stringify(packed ? packEnvelope(payloadEnvelope) : payloadEnvelope);
-  const [ascii, unicode, bmp] = await Promise.all([
+  const results = await Promise.allSettled([
     arxCompress(json),
     arxCompressUnicode(json),
     arxCompressBMP(json),
   ]);
-  return [
-    { value: `${PAYLOAD_FRAGMENT_KEY}=v1.arx.${ascii}`, codec: "arx", packed },
-    { value: `${PAYLOAD_FRAGMENT_KEY}=v1.arx.${unicode}`, codec: "arx", packed },
-    { value: `${PAYLOAD_FRAGMENT_KEY}=v1.arx.${bmp}`, codec: "arx", packed },
-  ];
+  const candidates: CandidateFragment[] = [];
+  for (const result of results) {
+    if (result.status === "fulfilled") {
+      candidates.push({ value: `${PAYLOAD_FRAGMENT_KEY}=v1.arx.${result.value}`, codec: "arx", packed });
+    }
+  }
+  return candidates;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/payload/fragment.ts` around lines 132 - 145, The buildArxCandidates
function currently uses Promise.all for arxCompress, arxCompressUnicode, and
arxCompressBMP so a single failure aborts all candidates; change it to use
Promise.allSettled (or individual try/catch) to collect successes and skip
failures so other payload candidates (e.g., deflate, lz, plain) can proceed.
Specifically, update buildArxCandidates to call Promise.allSettled on
[arxCompress(json), arxCompressUnicode(json), arxCompressBMP(json)], map only
the fulfilled results to CandidateFragment entries using PAYLOAD_FRAGMENT_KEY
and the same v1.arx.{value} naming (preserve codec="arx" and packed), and handle
the case where all three fail by returning an empty array (or otherwise
preserving existing behavior).
src/lib/payload/arx-codec.ts (3)

307-325: Silent data corruption on invalid Base1k input.

In decodeBase1k, unknown characters default to index 0 via the ?? 0 fallback (lines 310, 311, 316). This silently produces incorrect output rather than signaling an error. For a codec used in payload transport, consider throwing on invalid input to fail fast.

💡 Optional: throw on unknown characters
 export function decodeBase1k(str: string): Uint8Array {
   if (str.length < 2) return new Uint8Array(0);

-  const lenHigh = UNICODE_CHAR_TO_INDEX.get(str[0]) ?? 0;
-  const lenLow = UNICODE_CHAR_TO_INDEX.get(str[1]) ?? 0;
+  const lenHigh = UNICODE_CHAR_TO_INDEX.get(str[0]);
+  const lenLow = UNICODE_CHAR_TO_INDEX.get(str[1]);
+  if (lenHigh === undefined || lenLow === undefined) {
+    throw new Error("Invalid base1k length prefix");
+  }
   const byteLen = lenHigh * UNICODE_ALPHABET.length + lenLow;

   let num = BIGINT_0;
   for (let i = 2; i < str.length; i++) {
-    num = num * UBASE + BigInt(UNICODE_CHAR_TO_INDEX.get(str[i]) ?? 0);
+    const idx = UNICODE_CHAR_TO_INDEX.get(str[i]);
+    if (idx === undefined) {
+      throw new Error(`Invalid base1k character at position ${i}`);
+    }
+    num = num * UBASE + BigInt(idx);
   }

The same consideration applies to decodeBase76 (line 250) and decodeBaseBMP (lines 547-553).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/payload/arx-codec.ts` around lines 307 - 325, In decodeBase1k (and
likewise decodeBase76 and decodeBaseBMP) do not silently map unknown characters
via the "?? 0" fallback; instead validate lookups from UNICODE_CHAR_TO_INDEX for
lenHigh, lenLow and each loop character (use explicit undefined checks on
UNICODE_CHAR_TO_INDEX.get(str[i])) and throw a descriptive Error that includes
the invalid character and its index when a lookup fails; ensure the length
calculation (lenHigh/lenLow -> byteLen) only proceeds after validation so
invalid inputs fail fast rather than producing corrupted output.

200-211: Minor documentation inaccuracy: alphabet has 77 characters, not 76.

The comments reference "Base76" and "77 total" but the actual encoding uses ALPHABET.length which is 77. The naming is slightly inconsistent with the implementation.

📝 Consider renaming or clarifying

Either rename to base77 throughout, or update the comment at line 203 to clarify:

-// Alphabet: A-Za-z0-9 plus 15 fragment-safe punctuation chars (77 total).
+// Alphabet: A-Za-z0-9 plus 15 fragment-safe punctuation chars (77 total, hence "base77").
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/payload/arx-codec.ts` around lines 200 - 211, The header comment
incorrectly labels the encoding "Base76" while ALPHABET and BASE use 77
characters; update the comment to consistently refer to "Base77" (or otherwise
state "77 total characters") and adjust the descriptive lines around the
alphabet and packing density to match ALPHABET and BASE so the documentation
matches the implementation (refer to ALPHABET and BASE in arx-codec.ts).

578-585: Prevent redundant concurrent brotli-wasm imports with promise-based pattern.

The BrotliModule type correctly matches the brotli-wasm API. The double cast is a necessary workaround since brotli-wasm's default export is a Promise. However, concurrent calls to getBrotli() before the first completes will trigger redundant dynamic imports. While the ES module system caches these, storing the import promise avoids wasted concurrent operations:

Suggested refactoring
 let brotliModule: BrotliModule | null = null;
+let brotliPromise: Promise<BrotliModule> | null = null;

 async function getBrotli(): Promise<BrotliModule> {
   if (brotliModule) return brotliModule;
-  const mod = await import("brotli-wasm");
-  brotliModule = (await mod.default) as unknown as BrotliModule;
-  return brotliModule;
+  if (!brotliPromise) {
+    brotliPromise = import("brotli-wasm").then(async (mod) => {
+      brotliModule = (await mod.default) as unknown as BrotliModule;
+      return brotliModule;
+    });
+  }
+  return brotliPromise;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/payload/arx-codec.ts` around lines 578 - 585, Concurrent calls to
getBrotli() can trigger redundant dynamic imports; introduce a
brotliModulePromise variable in addition to brotliModule and use it to
deduplicate in-flight imports: in getBrotli(), if brotliModule exists return it;
if brotliModulePromise exists await and return its resolved value; otherwise set
brotliModulePromise = import("brotli-wasm").then(m => (m.default as unknown) as
BrotliModule), await it, assign the result to brotliModule, clear
brotliModulePromise, and return brotliModule (reference symbols: brotliModule,
brotliModulePromise, getBrotli, BrotliModule).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/codec-poc.mjs`:
- Around line 7-209: The benchmark in scripts/codec-poc.mjs is testing a
standalone reimplementation of the codec (functions like encBase76, encBase1k,
encBaseBMP, dictEncode, brotli) against plain file text instead of exercising
the real fragment path used in production (encodeEnvelopeAsync applied to a
normalized PayloadEnvelope), so replace the POC harness to call the shipped
encoder path: construct a real PayloadEnvelope like the app does, invoke
encodeEnvelopeAsync (or the exported encode function the app uses) to produce
fragments, and run your size/time measurements on those generated fragments
rather than on the parallel implementations and raw file bytes; keep the
reimplementation only for cross-checking if needed but ensure README/PR
benchmarks use encodeEnvelopeAsync and the same envelope packing logic as
production.

In `@scripts/compress-dictionary.mjs`:
- Around line 21-27: The current flow writes the minified JSON file before
brotli.compress runs, risking out-of-sync artifacts if compression fails; change
the sequence in the script so you first compute minified (const minified =
JSON.stringify(src)), run brotli.compress(Buffer.from(minified, "utf8"), {
quality: 11 }) and only after the compress call succeeds write both outputs with
writeFileSync("public/arx-dictionary.json.br", compressed) and
writeFileSync("public/arx-dictionary.json", minified, "utf8"); wrap the compress
call in a try/catch so you never write the JSON if compression throws and ensure
you reference the same local variables (minified, compressed) and writeFileSync
calls as shown.

In `@src/components/viewer-shell.tsx`:
- Around line 311-313: The call to encodeEnvelopeAsync(...) lacks a .catch()
handler so rejections from encodeEnvelopeAsync (which can happen when
buildCandidatesAsync or brotli-wasm loading fails) cause unhandled promise
rejections; add a .catch(...) on the promise returned by encodeEnvelopeAsync in
the component where setFragmentHash(...) is used (reference encodeEnvelopeAsync
and setFragmentHash) to handle errors: log the error (or surface it to the
UI/state), and ensure you clear or leave fragment state in a consistent way on
failure (e.g., do not call setFragmentHash on error and optionally set an error
flag/state for the UI).
- Around line 248-252: The effect unconditionally sets dictReady true even when
loadArxDictionary() returns -1 and also can re-run in StrictMode; change the
useEffect to call loadArxDictionary(), inspect its return value and only call
setDictReady(true) on success, otherwise set an error state (e.g., setDictError
or setDictReady(false)) and log the failure including the returned error code;
additionally guard against double fetches in React StrictMode by using a ref
flag (e.g., hasLoadedRef) or an AbortController so loadArxDictionary() runs only
once per mount; reference loadArxDictionary, dictReady, setDictReady, and the
useEffect wrapper when making the changes.

In `@src/lib/payload/arx-codec.ts`:
- Around line 564-567: The doc comment on isBaseBMPEncoded incorrectly states
the marker is U+FFEE while the actual constant BMP_MARKER is "\uFFF0"; update
the comment to reference U+FFF0 (or phrase it to read from BMP_MARKER) so the
documented marker matches the BMP_MARKER constant used by isBaseBMPEncoded.
- Around line 127-149: The function loadArxDictionary currently skips schema
validation when source is an object; adjust it so that when source is an object
you call isArxDictionary(source) and handle failure exactly like the fetched
path (return -1) before assigning dict; only pass the validated dict to
buildSubstitutions and set activeDictVersion/ dictionaryLoaded; keep the
existing error handling behavior on exceptions.

---

Nitpick comments:
In @.gitignore:
- Line 142: Add a short inline comment above the "/.junie" entry in .gitignore
explaining what ".junie" is, who or which tool creates it, and why it should be
ignored (e.g., "temporary IDE/workspace files created by JUNIE tool, shared-team
cache, etc."), so future contributors understand its purpose; update the
.gitignore by inserting that explanatory comment immediately before the
"/.junie" line.

In `@src/lib/payload/arx-codec.ts`:
- Around line 307-325: In decodeBase1k (and likewise decodeBase76 and
decodeBaseBMP) do not silently map unknown characters via the "?? 0" fallback;
instead validate lookups from UNICODE_CHAR_TO_INDEX for lenHigh, lenLow and each
loop character (use explicit undefined checks on
UNICODE_CHAR_TO_INDEX.get(str[i])) and throw a descriptive Error that includes
the invalid character and its index when a lookup fails; ensure the length
calculation (lenHigh/lenLow -> byteLen) only proceeds after validation so
invalid inputs fail fast rather than producing corrupted output.
- Around line 200-211: The header comment incorrectly labels the encoding
"Base76" while ALPHABET and BASE use 77 characters; update the comment to
consistently refer to "Base77" (or otherwise state "77 total characters") and
adjust the descriptive lines around the alphabet and packing density to match
ALPHABET and BASE so the documentation matches the implementation (refer to
ALPHABET and BASE in arx-codec.ts).
- Around line 578-585: Concurrent calls to getBrotli() can trigger redundant
dynamic imports; introduce a brotliModulePromise variable in addition to
brotliModule and use it to deduplicate in-flight imports: in getBrotli(), if
brotliModule exists return it; if brotliModulePromise exists await and return
its resolved value; otherwise set brotliModulePromise =
import("brotli-wasm").then(m => (m.default as unknown) as BrotliModule), await
it, assign the result to brotliModule, clear brotliModulePromise, and return
brotliModule (reference symbols: brotliModule, brotliModulePromise, getBrotli,
BrotliModule).

In `@src/lib/payload/fragment.ts`:
- Around line 319-381: decodeFragmentAsync duplicates the validation/parsing
logic already present in decodeFragment; extract the shared header parsing into
a helper (e.g., parseFragmentHeader) that takes the raw hash and returns either
an error ParsedPayload or a success object containing codec: PayloadCodec,
encoded: string, and fragmentLength: number; update decodeFragment and
decodeFragmentAsync to call parseFragmentHeader, short-circuit on the error
result, and then proceed with codec-specific decoding (arxDecompress or
decodePayload) and JSON parsing in their respective functions (preserve
resolveEnvelope call in decodeFragmentAsync with fragmentLength).
- Around line 132-145: The buildArxCandidates function currently uses
Promise.all for arxCompress, arxCompressUnicode, and arxCompressBMP so a single
failure aborts all candidates; change it to use Promise.allSettled (or
individual try/catch) to collect successes and skip failures so other payload
candidates (e.g., deflate, lz, plain) can proceed. Specifically, update
buildArxCandidates to call Promise.allSettled on [arxCompress(json),
arxCompressUnicode(json), arxCompressBMP(json)], map only the fulfilled results
to CandidateFragment entries using PAYLOAD_FRAGMENT_KEY and the same
v1.arx.{value} naming (preserve codec="arx" and packed), and handle the case
where all three fail by returning an empty array (or otherwise preserving
existing behavior).

In `@src/lib/payload/link-creator.ts`:
- Around line 140-171: createGeneratedArtifactLinkAsync duplicates the
post-encoding "finalize link" logic present in the synchronous
createGeneratedArtifactLink; extract that shared flow into a helper (e.g.
finalizeGeneratedArtifactLink or buildGeneratedArtifactLink) that accepts the
validated envelope, computed hash string, and optional baseUrl and performs
fragmentLength computation (hash.length - 1), MAX_FRAGMENT_LENGTH check, baseUrl
URL/hash assembly, and returns the GeneratedArtifactLink shape (envelope,
artifact, hash, url, fragmentLength); then have createGeneratedArtifactLinkAsync
call encodeEnvelopeAsync(normalized.envelope) to build the hash and pass the
envelope+hash to the new helper, and have createGeneratedArtifactLink use the
same helper after its encodeEnvelope call so both paths share identical
finalization behavior.

In `@tests/arx-codec.test.ts`:
- Around line 35-40: The test uses Math.random() to generate bytes which makes
failures non-deterministic; update the test in tests/arx-codec.test.ts (the it
block that calls encodeBase76) to seed the random data for reproducibility by
replacing the Math.random() loop with a deterministic pseudo-random generator or
a fixed seed sequence (e.g., a simple xorshift or a predefined byte array) so
the generated Uint8Array is repeatable across runs while still exercising
encodeBase76; ensure the replacement produces 100 bytes and preserves the same
assertion that the encoded string matches the URL-fragment-safe regex.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c6d47df5-65ae-49a3-9637-0bca763eced6

📥 Commits

Reviewing files that changed from the base of the PR and between a086df1 and c44740c.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (19)
  • .gitignore
  • AGENTS.md
  • README.md
  • docs/architecture.md
  • docs/payload-format.md
  • package.json
  • public/arx-dictionary.json
  • public/arx-dictionary.json.br
  • scripts/codec-poc.mjs
  • scripts/compress-dictionary.mjs
  • skills/agent-render-linking/SKILL.md
  • src/components/home/link-creator.tsx
  • src/components/viewer-shell.tsx
  • src/lib/payload/arx-codec.ts
  • src/lib/payload/fragment.ts
  • src/lib/payload/link-creator.ts
  • src/lib/payload/schema.ts
  • tests/arx-codec.test.ts
  • vitest.config.ts

Comment on lines +7 to 209
// --- Encoding helpers (mirrors arx-codec.ts) ---
const ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~!$*()',;:@/";
const BASE = BigInt(ALPHABET.length);
const B0 = 0n, B8 = 8n;

function decodePayload(encoded, codec) {
switch (codec) {
case "plain":
return strFromU8(fromBase64UrlBytes(encoded));
case "lz":
return decompressFromEncodedURIComponent(encoded);
case "deflate":
return strFromU8(inflateSync(fromBase64UrlBytes(encoded)));
default:
throw new Error(`Unsupported codec: ${codec}`);
}
function encBase76(bytes) {
if (!bytes.length) return "";
let num = B0; for (const b of bytes) num = (num << B8) | BigInt(b);
const c = []; while (num > B0) { c.push(ALPHABET[Number(num % BASE)]); num /= BASE; } c.reverse();
return ALPHABET[Math.floor(bytes.length / ALPHABET.length)] + ALPHABET[bytes.length % ALPHABET.length] + c.join("");
}

function packEnvelope(envelope) {
return {
p: 1,
v: envelope.v,
c: envelope.codec,
t: envelope.title,
a: envelope.activeArtifactId,
r: envelope.artifacts.map((artifact) => {
if (artifact.kind === "code") {
return {
i: artifact.id,
k: artifact.kind,
t: artifact.title,
f: artifact.filename,
c: artifact.content,
l: artifact.language,
};
}

if (artifact.kind === "diff") {
return {
i: artifact.id,
k: artifact.kind,
t: artifact.title,
f: artifact.filename,
p: artifact.patch,
o: artifact.oldContent,
n: artifact.newContent,
l: artifact.language,
w: artifact.view,
};
}

return {
i: artifact.id,
k: artifact.kind,
t: artifact.title,
f: artifact.filename,
c: artifact.content,
};
}),
};
const UA = [];
for (let cp = 0xA1; cp <= 0x7FF; cp++) { if (cp === 0xAD) continue; if (cp >= 0x300 && cp <= 0x36F) continue; UA.push(String.fromCodePoint(cp)); }
const UB = BigInt(UA.length);
function encBase1k(bytes) {
if (!bytes.length) return "";
let num = B0; for (const b of bytes) num = (num << B8) | BigInt(b);
const c = []; while (num > B0) { c.push(UA[Number(num % UB)]); num /= UB; } c.reverse();
return UA[Math.floor(bytes.length / UA.length)] + UA[bytes.length % UA.length] + c.join("");
}

function unpackEnvelope(value) {
if (!value || value.p !== 1 || !Array.isArray(value.r)) {
return value;
}

return {
v: value.v,
codec: value.c,
title: value.t,
activeArtifactId: value.a,
artifacts: value.r.map((artifact) => {
if (artifact.k === "code") {
return {
id: artifact.i,
kind: "code",
title: artifact.t,
filename: artifact.f,
content: artifact.c,
language: artifact.l,
};
}

if (artifact.k === "diff") {
return {
id: artifact.i,
kind: "diff",
title: artifact.t,
filename: artifact.f,
patch: artifact.p,
oldContent: artifact.o,
newContent: artifact.n,
language: artifact.l,
view: artifact.w,
};
}

return {
id: artifact.i,
kind: artifact.k,
title: artifact.t,
filename: artifact.f,
content: artifact.c,
};
}),
};
// baseBMP alphabet (simplified — same exclusions as arx-codec.ts)
const BMP = [];
for (let cp = 0xA1; cp <= 0xFFEF; cp++) {
if (cp >= 0xD800 && cp <= 0xDFFF) continue;
if (cp === 0xAD) continue;
if (cp >= 0x300 && cp <= 0x36F) continue;
if (cp >= 0x483 && cp <= 0x489) continue;
if (cp >= 0x591 && cp <= 0x5BD) continue;
if (cp === 0x5BF || (cp >= 0x5C1 && cp <= 0x5C2)) continue;
if (cp >= 0x5C4 && cp <= 0x5C5) continue;
if (cp === 0x5C7) continue;
if (cp >= 0x610 && cp <= 0x61A) continue;
if (cp >= 0x64B && cp <= 0x65F) continue;
if (cp === 0x670) continue;
if (cp >= 0x6D6 && cp <= 0x6DC) continue;
if (cp >= 0x6DF && cp <= 0x6E4) continue;
if (cp >= 0x6E7 && cp <= 0x6E8) continue;
if (cp >= 0x6EA && cp <= 0x6ED) continue;
if (cp === 0x711) continue;
if (cp >= 0x730 && cp <= 0x74A) continue;
if (cp >= 0x7A6 && cp <= 0x7B0) continue;
if (cp >= 0x7EB && cp <= 0x7F3) continue;
if (cp === 0x7FD) continue;
if (cp >= 0x816 && cp <= 0x819) continue;
if (cp >= 0x81B && cp <= 0x823) continue;
if (cp >= 0x825 && cp <= 0x827) continue;
if (cp >= 0x829 && cp <= 0x82D) continue;
if (cp >= 0x859 && cp <= 0x85B) continue;
if (cp >= 0x898 && cp <= 0x89F) continue;
if (cp >= 0x8CA && cp <= 0x903) continue;
if (cp >= 0x93A && cp <= 0x94F) continue;
if (cp >= 0x951 && cp <= 0x957) continue;
if (cp >= 0x962 && cp <= 0x963) continue;
if (cp >= 0x981 && cp <= 0x983) continue;
if (cp === 0x9BC) continue;
if (cp >= 0x9BE && cp <= 0x9C4) continue;
if (cp >= 0x9C7 && cp <= 0x9C8) continue;
if (cp >= 0x9CB && cp <= 0x9CD) continue;
if (cp === 0x9D7) continue;
if (cp >= 0x9E2 && cp <= 0x9E3) continue;
if (cp === 0x9FE) continue;
if (cp >= 0xA01 && cp <= 0xA03) continue;
if (cp >= 0xA3C && cp <= 0xA51) continue;
if (cp >= 0xA70 && cp <= 0xA71) continue;
if (cp === 0xA75) continue;
if (cp >= 0xA81 && cp <= 0xA83) continue;
if (cp >= 0xABC && cp <= 0xACD) continue;
if (cp >= 0xAE2 && cp <= 0xAE3) continue;
if (cp >= 0xAFA && cp <= 0xAFF) continue;
if (cp >= 0xB01 && cp <= 0xB03) continue;
if (cp >= 0xB3C && cp <= 0xB57) continue;
if (cp >= 0xB62 && cp <= 0xB63) continue;
if (cp >= 0xB82 && cp <= 0xB83) continue;
if (cp >= 0xBBE && cp <= 0xBCD) continue;
if (cp === 0xBD7) continue;
if (cp >= 0xC00 && cp <= 0xC04) continue;
if (cp >= 0xC3C && cp <= 0xC56) continue;
if (cp >= 0xC62 && cp <= 0xC63) continue;
if (cp >= 0xC81 && cp <= 0xC83) continue;
if (cp >= 0xCBC && cp <= 0xCD6) continue;
if (cp >= 0xCE2 && cp <= 0xCE3) continue;
if (cp === 0xCF3) continue;
if (cp >= 0xD00 && cp <= 0xD03) continue;
if (cp >= 0xD3B && cp <= 0xD57) continue;
if (cp >= 0xD62 && cp <= 0xD63) continue;
if (cp >= 0xD81 && cp <= 0xD83) continue;
if (cp >= 0xDCA && cp <= 0xDDF) continue;
if (cp >= 0xDF2 && cp <= 0xDF3) continue;
if (cp === 0xE31) continue;
if (cp >= 0xE34 && cp <= 0xE3A) continue;
if (cp >= 0xE47 && cp <= 0xE4E) continue;
if (cp === 0xEB1) continue;
if (cp >= 0xEB4 && cp <= 0xEBC) continue;
if (cp >= 0xEC8 && cp <= 0xECE) continue;
if (cp >= 0xF18 && cp <= 0xF19) continue;
if (cp === 0xF35 || cp === 0xF37 || cp === 0xF39) continue;
if (cp >= 0xF3E && cp <= 0xF3F) continue;
if (cp >= 0xF71 && cp <= 0xF84) continue;
if (cp >= 0xF86 && cp <= 0xF87) continue;
if (cp >= 0xF8D && cp <= 0xFBC) continue;
if (cp === 0xFC6) continue;
if (cp >= 0x102B && cp <= 0x103E) continue;
if (cp >= 0x1056 && cp <= 0x1059) continue;
if (cp >= 0x105E && cp <= 0x1060) continue;
if (cp >= 0x1062 && cp <= 0x1064) continue;
if (cp >= 0x1067 && cp <= 0x106D) continue;
if (cp >= 0x1071 && cp <= 0x1074) continue;
if (cp >= 0x1082 && cp <= 0x108D) continue;
if (cp === 0x108F) continue;
if (cp >= 0x109A && cp <= 0x109D) continue;
if (cp >= 0x135D && cp <= 0x135F) continue;
if (cp >= 0x1712 && cp <= 0x1715) continue;
if (cp >= 0x1732 && cp <= 0x1734) continue;
if (cp >= 0x1752 && cp <= 0x1753) continue;
if (cp >= 0x1772 && cp <= 0x1773) continue;
if (cp >= 0x17B4 && cp <= 0x17D3) continue;
if (cp === 0x17DD) continue;
if (cp >= 0x180B && cp <= 0x180F) continue;
if (cp >= 0x1885 && cp <= 0x1886) continue;
if (cp === 0x18A9) continue;
if (cp >= 0x1920 && cp <= 0x192B) continue;
if (cp >= 0x1930 && cp <= 0x193B) continue;
if (cp >= 0x1A17 && cp <= 0x1A1B) continue;
if (cp >= 0x1A55 && cp <= 0x1A5E) continue;
if (cp >= 0x1A60 && cp <= 0x1A7C) continue;
if (cp === 0x1A7F) continue;
if (cp >= 0x1AB0 && cp <= 0x1ACE) continue;
if (cp >= 0x1B00 && cp <= 0x1B04) continue;
if (cp >= 0x1B34 && cp <= 0x1B44) continue;
if (cp >= 0x1B6B && cp <= 0x1B73) continue;
if (cp >= 0x1B80 && cp <= 0x1B82) continue;
if (cp >= 0x1BA1 && cp <= 0x1BAD) continue;
if (cp >= 0x1BE6 && cp <= 0x1BF3) continue;
if (cp >= 0x1C24 && cp <= 0x1C37) continue;
if (cp >= 0x1CD0 && cp <= 0x1CF9) continue;
if (cp >= 0x1DC0 && cp <= 0x1DFF) continue;
if (cp >= 0x200B && cp <= 0x200F) continue;
if (cp >= 0x2028 && cp <= 0x202E) continue;
if (cp >= 0x2060 && cp <= 0x2069) continue;
if (cp >= 0x20D0 && cp <= 0x20F0) continue;
if (cp >= 0x2CEF && cp <= 0x2CF1) continue;
if (cp === 0x2D7F) continue;
if (cp >= 0x2DE0 && cp <= 0x2DFF) continue;
if (cp >= 0x302A && cp <= 0x302F) continue;
if (cp >= 0x3099 && cp <= 0x309A) continue;
if (cp >= 0xA66F && cp <= 0xA672) continue;
if (cp >= 0xA674 && cp <= 0xA67D) continue;
if (cp >= 0xA69E && cp <= 0xA69F) continue;
if (cp >= 0xA6F0 && cp <= 0xA6F1) continue;
if (cp === 0xA802 || cp === 0xA806 || cp === 0xA80B) continue;
if (cp >= 0xA823 && cp <= 0xA827) continue;
if (cp === 0xA82C) continue;
if (cp >= 0xA880 && cp <= 0xA881) continue;
if (cp >= 0xA8B4 && cp <= 0xA8C5) continue;
if (cp >= 0xA8E0 && cp <= 0xA8F1) continue;
if (cp === 0xA8FF) continue;
if (cp >= 0xA926 && cp <= 0xA92D) continue;
if (cp >= 0xA947 && cp <= 0xA953) continue;
if (cp >= 0xA980 && cp <= 0xA983) continue;
if (cp >= 0xA9B3 && cp <= 0xA9C0) continue;
if (cp === 0xA9E5) continue;
if (cp >= 0xAA29 && cp <= 0xAA36) continue;
if (cp === 0xAA43) continue;
if (cp >= 0xAA4C && cp <= 0xAA4D) continue;
if (cp >= 0xAA7B && cp <= 0xAA7D) continue;
if (cp === 0xAAB0) continue;
if (cp >= 0xAAB2 && cp <= 0xAAB4) continue;
if (cp >= 0xAAB7 && cp <= 0xAAB8) continue;
if (cp >= 0xAABE && cp <= 0xAABF) continue;
if (cp === 0xAAC1) continue;
if (cp >= 0xAAEB && cp <= 0xAAEF) continue;
if (cp >= 0xAAF5 && cp <= 0xAAF6) continue;
if (cp >= 0xABE3 && cp <= 0xABEA) continue;
if (cp >= 0xABEC && cp <= 0xABED) continue;
if (cp === 0xFB1E) continue;
if (cp >= 0xFE00 && cp <= 0xFE0F) continue;
if (cp >= 0xFE20 && cp <= 0xFE2F) continue;
if (cp === 0xFEFF) continue;
if (cp >= 0xFFF0 && cp <= 0xFFFF) continue;
BMP.push(String.fromCodePoint(cp));
}
const BB = BigInt(BMP.length);
function encBaseBMP(bytes) {
if (!bytes.length) return "";
let num = B0; for (const b of bytes) num = (num << B8) | BigInt(b);
const c = []; while (num > B0) { c.push(BMP[Number(num % BB)]); num /= BB; } c.reverse();
return "\uFFF0" + BMP[Math.floor(bytes.length / BMP.length)] + BMP[bytes.length % BMP.length] + c.join("");
}

function buildFragment(envelope, codec, packed) {
const json = JSON.stringify(packed ? packEnvelope({ ...envelope, codec }) : { ...envelope, codec });
return `${FRAGMENT_KEY}=v1.${codec}.${encodePayload(json, codec)}`;
// --- Dictionary substitution ---
const SBC = [1,2,3,4,5,6,7,8,0x0b,0x0e,0x0f,0x10,0x11,0x12,0x13,0x14,0x15,0x16,0x17,0x18,0x19,0x1a,0x1b,0x1c,0x1d];
const subs = [];
for (let i = 0; i < dict.singleByteSlots.length && i < SBC.length; i++)
subs.push([dict.singleByteSlots[i], String.fromCharCode(SBC[i])]);
for (let i = 0; i < dict.extendedSlots.length; i++)
subs.push([dict.extendedSlots[i], "\x00" + String.fromCharCode(i + 1)]);
function dictEncode(t) { let r = t; for (const [f,to] of subs) r = r.split(f).join(to); return r; }

function brotli(buf) {
return zlib.brotliCompressSync(Buffer.from(buf), { params: { [zlib.constants.BROTLI_PARAM_QUALITY]: 11 } });
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Benchmark the shipped fragment path, not a parallel implementation.

The app generates links from a normalized PayloadEnvelope via encodeEnvelopeAsync, but this POC reimplements the codec and runs it over bare file text. That makes the published size and timing numbers easy to drift from real fragments as envelope packing or codec internals evolve, so they are a shaky basis for the README/PR benchmarks.

Also applies to: 211-295

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/codec-poc.mjs` around lines 7 - 209, The benchmark in
scripts/codec-poc.mjs is testing a standalone reimplementation of the codec
(functions like encBase76, encBase1k, encBaseBMP, dictEncode, brotli) against
plain file text instead of exercising the real fragment path used in production
(encodeEnvelopeAsync applied to a normalized PayloadEnvelope), so replace the
POC harness to call the shipped encoder path: construct a real PayloadEnvelope
like the app does, invoke encodeEnvelopeAsync (or the exported encode function
the app uses) to produce fragments, and run your size/time measurements on those
generated fragments rather than on the parallel implementations and raw file
bytes; keep the reimplementation only for cross-checking if needed but ensure
README/PR benchmarks use encodeEnvelopeAsync and the same envelope packing logic
as production.

Comment on lines +21 to +27
// Re-serialize minified (no whitespace)
const minified = JSON.stringify(src);
writeFileSync("public/arx-dictionary.json", minified, "utf8");

// Brotli-compress the minified JSON
const compressed = brotli.compress(Buffer.from(minified, "utf8"), { quality: 11 });
writeFileSync("public/arx-dictionary.json.br", compressed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Write both artifacts only after Brotli succeeds.

Line 23 updates public/arx-dictionary.json before the compressed buffer exists. If Brotli throws, the source JSON and .br variant drift out of sync.

Proposed fix
 // Re-serialize minified (no whitespace)
 const minified = JSON.stringify(src);
-writeFileSync("public/arx-dictionary.json", minified, "utf8");
 
 // Brotli-compress the minified JSON
 const compressed = brotli.compress(Buffer.from(minified, "utf8"), { quality: 11 });
+writeFileSync("public/arx-dictionary.json", minified, "utf8");
 writeFileSync("public/arx-dictionary.json.br", compressed);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Re-serialize minified (no whitespace)
const minified = JSON.stringify(src);
writeFileSync("public/arx-dictionary.json", minified, "utf8");
// Brotli-compress the minified JSON
const compressed = brotli.compress(Buffer.from(minified, "utf8"), { quality: 11 });
writeFileSync("public/arx-dictionary.json.br", compressed);
// Re-serialize minified (no whitespace)
const minified = JSON.stringify(src);
// Brotli-compress the minified JSON
const compressed = brotli.compress(Buffer.from(minified, "utf8"), { quality: 11 });
writeFileSync("public/arx-dictionary.json", minified, "utf8");
writeFileSync("public/arx-dictionary.json.br", compressed);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/compress-dictionary.mjs` around lines 21 - 27, The current flow
writes the minified JSON file before brotli.compress runs, risking out-of-sync
artifacts if compression fails; change the sequence in the script so you first
compute minified (const minified = JSON.stringify(src)), run
brotli.compress(Buffer.from(minified, "utf8"), { quality: 11 }) and only after
the compress call succeeds write both outputs with
writeFileSync("public/arx-dictionary.json.br", compressed) and
writeFileSync("public/arx-dictionary.json", minified, "utf8"); wrap the compress
call in a try/catch so you never write the JSON if compression throws and ensure
you reference the same local variables (minified, compressed) and writeFileSync
calls as shown.

Comment on lines +248 to +252
const [dictReady, setDictReady] = useState(false);

useEffect(() => {
loadArxDictionary().then(() => setDictReady(true));
}, []);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing error handling for dictionary load failure.

loadArxDictionary() can return -1 on failure (network error, invalid JSON), but dictReady is set to true regardless. This masks load failures and may cause silent fallback to the built-in dictionary without any user indication. Additionally, in React StrictMode (dev), this effect runs twice, causing redundant fetches.

🛡️ Proposed fix to handle dictionary load result
 useEffect(() => {
-  loadArxDictionary().then(() => setDictReady(true));
+  let cancelled = false;
+  loadArxDictionary().then((version) => {
+    if (!cancelled) {
+      // version === -1 means load failed, but fallback is used
+      setDictReady(true);
+    }
+  });
+  return () => { cancelled = true; };
 }, []);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/components/viewer-shell.tsx` around lines 248 - 252, The effect
unconditionally sets dictReady true even when loadArxDictionary() returns -1 and
also can re-run in StrictMode; change the useEffect to call loadArxDictionary(),
inspect its return value and only call setDictReady(true) on success, otherwise
set an error state (e.g., setDictError or setDictReady(false)) and log the
failure including the returned error code; additionally guard against double
fetches in React StrictMode by using a ref flag (e.g., hasLoadedRef) or an
AbortController so loadArxDictionary() runs only once per mount; reference
loadArxDictionary, dictReady, setDictReady, and the useEffect wrapper when
making the changes.

Comment on lines +311 to +313
encodeEnvelopeAsync({ ...envelope, activeArtifactId: artifactId }, { codec: envelope.codec }).then((encoded) => {
setFragmentHash(`#${encoded}`);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing .catch() handler on encodeEnvelopeAsync promise.

Per context snippet at src/lib/payload/fragment.ts:188-192, encodeEnvelopeAsync can reject if buildCandidatesAsync fails (e.g., brotli-wasm module load failure). Without a .catch() handler, unhandled promise rejections will occur, potentially leaving the UI in an inconsistent state.

🐛 Proposed fix to add error handling
     encodeEnvelopeAsync({ ...envelope, activeArtifactId: artifactId }, { codec: envelope.codec }).then((encoded) => {
       setFragmentHash(`#${encoded}`);
-    });
+    }).catch((err) => {
+      console.error("Failed to encode envelope:", err);
+      // Optionally: fall back to sync encoding for non-arx codecs
+    });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
encodeEnvelopeAsync({ ...envelope, activeArtifactId: artifactId }, { codec: envelope.codec }).then((encoded) => {
setFragmentHash(`#${encoded}`);
});
encodeEnvelopeAsync({ ...envelope, activeArtifactId: artifactId }, { codec: envelope.codec }).then((encoded) => {
setFragmentHash(`#${encoded}`);
}).catch((err) => {
console.error("Failed to encode envelope:", err);
// Optionally: fall back to sync encoding for non-arx codecs
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/components/viewer-shell.tsx` around lines 311 - 313, The call to
encodeEnvelopeAsync(...) lacks a .catch() handler so rejections from
encodeEnvelopeAsync (which can happen when buildCandidatesAsync or brotli-wasm
loading fails) cause unhandled promise rejections; add a .catch(...) on the
promise returned by encodeEnvelopeAsync in the component where
setFragmentHash(...) is used (reference encodeEnvelopeAsync and setFragmentHash)
to handle errors: log the error (or surface it to the UI/state), and ensure you
clear or leave fragment state in a consistent way on failure (e.g., do not call
setFragmentHash on error and optionally set an error flag/state for the UI).

Comment on lines +127 to +149
export async function loadArxDictionary(source?: string | ArxDictionary): Promise<number> {
try {
let dict: ArxDictionary;

if (source && typeof source === "object") {
dict = source;
} else {
const url = typeof source === "string" ? source : resolveDefaultDictionaryUrl();
const response = await fetch(url);
if (!response.ok) return -1;
const json: unknown = await response.json();
if (!isArxDictionary(json)) return -1;
dict = json;
}

activeSubs = buildSubstitutions(dict);
activeDictVersion = dict.version;
dictionaryLoaded = true;
return dict.version;
} catch {
return -1;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing validation when source is an object.

When source is passed as an object (lines 131-132), it bypasses the isArxDictionary validation that's applied to fetched JSON (line 138). A malformed dictionary object could corrupt activeSubs.

🛡️ Proposed fix to add validation
     if (source && typeof source === "object") {
+      if (!isArxDictionary(source)) return -1;
       dict = source;
     } else {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export async function loadArxDictionary(source?: string | ArxDictionary): Promise<number> {
try {
let dict: ArxDictionary;
if (source && typeof source === "object") {
dict = source;
} else {
const url = typeof source === "string" ? source : resolveDefaultDictionaryUrl();
const response = await fetch(url);
if (!response.ok) return -1;
const json: unknown = await response.json();
if (!isArxDictionary(json)) return -1;
dict = json;
}
activeSubs = buildSubstitutions(dict);
activeDictVersion = dict.version;
dictionaryLoaded = true;
return dict.version;
} catch {
return -1;
}
}
export async function loadArxDictionary(source?: string | ArxDictionary): Promise<number> {
try {
let dict: ArxDictionary;
if (source && typeof source === "object") {
if (!isArxDictionary(source)) return -1;
dict = source;
} else {
const url = typeof source === "string" ? source : resolveDefaultDictionaryUrl();
const response = await fetch(url);
if (!response.ok) return -1;
const json: unknown = await response.json();
if (!isArxDictionary(json)) return -1;
dict = json;
}
activeSubs = buildSubstitutions(dict);
activeDictVersion = dict.version;
dictionaryLoaded = true;
return dict.version;
} catch {
return -1;
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/payload/arx-codec.ts` around lines 127 - 149, The function
loadArxDictionary currently skips schema validation when source is an object;
adjust it so that when source is an object you call isArxDictionary(source) and
handle failure exactly like the fetched path (return -1) before assigning dict;
only pass the validated dict to buildSubstitutions and set activeDictVersion/
dictionaryLoaded; keep the existing error handling behavior on exceptions.

Comment on lines +564 to +567
/** Returns true if the encoded string uses baseBMP encoding (starts with U+FFEE marker). */
export function isBaseBMPEncoded(str: string): boolean {
return str.startsWith(BMP_MARKER);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation inconsistency: comment says U+FFEE but marker is U+FFF0.

Line 564 states "U+FFEE marker" but BMP_MARKER is defined as "\uFFF0" (line 520).

📝 Fix the comment
-/** Returns true if the encoded string uses baseBMP encoding (starts with U+FFEE marker). */
+/** Returns true if the encoded string uses baseBMP encoding (starts with U+FFF0 marker). */
 export function isBaseBMPEncoded(str: string): boolean {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/payload/arx-codec.ts` around lines 564 - 567, The doc comment on
isBaseBMPEncoded incorrectly states the marker is U+FFEE while the actual
constant BMP_MARKER is "\uFFF0"; update the comment to reference U+FFF0 (or
phrase it to read from BMP_MARKER) so the documented marker matches the
BMP_MARKER constant used by isBaseBMPEncoded.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Mar 12, 2026

Deploying agent-render with  Cloudflare Pages  Cloudflare Pages

Latest commit: 38e13a8
Status: ✅  Deploy successful!
Preview URL: https://c024da87.agent-render.pages.dev
Branch Preview URL: https://experimental-compression.agent-render.pages.dev

View logs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/agent-render-linking/SKILL.md`:
- Line 197: Change the lowercase "markdown" in the JSON envelope example string
'","kind":"markdown","content":"' to capitalized "Markdown" so the user-facing
sentence reads '","kind":"Markdown","content":"'; update the surrounding text
where the word "markdown" is used in that sentence to "Markdown" to keep the
proper noun capitalized.

In `@src/lib/payload/fragment.ts`:
- Around line 97-99: The current getSyncCandidateCodecs function unconditionally
removes "arx" which causes a misleading generic error later in encodeEnvelope;
instead, change getSyncCandidateCodecs to first collect candidates via
getCandidateCodecs(options) and if the only candidate is "arx" return ["arx"]
(preserving the explicit sync-path ARX error), otherwise return the candidate
list with "arx" filtered out; update references to getSyncCandidateCodecs,
getCandidateCodecs, and ensure encodeEnvelope still receives the preserved "arx"
candidate so encodeEnvelopeAsync guidance is produced.
- Around line 132-145: The fragment encoder (buildArxCandidates) currently
embeds getActiveDictVersion into the fragment string
(PAYLOAD_FRAGMENT_KEY=v1.arx.${dictVersion}.…), but the decoder ignores that
version and calls arxDecompress without it; update the decoder path that parses
the numeric version to pass that extracted version into a version-aware resolver
or into arxDecompress (or call a new resolver like
resolveArxDictVersion(version) before decompression), and if resolution fails
fall back to using the full encoded remainder for decompression so older
fragments decompress with their original dictionary when available.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 32db5e48-7c7a-4e38-be8b-168d9609cce1

📥 Commits

Reviewing files that changed from the base of the PR and between c44740c and 05c5797.

📒 Files selected for processing (5)
  • AGENTS.md
  • docs/payload-format.md
  • skills/agent-render-linking/SKILL.md
  • src/lib/payload/fragment.ts
  • tests/arx-codec.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • AGENTS.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
skills/agent-render-linking/SKILL.md (1)

39-42: Documentation may mislead agents about sync codec priority.

The documented codec priority arx -> deflate -> lz -> plain only applies to async encoding (encodeEnvelopeAsync). The sync path (encodeEnvelope) defaults to ["deflate", "lz", "plain"] and explicitly excludes ARX. Agents following this guidance for sync operations will mispredict the actual codec selection order.

Consider clarifying that ARX requires async encoding and that sync encoding uses a different default priority.

📝 Suggested clarification
 Prefer:
 1. shortest valid fragment for the target surface
-2. codec priority `arx -> deflate -> lz -> plain` unless explicitly overridden
+2. codec priority `arx -> deflate -> lz -> plain` for async encoding (recommended)
+   - sync encoding defaults to `deflate -> lz -> plain` (arx requires async)
 3. packed wire mode when available
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/agent-render-linking/SKILL.md` around lines 39 - 42, Update the
SKILL.md documentation to clarify codec priority differences: explicitly state
that ARX is only available for async encoding (encodeEnvelopeAsync) and is
excluded from the sync encoder (encodeEnvelope), and that the sync default codec
priority is ["deflate","lz","plain"] while async defaults to
["arx","deflate","lz","plain"] unless overridden; also note packed wire mode
preference and the rule to prefer the shortest valid fragment for the target
surface so agents don't assume the async priority for sync calls.
src/lib/payload/fragment.ts (1)

335-341: Dictionary version compatibility is handled via decode fallback, not version-aware loading.

resolveArxDictVersion only checks if the fragment's version matches the currently active dictionary. If they don't match, the code falls back to trying different payload parsing strategies (lines 394-396) rather than loading the appropriate dictionary version.

This is a reasonable v1 approach, but if the dictionary schema changes significantly in future versions, payloads encoded with older dictionaries may fail to decode even with the fallback attempts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/payload/fragment.ts` around lines 335 - 341, resolveArxDictVersion
currently only compares the fragment version to getActiveDictVersion and returns
false on mismatch, relying on decode fallbacks; change it to actively attempt to
load/switch to the fragment's dictionary version when version !== null and
different from getActiveDictVersion by calling the existing dictionary
loader/switch function (e.g., loadDictVersion or switchActiveDict) and return
true only if the load/switch succeeds; keep the original null-handling behavior
(null -> true) and ensure resolveArxDictVersion returns false if the requested
dictionary version cannot be loaded so higher-level code can still try fallback
parsing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/payload/fragment.ts`:
- Around line 415-417: In decodeFragmentAsync, the else-branch incorrectly
passes a non-existent variable "encoded" to decodePayload; replace that with the
correct variable "remainder" (the non-ARX payload defined earlier) so that
decodePayload(remainder, codec) is called and decodedJson is set correctly in
the decodeFragmentAsync function.

---

Nitpick comments:
In `@skills/agent-render-linking/SKILL.md`:
- Around line 39-42: Update the SKILL.md documentation to clarify codec priority
differences: explicitly state that ARX is only available for async encoding
(encodeEnvelopeAsync) and is excluded from the sync encoder (encodeEnvelope),
and that the sync default codec priority is ["deflate","lz","plain"] while async
defaults to ["arx","deflate","lz","plain"] unless overridden; also note packed
wire mode preference and the rule to prefer the shortest valid fragment for the
target surface so agents don't assume the async priority for sync calls.

In `@src/lib/payload/fragment.ts`:
- Around line 335-341: resolveArxDictVersion currently only compares the
fragment version to getActiveDictVersion and returns false on mismatch, relying
on decode fallbacks; change it to actively attempt to load/switch to the
fragment's dictionary version when version !== null and different from
getActiveDictVersion by calling the existing dictionary loader/switch function
(e.g., loadDictVersion or switchActiveDict) and return true only if the
load/switch succeeds; keep the original null-handling behavior (null -> true)
and ensure resolveArxDictVersion returns false if the requested dictionary
version cannot be loaded so higher-level code can still try fallback parsing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8689721c-58da-46f9-9afa-c9677bf8f11b

📥 Commits

Reviewing files that changed from the base of the PR and between 05c5797 and fdfc90d.

📒 Files selected for processing (3)
  • skills/agent-render-linking/SKILL.md
  • src/lib/payload/fragment.ts
  • tests/fragment.test.ts

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/lib/payload/fragment.ts (1)

141-149: ⚠️ Potential issue | 🟠 Major

Resolve the stamped ARX dictionary version to a real table.

buildArxCandidates() now writes getActiveDictVersion() into the fragment, but resolveArxDictVersion() only compares that number to the current global state and arxDecompress() still uses whichever dictionary is active in src/lib/payload/arx-codec.ts. Fragments created before the shared dictionary finishes loading, or fragments from an older dictionary version, can still be decoded with the wrong substitution table. Please resolve the parsed version to an immutable dictionary snapshot, or fail with an explicit version-mismatch error instead of decoding against the current table.

Based on learnings, **/lib/payload/*.ts: Treat every payload as untrusted input and fail clearly on malformed or oversized payloads before renderer mount when possible.

Also applies to: 335-340, 393-403

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/payload/fragment.ts` around lines 141 - 149, The fragment writer
(buildArxCandidates) stamps getActiveDictVersion into the fragment but the
decoder (resolveArxDictVersion / arxDecompress) still uses the global active
table; change resolveArxDictVersion to map the parsed version to an immutable
dictionary snapshot (e.g., fetch or look up a frozen substitution table keyed by
that version) and make arxDecompress accept and use that snapshot instead of the
global active table; if the stamped version is missing or incompatible, throw a
clear version-mismatch error and fail decoding early (also add the same
snapshot-resolution and explicit failure checks where similar ARX decoding
occurs around the other ranges noted).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/payload/fragment.ts`:
- Around line 138-150: buildArxCandidates currently ranks ARX fragments by raw
string length which miscounts non-ASCII outputs; update buildArxCandidates (and
the parallel candidates at lines ~195-198) to compute and store a
transportLength for each candidate using the same percent-encoding/serialization
used by decodeArxEncodedPayload (i.e., percent-encode non-ASCII bytes the same
way the URL fragment will be serialized) and use transportLength for both
ranking and the 8000-character fragment budget check; ensure CandidateFragment
objects include this transportLength field and that downstream budget checks
reference it instead of value.length; keep symbols: buildArxCandidates,
arxCompress, arxCompressUnicode, arxCompressBMP, PAYLOAD_FRAGMENT_KEY,
decodeArxEncodedPayload, and getActiveDictVersion.

---

Duplicate comments:
In `@src/lib/payload/fragment.ts`:
- Around line 141-149: The fragment writer (buildArxCandidates) stamps
getActiveDictVersion into the fragment but the decoder (resolveArxDictVersion /
arxDecompress) still uses the global active table; change resolveArxDictVersion
to map the parsed version to an immutable dictionary snapshot (e.g., fetch or
look up a frozen substitution table keyed by that version) and make
arxDecompress accept and use that snapshot instead of the global active table;
if the stamped version is missing or incompatible, throw a clear
version-mismatch error and fail decoding early (also add the same
snapshot-resolution and explicit failure checks where similar ARX decoding
occurs around the other ranges noted).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a2daacd2-5071-49f0-a6ae-86dd13f10130

📥 Commits

Reviewing files that changed from the base of the PR and between fdfc90d and 236f6f6.

📒 Files selected for processing (1)
  • src/lib/payload/fragment.ts

…length

arxCompressUnicode and arxCompressBMP emit non-ASCII Unicode characters.
Browsers percent-encode these in URL fragments (each UTF-8 byte → %XX,
3 chars per byte), so a baseBMP candidate with raw length N can serialize
to up to 9N chars. Ranking by raw value.length caused Unicode/BMP fragments
to appear shorter than they actually are in the URL, potentially exceeding
the 8000-char budget.

Add transportLength to CandidateFragment that computes the serialized
(percent-encoded) length. Update selectCandidate to sort and budget-check
by transportLength. buildArxCandidates uses computeTransportLength() for
non-ASCII payloads; non-ARX codecs produce ASCII-only output so their
transportLength equals value.length.

Co-authored-by: Aanish Bhirud <baanish@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/lib/payload/fragment.ts (1)

219-315: Consider extracting shared fragment parsing/validation to one helper.

decodeFragment and decodeFragmentAsync duplicate key parsing/version/codec checks. A small shared parser would reduce drift risk as codecs evolve.

Also applies to: 361-451

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/payload/fragment.ts` around lines 219 - 315, Duplicate fragment
header parsing exists in decodeFragment and decodeFragmentAsync; extract a
single helper (e.g., parseFragmentHeader or parseFragmentMetadata) to
centralize: take the raw hash string, strip leading "#", validate empty and
MAX_FRAGMENT_LENGTH, split key/value and verify PAYLOAD_FRAGMENT_KEY, locate dot
positions and verify format, validate version === "v1" and codec ∈ codecs, and
return a uniform result object containing {ok:boolean,
errorResponse?:ParsedPayload, fragment:string, version, codec, encoded,
fragmentLength}. Replace the duplicated blocks in decodeFragment and
decodeFragmentAsync to call this helper, short-circuit on helper.errorResponse,
and then continue with codec-specific decoding (including the arx async check)
and JSON parsing; keep using existing constants (PAYLOAD_FRAGMENT_KEY,
MAX_FRAGMENT_LENGTH, MAX_DECODED_PAYLOAD_LENGTH, codecs) and preserve existing
error codes/messages so behavior/tests remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/payload/fragment.ts`:
- Around line 404-415: The code currently treats any payload starting with
"<digits>." as versioned and, when useVersionedPayload is true, only attempts
decodeArxEncodedPayload(versionedPayload), which can false-negative legacy
payloads that merely begin with a numeric prefix; change the decode attempt
ordering so we always try the full remainder (fallbackPayload) first and then
the stripped versionedPayload when parsedDictVersion !== null and
useVersionedPayload is true. Concretely: keep parsedDictVersion,
versionedPayload, fallbackPayload and useVersionedPayload as-is, but build
decodeAttempts so that if useVersionedPayload and parsedDictVersion !== null you
push decodeArxEncodedPayload(fallbackPayload) first then
decodeArxEncodedPayload(versionedPayload); otherwise preserve existing
fallback-only or fallback-then-versioned behavior; ensure resolveArxDictVersion
still gates whether the versioned form is attempted.

---

Nitpick comments:
In `@src/lib/payload/fragment.ts`:
- Around line 219-315: Duplicate fragment header parsing exists in
decodeFragment and decodeFragmentAsync; extract a single helper (e.g.,
parseFragmentHeader or parseFragmentMetadata) to centralize: take the raw hash
string, strip leading "#", validate empty and MAX_FRAGMENT_LENGTH, split
key/value and verify PAYLOAD_FRAGMENT_KEY, locate dot positions and verify
format, validate version === "v1" and codec ∈ codecs, and return a uniform
result object containing {ok:boolean, errorResponse?:ParsedPayload,
fragment:string, version, codec, encoded, fragmentLength}. Replace the
duplicated blocks in decodeFragment and decodeFragmentAsync to call this helper,
short-circuit on helper.errorResponse, and then continue with codec-specific
decoding (including the arx async check) and JSON parsing; keep using existing
constants (PAYLOAD_FRAGMENT_KEY, MAX_FRAGMENT_LENGTH,
MAX_DECODED_PAYLOAD_LENGTH, codecs) and preserve existing error codes/messages
so behavior/tests remain unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f92adb8f-b000-4504-815d-ef9ab3989194

📥 Commits

Reviewing files that changed from the base of the PR and between 236f6f6 and 61066a1.

📒 Files selected for processing (1)
  • src/lib/payload/fragment.ts

claude and others added 2 commits March 13, 2026 00:17
…r helper

- When useVersionedPayload && parsedDictVersion !== null, try fallbackPayload
  (full remainder) first before versionedPayload (stripped), preventing false
  negatives for legacy payloads whose content coincidentally starts with a
  numeric prefix.
- Extract parseFragmentHeader() to deduplicate the shared fragment header
  parsing logic (strip "#", validate length/key, locate dots, check version
  and codec) that was duplicated between decodeFragment and decodeFragmentAsync;
  both functions now call the helper and short-circuit on errorResponse.

https://claude.ai/code/session_01KPxKh7zY5iNbNqfR1KpqK8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants