Skip to content

EncodingId as a sealed interface; ArrayNode carries the typed id#193

Merged
dfa1 merged 7 commits into
mainfrom
sealed-encoding-id
Jul 4, 2026
Merged

EncodingId as a sealed interface; ArrayNode carries the typed id#193
dfa1 merged 7 commits into
mainfrom
sealed-encoding-id

Conversation

@dfa1

@dfa1 dfa1 commented Jul 4, 2026

Copy link
Copy Markdown
Owner

What

Two-step refactor making encoding identity typed at the core while staying string-faithful at the wire edge:

  1. ea88a91bcore.model.EncodingId becomes sealed interface EncodingId permits WellKnown, Custom. The 33 spec constants move verbatim into the nested WellKnown enum and are re-exported as WellKnown-typed interface fields, so all ~579 existing EncodingId.VORTEX_FOO call sites compile unchanged. Custom(String) wraps any other wire string; its compact constructor rejects null/blank/well-known collisions, so the two variants can never alias as map keys. parse is total over non-blank ids — an unknown id yields a typed Custom instead of an empty Optional.
  2. 21810d7e — with the Known/Unknown bit living in the id type, the KnownArrayNode/UnknownArrayNode sealed pair encodes nothing: ArrayNode collapses to a single record carrying EncodingId. ReadRegistry returns to one string-keyed map hit; allowUnknown passthrough and error messages byte-identical.

Why

  • "Typed at the core, string at the edges" without the old collapse's objection (raw String in ArrayNode) and without the sealed node pair's switch dances — the sum type lives in the id, the one place it's consumed.
  • Unlocks a real capability: EncodingDecoder/EncodingEncoder keep EncodingId in their signatures, so third-party codecs can now declare ids outside the spec set — previously impossible with the closed enum despite the registry's pluggability promise.

Review hardening (adversarial review pass, findings fixed in 21810d7e)

  • Blocker fixed: a crafted file with a zero-length/whitespace encoding-id string reached EncodingId.parse unguarded and escaped as IllegalArgumentException; FlatSegmentDecoder now rejects blank spec entries with VortexException (previously "" could even slide through allowUnknown as a passthrough). Pinned by tests on both layers.
  • WriteRegistry.Builder's TreeMap now orders by wire string — natural ordering would ClassCastException on the first Custom-keyed registration.

Verification

  • ./mvnw verify green — all 15 modules including the failsafe interop suite, after each step and after the hardening fixes.
  • ./mvnw javadoc:javadoc -pl core — zero output.

Follow-up candidates (not in this PR)

  • Footer.arraySpecs/layoutSpecs could carry typed ids now that parse is total.
  • ReadRegistry dispatch for Custom-id decoders (currently registrable but unknown-node passthrough still governs flat-segment decode).

🤖 Generated with Claude Code

dfa1 and others added 7 commits July 4, 2026 08:56
Move the ExtensionDecoder interface from the reader root into
reader.extension next to its four implementations, mirroring
EncodingDecoder in reader.decode.

Also sync stale docs: ReadRegistry no longer manages extension
decoders (no register(ExtensionDecoder), no ServiceLoader
discovery) — the spec set is closed and dispatched in Chunk.as().

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Reverts 3b25a97. Also restores KnownArrayNode/UnknownArrayNode,
whose deletions were swept into the unrelated dict-lane perf commit
12e1350 instead of the collapse commit, and drops the collapse
entry from the unreleased changelog section.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
MaskedEncodingDecoder and PatchedEncodingDecoder are implemented and
ServiceLoader-registered; the enum constants still claimed otherwise.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
sealed interface EncodingId permits WellKnown, Custom: the spec
constants move verbatim into the nested WellKnown enum and every
wire string now has a typed representation — parse() is total,
returning the WellKnown match or a Custom wrapping the raw id.
Custom's compact constructor rejects null, blank, and ids that
collide with a WellKnown wire string, so the two variants never
alias as map keys.

All constants are re-exported as WellKnown-typed interface fields,
keeping the existing EncodingId.VORTEX_FOO call sites source
compatible. KnownArrayNode narrows its component to WellKnown (a
known node can no longer hold a custom id); ArrayNode.of dispatches
WellKnown/Custom to Known/UnknownArrayNode. EncodingDecoder and
EncodingEncoder keep the interface in their signatures, which for
the first time lets third-party codecs declare ids outside the
spec set. EncodingId extends Serializable to preserve the enum's
implicit serializability for the VortexException field.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Collapse the KnownArrayNode/UnknownArrayNode sealed pair into a
single ArrayNode record: with EncodingId sealed, the Known/Unknown
bit lives in the id type (WellKnown vs Custom), so the parallel
node hierarchy and its switch dances at every dispatch site encode
nothing. ReadRegistry returns to the single string-keyed map hit;
the allowUnknown passthrough and error messages are unchanged for
both former variants. The redundant ArrayNode.of factory is gone —
the canonical constructor takes the typed id.

Hardening from review: a crafted file with a blank encoding id in
the spec table now fails as VortexException in FlatSegmentDecoder
(EncodingId.parse rejects blank with IllegalArgumentException,
which must not escape the untrusted read path — previously "" flowed
to the unknown-id path and could pass through allowUnknown);
WriteRegistry's builder TreeMap orders by wire string instead of
natural ordering, which a Custom-keyed registration would break.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The EncodingId javadoc examples used "vortex.flat", which lives in
the footer's layoutSpecs table (Layout.FLAT), never in arraySpecs;
the canonical flat encoding id is "vortex.primitive".

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@dfa1 dfa1 merged commit 3c412b4 into main Jul 4, 2026
6 checks passed
@dfa1 dfa1 deleted the sealed-encoding-id branch July 4, 2026 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant