Skip to content

Codecs and Formats

Petrus Pradella edited this page Jun 27, 2026 · 3 revisions

Codecs & Formats

A Codec is the only component that knows a concrete format. Above it, the dynamic API, comment reconciliation and binding are all format-agnostic.

The four codecs

Codec formatId Extensions Comment fidelity Notes
YamlCodec yaml yml, yaml LOSSLESS block + side + header/footer comments round-trip
JsonCodec json json NONE strict RFC JSON, pretty-printed; comments not emitted
TomlCodec toml toml LOSSLESS [table] sections + # comments; no null
JsoncCodec jsonc jsonc LOSSY JSON with // comments; best-effort positions
Config yaml  = Config.open(Paths.get("a.yml"),   new YamlCodec());
Config json  = Config.open(Paths.get("a.json"),  new JsonCodec());
Config toml  = Config.open(Paths.get("a.toml"),  new TomlCodec());
Config jsonc = Config.open(Paths.get("a.jsonc"), new JsoncCodec());

Comment fidelity

CommentFidelity is a codec capability, not a global setting:

  • LOSSLESS — comments survive a full round-trip (YAML, TOML).
  • LOSSY — best-effort; comments in positions the path-keyed overlay can address survive (JSONC).
  • NONE — the format has no comment syntax that round-trips; comment writes are accepted in memory but not emitted, and the data is never corrupted (JSON).

Code that calls setComment(...) is portable across all four codecs. A NONE codec drops it on write; a LOSSLESS/LOSSY codec round-trips it. The comment overlay lives on Config, independent of the codec.

The codec SPI

A custom format implements Codec (text ⇄ tree + identity) and, to carry comments, CommentAware (the structure emitter + comment parser). ObjectMapperAware exposes the shared mapper for binding.

public interface Codec {
    String formatId();
    String[] fileExtensions();          // lowercase, no dot; first = canonical
    CommentFidelity commentFidelity();
    JsonNode readTree(String text);     // text -> canonical tree (unknown keys survive)
    String writeTreePlain(JsonNode t);  // tree -> text, no structural comments
    <V> V treeToValue(JsonNode n, JavaType t);
    JsonNode valueToTree(Object v);
}

The emitter renders document structure itself and delegates only leaf values to the mapper — it never re-parses the mapper's output, so a custom mapper can restyle a value without breaking layout or comments.

Resolving a codec by file name

CodecRegistry registry = CodecRegistry.defaults();        // JSON, YAML, TOML, JSONC
Codec c = registry.forFile("server.toml");                // -> TomlCodec
Codec y = registry.byExtension("yml");                    // -> YamlCodec
registry.register(new MyCustomCodec());                   // last registration for an extension wins

Resolution is by extension only — it never content-sniffs (a YAML and a JSON document overlap, so guessing is ambiguous). An unknown extension throws CodecException.

Format-specific notes

YAML (LOSSLESS)

Block/side/header/footer comments and key order round-trip. Long scalars stay on one line (split disabled), no leading --- marker. The comment parser is line-based and covers common block-style YAML; exotic constructs (# inside a | block scalar, multi-line flow) are not specially handled.

JSON (NONE)

Strict, pretty-printed RFC JSON. Explicit nulls are kept (they are user data). Comments are never emitted.

TOML (LOSSLESS)

Nested objects become [table.path] sections (a table's own scalars are emitted before its sub-tables, as TOML requires); # comments round-trip. TOML has no null — a null value is omitted on write (see FAQ & Gotchas). A very large integer the TOML reader can't round-trip is stored as a quoted string so it still reads back as the same digits.

JSONC (LOSSY)

JSON plus // (and /* */) comments and trailing commas on read; // block/side comments on write. Comment positions the overlay can't address (e.g. between array elements) are not preserved — hence LOSSY.

→ See also Architecture Overview · FAQ & Gotchas

Clone this wiki locally