Skip to content

Codecs and Formats

Petrus Pradella edited this page Jun 28, 2026 · 3 revisions

Codecs & Formats

A Codec is the only component that knows a concrete format. Above it, the dynamic API, comment reconciliation and binding are all format-agnostic.

The four codecs

Codec formatId Extensions Comment fidelity Notes
YamlCodec yaml yml, yaml LOSSLESS block + side + header/footer comments round-trip
JsonCodec json json NONE strict RFC JSON, pretty-printed; comments not emitted
TomlCodec toml toml LOSSLESS [table] sections + # comments; no null
JsoncCodec jsonc jsonc LOSSY JSON with // comments; best-effort positions
Config yaml  = Config.open(Paths.get("a.yml"),   new YamlCodec());
Config json  = Config.open(Paths.get("a.json"),  new JsonCodec());
Config toml  = Config.open(Paths.get("a.toml"),  new TomlCodec());
Config jsonc = Config.open(Paths.get("a.jsonc"), new JsoncCodec());

Comment fidelity

CommentFidelity is a codec capability, not a global setting:

  • LOSSLESS — comments survive a full round-trip (YAML, TOML).
  • LOSSY — best-effort; comments in positions the path-keyed overlay can address survive (JSONC).
  • NONE — the format has no comment syntax that round-trips; comment writes are accepted in memory but not emitted, and the data is never corrupted (JSON).

Code that calls setComment(...) is portable across all four codecs. A NONE codec drops it on write; a LOSSLESS/LOSSY codec round-trips it. The comment overlay lives on Config, independent of the codec.

The codec SPI

A custom format implements Codec (text ⇄ tree + identity) and, to carry comments, CommentAware (the structure emitter + comment parser). ObjectMapperAware exposes the shared mapper for binding.

public interface Codec {
    String formatId();
    String[] fileExtensions();          // lowercase, no dot; first = canonical
    CommentFidelity commentFidelity();
    default Charset charset() { return StandardCharsets.UTF_8; }  // bytes <-> text for the back-store
    JsonNode readTree(String text);     // text -> canonical tree (unknown keys survive)
    String writeTreePlain(JsonNode t);  // tree -> text, no structural comments
    <V> V treeToValue(JsonNode n, JavaType t);
    JsonNode valueToTree(Object v);
}

The emitter renders document structure itself and delegates only leaf values to the mapper — it never re-parses the mapper's output, so a custom mapper can restyle a value without breaking layout or comments.

Resolving a codec by file name

CodecRegistry registry = CodecRegistry.defaults();        // JSON, YAML, TOML, JSONC
Codec c = registry.forFile("server.toml");                // -> TomlCodec
Codec y = registry.byExtension("yml");                    // -> YamlCodec
registry.register(new MyCustomCodec());                   // last registration for an extension wins

Resolution is by extension only — it never content-sniffs (a YAML and a JSON document overlap, so guessing is ambiguous). An unknown extension throws CodecException.

Changing format at runtime & the in-memory codec

A file-backed Config is not pinned to the codec it opened with:

Config cfg = Config.open("server.yml");   // live codec = YamlCodec

cfg.save(new TomlCodec());                 // one-shot: write TOML to the same file; live codec stays YAML
cfg.changeCodec(new JsonCodec());          // switch the format used by every subsequent save()
cfg.save();                                // now emits JSON

Neither call renames the file. Writing a format the extension doesn't imply is your call — a later extension-inferred Config.open would resolve the wrong codec. Going from a comment-bearing codec to JSON (NONE) drops the comment overlay on the next save.

InMemoryCodec (formatId "memory") backs Config.inMemory(): it carries the storage-safe Jackson mapper, so the full typed/POJO flow works, but it declares no file extensions and CommentFidelity.NONE, so it is never registered and never chosen by file-name inference. Its readTree/writeTreePlain throw — an in-memory config has no text format until you save(realCodec) or changeCodec(realCodec). See Lifecycle, Reload & Watching for Config.inMemory().

Format-specific notes

YAML (LOSSLESS)

Block/side/header/footer comments and key order round-trip. Long scalars stay on one line (split disabled), no leading --- marker. The comment parser is line-based and covers common block-style YAML; exotic constructs (# inside a | block scalar, multi-line flow) are not specially handled.

A per-element block comment on a scalar list item round-trips, addressed by its dotted index list.0, list.2, … — YAML only. (TOML/JSON/JSONC do not carry per-list-item comments.)

config.setValue("tags", Arrays.asList("alpha", "beta"));
config.setComment("tags.0", "the primary tag");   // block comment above the first item
tags:
  # the primary tag
  - alpha
  - beta

Only a scalar sequence carries per-element comments; an object/nested sequence renders whole. A scalar sequence with no element comment is emitted unchanged, so this never alters an existing layout.

JSON (NONE)

Strict, pretty-printed RFC JSON. Explicit nulls are kept (they are user data). Comments are never emitted.

TOML (LOSSLESS)

Nested objects become [table.path] sections (a table's own scalars are emitted before its sub-tables, as TOML requires); # comments round-trip. TOML has no null — a null value is omitted on write (see FAQ & Gotchas). A very large integer the TOML reader can't round-trip is stored as a quoted string so it still reads back as the same digits.

A list whose every element is a non-empty object (e.g. a List<POJO>) emits as idiomatic repeated [[path]] array-of-tables blocks; a scalar, empty, or mixed-element array stays inline. The read side and the canonical tree are identical either way — this is purely a write-time rendering choice.

# a List<Server> renders as array-of-tables
[[servers]]
host = "a.example"
port = 25565

[[servers]]
host = "b.example"
port = 25566

ports = [25565, 25566]   # a scalar array stays inline

JSONC (LOSSY)

JSON plus // (and /* */) comments and trailing commas on read; // block/side comments on write. Comment positions the overlay can't address (e.g. between array elements) are not preserved — hence LOSSY.

→ See also Architecture Overview · FAQ & Gotchas

Clone this wiki locally