Skip to content
Joshua Shinavier edited this page Jun 4, 2026 · 4 revisions

Design rationale

This page records deliberate design decisions in Hydra — the "why we chose this, and not the obvious alternative" behind aspects of the language that a reader might reasonably second-guess.

It is distinct from Concepts, which describes the model as it is. This page answers why. When a choice here is settled, the relevant Concepts/section links to it rather than repeating the justification inline.

Entries are settled decisions, not open questions. In-flight design exploration belongs in GitHub issues; once a decision is made and stable, summarize the rationale here and link the issue.


Ordered map keys

Hydra requires the key type of a map to be orderable. This is a deliberate design choice, not an inherited limitation: every term in Hydra — including a map term — needs a canonical representation that survives translation into any target language. A total order on keys gives a map a single canonical form (a deterministic key/value sequence) that every host reproduces identically, regardless of how that host implements maps internally.

In practice this excludes almost nothing. Hydra terms have a canonical order for all first-order shapes, so the only key types ruled out are those with no canonical identity at all — chiefly functions, which are excluded for a stronger reason than non-orderability: function equality is undecidable, so a function couldn't serve as a key even in an unordered (hash-based) map.

Two alternatives are sometimes raised — keys ordered by a supplied ordering function (rather than a natural order on the type), or merely-hashable keys paired with a hashing function. Both are perfectly constructible in Hydra today (e.g. a list of key/value pairs plus an ordering or hashing function), but neither is a fundamental constructor in the language, because the natural-order map already provides the canonical, translation-stable representation Hydra needs.

Discussion: issue #230.


Union introduction as inject

The constructor that builds a value of a union type (selecting one variant and supplying its payload) is called inject. This is standard typed-lambda-calculus terminology: inject is the introduction form for a sum type, dual to project (the elimination form for a product/record). Alternatives like variant or construct are sometimes proposed, but inject/project is the established, symmetric pair and is kept deliberately.


Annotations as a separate type and term variant

Hydra makes annotated a distinct variant of Type and Term (a record pairing an inner type/term with an annotation Term/Type carrying its metadata), rather than baking metadata fields into every variant or carrying a parallel metadata map alongside each module.

The annotation field is typed as Term (resp. Type) since #386; the canonical encoding for what host code thinks of as a Map<Name, Term> is a TermMap keyed by TermVariables, bridged by the kernel helpers wrapAnnotationMap and getAnnotationMap. Treating the annotation slot as a Term rather than a fixed Map<Name, Term> lets specialised users substitute a different representation (e.g. a list of pairs, an injection into a richer ADT) without re-shaping every annotated node.

This choice has three load-bearing consequences:

  • Stacking. Multiple annotation layers can be applied to the same inner type or term by nesting annotated nodes; tools that need to reason about a particular annotation key can peel layers from the outside without worrying about which inner type or term they apply to.
  • Small grammar. Every non-annotation variant has exactly the fields its semantic role requires. Adding a new annotation key never modifies the grammar of Type or Term — it just goes into the annotation Term of an enclosing annotated node.
  • Easy traversal of the unannotated core. A traversal that doesn't care about metadata can call a single helper (kernel Annotations.deannotateType / …Term) once at the top and then walk the variants directly. The alternative — fields-on- every-variant — would require every traversal to ignore those fields explicitly, multiplied across every host.

The accepted cost is the "peel through annotations" step that consumers of the annotated form have to perform; in practice this is one helper call per traversal and is mechanical.


Coders bundle encode and decode

A Coder in Hydra pairs an encoder (source → target) with a decoder (target → source) in a single value, rather than letting them be authored as two independent functions. Alternatives are common — many serialization stacks ship an encoder and a decoder as separate modules, sometimes from different authors.

The bundled form is preferred for three reasons:

  • Round-trip identity is a single-source claim. When an author writes a Coder, they are simultaneously asserting the pair is consistent. Round-trip tests are then a property of the Coder rather than a cross-module coincidence: decoding the result of encoding a value should yield the original (modulo a documented lossiness flag).
  • One Coder per format pair. A reader who wants to know how Hydra exchanges with Avro looks at one place, not two. Format-pair-specific invariants (which fields encode where, how nominal types map, what's lossless and what isn't) live together with the code that enforces them.
  • Coder composition is closed. Two Coders composing into a third (Avro ↔ Hydra ↔ Protobuf, for instance) is well-defined because each side is itself bidirectional. A composition of two unidirectional encoders would give you a one-way Avro → Protobuf path with no symmetric way home.

The accepted cost is that authoring a Coder is harder than authoring an encoder alone — the author has to think about the inverse direction even when only one is immediately needed. In practice this cost is paid once per format, not per use.


Adapters as the type-aware layer above Coders

A Coder transforms values between two representations that already share a type schema. An Adapter adds the type-level half: source and target types, a Coder over those types' values, and an isLossy flag. The two frameworks are deliberately kept separate.

The rationale:

  • Value-level transformations don't always need a type-level mapping. Within a single type system, encode/decode is purely value-level; folding type-level fields into every Coder would burden the simpler case.
  • Cross-language coding inherently needs both. Adapting a Hydra record into a Java class involves two different type-level objects bound together with a value-level Coder and a lossiness claim. The Adapter is where both halves meet.
  • Composition has different rules at each level. Coders compose pointwise (Avro ↔ Hydra ↔ Protobuf at the value level). Adapters compose more carefully — both type-level mappings must compose, and the combined lossiness flag is the disjunction of the pair's.

The accepted cost is one more abstraction to learn before writing a cross-language coder.


Coders per target language

Each target language has its own coder, packaged separately: hydra-java, hydra-python, hydra-scala, hydra-lisp, hydra-typescript, hydra-go, hydra-coq, hydra-wasm. There is no global "all targets" coder.

This mirrors the modular package structure already chosen for everything else in Hydra. A change to one coder is bounded to that coder's package and its downstream consumers, not the universe of every coder Hydra knows how to emit. A new target language is added by writing one new coder package, not by extending a central registry. The set of supported target languages can be discovered by listing the coder packages, rather than by inspecting a hardcoded list in a central module.

The per-language packaging also lets Hydra's composability principle operate at the package level: coders compose. The build orchestration in bin/sync.sh is one application — piping Hydra modules through the JSON coder (in hydra-kernel) and then through the per-target coders — but the per-language coder packages are also independently usable by downstream applications that want to compose them with their own coders.


Topological-sort emission

Hydra emits code in topological-sort order over the module-dependency DAG: each target module appears in the output stream only after every module it depends on. This is host-agnostic: Haskell and Java tolerate forward references via the build system, but Lisp dialects, certain Python flavors, and WASM-style emission don't, so topo-sort is the granularity that works in every host Hydra targets.

The downstream consequence is that emission can stream — a host's coder processes one module at a time and never has to look ahead, since every name the current module mentions is either already defined or is a known primitive.


Explicit Either (and Maybe) error handling

Hydra's kernel surfaces failures through explicit return types. Operations that produce a value on success return Either SomeError a; checks that only report a finding return Maybe a, where a is the finding type (e.g. Maybe InvalidTermError to mean "either there is an invalidity to report or there isn't"). The concrete result and error types vary by subsystem. The kernel does not use exceptions, monad-transformer stacks, Result<T, E>-style sugar, or any language-specific error-handling idiom.

The reason is translingual portability. The kernel runs in every host Hydra targets, and the error mechanism has to translate cleanly into all of them with no host-specific runtime support.

  • Exceptions are host-divergent. Each host has its own model (checked vs unchecked, condition restarts in Lisp, panics in Go, etc.) and what catches what depends on the host's call-stack semantics.
  • Monad stacks have no portable representation. A StateT (ExceptT Error) is meaningful in Haskell and unfaithful elsewhere; not every host has Applicative/Monad as a type-level abstraction.
  • Result<T, E> is syntactic sugar over the same sum. Hydra's Either is the language-neutral form; per-host coders render it as Result / Either / Maybe / Option / tagged union as appropriate.

Subsystems that need threaded state — notably inference, via InferenceContext — pass it as an explicit parameter rather than concealing it in a monad. The accepted cost is verbosity at every call site; the gain is a single error-handling pattern that the entire kernel and every coder share.


Module-level dependencies

Each Hydra module carries a dependencies field — the set of other modules it references. Module-level dependency information is used by:

  • Incremental sync. The dirty-set closure walks reverse dependency edges: when a module's source changes, every module that depends on it (transitively) must be re-inferred.
  • Per-host code emission. Each coder consumes the dependency set to generate the host's import / use / require statements at the top of every emitted file.

Both consumers run at sync time. Without persisted module-level dependency information, each consumer would have to scan every module's term and type bodies — an O(n · m) walk per consumer per emission, where n is the number of definitions in the module and m is the average size of a definition's term-or-type tree. With the field, that scan happens once per module and is amortized against every later read.

The same caching argument applies as for inference: term and type analyses are expensive, and we persist the results in the JSON layer rather than recomputing them at every consumer.

Derivation of the field is tracked in #419.

Clone this wiki locally