Summary
Define a versioned alignment-layer JSON contract and ship a thin CLI / service wrapper
so TRACE can be consumed as a swappable adapter by an external platform (e.g. a postgres /
Django / Node scholarly-editions platform à la qumran-digital) without importing the Python
package. Smallest, highest-leverage item for the platform integration use case.
Motivation
TRACE round-trips its own Pydantic result models to JSON, but it is library-only — no CLI, no
service surface. The target platform architecture treats each processing step (OCR, chunking,
alignment, reuse) as an adapter behind a stable, versioned JSON contract over an immutable,
character-addressable base text, with provenance (which adapter + version produced each layer)
as a first-class field. For TRACE to be the alignment adapter, two things are needed:
- a published, versioned alignment-layer schema (standoff: references to base-text
offsets, matches with reason tags, witness/segment identifiers, trace_version +
language_pack_version) — decoupled from internal model churn; and
- a thin invocation surface (CLI reading stdin/files, emitting the contract JSON) so a
non-Python platform can shell out or call a small service.
Scope
- Schema: a documented JSON Schema for the alignment layer (pairwise and multi-witness),
explicitly standoff (offsets into a referenced base text, not embedded text), carrying
per-match reason tags and full provenance/version stamps. Version the schema independently of
the package version.
- CLI:
tracealign console entry point — at minimum align and align-multi subcommands
reading token/text input and emitting contract JSON; --lang, config flags, --version.
- Round-trip + validation: loaders that validate input against the schema and fail loudly on
drift.
- Out of scope: a long-running HTTP server (note it as a possible follow-up; CLI is enough to be
a subprocess adapter).
Acceptance criteria
Roadmap: enabling infrastructure for downstream platform integration (qumran-digital-style);
unblocks TRACE-as-adapter independent of later linguistic stages.
Summary
Define a versioned alignment-layer JSON contract and ship a thin CLI / service wrapper
so TRACE can be consumed as a swappable adapter by an external platform (e.g. a postgres /
Django / Node scholarly-editions platform à la qumran-digital) without importing the Python
package. Smallest, highest-leverage item for the platform integration use case.
Motivation
TRACE round-trips its own Pydantic result models to JSON, but it is library-only — no CLI, no
service surface. The target platform architecture treats each processing step (OCR, chunking,
alignment, reuse) as an adapter behind a stable, versioned JSON contract over an immutable,
character-addressable base text, with provenance (which adapter + version produced each layer)
as a first-class field. For TRACE to be the alignment adapter, two things are needed:
offsets, matches with reason tags, witness/segment identifiers,
trace_version+language_pack_version) — decoupled from internal model churn; andnon-Python platform can shell out or call a small service.
Scope
explicitly standoff (offsets into a referenced base text, not embedded text), carrying
per-match reason tags and full provenance/version stamps. Version the schema independently of
the package version.
tracealignconsole entry point — at minimumalignandalign-multisubcommandsreading token/text input and emitting contract JSON;
--lang, config flags,--version.drift.
a subprocess adapter).
Acceptance criteria
tracealignCLI entry point inpyproject.toml;tracealign align ...andalign-multi ...emit schema-valid JSON.flake8clean.Roadmap: enabling infrastructure for downstream platform integration (qumran-digital-style);
unblocks TRACE-as-adapter independent of later linguistic stages.