Skip to content

Versioned alignment-layer JSON contract + thin CLI/service wrapper (adapter surface) #19

Description

@bsesic

Summary

Define a versioned alignment-layer JSON contract and ship a thin CLI / service wrapper
so TRACE can be consumed as a swappable adapter by an external platform (e.g. a postgres /
Django / Node scholarly-editions platform à la qumran-digital) without importing the Python
package. Smallest, highest-leverage item for the platform integration use case.

Motivation

TRACE round-trips its own Pydantic result models to JSON, but it is library-only — no CLI, no
service surface. The target platform architecture treats each processing step (OCR, chunking,
alignment, reuse) as an adapter behind a stable, versioned JSON contract over an immutable,
character-addressable base text, with provenance (which adapter + version produced each layer)
as a first-class field. For TRACE to be the alignment adapter, two things are needed:

  1. a published, versioned alignment-layer schema (standoff: references to base-text
    offsets, matches with reason tags, witness/segment identifiers, trace_version +
    language_pack_version) — decoupled from internal model churn; and
  2. a thin invocation surface (CLI reading stdin/files, emitting the contract JSON) so a
    non-Python platform can shell out or call a small service.

Scope

  • Schema: a documented JSON Schema for the alignment layer (pairwise and multi-witness),
    explicitly standoff (offsets into a referenced base text, not embedded text), carrying
    per-match reason tags and full provenance/version stamps. Version the schema independently of
    the package version.
  • CLI: tracealign console entry point — at minimum align and align-multi subcommands
    reading token/text input and emitting contract JSON; --lang, config flags, --version.
  • Round-trip + validation: loaders that validate input against the schema and fail loudly on
    drift.
  • Out of scope: a long-running HTTP server (note it as a possible follow-up; CLI is enough to be
    a subprocess adapter).

Acceptance criteria

  • Published JSON Schema file in-repo, versioned, with docs and at least one golden example per layer.
  • tracealign CLI entry point in pyproject.toml; tracealign align ... and align-multi ... emit schema-valid JSON.
  • Output is standoff (offsets reference a base text; text not duplicated into the alignment layer) and carries adapter/version provenance.
  • Schema-validation tests; CLI smoke tests; existing JSON round-trip still passes.
  • TDD; suite green on 3.10/3.11/3.12; flake8 clean.

Roadmap: enabling infrastructure for downstream platform integration (qumran-digital-style);
unblocks TRACE-as-adapter independent of later linguistic stages.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions