Skip to content

Releases: aestheticfunction/dspack-gen

M2 — eval harness + findings (hosted path blocked; ADR-D1 measured)

Choose a tag to compare

@ryandmonk ryandmonk released this 03 Jul 16:54
b312d50

Milestone 2: the eval harness, its findings, and the hardening the live runs forced.

  • Eval harness (PR-10): matrix runner over the same runPipeline as the CLI/demo; per-cell metrics with the two failure classes never blended; ADR-7 repair-template variant; deterministic fake-matrix CI gate; --resume; per-run crash containment with explicit error outcomes.
  • Hardening bought with real failures: typed adapter transport errors (#15), emitter refusal as failed-gate (#16, found live by containment), Ollama timeouts raised past undici's 300s ceiling (#17).
  • Findings (docs/findings.md, evidence: docs/evidence/2026-07-03-eval-rerun/, 216 audit reports): a two-local-model study — the hosted column produced zero governance observations (all 72 runs pre-generation 400: compiled grammar too large; the hosted twin of the S0 mlx finding; matched-schema design filed as #20). Local (n=72 each, 0 end-to-end passes): the ADR-D1 projection gap is the entire gate-failure story (60/60 cross-local-model, ordinary prompts) → v0.4 requiredProps headline candidate; repair-shape reading confounded (deconfound owed); the take-2 n=1 repair did not replicate (0/6).
  • ds-mcp gained get-generation-context/validate-ui (PR-11) via this repo's zero-network core, git-pinned.
  • M3 backlog: #19 (pre-generation failed-adapter classification), #20 (matched-schema hosted design), plus the plan's filed M3 agenda.

Companion M2 tags: ds-mcp@m2, dspack@m2, dspack-emit@m2.

M1 — the governed-generation pipeline (vertical slice)

Choose a tag to compare

@ryandmonk ryandmonk released this 02 Jul 22:12
1073c27

Milestone 1 of the AF governance layer: the thesis-proving vertical slice, end to end and machine-gated.

  • Pipeline: contract → compiled context (src/core/compiler.ts) → schema-constrained generation (Ollama structured outputs / Anthropic; model ids are configuration) → surface gates S1/S2/S3 (src/core/lint/) → bounded repair (ADR-7: one findings object, two serializations) → A2UI emission via the pinned emitter → gates A1–A3audit report v1 (schemas/audit-report.v1.schema.json, guarantees in docs/AUDIT.md).
  • Flagship gate: npm run demo:e2e (Playwright, deterministic scripted adapter) asserts the full on-screen trail: violation with verbatim rationale → exact repair message → verified-clean attempt → A1–A3 green → rendered AlertDialog → schema-valid downloaded report.
  • Evidence (the write-up's raw material): docs/spike-structured-outputs.md (S0 + model refresh; the engine-scoped MLX silent-format finding with minimal repro) and docs/evidence/2026-07-02-flagship-candidates{,-post-example-link}/ — 22 audit reports incl. the clean-negative example-link A/B. Public framing rule: failures are named with the report's own vocabulary (failed-lint-exhausted).
  • Open M2 threads: #7 (contract-copy drift prevention), the json-render emitter (PR-9), eval harness (PR-10, incl. repair-shape matrix design input), ds-mcp tools (PR-11).

Companion M1 tags: dspack@m1, dspack-to-a2ui@m1. The governing plan document (revision records 1–3) is maintained by the project owner.