Releases: aestheticfunction/dspack-gen
Releases · aestheticfunction/dspack-gen
Release list
M2 — eval harness + findings (hosted path blocked; ADR-D1 measured)
Milestone 2: the eval harness, its findings, and the hardening the live runs forced.
- Eval harness (PR-10): matrix runner over the same
runPipelineas the CLI/demo; per-cell metrics with the two failure classes never blended; ADR-7 repair-template variant; deterministic fake-matrix CI gate;--resume; per-run crash containment with expliciterroroutcomes. - Hardening bought with real failures: typed adapter transport errors (#15), emitter refusal as
failed-gate(#16, found live by containment), Ollama timeouts raised past undici's 300s ceiling (#17). - Findings (docs/findings.md, evidence: docs/evidence/2026-07-03-eval-rerun/, 216 audit reports): a two-local-model study — the hosted column produced zero governance observations (all 72 runs pre-generation
400: compiled grammar too large; the hosted twin of the S0 mlx finding; matched-schema design filed as #20). Local (n=72 each, 0 end-to-end passes): the ADR-D1 projection gap is the entire gate-failure story (60/60 cross-local-model, ordinary prompts) → v0.4requiredPropsheadline candidate; repair-shape reading confounded (deconfound owed); the take-2 n=1 repair did not replicate (0/6). - ds-mcp gained
get-generation-context/validate-ui(PR-11) via this repo's zero-network core, git-pinned. - M3 backlog: #19 (pre-generation failed-adapter classification), #20 (matched-schema hosted design), plus the plan's filed M3 agenda.
Companion M2 tags: ds-mcp@m2, dspack@m2, dspack-emit@m2.
M1 — the governed-generation pipeline (vertical slice)
Milestone 1 of the AF governance layer: the thesis-proving vertical slice, end to end and machine-gated.
- Pipeline: contract → compiled context (
src/core/compiler.ts) → schema-constrained generation (Ollama structured outputs / Anthropic; model ids are configuration) → surface gates S1/S2/S3 (src/core/lint/) → bounded repair (ADR-7: one findings object, two serializations) → A2UI emission via the pinned emitter → gates A1–A3 → audit report v1 (schemas/audit-report.v1.schema.json, guarantees indocs/AUDIT.md). - Flagship gate:
npm run demo:e2e(Playwright, deterministic scripted adapter) asserts the full on-screen trail: violation with verbatim rationale → exact repair message → verified-clean attempt → A1–A3 green → rendered AlertDialog → schema-valid downloaded report. - Evidence (the write-up's raw material):
docs/spike-structured-outputs.md(S0 + model refresh; the engine-scoped MLX silent-formatfinding with minimal repro) anddocs/evidence/2026-07-02-flagship-candidates{,-post-example-link}/— 22 audit reports incl. the clean-negative example-link A/B. Public framing rule: failures are named with the report's own vocabulary (failed-lint-exhausted). - Open M2 threads: #7 (contract-copy drift prevention), the json-render emitter (PR-9), eval harness (PR-10, incl. repair-shape matrix design input), ds-mcp tools (PR-11).
Companion M1 tags: dspack@m1, dspack-to-a2ui@m1. The governing plan document (revision records 1–3) is maintained by the project owner.