Releases: ajaysurya1221/dorian
dorian v1.1.1 — golden path now blocks broken promises
dorian v1.1.1 — the golden path now blocks broken promises
A small, focused follow-up to v1.1.0. No breaking changes — this changes a single
scaffold default; the warrant format, checker grammar, exit codes, and trust semantics are unchanged.
What changed
dorian init's starter claim is now load-bearing. In v1.1.0 the scaffolded starter claim sealed
as non-load-bearing, so when a later change broke it the warrant folded to DEGRADED (exit 3) —
and the scaffolded GitHub Action (fail_on: revoked) does not block on DEGRADED. So a first-time
user following the golden path would break the promise and watch it silently ship. The starter is
now load-bearing: breaking it folds to REVOKED (exit 4) and the default Action blocks the PR.
Why it matters
dorian's whole pitch is "broken promises do not silently ship." The golden path (dorian init) is the
first thing a new user runs, and it should demonstrate the gate — not degrade quietly under the
default Action. This release makes the out-of-the-box experience match the promise. It also resolves an
internal inconsistency: the scaffolded change note already called these "load-bearing facts."
This was caught by the v1.1.0 public-install smoke test (install from PyPI → init → verify → break
→ revalidate), which observed WARRANTED → DEGRADED (exit 3) where the product promise calls for a
block.
Install
pip install -U dorian-vwp # 1.1.1
dorian --versionQuickstart (the golden path, now blocking)
cd your-repo
dorian init # claims.json + change note + workflow
dorian verify dorian-change-note.md --claims claims.json # seals green — exit 0
# ...later, a change breaks the sealed fact...
dorian revalidate --since <base> # REVOKED, exit 4 — the Action blocks the PRGitHub Action
Unchanged. The scaffolded workflow pins ajaysurya1221/dorian/action@v1.1.1 and uses the default
fail_on: revoked, which now blocks when the starter claim is broken.
Security notes
No change to the security boundary. dorian init still writes files only — it never runs a checker,
never executes code, never writes outside the repo root, and never overwrites without --force. The
starter claim remains a read-only C3 checker (config-value:/path:). --deny-exec/--deny-shell
and checker_trust: base are fail-closed policies, not a sandbox.
Known limitations
- dorian verifies explicit, checkable claims — not arbitrary correctness; it is not a sandbox.
- One documented, reproduced real cross-PR catch (
encode/httpx) — not broad real-world validation. - A load-bearing starter means that if you keep the example claim and legitimately change the fact it
pins (e.g. rename your package), the next PR is REVOKED until you update the claim or supersede the
warrant — which is the intended "you changed a sealed fact, acknowledge it" behavior. Replace the
starter with your real load-bearing facts.
Tests / gates
Full suite green at 1.1.1 (the 1.1.0 suite plus 2 new dorian init tests pinning the load-bearing
starter and the end-to-end break → REVOKED → exit 4), ruff clean, wheel/sdist build + twine check
pass, and a public-install smoke test from PyPI.
Upgrade notes
pip install -U dorian-vwp. Already-sealed warrants are unaffected. If you want a previously
scaffolded starter claim to block, set its load_bearing to true in claims.json and re-seal with
dorian verify, or re-run dorian init --force to regenerate the scaffold.
dorian v1.1.0 — CI truth gate: dorian init + clearer PR comment
dorian 1.1.0
A productization release that makes the first run easy. No breaking changes: the warrant
format, checker grammar, exit codes, and trust semantics are unchanged. 1.1.0 adds the missing
golden-path onboarding command, makes the PR-comment output customer-readable, and cleans up a
packaging-hygiene defect — it changes a command surface and output formatting, not verification.
What's new
-
dorian init(new command) — first-run scaffolding so a new user reaches a sealed warrant in
minutes instead of hand-writing JSON. It writes three files:- a born-verifiable starter
claims.json(aconfig-value:claim about the pyproject package
name when available — the same checker family that caughtencode/httpx#3592 — otherwise a
path:existence claim about a file that is present, so the firstdorian verifyseals green, not red); dorian-change-note.md, the change note those claims back;.github/workflows/dorian.yml, the GitHub Action workflow, pinned to this package's version.
It writes files only — it never runs a checker, never executes code, never writes outside the
repo root, and never overwrites an existing file without--force(re-running is idempotent).
Supports--dry-run(print the plan, write nothing) and the global--json(machine-readable
summary). The scaffolded starter checker is always a read-only C3 family, never an executable
C4/C5 — safe by default. - a born-verifiable starter
-
Customer-readable PR comment —
dorian revalidate --format md(the body the GitHub Action
posts) now leads with an explicitStatus:Blocked / Passed / Errored verdict, an aggregate
trust-change counts table (how many touched warrants this change moved to REVOKED / DEGRADED /
TRUSTED / UNKNOWN), asealed in <artifact>.warrantline under each affected artifact, and a
verdict-keyedWhat to do:remediation line. The existing per-claim verdict table, fold
transitions, recall section, and stats footer are unchanged; the comment stays deterministic (no
timestamps, no absolute paths) and keeps the 160-char content-carryover bound on every detail cell.
Packaging hygiene
- Guards against editor/file-sync duplicate artifacts. A
.gitignorerule and a Hatch build
excludenow keep stray… 2.pysync-duplicate files (the kind macOS / cloud sync leaves behind,
e.g.suggestclaims 2.py) from ever being tracked or packaged into a wheel — even from a dirty
working tree. To be precise: these files were never tracked in git, so they were never in a
CI-built wheel or on PyPI; the guards additionally make a localuv buildfrom a dirty tree
provably clean (33 modules, no space-named files).
Tests & gates
Full suite green at 1.1.0 (883 tracked tests: the 1.0.2 suite plus 8 new dorian init tests, and
new assertions pinning the enhanced PR-comment output), ruff clean, wheel/sdist build +
twine check pass. The new dorian init golden path is covered end to end (init → verify
exits 0 and writes a warrant — a tool whose pitch is "don't ship false claims" must not ship a
false scaffold).
The reproducible benchmark suites are not re-run here: 1.1.0 adds a command, output formatting,
and a packaging cleanup, none of which touch the checker, binding, or fold code the suites measure,
so the recorded figures stand unchanged (last executed at 1.0.2; see
docs/BENCHMARK_CURRENT.md).
Honest scope (unchanged)
dorian has one documented, reproduced real cross-PR catch on frozen public SHAs (encode/httpx
requires-python floor; see docs/REAL_CATCH_LOG.md) — not broad
real-world validation. The benchmark suites are reproducibility evidence on frozen fixtures only.
--deny-exec/--deny-shell are fail-closed policies, not sandboxes; checker_trust: base is a
checker-source trust root, not a sandbox. dorian init and suggest-claims scaffold starter claims
for review — existence/value checks, not behavior (a gutted body keeps a symbol: claim green). A
warrant id is content-addressed and tamper-evident, but its body includes the seal timestamp, so a
fresh seal yields a different id — what reproduces is the outcome, not the id.
Install
pip install dorian-vwp # 1.1.0 on PyPI
dorian init # scaffold a starter setup
dorian verify dorian-change-note.md --claims claims.json # seal the warrant — exit 0dorian v1.0.2
dorian 1.0.2
An announcement-readiness hotfix on top of 1.0.1. No breaking changes: the warrant format,
checker grammar, exit codes, and trust semantics are unchanged. The point of this release is
public-facing coherence — one version across PyPI, README, the Action docs, and the GitHub
release — plus two edge-case bug fixes, a real SCA-scope fix, and CI credential hardening.
It resolves the post-1.0.1 use-and-see validation findings (Codex GPT-5.5 HOTFIX_BEFORE_ANNOUNCE).
Why this release exists
1.0.1 was real on GitHub but pip install dorian-vwp still served 1.0.0, while the README
documented 1.0.1-only commands (suggest-claims, export --in-toto). 1.0.2 is published to
PyPI so the documented install path and command surface agree with the package a new user gets.
Public-trust fixes
- PyPI coherence (FINDING-01) — 1.0.2 is published to PyPI via the Trusted-Publisher
workflow, sopip install dorian-vwpprovides the documented command surface. README,
release docs, and install examples point to one coherent version. - Immutable Action ref (FINDING-02) — the README Getting-Started snippet pinned
dorian/action@main(a moving target). It and the action docs now use@v1.0.2. A new
version-sync guard fails CI ifdorian/action@mainreappears in public copy-paste. - SCA audits the project, not the tool (FINDING-03) —
security.ymlranuvx pip-audit,
which audited pip-audit's own isolated environment, not dorian's dependencies. It now
exports the resolved project set (uv export --all-extras --dev --no-emit-project) and audits
that, with a step that asserts the project deps (duckdb/anthropic/pytest) are actually present. - Checkout credential hardening (FINDING-04) — every
actions/checkoutstep now sets
persist-credentials: false; none of these workflows perform authenticated git operations
(the release/publish lanes mint short-lived OIDC tokens, not git credentials). - Release-gate determinism — the release-gate test job disables the uv cache
(enable-cache: false) so release validation runs from a clean resolve.
Bug fixes
exportof an artifact literally named*.warrant(FINDING-05) —dorian export
unconditionally stripped a.warrantsuffix, sodorian export foo.warrantlooked for the
wrong sidecar and failed. It now prefers reading the input as the artifact (sofoo.warrant
exports its ownfoo.warrant.warrantsidecar) and only treats a.warrant-suffixed input as
a sidecar path when the artifact has no sidecar of its own. Regression-tested.suggest-claimson PEP 263 (non-UTF8) Python (FINDING-06) — the file was read with a
hardcoded UTF-8 decode, so valid Python declaring e.g.# -*- coding: latin-1 -*-was
rejected. It now parses the file's bytes so the encoding cookie is honored; an unknown/declared-
wrong codec surfaces as a clear usage error, not a traceback. Regression-tested.symbol_indexnon-git robustness —pyproject_script_definersreachedgit ls-files
unguarded, so with a precomputed definer map plus a[project.scripts]table a non-git
checkout raisedGitErrorinstead of degrading to{}(breaking the documented "non-git
yields {}" contract ofclaim_symbol_watch_paths). The git call is now guarded; the symbol
binding is preserved, only the git-dependent script-target resolution degrades. Regression-tested.
Docs / guards
- Attestation-interop example (FINDING-07) — the in-toto example pinned a fixed
"dorianVersion": "1.0.0"; it is now version-neutral, with a guard against a hardcoded value. - Stronger determinism test (FINDING-08) — the in-toto determinism test now asserts both
CLI invocations succeed before comparing output bytes. - Stale-wording guard (FINDING-09) — the version-sync guard now also catches "until the PyPI
release" (the narrower variant that slipped past the "first PyPI release" family).
Tests & gates
Full suite green at 1.0.2 (874 tracked tests, +5 new regression tests for the fixes above), ruff
clean, bandit clean, project-scope pip-audit clean, wheel/sdist build + twine check pass.
Both reproducible benchmark suites were re-run at 1.0.2 and reproduce the 1.0.1 figures
exactly (large-mutation P=R=0.93; binding-lifecycle to the same content-derived run_id), so
the hotfix touches no checker numeric behavior. The documented encode/httpx real catch was
independently reproduced on this build (verify exit 0 → revalidate exit 4 → REVOKED,
httpx-python-floor-38 BROKEN).
Honest scope (unchanged)
dorian has one documented, reproduced real cross-PR catch on frozen public SHAs — not broad
real-world validation. The benchmark suites are reproducibility evidence on frozen fixtures only.
--deny-exec/--deny-shell are fail-closed policies, not sandboxes; checker_trust: base
is a checker-source trust root, not a sandbox. suggest-claims checks existence/value, not
behavior (a gutted body keeps a symbol: claim green); the in-toto export is experimental and
not a registered in-toto predicate. A warrant id is content-addressed and tamper-evident, but
its body includes the seal timestamp, so a fresh seal yields a different id — what reproduces is
the outcome, not the id.
Install
pip install dorian-vwp # 1.0.2 on PyPIdorian v1.0.1
dorian 1.0.1
A hardening, DX, and interop patch on top of 1.0.0. No breaking changes; the warrant format,
checker grammar, exit codes, and trust semantics are unchanged. The headline addition is the
first documented, reproducible cross-PR catch on a public repo.
Proof
docs/REAL_CATCH_LOG.md— one documented catch onencode/httpx
(BSD-3): a load-bearing claim sealed whenrequires-pythonwas">=3.8"was flipped
WARRANTED → REVOKED(exit 4) by a real later upstream PR (#3592,
"Drop Python 3.8 support") while httpx's own test suite stayed green and no stateless per-PR
review would have re-opened the original claim. From-scratch reproduction included. This is
one documented catch with honest scope, not a validation claim.
Security
- C4 hardening: a
pytest:checker nodeid whose file part is empty or starts with-
(e.g.pytest:-pevil,pytest:--collect-only) is now rejected asERROR(bad_program)
before any subprocess spawns — it can no longer reach pytest as an option. Red/green tested. - C5 sqlite reconcile timeout: a pathological reconcile query (e.g. an infinite recursive
CTE the read-only authorizer permits) is now bounded by a per-query wall-clock deadline and
returnsERROR(query_timeout)instead of hanging the process — closing a DoS that survived
--deny-exec(typed C5 reads are deliberately not exec-gated). Red/green tested. - Supply chain: every third-party GitHub Action is pinned to an immutable commit SHA (each
verified viagit ls-remote); a newsecurity.ymlrunspip-audit(SCA) andbandit
(SAST), and Dependabot keeps the pins and deps fresh. bandit excludes only dorian's
documented, policy-gated execution primitives, with a reason per check.
Performance
dorian verifynow builds the whole-repo Python-symbol and config-key indexes once per
run instead of 2×/3×; output is byte-identical (pinned by a call-count spy + the existing
watch/read-set assertions).
Features (additive, opt-in)
dorian suggest-claims <file.py>— a deterministic, zero-model C3 counterpart to
suggest-data-checks. Proposessymbol:claims for non-private defs/classes andpy-const:
claims for literal module constants, runs each, and emits only the passing ones, so the
{"claims": [...]}fragment seals unmodified.load_bearingdefaults to false; ambiguous
symbols are skipped. Scaffolding for review (existence/value, not behavior) — see
docs/design/SUGGEST_CLAIMS.md.dorian export --in-toto <artifact>— project a sealed.warrantinto an experimental
in-totoClaimVerificationStatement (deterministic, no signing, no network, zero deps).
Experimental interop — seedocs/ATTESTATION_INTEROP.md.
Docs / DX
- The runnable "Try it in 30 seconds" demo is promoted above the fold and the Demo badge points
at it; the illustrative/loginstory is clearly labeled. - New:
docs/WRITING_GOOD_CLAIMS.md(worked good/bad claim pairs + the gutted-body ceiling),
docs/SECURITY_AND_SAFE_RUNNERS.md(one safe public-fork recipe), a sharpened
docs/USE_WITH_CLAUDE_CODE.md, and the public benchmark protocol reconciled with what shipped.
Honest scope (unchanged from 1.0.0)
The public benchmark is reproducibility evidence on frozen SHAs only, not general real-world
validation. Trigger and truth layers are reported separately, and ERROR is not BROKEN.
--deny-exec/--deny-shell are fail-closed policies, not sandboxes; checker_trust: base
is a checker-source trust root, not a sandbox. suggest-claims checks existence/value, not
behavior (a gutted body keeps a symbol: claim green); the in-toto export is experimental.
A warrant id is content-addressed and tamper-evident, but its body includes the seal
timestamp, so a fresh seal yields a different id — what reproduces is the outcome, not the id.
Install
pip install dorian-vwpPyPI publishing is a separate step and is not performed by this GitHub Release; pip will
serve 1.0.0 until 1.0.1 is published to PyPI via the Trusted Publisher workflow.
dorian 1.0.0
dorian 1.0.0 includes deterministic, token-free claim revalidation for trusted repositories; structural checkers including py-signature:, py-const:, code:, and config-value:; a public micro-benchmark with machine-derived structural claims reproduced on named repositories pinned at frozen SHAs; and release provenance through GitHub artifact attestations.
The public benchmark is reproducibility evidence on frozen SHAs only, not a general real-world validation claim. Trigger and truth layers are reported separately, and ERROR is not BROKEN. --deny-exec and --deny-shell are fail-closed policies, not sandboxes. trusted-base is a checker-source trust root, not a sandbox.
Artifacts from the successful release gate (build + 3.11/3.12/3.13 test matrix + SHA-256 + Sigstore build-provenance attestation):
dorian_vwp-1.0.0-py3-none-any.whldorian_vwp-1.0.0.tar.gzSHA256SUMS
PyPI publishing is separate and is not performed by the GitHub Release step.
dorian 1.0.0rc1
dorian 1.0.0rc1 — V1 release candidate
Prerelease. A release candidate, not final 1.0.0. dorian is a local-first,
deterministic, token-free verifier of the claims a change makes about its sources.
This RC lands the V1 strengthening program (research-report driven), independently
audited before tagging. All additions are additive and backward-compatible; default
behavior is unchanged unless you opt in.
Highlights
- Python structural checkers —
py-signature:andpy-const:(C3 subgrammars, AST-based)
close thesymbol:existence ceiling and thestring:/regex:comment-survival false-pass
for Python signatures and constants.py-const:compares value and type (30 != 30.0,
1 != True). - Semantic-context search —
code:runs a regex over comment/docstring-stripped Python. - Checker-strength / claim-risk diagnostics —
dorian bindings(human + JSON) classifies
each checker's truth strength and flags kind-vs-strength adequacy mismatches; advisory only. - Multi-index binding — config keys in tracked
.toml/.jsonwiden re-check triggers
(TOML/JSON only; YAML excluded to keep zero runtime deps), with provenance and ambiguity skip. - Trusted-base checker-source mode —
revalidate --checker-source base/ Action
checker_trust: base: runs only base-approved checker specs for public/fork PRs. dorian bench warrant-quality— offline per-claim mutation scoring (trigger vs verdict).
Security
checker_trust: baseis a checker-source trust root, not a sandbox: a base-approved
pytest:checker can still execute PR-head code — for untrusted forks pair it with
deny_exec: true(or external isolation).--deny-exec/--deny-shellare fail-closed, not
sandboxes. Nopull_request_target; no secrets required or exposed.- The trusted-base exploit matrix (
tests/test_trusted_base.py, 10 cases) proves PR-added /
PR-modified executable checkers never execute (sentinel-verified) and a missing/tampered base
sidecar fails closed (ERRORED, never BROKEN, never green).
Benchmark scope
Synthetic-suite reproducibility, not broad real-world validation. Numbers
(docs/BENCHMARK_CURRENT.md, measured at commit 33e9eaf): large-mutation 240 pairs P=R=0.93
(11.6× / 10.4× false-positive reduction vs file watchers); binding-lifecycle 808 pairs,
selection recall 0.54 → 1.00, alarm precision/recall 1.00, 0 errored; realworld 5 cases
(2 solved / 1 partial / 2 not_solved). Binding improves selection; it does not prove
semantic behavior (the gutted-body ceiling is shown, not solved). Historical v0.7.0 / 0.9.0
docs are preserved and labeled historical.
Remaining non-goals (post-V1, why this is an RC not final 1.0.0)
Real-repo public micro-benchmark (protocol-only); declarative/route/SQL binding indices;
YAML config binding; audit-event/state single-transaction atomicity; --extract stays
draft/experimental. See docs/V1_SCOPE.md.
Verification (release commit 24ae7c8)
uv run pytest→ 735 passed (incl. slow: wheel build, real pytest subprocess, regex timeout)uv run ruff check/ruff format --check→ cleanuv build+ clean-venv install →dorian 1.0.0rc1- benchmarks re-run identical; trusted-base exploit matrix passes
- independently re-audited (6 read-only auditor lenses): 2 release-blocking doc-drift issues
and several should-fixes found and repaired before tagging.
Invariants preserved: ERROR is never BROKEN; checkers are read-only (except C4/C5-shell);
binding selects re-check candidates only; zero runtime dependencies.
v0.11.0 — security hardening: deny-exec + C3 regex ReDoS backstop
Security-hardening release. Opt-in, fail-closed controls over the executable checker families, a real fix for catastrophic-regex stalls, and honest security/validation docs — all backward-compatible (trusted/internal repos are unchanged).
Highlights
- deny-exec / deny-shell execution policy —
--deny-exec/--deny-shell(envDORIAN_DENY_EXEC/DORIAN_DENY_SHELL) onseal,verify,revalidate, andrebind. The executable families (C4pytest:, C5shell:) ERROR instead of running, gated at the singlerun_checkerchoke point. A blocked claim never seals (born-verifiable) and never silently passes revalidate (ERRORED, never VERIFIED/BROKEN). Fail-closed; not a sandbox. - C3 regex ReDoS backstop — the match runs in a spawned worker killed at
spec.timeout_s, so catastrophic backtracking ERRORs (regex_timeout) instead of stalling. No new core runtime dependency. - Drift guards —
test_version_sync(pyproject ==__init__== CLI) andtest_cli_docs_sync(every README command resolves). - Honesty & onboarding docs —
SECURITY.md,docs/SECURITY_BOUNDARY.md, validation-honesty / release-checklist / dependency / benchmark-reproducibility / shadow-pilot docs, 6 issue templates, a manual OIDC PyPI publish workflow, and a roadmap backlog with an explicit "do not build" list.
Adversarial audit
A five-lens review caught a real escape: dorian rebind re-runs checkers but did not receive the policy and had no flag, so it executed code under DORIAN_DENY_EXEC=1. Fixed, with a red-green-verified regression test.
Caveat
deny-exec removes code execution but not the self-attested-verdict problem; the public-fork-PR story remains the deferred trusted-base Action mode (designed, not built). dorian is for trusted/internal repositories, or --deny-exec everywhere else.
Verification
CI green on Python 3.11 / 3.12 / 3.13; 636 tests pass. Core runtime dependencies: none.
v0.10.0 — opt-in weak-binding gate
A pre-release adding an opt-in, seal-time review gate plus Claude Code onboarding. No change to default behavior, trust/claim state, the sidecar schema, fold policy, checker grammar, dependencies, or the GitHub Action.
New: --binding-gate off | warn | fail
On dorian verify and dorian seal (default off):
warn— seal, then print weak-binding diagnostics (exit 0).fail— refuse the seal before writing any sidecar (atomic no-write, exit 4) when a claim carries a high-risk weak-binding flag:unbacked,short-literal,ambiguous-mention,trigger-only-symbol,unwatched-mention.single-fileis warn-only — the expected shape of an honest one-checker C3path:/symbol:/regex:claim.
Weak binding is a false-confidence smell: the gate never marks a claim false and never touches trust or claim state. dorian bindings stays a pure linter.
Also in this release
- Claude Code onboarding:
docs/USE_WITH_CLAUDE_CODE.md+ a runnableexamples/claude-code/pack. docs/PUBLIC_BENCHMARK_PROTOCOL.md(pre-registered, protocol-only) anddocs/TRUSTED_BASE_ACTION_DESIGN.md(design only, not implemented).docs/START_HERE.mdnavigation index.
CI green on Python 3.11 / 3.12 / 3.13. Not published to PyPI.
🤖 Generated with Claude Code
v0.9.1 — evidence wording polish
A patch release — no product behavior change. It makes the v0.9.0 evidence harder to misread.
- README badge/version updated: the stale hardcoded
v0.8badge is replaced with a dynamic GitHub-release badge (reads the latest release, so it can't drift again). - "proof" softened to "evidence": the throwaway self-host demo line now reads "evidence that the mechanism can catch this kind of checked break on real code," preserving the caveat that it was a throwaway demo, not a committed artifact or benchmark figure.
- "5 limitations closed" clarified: v0.9.0 closed five binding-hardening issues, not all of dorian's strategic limitations. These remain open and visible: ambiguous symbols are skipped unless disambiguated; Python-only definition indexing; revalidate stays sidecar/store-driven; auto-binding is verify-only; trigger coverage is not behavior proof; real-world reproductions are scoped, not universal validation.
- Benchmark counts were re-checked against the machine outputs (summary JSON / generated doc / README) and agree (808 pairs · 63 domains · 122 artifacts · 122 claims · 408 mutations; checker_path recall 0.54 → bound 1.00; verdict precision 1.00, zero false BROKEN). The full benchmark was not rerun — this patch changed docs/version only — so the historical run provenance honestly stays
dorian 0.9.0. - Real-world scope clarified: a cited discovery catalog (12 public problems across 5 categories) was added to the protocol so the 5 report cases (3 hermetic reproductions + 2 documented) read as a selection, not a cherry-pick.
- Trigger-vs-truth framing preserved: binding widens the re-check trigger; the checker still decides truth; a watched file changing never makes a claim BROKEN by itself.
- Added a docs-polish guard test; full gate green (582 passed), ruff clean. The
v0.9.0tag was not modified.
v0.9.0 — binding hardening + trigger-vs-truth evidence
Symbol-binding correctness + honest evidence for it. Binding widens when a claim is re-checked; the checker still decides truth — a watched file changing never makes a claim BROKEN by itself.
Highlights
- Symbol→defining-file binding (5 limitations closed) + 3 TDD-hardened precision nits (C4 nodeid whitespace parity; backticked common-word over-binding guard; ambiguous pyproject-script target rejection).
- Binding-lifecycle benchmark — 808 known-truth (artifact, mutation) pairs over 63 domains, scored in two layers:
- selection (re-check trigger) recall 0.54 → 1.00 vs a pre-binding checker-path watcher, at 1.00 precision (vs 0.92 for the rejected "any file with the token" shortcut) — the false-TRUSTED trigger reduction.
- verdict (BROKEN) precision 1.00, zero false BROKEN; ERRORED reported separately, never an alarm.
- the gutted-body ceiling is shown, not solved: an existence checker fires the trigger but yields 0 BROKEN; only a behavior checker catches it.
dorian bench binding-lifecycle·docs/BENCHMARK_BINDING_LIFECYCLE.md
- Offline public-case reproductions of still-open problem classes — solved 2 / partial 1 / not_solved 2; labels derived from dorian's actual behavior.
dorian bench realworld-usecases·docs/REALWORLD_USECASES.md - README + roadmap refreshed; CodeRabbit review (1 critical, 1 major, 2 minor) addressed; a CI rmtree-race in the bench teardown fixed.
In-fixture, synthetic results — a reproducible demonstration of the mechanism on these suites, not a claim about any real repository. Full gate green; matrix CI 3.11/3.12/3.13 + CodeRabbit pass.