forked from lucas-maes/le-wm
-
Notifications
You must be signed in to change notification settings - Fork 0
v0.9 eval/report: run full gate suite and publish claim audit #392
Copy link
Copy link
Closed
Labels
area:evaluationArea: evaluationArea: evaluationarea:releaseArea: releaseArea: releasearea:resultsBenchmark results, reports, and research evidenceBenchmark results, reports, and research evidenceeffort:lLarge multi-file implementation changeLarge multi-file implementation changepriority:p1Required for v1.0 or core follow-throughRequired for v1.0 or core follow-throughspec:rfc-0015RFC-0015 v0.7 execution-substrate improvementsRFC-0015 v0.7 execution-substrate improvementstype:docsDocumentation workDocumentation work
Metadata
Metadata
Assignees
Labels
area:evaluationArea: evaluationArea: evaluationarea:releaseArea: releaseArea: releasearea:resultsBenchmark results, reports, and research evidenceBenchmark results, reports, and research evidenceeffort:lLarge multi-file implementation changeLarge multi-file implementation changepriority:p1Required for v1.0 or core follow-throughRequired for v1.0 or core follow-throughspec:rfc-0015RFC-0015 v0.7 execution-substrate improvementsRFC-0015 v0.7 execution-substrate improvementstype:docsDocumentation workDocumentation work
Parent
#385
What to build
Evaluate the verified v0.9 checkpoints across the full gate suite, publish tracked eval artifacts, and write the final report/cards/artifact index. This is the issue that decides whether the v0.9 claim opens or remains diagnostic.
Acceptance criteria
p_pass, and WS-D rerank evals run for both seeds with required parent manifests.docs/benchmark/v0_9/artifacts include manifests, reports, and score rows required to reproduce the report.main.Blocked by
#391.