Skip to content

Publish v0.9 claim audit#400

Merged
AbdelStark merged 1 commit into
mainfrom
issue-392-v0-9-claim-audit
Jun 7, 2026
Merged

Publish v0.9 claim audit#400
AbdelStark merged 1 commit into
mainfrom
issue-392-v0-9-claim-audit

Conversation

@AbdelStark
Copy link
Copy Markdown
Owner

Summary

Claim Boundary

v0.9 stays overall claim-closed. Both seeds clear HumanEval WS-D reranking, but MBPP-Plus has zero lift over no-action, broad semantic-decoy and representation gates remain closed, and p-pass calibration is reported from completion-score baselines rather than a standalone serialized p_pass key.

Validation

  • uv run pytest tests/docs/test_hf_ml_intern_training.py -> 8 passed
  • uv run pytest tests/ -> 967 passed, 8 skipped, 1 torch nested-tensor warning
  • uv run python -m compileall -q -x 'tests/fixtures/codestate/invalid_(before|after)\.py$' codelewm tests
  • uv run codelewm --help
  • uv run codelewm openrouter byok-register --dry-run --json
  • uv run scripts/llm-world-model-demo -> manifest verify ok, artifact/html secret scans ok, claim gate closed
  • parent-aware manifest verification for the v0.9 pack, both training runs, and all 12 checked-in v0.9 eval reports -> ok
  • uv run codelewm secret-scan docs/benchmark/v0_9 --json -> ok, no findings
  • git diff --check

Closes #392.

@AbdelStark AbdelStark merged commit c1e1372 into main Jun 7, 2026
9 checks passed
@AbdelStark AbdelStark deleted the issue-392-v0-9-claim-audit branch June 7, 2026 03:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v0.9 eval/report: run full gate suite and publish claim audit

1 participant