Skip to content

Improve v0.9 pack build observability#399

Merged
AbdelStark merged 1 commit into
mainfrom
issue-391-pack-build-observability
Jun 6, 2026
Merged

Improve v0.9 pack build observability#399
AbdelStark merged 1 commit into
mainfrom
issue-391-pack-build-observability

Conversation

@AbdelStark
Copy link
Copy Markdown
Owner

Summary

Part of #391 under tracker #385.

  • add structured progress events for long pass/fail pack builds, persisted as reports/passfail_pack_progress.jsonl and mirrored to stderr with the existing CODELEWM_JOB_EVENT prefix
  • extend hf-job-event-status so persisted pack-build logs summarize progress, completion, ETA, and event counts like training job logs
  • update the v0.9 short A10G config with the measured cross-benchmark pack p_pass BCE positive-class weight from the preflight pack report
  • document the pack-build progress/status workflow in the HF operations and scaled-training runbooks

This PR intentionally does not close #391. The live two-seed HF Jobs run, artifact download, checkpoint inspection, and training health table are still pending after this lands.

Local Validation

  • uv run pytest tests/data/execution_pack/test_passfail_pack.py tests/training/test_job_events.py tests/training/test_execution_train_config.py
  • uv run python -m compileall -q -x 'tests/fixtures/codestate/invalid_(before|after).py$' codelewm tests
  • git diff --check
  • uv run pytest tests/ # 967 passed, 8 skipped, 1 warning
  • uv run scripts/hf-launch-execution-run --config config/train/scaled/codelewm_execution_v0_9_short_a10g.yaml --git-sha dryrun --date 20260606 --runtime-image-digest sha256:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa --require-runtime-image-digest --json

Pack Preflight Evidence

Built the local v0.9 pass/fail pack with progress logging enabled:

  • 384 completion-label rows: HumanEval 282, MBPP-Plus 102
  • 2,214 sandboxed scoring inputs
  • 2,188 produced records: HumanEval 1,882, MBPP-Plus 306
  • pass labels: false 1,107; true 1,081
  • p_pass_bce_pos_weight: 1.0240518038852915
  • split counts: train 1,928; val 57; test 203
  • readiness gates passed, including val/test passed true/false and output_magnitude_bucket coverage
  • sandbox rejects: sandbox_timeout 26
  • manifest verify ok: 9 files checked
  • codelewm secret-scan ok: no findings

@AbdelStark AbdelStark merged commit 69f798a into main Jun 6, 2026
9 checks passed
@AbdelStark AbdelStark deleted the issue-391-pack-build-observability branch June 6, 2026 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v0.9 train: guarded 2-seed HF Jobs run after data/eval preflight

1 participant