Skip to content

Improve v0.8 HF runtime lifecycle events#383

Merged
AbdelStark merged 1 commit into
mainfrom
issue-370-runtime-lifecycle-events
Jun 5, 2026
Merged

Improve v0.8 HF runtime lifecycle events#383
AbdelStark merged 1 commit into
mainfrom
issue-370-runtime-lifecycle-events

Conversation

@AbdelStark
Copy link
Copy Markdown
Owner

Summary

  • emit structured CODELEWM_JOB_EVENT runtime lifecycle events from the v0.8 container entrypoint
  • summarize latest runtime.* phases in scripts/hf-job-event-status alongside training progress
  • document runtime-vs-training observability in the HF Jobs runbooks

Closes part of #370.

Validation

  • bash -n containers/v0_8/entrypoint.sh
  • uv run pytest tests/training/test_job_events.py tests/containers/test_runtime_image.py
  • uv run pytest tests/docs/test_hf_ml_intern_training.py tests/docs/test_scaled_training_runbook.py
  • uv run python -m compileall -q codelewm/training/job_events.py scripts/hf-job-event-status tests/training/test_job_events.py tests/containers/test_runtime_image.py
  • git diff --check
  • local entrypoint smoke parsed with uv run scripts/hf-job-event-status --from-file /tmp/codelewm-runtime-smoke.err

@AbdelStark AbdelStark merged commit cd99160 into main Jun 5, 2026
9 checks passed
@AbdelStark AbdelStark deleted the issue-370-runtime-lifecycle-events branch June 5, 2026 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant