Remove nemo_log_globalrank-N_localrank-M.txt file creation#15626
Conversation
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
465fca2 to
8bc5613
Compare
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
8bc5613 to
e34926e
Compare
|
I'll monitor this PR until CI is green. I'll post a plan for any fix and wait for your approval before pushing anything. Ping me by removing 'Has Babysitter' to cancel. |
The PR Babysitter never investigated real CI failures. Check-runs produced by GHA jobs authenticated with the default GITHUB_TOKEN do not fire `check_run` events on downstream workflows (GitHub's recursion guard). As a result, `check-label-for-ci` skipped on every CI failure it was meant to handle — verified on PR #15626 where all 31 recent check_run-triggered runs skipped every job despite multiple failing checks (Nemo_CICD_Test, Nemo_Linting_Test, etc.). Replace the `check_run: [completed]` trigger with an explicit `workflow_run: [completed]` list covering the CI workflows that can fail on a PR (CICD NeMo, PyLint/flake8, wheel build, __init__ check, copyright, CI-Install-Check, CodeQL, secrets detector). Intentionally omitted: "Isort and Black Formatting" (auto-pushes fixes), the babysitter itself, and labeler/relabel bots. Update `check-label-for-ci`, the investigate prompt, and `ping-author-on-failure` to read `github.event.workflow_run.*` instead of `github.event.check_run.*` (conclusion, pull_requests, head_sha, name). Drop the `reformat_with_isort_and_black` name filter — filtering is now done at the trigger level by not listing that workflow. Semantics change: one investigation per failing workflow (not per failing check). The existing prompt already instructs Claude to look at all failing checks on the PR, so coverage is unchanged and the noise is lower. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Babysitter deactivated for this PR. |
|
I'll monitor this PR until CI is green. I'll post a plan for any fix and wait for your approval before pushing anything. Ping me by removing 'Has Babysitter' to cancel. |
|
I wasn't able to investigate the CI failure in |
|
Babysitter deactivated for this PR. |
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
…om/nvidia/nemo into remove-global-local-rank-log-files
|
[🤖]: Hi @pzelasko 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
|
@chtruong814 the PR Babysitter is restricted to the NVIDIA-NeMo |
|
Babysitter deactivated for this PR. |
|
I'll monitor this PR until CI is green. I'll post a plan for any fix and wait for your approval before pushing anything. Ping me by removing 'Has Babysitter' to cancel. |
|
I wasn't able to investigate the CI failure in |
|
Babysitter deactivated for this PR. |
|
[🤖]: Hi @pzelasko 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
|
Merging - one flaky test failed that's fixed in main already. |
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Removed log files creation - the preferred way is to redirect each processes output to dedicated log file via training orchestration layer (e.g., SLURM).
Collection: all
Changelog
Usage
# Add a code snippet demonstrating how to use thisGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information