Skip to content

Remove nemo_log_globalrank-N_localrank-M.txt file creation#15626

Merged
pzelasko merged 6 commits into
mainfrom
remove-global-local-rank-log-files
Apr 23, 2026
Merged

Remove nemo_log_globalrank-N_localrank-M.txt file creation#15626
pzelasko merged 6 commits into
mainfrom
remove-global-local-rank-log-files

Conversation

@pzelasko
Copy link
Copy Markdown
Collaborator

@pzelasko pzelasko commented Apr 21, 2026

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Removed log files creation - the preferred way is to redirect each processes output to dedicated log file via training orchestration layer (e.g., SLURM).

Collection: all

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
@pzelasko pzelasko requested a review from blisc April 21, 2026 13:02
@github-actions github-actions Bot added core Changes to NeMo Core CI labels Apr 21, 2026
@pzelasko pzelasko force-pushed the remove-global-local-rank-log-files branch from 465fca2 to 8bc5613 Compare April 21, 2026 13:03
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
@pzelasko pzelasko force-pushed the remove-global-local-rank-log-files branch from 8bc5613 to e34926e Compare April 21, 2026 13:04
@pzelasko pzelasko added Run CICD Has Babysitter Claude autonomously takes over the quest to get to a green CI. labels Apr 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

I'll monitor this PR until CI is green. I'll post a plan for any fix and wait for your approval before pushing anything. Ping me by removing 'Has Babysitter' to cancel.

@github-actions github-actions Bot removed the Run CICD label Apr 21, 2026
pzelasko added a commit that referenced this pull request Apr 21, 2026
The PR Babysitter never investigated real CI failures. Check-runs
produced by GHA jobs authenticated with the default GITHUB_TOKEN do
not fire `check_run` events on downstream workflows (GitHub's
recursion guard). As a result, `check-label-for-ci` skipped on every
CI failure it was meant to handle — verified on PR #15626 where all
31 recent check_run-triggered runs skipped every job despite multiple
failing checks (Nemo_CICD_Test, Nemo_Linting_Test, etc.).

Replace the `check_run: [completed]` trigger with an explicit
`workflow_run: [completed]` list covering the CI workflows that can
fail on a PR (CICD NeMo, PyLint/flake8, wheel build, __init__ check,
copyright, CI-Install-Check, CodeQL, secrets detector). Intentionally
omitted: "Isort and Black Formatting" (auto-pushes fixes), the
babysitter itself, and labeler/relabel bots.

Update `check-label-for-ci`, the investigate prompt, and
`ping-author-on-failure` to read `github.event.workflow_run.*`
instead of `github.event.check_run.*` (conclusion, pull_requests,
head_sha, name). Drop the `reformat_with_isort_and_black` name filter
— filtering is now done at the trigger level by not listing that
workflow.

Semantics change: one investigation per failing workflow (not per
failing check). The existing prompt already instructs Claude to look
at all failing checks on the PR, so coverage is unchanged and the
noise is lower.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pzelasko pzelasko removed the Has Babysitter Claude autonomously takes over the quest to get to a green CI. label Apr 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Babysitter deactivated for this PR.

@pzelasko pzelasko added the Has Babysitter Claude autonomously takes over the quest to get to a green CI. label Apr 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

I'll monitor this PR until CI is green. I'll post a plan for any fix and wait for your approval before pushing anything. Ping me by removing 'Has Babysitter' to cancel.

@github-actions
Copy link
Copy Markdown
Contributor

I wasn't able to investigate the CI failure in Build, test, and publish a PyPi wheel (to testpypi).. @pzelasko, could you take a look?

@chtruong814 chtruong814 removed the Has Babysitter Claude autonomously takes over the quest to get to a green CI. label Apr 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Babysitter deactivated for this PR.

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Comment thread nemo/utils/exp_manager.py
@github-actions github-actions Bot removed the Run CICD label Apr 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

[🤖]: Hi @pzelasko 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 22, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@chtruong814 chtruong814 added the Has Babysitter Claude autonomously takes over the quest to get to a green CI. label Apr 23, 2026
@chtruong814
Copy link
Copy Markdown
Collaborator

@chtruong814 the PR Babysitter is restricted to the NVIDIA-NeMo speech_team. Removing the Has Babysitter label.

@chtruong814 chtruong814 removed the Has Babysitter Claude autonomously takes over the quest to get to a green CI. label Apr 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Babysitter deactivated for this PR.

@chtruong814 chtruong814 added the Has Babysitter Claude autonomously takes over the quest to get to a green CI. label Apr 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

I'll monitor this PR until CI is green. I'll post a plan for any fix and wait for your approval before pushing anything. Ping me by removing 'Has Babysitter' to cancel.

@github-actions
Copy link
Copy Markdown
Contributor

I wasn't able to investigate the CI failure in CICD NeMo. @pzelasko, could you take a look?

@chtruong814 chtruong814 removed the Has Babysitter Claude autonomously takes over the quest to get to a green CI. label Apr 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Babysitter deactivated for this PR.

@github-actions
Copy link
Copy Markdown
Contributor

[🤖]: Hi @pzelasko 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

@pzelasko
Copy link
Copy Markdown
Collaborator Author

Merging - one flaky test failed that's fixed in main already.

@pzelasko pzelasko merged commit 7913794 into main Apr 23, 2026
266 of 278 checks passed
@pzelasko pzelasko deleted the remove-global-local-rank-log-files branch April 23, 2026 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI core Changes to NeMo Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants