Skip to content

ci: restore perf test torchrun logs#4951

Merged
chtruong814 merged 1 commit into
NVIDIA:mainfrom
chtruong814:chtruong/fix-perf-tests
May 23, 2026
Merged

ci: restore perf test torchrun logs#4951
chtruong814 merged 1 commit into
NVIDIA:mainfrom
chtruong814:chtruong/fix-perf-tests

Conversation

@chtruong814
Copy link
Copy Markdown
Contributor

Summary

  • Restore torchrun per-rank log emission in the perf test harness.
  • Create {assets_dir}/logs/1 beside {assets_dir}/perf_results so launch_jet_workload.py can find std*.log assets.
  • Fixes the gpt_16b_perf retry loop introduced when PR Perf tests #4917 removed the torchrun log arguments.

Test Plan

  • bash -n tests/performance_tests/shell_test_utils/run_perf_test.sh
  • git diff --check -- tests/performance_tests/shell_test_utils/run_perf_test.sh

Signed-off-by: Charlie Truong <chtruong@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 23, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@chtruong814 chtruong814 marked this pull request as ready for review May 23, 2026 01:48
@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team May 23, 2026 01:48
@chtruong814
Copy link
Copy Markdown
Contributor Author

Fast merging to resolve internal testing issue. This script is only used on internal tests.

@chtruong814 chtruong814 merged commit f7f584d into NVIDIA:main May 23, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants