Skip to content
This repository was archived by the owner on Nov 19, 2025. It is now read-only.

fix: session start fixture correctly handles being launched by mpirun#486

Merged
terrykong merged 1 commit intomainfrom
tk/mpi-unit-test-fix
Jan 21, 2025
Merged

fix: session start fixture correctly handles being launched by mpirun#486
terrykong merged 1 commit intomainfrom
tk/mpi-unit-test-fix

Conversation

@terrykong
Copy link
Copy Markdown
Collaborator

What does this PR do ?

This fix was only in dev, so cherry-picking it into main

Changelog

  • Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

Checklist when contributing a new algorithm

  • Does the trainer resume and restore model state all states?
  • Does the trainer support all parallelism techniques(PP, TP, DP)?
  • Does the trainer support max_steps=-1 and validation?
  • Does the trainer only call APIs defined in alignable_interface.py?
  • Does the trainer have proper logging?

Additional Information

  • Related to # (issue)

Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong requested a review from ko3n1g January 17, 2025 22:25
@terrykong terrykong added the Run CICD Set + un-set to retrigger (add after r*.*.* labels) label Jan 17, 2025
@terrykong terrykong enabled auto-merge (squash) January 17, 2025 22:26
@terrykong terrykong merged commit 9512ee8 into main Jan 21, 2025
@terrykong terrykong deleted the tk/mpi-unit-test-fix branch January 21, 2025 14:50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Run CICD Set + un-set to retrigger (add after r*.*.* labels)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants