[JAX] Add wait per multi-proc cleanup in L0_jax_distributed_unittest#2979
Conversation
Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
|
/te-ci JAX L0 |
Greptile SummaryThis PR adds a
Confidence Score: 5/5Safe to merge — the one-line addition to each script correctly closes the process-reaping race at script exit. Both changes are minimal and targeted: adding No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Start test loop] --> B[Spawn N GPU processes as background jobs]
B --> C[wait — all test processes finish]
C --> D[Check log for PASS/FAIL/SKIP]
D --> E[wait — before log cleanup]
E --> F[rm log files]
F --> G{More test cases?}
G -- Yes --> A
G -- No --> H[wait — post-loop]
H --> I[cleanup — send SIGTERM + SIGKILL to any lingering PIDs]
I --> J["wait NEW — block until killed processes are reaped"]
J --> K[exit HAS_FAILURE]
K --> L[EXIT trap fires cleanup again — harmless, kill -0 guards no-op]
Reviews (2): Last reviewed commit: "Merge branch 'main' into cgemm_mprocs_fi..." | Re-trigger Greptile |
NVIDIA#2979) add wait per multi-proc test cleanup Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>
Description
Add wait per multi-proc cleanup in
L0_jax_distributed_unittestto prevent later tests process starts before the previous tests' cleanup is done. This helps to prevent mismatch issues in CGEMM tests reported by QA.Type of change
Checklist: