Skip to content

Conversation

@cijohnson
Copy link

Motivation

Add RCCL_P2P_BATCH_ENABLE parameter to multinode

Technical Details

  • Added nccl_p2p_batch_enable test parameter with values ["0", "1"] to rccl_multinode_cvs.py
  • Updated rccl_lib.py to include the parameter in MPI command execution
  • Added parameter to rccl_config.json configuration file
  • Doubles test combinations to cover P2P batching enable/disable cases.

Test Plan

Execute all combination of rccl multinode tests

Test Result

WIP

Submission Checklist

@cijohnson cijohnson requested a review from venksrin09 November 21, 2025 03:28
@cijohnson cijohnson force-pushed the ichristo/add_p2p_batch_test_combo branch from 413a35d to 22a3b6c Compare November 21, 2025 03:39
rccl tests.

- Added nccl_p2p_batch_enable test parameter with values ["0", "1"] to rccl_multinode_cvs.py
- Updated rccl_lib.py to include the parameter in MPI command execution
- Added parameter to rccl_config.json configuration file
- Doubles test combinations to cover P2P batching enable/disable cases.
- RCCL_P2P_BATCH_ENABLE=1 is only tested on clusters with ≤32 nodes

Signed-off-by: Ignatious Johnson <ichristo@amd.com>
@cijohnson cijohnson force-pushed the ichristo/add_p2p_batch_test_combo branch from 22a3b6c to 465873a Compare November 23, 2025 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants