-
Notifications
You must be signed in to change notification settings - Fork 57
Open
Description
I tried the code changes for MPI as described in 03-job-launchers/README.md, but soon realised that the local rank was missing. I see that you added it as a command arg, but is it not better to use the OMPI_COMM_WORLD_LOCAL_RANK env?
I made these changes:
- rank = int(os.getenv("RANK", "0"))
- local_rank = rank % torch.cuda.device_count()
- world_size = int(os.getenv("WORLD_SIZE", "1"))
+ rank = int(os.getenv("OMPI_COMM_WORLD_RANK", "0"))
+ local_rank = int(os.getenv("OMPI_COMM_WORLD_LOCAL_RANK", "0"))
+ world_size = int(os.getenv("OMPI_COMM_WORLD_SIZE", "1"))
Metadata
Metadata
Assignees
Labels
No labels