-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
ciContinuous IntegrationContinuous IntegrationfeatureIs an improvement or enhancementIs an improvement or enhancementhelp wantedOpen to be worked onOpen to be worked on
Milestone
Description
🚀 CI / Tests
We currently have about 3 tests failing on master, it's a mix of NCCL error and "RuntimeError: Address already in use".
Need to find the reason for the failures and adjust CI to report it properly.
FAILED tests/plugins/test_deepspeed_plugin.py::test_deepspeed_multigpu_stage_2_accumulated_grad_batches[True]
FAILED tests/plugins/test_deepspeed_plugin.py::test_deepspeed_multigpu_stage_2_accumulated_grad_batches[False]
FAILED tests/plugins/test_sharded_plugin.py::test_ddp_sharded_plugin_manual_optimization[ddp_sharded_spawn]
Metadata
Metadata
Assignees
Labels
ciContinuous IntegrationContinuous IntegrationfeatureIs an improvement or enhancementIs an improvement or enhancementhelp wantedOpen to be worked onOpen to be worked on