torch_musa Release v2.7.1
torch_musa v2.7.1 bug fix release
torch_musa v2.7.1 is now available. This is an enhanced version of v2.7.0, aimed at fixing issues, adding more operators and optimizing FSDP performances.
EnhanceMent
Operators:
- Fix error of
BCELosswith non-contiguous inputs; - Fix error when tensor is devided by scalar 1;
- Fix
torch.conjruns into a dead loop; - Fix empty
param_groupwhen optimizer was initialized with CPU tensors; - A lot of operators are supported, check the ops_list.md for details;
Features:
- Configurable overlap strategies in
FSDP2. We've introduced theTORCH_MUSA_FSDP2_OVERLAP_LEVELenvironment variable to let you control how communication overlaps with computation, enabling explicit trade-offs between memory usage and performance, Available overlap strategies are listed bellow:- 0 (NO_OVERLAP): No overlap, only for experimental usage mostly
- 1 (OVERLAP_FSDP_COMM_ONLY): Overlap FSDP collectives only but with less memory usage
- 2 (OVERLAP_FSDP_COMM_COPY_IN_WITH_COPY_OUT): Overlap communication/input copies with computation
- 3 (OVERLAP_FSDP_COMM_COPY_IN_WITH_COMM): The inter-node all-reduce overlap with computation was disabled
- 4 (OVERLAP_HSDP_COMM): Maximum communication overlap, which is PyTorch's default setting
- Expose StreamContext in torch_musa;
Enjoy.