torch_musa v2.7.1 bug fix release

torch_musa v2.7.1 is now available. This is an enhanced version of v2.7.0, aimed at fixing issues, adding more operators and optimizing FSDP performances.

EnhanceMent

Operators:

Fix error of BCELoss with non-contiguous inputs;
Fix error when tensor is devided by scalar 1;
Fix torch.conj runs into a dead loop;
Fix empty param_group when optimizer was initialized with CPU tensors;
A lot of operators are supported, check the ops_list.md for details;

Features:

Configurable overlap strategies in FSDP2. We've introduced the TORCH_MUSA_FSDP2_OVERLAP_LEVEL environment variable to let you control how communication overlaps with computation, enabling explicit trade-offs between memory usage and performance, Available overlap strategies are listed bellow:
- 0 (NO_OVERLAP): No overlap, only for experimental usage mostly
- 1 (OVERLAP_FSDP_COMM_ONLY): Overlap FSDP collectives only but with less memory usage
- 2 (OVERLAP_FSDP_COMM_COPY_IN_WITH_COPY_OUT): Overlap communication/input copies with computation
- 3 (OVERLAP_FSDP_COMM_COPY_IN_WITH_COMM): The inter-node all-reduce overlap with computation was disabled
- 4 (OVERLAP_HSDP_COMM): Maximum communication overlap, which is PyTorch's default setting
Expose StreamContext in torch_musa;

Enjoy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch_musa Release v2.7.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

torch_musa v2.7.1 bug fix release

EnhanceMent

Operators:

Features:

Uh oh!