Skip to content

torch_musa Release v2.7.1

Choose a tag to compare

@fmo-mt fmo-mt released this 19 Jan 12:21
· 8 commits to main since this release
0bc05bf

torch_musa v2.7.1 bug fix release

torch_musa v2.7.1 is now available. This is an enhanced version of v2.7.0, aimed at fixing issues, adding more operators and optimizing FSDP performances.

EnhanceMent

Operators:

  • Fix error of BCELoss with non-contiguous inputs;
  • Fix error when tensor is devided by scalar 1;
  • Fix torch.conj runs into a dead loop;
  • Fix empty param_group when optimizer was initialized with CPU tensors;
  • A lot of operators are supported, check the ops_list.md for details;

Features:

  • Configurable overlap strategies in FSDP2. We've introduced the TORCH_MUSA_FSDP2_OVERLAP_LEVEL environment variable to let you control how communication overlaps with computation, enabling explicit trade-offs between memory usage and performance, Available overlap strategies are listed bellow:
    • 0 (NO_OVERLAP): No overlap, only for experimental usage mostly
    • 1 (OVERLAP_FSDP_COMM_ONLY): Overlap FSDP collectives only but with less memory usage
    • 2 (OVERLAP_FSDP_COMM_COPY_IN_WITH_COPY_OUT): Overlap communication/input copies with computation
    • 3 (OVERLAP_FSDP_COMM_COPY_IN_WITH_COMM): The inter-node all-reduce overlap with computation was disabled
    • 4 (OVERLAP_HSDP_COMM): Maximum communication overlap, which is PyTorch's default setting
  • Expose StreamContext in torch_musa;

Enjoy.