Skip to content

torch_musa Release v2.1.1

Choose a tag to compare

@fmo-mt fmo-mt released this 09 Sep 13:19
· 32 commits to main since this release
973ed69

torch_musa v2.1.1 bug fix release

torch_musa v2.1.1 is now available. This is an enhanced version of v2.1.0, aimed at fixing issues discovered during projects and improving core features. Despite some known issues, complete functional/integration tests have been passed based on MUSA 4.2.0. Native supported operators increased to over 948.

New Features

  • Support musagraphs backend for torch.compile, introducing reduced host overhead and e2e acceleration from musa-graph.
  • muSolver has been integrated into the backend of several linalg operators, including lu_factor_ex、lu_solve、solve_ex、cholesky_ex...
  • FusedAdamW/FusedAdam on MUSA are available on DTensor or other Tensor variants that based on the torch_dispatch mechanism.
  • Benchmark module has been expanded to include more operator cases.

EnhanceMent

  • Fixed the occurrence of 0-value in exponential,inspired from Intel MKL vRngExponential(...)
  • Ensured early return for some 0-numel op cases
  • Optimized one-hot by eliminating redundant preprocessing logics
  • Added rrelu_with_noise/nansum, RoPE supports multi-latent
  • Extended SDPA with no-batch inputs, enable mask-grad only for math backend
  • Fixed scatter_reduce crash and cross-entropy with none mode cases
  • Improved bandwiths of binary ops on rhs not last-contiguous cases