torch_musa Release v2.1.1
torch_musa v2.1.1 bug fix release
torch_musa v2.1.1 is now available. This is an enhanced version of v2.1.0, aimed at fixing issues discovered during projects and improving core features. Despite some known issues, complete functional/integration tests have been passed based on MUSA 4.2.0. Native supported operators increased to over 948.
New Features
- Support
musagraphsbackend for torch.compile, introducing reduced host overhead and e2e acceleration from musa-graph. - muSolver has been integrated into the backend of several linalg operators, including lu_factor_ex、lu_solve、solve_ex、cholesky_ex...
- FusedAdamW/FusedAdam on MUSA are available on DTensor or other Tensor variants that based on the torch_dispatch mechanism.
- Benchmark module has been expanded to include more operator cases.
EnhanceMent
- Fixed the occurrence of 0-value in exponential,inspired from Intel MKL vRngExponential(...)
- Ensured early return for some 0-numel op cases
- Optimized one-hot by eliminating redundant preprocessing logics
- Added rrelu_with_noise/nansum, RoPE supports multi-latent
- Extended SDPA with no-batch inputs, enable mask-grad only for math backend
- Fixed scatter_reduce crash and cross-entropy with
nonemode cases - Improved bandwiths of binary ops on rhs not last-contiguous cases