Skip to content

v3.0.0

Choose a tag to compare

@UCHINO-Yuki UCHINO-Yuki released this 04 Jun 06:47
· 7 commits to main since this release

Major: Improve GEMM performance and add Level 3 BLAS/mixed-precision support

  • Improve the performance of the existing GEMM implementation:
    • gemmul8::gemm, gemmul8::gemmLt
  • Add support for the following Level 3 BLAS-like matrix operations:
    • SYMM (gemmul8::symm, gemmul8::symmLt)
    • SYRK (gemmul8::syrk, gemmul8::syrkLt)
    • SYR2K (gemmul8::syr2k, gemmul8::syr2kLt)
    • SYRKX (gemmul8::syrkx, gemmul8::syrkxLt)
    • HERK (gemmul8::herk, gemmul8::herkLt)
    • HER2K (gemmul8::her2k, gemmul8::her2kLt)
    • HERKX (gemmul8::herkx, gemmul8::herkxLt)
    • TRMM (gemmul8::trmm, gemmul8::trmmLt)
    • TRSM (gemmul8::trsm, gemmul8::trsmLt)
    • TRTRMM (gemmul8::trtrmm, gemmul8::trtrmmLt): triangular-by-triangular matrix multiplication
  • Add support for mixed-precision execution
  • Add workspace-query support by calling GEMMul8 routines with work == nullptr
  • Extend gemmul8::workSize to support the routines listed above except TRSM
  • Add gemmul8::workSizeTrsm for TRSM workspace-size calculation
  • Add TRSM block-size control APIs for the internal blocked algorithm:
    • gemmul8::set_block_size_trsm(int nB)
    • gemmul8::get_block_size_trsm()
  • Add overload (Hook Mode) support for the routines listed above
  • Add overload (Hook Mode) support for _64, 3m, and 3m_64 variants where applicable
  • Change the GEMMul8 routine argument type from unsigned num_moduli to int num_moduli