Skip to content

AMD Optimized BLIS Version 2.2

Compare
Choose a tag to compare
@pradeeptrgit pradeeptrgit released this 30 Jun 07:07
· 1414 commits to master since this release

AMD Optimized BLIS Version 2.2

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Improved performance for Level-1 BLAS routines for single and double precision.
  • Improved performance of SGEMV and DGEMV for large sizes.
  • Enabled small unpacked(SUP) GEMM kernels for single precision and double precision complex (C,Z) GEMM
  • Multi-threaded small unpacked(SUP) GEMM kernels enabled for (S,D,C,Z) GEMM providing improved performance for small/skinny matrices.
  • GEMM Selective packing feature is now multithread enabled. Selective packing feature packs either A or B or both the matrices and can be enabled by setting environment variable. Refer AOCL User Guide at https://developer.amd.com/amd-aocl/ for details
  • Improved TRSM single-thread and multi-thread performance for large and skinny matrices
  • Debug trace and log feature enabled for debug purposes.