Skip to content

rocBLAS-2.0.0 for ROCm 2.0

Compare
Choose a tag to compare
@amcamd amcamd released this 19 Dec 19:46
· 3489 commits to master since this release

Changelist:

  • improved performance of fp16/fp32 rocblas_gemm_ex on gfx906
  • support for i8/i32 rocblas_gemm_ex
  • update vega-10 resnet50 tuning
  • refactor testing to be data driven
  • change gemm-ex API solution index from uint32_t to int32_t
  • disable gemm and gemm_ex chunking
  • fix gemv argument checking
  • add performance script for p1b1 benchmark sizes
  • refactor gemm code to reduce use of macros
  • trsm performance regression fix