Skip to content

Tensile 4.33.0 for ROCm 5.2.0

Compare
Choose a tag to compare
@ROCmMathLibrariesBot ROCmMathLibrariesBot released this 28 Jun 18:42
da90ed3

Added

  • TensileUpdateLibrary for updating old library logic files
  • Support for TensileRetuneLibrary to use sizes from separate file
  • ZGEMM DirectToVgpr/DirectToLds/StoreCInUnroll/MIArchVgpr support
  • Tests for denorm correctness
  • Option to write different architectures to different TensileLibrary files

Optimizations

  • Optimize MessagePackLoadLibraryFile by switching to fread
  • DGEMM tail loop optimization for PrefetchAcrossPersistentMode=1/DirectToVgpr

Changed

  • Alpha/beta datatype remains as F32 for HPA HGEMM
  • Force assembly kernels to not flush denorms
  • Use hipDeviceAttributePhysicalMultiProcessorCount as multiProcessorCount

Fixed

  • Fix segmentation fault when run i8 datatype with TENSILE_DB=0x80