Math libraries on AMD Rome (20201130)

talk by Sebastian Achilles (JSC): performance of math libraries on AMD Rome (Zen2)
- slides (PDF)
significant performance benefits for BLIS compared to Intel MKL (2020.2) and OpenBLAS
- BLIS: version 2.2 (AMD fork), but same performance as stock BLIS
  - AMD-specific kernels have been backported to upstream
- full node tests (128 cores @ JUSUF system at JSC)
  - large performance gaps for several BLAS function (dgemm, zgemm, etc.)
- single-threaded performance difference is smaller, but still in favor of BLIS
switch from OpenBLAS to BLIS in foss toolchain?
- needs testing on Intel systems as well, compare BLIS with OpenBLAS
- Sebastian will share his benchmarks scripts so others can test as well
also compared FFTW 3.3.8 vs patched FFTW 3.3.8 by AMD
- both significantly faster than Intel MKL 2020.2 on AMD Rome
- patched FFTW shows even better performance
should we use AMD-patched FFTW in foss toolchain?
- AMD patches introduce --enable-amd configuration option, so may be safe to also apply on Intel systems
- makes providing optimized FFTW for AMD easier in foss toolchains
- if needed we can pick which FFTW installation to use on AMD systems at installation time
notes:
- Intel MKL 2020.2 has some Zen2-specific kernels, but still falls back to Intel Pentium 4 code paths
- some performance improvements in Intel MKL 2020.4, but BLIS is still significantly better
- Intel MKL can be "convinced" to use AVX2-optimized code paths
  - easy to do in imkl 2020.0, just use export MKL_DEBUG_CPU_TYPE=5
  - harder in more recent imkl 2020 versions, requires patching of binaries/libraries (see https://danieldk.eu/Posts/2020-08-31-MKL-Zen.html)
    - and that's against the EULA, and you don't really know what you're getting (or if it works correctly)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Math libraries on AMD Rome (20201130)

Clone this wiki locally