-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Context
It's been reasonably well documented that on alder lake, if, and only if, the GraceMount Ecore's are disabled, it becomes possible to enable all of the available AVX512 instruction available on the Golden Cove Pcore's the same core's used in SapphireRapids.
For relevant workloads, many of which OpenBLAS has AVX512 accelerated code paths, this can lead to a significant performance uplift.
As of release 0.3.19, even if ecores are disabled and AVX512 is available, the build system will not make use of them automatically.
Current workaround
The user has the option of either passing:
CFLAGS='-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16'
CXXFLAGS='-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16'
FFLAGS='-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16'
or alternatively passing
CFLAGS='-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect'
CXXFLAGS='-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect'
FFLAGS='-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect'
NOTE: It is preferred to use the -march=sapphirerapids option as GCC, Clang, LLVM based ICX and ICC will then preserve the relevant AVX512 cost functions for auto-vectorization vs other instruction pathways, instead of generic costs meant as a catch all across all supported architectures.
NOTE: We must disable the AMX tile and AMX instructions as the additional hardware was not built into Alder Lake.
Proposed solution/Request
I'd like to propose that, if the user supplies an architecture flag AlderLakeAVX512, it be aliased to be enable all features of Sapphire Rapids not explicitly requiring the AMX Tile.
Sources on AVX512 support on alder lake:
Phoronix Article
OpenBenchmarking Results of disabling 8 ecores to enable AVX512 on 8 pcores
Testing of AVX512 per instruction cost and pipeline on Alderlake