Skip to content

Propose optional CPUARCH target AlderLakeAVX512, as aliase of SapphireRapids #3490

@FCLC

Description

@FCLC

Context

It's been reasonably well documented that on alder lake, if, and only if, the GraceMount Ecore's are disabled, it becomes possible to enable all of the available AVX512 instruction available on the Golden Cove Pcore's the same core's used in SapphireRapids.

For relevant workloads, many of which OpenBLAS has AVX512 accelerated code paths, this can lead to a significant performance uplift.

As of release 0.3.19, even if ecores are disabled and AVX512 is available, the build system will not make use of them automatically.

Current workaround

The user has the option of either passing:

CFLAGS='-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16'
CXXFLAGS='-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16'
FFLAGS='-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16'

or alternatively passing

CFLAGS='-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect'
CXXFLAGS='-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect'
FFLAGS='-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect'

NOTE: It is preferred to use the -march=sapphirerapids option as GCC, Clang, LLVM based ICX and ICC will then preserve the relevant AVX512 cost functions for auto-vectorization vs other instruction pathways, instead of generic costs meant as a catch all across all supported architectures.
NOTE: We must disable the AMX tile and AMX instructions as the additional hardware was not built into Alder Lake.

Proposed solution/Request

I'd like to propose that, if the user supplies an architecture flag AlderLakeAVX512, it be aliased to be enable all features of Sapphire Rapids not explicitly requiring the AMX Tile.

Sources on AVX512 support on alder lake:

Anandtech article

Phoronix Article

OpenBenchmarking Results of disabling 8 ecores to enable AVX512 on 8 pcores

Testing of AVX512 per instruction cost and pipeline on Alderlake

discussion of AVX512 performance support and performance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions