Skip to content

Clarifying naming of compute kernels prior to submission of PR #3756

@FCLC

Description

@FCLC

I'm currently implementing the FP16 kernels as a follow up to #3754 and #2767

I want to be 100% clear on OpenBLAS nomenclature prior to submitting my PR.

Baseline assumptions

Where name is XYgemm_kernel_NxM_UARCH.c

and

X defines the returning data type
Y defines the submitted data type
N defines the column of the matrix, each column being the columnar access offset of a given register/vector
M defines the in register vector width (so for FP16 in a 512 bit vector lanes, 32 entries), while also being the row of the matrix

X and Y can be any of the following
s=single precision
d=double
b=bf16
h=fp16
c=complex single
z=complex double
null or no prefix

Also for the sake of reference
ge is general
m is matrix
v is vector
e: sdgemv would be an operation that takes in double precision floats in a matrix and vectors and returns in single precision format

UARCH is either the first UARCH for which the compute kernel is targeted, or if one already exist but a new UARCH which supports new extensions has come out and has favorable additional instructions that can further optimize a process, uses the name of the new UARCH instead. In some cases the lake/bridge suffix is dropped (sandy instead of sandy bridge). Sometimes the consumer/workstation name is used instead of the architecture name (skylakex instead of skylake)

This is assumed from the following:

The project has 6 implementations of a x86_64 single precision (fp32) 16x4 kernel.

3 in c with inline asm, 3 in pure asm.

Looking only to the ASM versions we have:
sgemm...sandy.S which is where AVX(1) instructions were introduced.
sgemm...haswell.S which is where AVX2 instructions were introduced.
sgemm...skylakex.S which is where AVX512 instructions were introduced.

Next:

if X is the same as Y then Y is dropped and shortened to Xgemm...
if N is the same as M both are kept for the sake of clarity

Something I'm unclear about is if N and M are expected to set the upper bound of a kernel, or if they're expected to always be a given size.

For example is the 16x4 also expected to deal with 8x2? Or is each GEMM expected to be it's own file implementation.

From looking at the /kernel/x86_64/ folder I believe it's the later, but I wanted to confirm

Next:

Specific to my PR

Following this nomenclature, I believe my first 2 kernels will be a 32x4 fp16->fp16 in avx512 using the new ISA as well as a kernel leveraging AVX(1) and the f16c ISA extensions to provide legacy support.

Kernel1: legacy

f16c and AVX were first introduced together on the Ivybridge micro-architecture and use the 32bit vector registers to compute then convert to fp16. AVX1 allows for YMM registers if we stick to floating point mode, which we're doing, so all good on that front.

As such we can use YMM registers, but need to treat them in 32 bit increments. This would mean a maximum of Nx8 kernel, with a naming convention of:
hgemm_kernel_NxM_ivy(bridge?).c

Where M could be 1,2,4,8, but probably only 8, N probably only 4, 8 and 16.

A potential issue is that the spec of f16c defines that the values be converted to fp32 before computation->computed->then computed back. it would therefore only be applicable for testing/development purposes before sending to modern systems capable of utilizing the AVX512 implementation.

Kernel 2: The modern implementation (read: the fast one)

In the case of the AVX512 implementation things are a little different. The first implementation in market was unofficial/unsanctioned via Alderlake (see #3490) over a year in market prior to Sapphire rapids on the performance core's (named Golden Cove). It supports proper FP16 implementation, therefore supporting 32 values in the 512bit ZMM registers.

The compromise I've come to is to name the AVX512 versions:

`hgemm_kernel_NxM_goldencove.c

Where the implementations will be
M=4,8,16 and N = 16, 32.

Once all of these are done it may be viable/workable to create SH/HS as well as the FP16 complex kernels but that's for later.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions