Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Alpha support for SME2.1 #309

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rsandifo-arm
Copy link
Contributor


name: Pull request
about: Technical issues, document format problems, bugs in scripts or feature proposal.


Thank you for submitting a pull request!

If this PR is about a bugfix:

Please use the bugfix label and make sure to go through the checklist below.

If this PR is about a proposal:

We are looking forward to evaluate your proposal, and if possible to
make it part of the Arm C Language Extension (ACLE) specifications.

We would like to encourage you reading through the contribution
guidelines
, in particular the section on submitting
a proposal
.

Please use the proposal label.

As for any pull request, please make sure to go through the below
checklist.

Checklist: (mark with X those which apply)

  • If an issue reporting the bug exists, I have mentioned it in the
    PR (do not bother creating the issue if all you want to do is
    fixing the bug yourself).
  • I have added/updated the SPDX-FileCopyrightText lines on top
    of any file I have edited. Format is SPDX-FileCopyrightText: Copyright {year} {entity or name} <{contact informations}>
    (Please update existing copyright lines if applicable. You can
    specify year ranges with hyphen , as in 2017-2019, and use
    commas to separate gaps, as in 2018-2020, 2022).
  • I have updated the Copyright section of the sources of the
    specification I have edited (this will show up in the text
    rendered in the PDF and other output format supported). The
    format is the same described in the previous item.
  • I have run the CI scripts (if applicable, as they might be
    tricky to set up on non-*nix machines). The sequence can be
    found in the contribution
    guidelines
    . Don't
    worry if you cannot run these scripts on your machine, your
    patch will be automatically checked in the Actions of the pull
    request.
  • I have added an item that describes the changes I have
    introduced in this PR in the section Changes for next
    release
    of the section Change Control/Document history
    of the document. Create Changes for next release if it does
    not exist. Notice that changes that are not modifying the
    content and rendering of the specifications (both HTML and PDF)
    do not need to be listed.
  • When modifying content and/or its rendering, I have checked the
    correctness of the result in the PDF output (please refer to the
    instructions on how to build the PDFs
    locally
    ).
  • The variable draftversion is set to true in the YAML header
    of the sources of the specifications I have modified.
  • Please DO NOT add my GitHub profile to the list of contributors
    in the README page of the project.

@rsandifo-arm
Copy link
Contributor Author

This will need rebasing once SVE2.1 goes in, but I thought it was worth posting now for comments.

Move and zero ZA tile slice to vector register.

```
// And similarly for u8.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also have :
svreadz_ver

momchil-velikov added a commit to momchil-velikov/llvm-project that referenced this pull request Apr 9, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics

    void svmopa_za16[_f16]_m(uint64_t tile, svbool_t pn, svbool_t pm,
                             svfloat16_t zn, svfloat16_t zm)
        __arm_streaming __arm_inout("za");
    void svmops_za16[_f16]_m(uint64_t tile, svbool_t pn,
                             svbool_t pm, svfloat16_t zn, svfloat16_t zm)
        __arm_streaming __arm_inout("za");

as well as the corresponding `bf16` variants.
momchil-velikov added a commit to momchil-velikov/llvm-project that referenced this pull request Apr 11, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics

    void svmopa_za16[_f16]_m(uint64_t tile, svbool_t pn, svbool_t pm,
                             svfloat16_t zn, svfloat16_t zm)
        __arm_streaming __arm_inout("za");
    void svmops_za16[_f16]_m(uint64_t tile, svbool_t pn,
                             svbool_t pm, svfloat16_t zn, svfloat16_t zm)
        __arm_streaming __arm_inout("za");

as well as the corresponding `bf16` variants.
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Apr 12, 2024
… single

According to the specification in
ARM-software/acle#309 this adds the intrinsics

// And similarly for u8.
svint8_t svreadz_hor_za8_s8(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");

// And similarly for u16, bf16 and f16.
svint16_t svreadz_hor_za16_s16(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");

// And similarly for u32 and f32.
svint32_t svreadz_hor_za32_s32(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");

// And similarly for u64 and f64.
svint64_t svreadz_hor_za64_s64(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");

// And similarly for s16, s32, s64, u8, u16, u32, u64, bf16, f16, f32, f64
svint8_t svreadz_hor_za128_s8(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Apr 15, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics

// Variants are also available for _za8_u8, _za16_s16, _za16_u16,
// _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
// _za64_s64, _za64_u64 and _za64_f64
svint8x2_t svreadz_hor_za8_s8_vg2(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");

// Variants are also available for _za8_u8, _za16_s16, _za16_u16,
// _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
// _za64_s64, _za64_u64 and _za64_f64
svint8x4_t svreadz_hor_za8_s8_vg4(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");

// Variants are also available for _za8_u8, _za16_s16, _za16_u16,
// _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
// _za64_s64, _za64_u64 and _za64_f64
svint8x2_t svreadz_ver_za8_s8_vg2(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");

// Variants are also available for _za8_u8, _za16_s16, _za16_u16,
// _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
// _za64_s64, _za64_u64 and _za64_f64
svint8x4_t svreadz_ver_za8_s8_vg4(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");

``` c
// Variants are also available for:
// _za16[_bf16]_m (only if __ARM_FEATURE_SME_B16B16 != 0)
// _za16[_f16]_m (only if __ARM_FEATURE_SME_F16F16 != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that the instructions under ARM_FEATURE_SME_F16F16 in the ACLE
are only under FEATURE_SME_F16F16 in developer.arm(It looks like they do not require SME2),

FMLA ZA.H[[], []{, VGx2}], { [].H-[].H }, [].H[[]]
if !IsFeatureImplemented(FEAT_SME_F16F16) then UNDEFINED;
I am not sure if the ACLE should follow that or not.

@@ -1948,6 +1968,20 @@ See [Half-precision brain
floating-point](#half-precision-brain-floating-point) for details
of half-precision brain floating-point types.

#### Non-widening brain 16-bit floating-point support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Arm ARM March 2024, the feature name is FEAT_SVE_B16B16, which implies FEAT_SME2 or FEAT_SVE2. Differences with the proposed spec are:

  • spelling of the feature name
  • it does not imply SVE2.1 or SME2.1 as the text seems to suggest

Perhaps two separate macros are not needed and equivalent functionality can be obtained by, e.g.

#if defined (__ARM_FEATURE_SME) && defined (__ARM_FEATURE_SVE_B16B16)
...

extended in the future.

`__ARM_FEATURE_SME_F16F16` is defined to `1` if there is hardware support
for the SME2.1 half-precision (FEAT_SME_F16F16) instructions and if their
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only SME2 is implied by FEAT_SME_F16F16

CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Apr 16, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics

Move and zero multiple ZA single-vector groups to vector registers

// Variants are also available for _za8_u8, _za16_s16, _za16_u16,
// _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
// _za64_s64, _za64_u64 and _za64_f64
svint8x2_t svreadz_za8_s8_vg1x2(uint32_t slice)
__arm_streaming __arm_inout("za");

// Variants are also available for _za8_u8, _za16_s16, _za16_u16,
// _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
// _za64_s64, _za64_u64 and _za64_f64
svint8x4_t svreadz_za8_s8_vg1x4(uint32_t slice)
__arm_streaming __arm_inout("za");
@@ -10132,6 +10184,9 @@ Multi-vector convert to/from floating-point.

// Variants are also available for _f32[_u32_x4], _s32[_f32_x4] and _u32[_f32_x4]
svfloat32x4_t svcvt_f32[_s32_x4](svint32x4_t zn) __arm_streaming;

// Only if __ARM_FEATURE_SME_F16F16 != 0
svfloat32x2_t svcvt_f32[_f16_x2](svfloat16_t zn) __arm_streaming;
Copy link

@Lukacma Lukacma Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't naming this svcvt_f32[_x2_f16] be more consistent as it is output that has 2 vectors ?

// _single_u8_x2,
// _single_bf16_x2 (only if __ARM_FEATURE_SME_B16B16)
// _single_s16_x2
// _single_u16_x2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the SME2 ACLE omitted sv[max|min|maxnm|minnm]_single_f16_x2 and sv[max|min|maxnm|minnm]_f16_x2

momchil-velikov added a commit to momchil-velikov/llvm-project that referenced this pull request Apr 29, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics

    void svmopa_za16[_f16]_m(uint64_t tile, svbool_t pn, svbool_t pm,
                             svfloat16_t zn, svfloat16_t zm)
        __arm_streaming __arm_inout("za");
    void svmops_za16[_f16]_m(uint64_t tile, svbool_t pn,
                             svbool_t pm, svfloat16_t zn, svfloat16_t zm)
        __arm_streaming __arm_inout("za");

as well as the corresponding `bf16` variants.
momchil-velikov added a commit to momchil-velikov/llvm-project that referenced this pull request Apr 30, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics

    void svmopa_za16[_f16]_m(uint64_t tile, svbool_t pn, svbool_t pm,
                             svfloat16_t zn, svfloat16_t zm)
        __arm_streaming __arm_inout("za");
    void svmops_za16[_f16]_m(uint64_t tile, svbool_t pn,
                             svbool_t pm, svfloat16_t zn, svfloat16_t zm)
        __arm_streaming __arm_inout("za");

as well as the corresponding `bf16` variants.
momchil-velikov added a commit to llvm/llvm-project that referenced this pull request May 9, 2024
…tors (#88266)

According to the specification in
ARM-software/acle#309 this adds the intrinsics

void_svadd_za16_vg1x2_f16(uint32_t slice, svfloat16x2_t zn)
__arm_streaming __arm_inout("za");
void_svadd_za16_vg1x4_f16(uint32_t slice, svfloat16x4_t zn)
__arm_streaming __arm_inout("za");
void_svsub_za16_vg1x2_f16(uint32_t slice, svfloat16x2_t zn)
__arm_streaming __arm_inout("za");
void_svsub_za16_vg1x4_f16(uint32_t slice, svfloat16x4_t zn)
__arm_streaming __arm_inout("za");

as well as the corresponding `bf16` variants.
momchil-velikov added a commit to momchil-velikov/llvm-project that referenced this pull request May 9, 2024
…tors (llvm#88266)

According to the specification in
ARM-software/acle#309 this adds the intrinsics

void_svadd_za16_vg1x2_f16(uint32_t slice, svfloat16x2_t zn)
__arm_streaming __arm_inout("za");
void_svadd_za16_vg1x4_f16(uint32_t slice, svfloat16x4_t zn)
__arm_streaming __arm_inout("za");
void_svsub_za16_vg1x2_f16(uint32_t slice, svfloat16x2_t zn)
__arm_streaming __arm_inout("za");
void_svsub_za16_vg1x4_f16(uint32_t slice, svfloat16x4_t zn)
__arm_streaming __arm_inout("za");

as well as the corresponding `bf16` variants.
momchil-velikov added a commit to momchil-velikov/llvm-project that referenced this pull request May 9, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics

    void svmopa_za16[_f16]_m(uint64_t tile, svbool_t pn, svbool_t pm,
                             svfloat16_t zn, svfloat16_t zm)
        __arm_streaming __arm_inout("za");
    void svmops_za16[_f16]_m(uint64_t tile, svbool_t pn,
                             svbool_t pm, svfloat16_t zn, svfloat16_t zm)
        __arm_streaming __arm_inout("za");

as well as the corresponding `bf16` variants.
momchil-velikov added a commit to llvm/llvm-project that referenced this pull request May 10, 2024
According to the specification in
ARM-software/acle#309
add the following intrinsics

void svmla[_single]_za16[_f16]_vg1x2(uint32_t slice, svfloat16x2_t zn,
svfloat16_t zm)
void svmla[_single]_za16[_f16]_vg1x4(uint32_t slice, svfloat16x4_t zn,
svfloat16_t zm)
void svmls[_single]_za16[_f16]_vg1x2(uint32_t slice, svfloat16x2_t zn,
svfloat16_t zm)
void svmls[_single]_za16[_f16]_vg1x4(uint32_t slice, svfloat16x4_t zn,
svfloat16_t zm)

void svmla_za16[_f16]_vg1x2(uint32_t slice, svfloat16x2_t zn,
svfloat16x2_t zm)
void svmla_za16[_f16]_vg1x4(uint32_t slice, svfloat16x4_t zn,
svfloat16x4_t zm)
void svmls_za16[_f16]_vg1x2(uint32_t slice, svfloat16x2_t zn,
svfloat16x2_t zm)
void svmls_za16[_f16]_vg1x4(uint32_t slice, svfloat16x4_t zn,
svfloat16x4_t zm)

void svmla_lane_za16[_f16]_vg1x2(uint32_t slice, svfloat16x2_t zn,
svfloat16_t zm, uint64_t imm_idx)
void svmla_lane_za16[_f16]_vg1x4(uint32_t slice, svfloat16x4_t zn,
svfloat16_t zm, uint64_t imm_idx)
void svmls_lane_za16[_f16]_vg1x2(uint32_t slice, svfloat16x2_t zn,
svfloat16_t zm, uint64_t imm_idx)
void svmls_lane_za16[_f16]_vg1x4(uint32_t slice, svfloat16x4_t zn,
svfloat16_t zm, uint64_t imm_idx)

as well as the corresponding `_bf16` variants.
momchil-velikov added a commit to llvm/llvm-project that referenced this pull request May 10, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics

    void svmopa_za16[_f16]_m(uint64_t tile, svbool_t pn, svbool_t pm,
                             svfloat16_t zn, svfloat16_t zm)
        __arm_streaming __arm_inout("za");
    void svmops_za16[_f16]_m(uint64_t tile, svbool_t pn,
svbool_t pm, svfloat16_t zn, svfloat16_t zm)
        __arm_streaming __arm_inout("za");

as well as the corresponding `bf16` variants.
momchil-velikov added a commit to momchil-velikov/llvm-project that referenced this pull request May 10, 2024
…tors (llvm#88266)

According to the specification in
ARM-software/acle#309 this adds the intrinsics

void_svadd_za16_vg1x2_f16(uint32_t slice, svfloat16x2_t zn)
__arm_streaming __arm_inout("za");
void_svadd_za16_vg1x4_f16(uint32_t slice, svfloat16x4_t zn)
__arm_streaming __arm_inout("za");
void_svsub_za16_vg1x2_f16(uint32_t slice, svfloat16x2_t zn)
__arm_streaming __arm_inout("za");
void_svsub_za16_vg1x4_f16(uint32_t slice, svfloat16x4_t zn)
__arm_streaming __arm_inout("za");

as well as the corresponding `bf16` variants.
momchil-velikov added a commit to momchil-velikov/llvm-project that referenced this pull request May 10, 2024
…tors (llvm#88266)

According to the specification in
ARM-software/acle#309 this adds the intrinsics

void_svadd_za16_vg1x2_f16(uint32_t slice, svfloat16x2_t zn)
__arm_streaming __arm_inout("za");
void_svadd_za16_vg1x4_f16(uint32_t slice, svfloat16x4_t zn)
__arm_streaming __arm_inout("za");
void_svsub_za16_vg1x2_f16(uint32_t slice, svfloat16x2_t zn)
__arm_streaming __arm_inout("za");
void_svsub_za16_vg1x4_f16(uint32_t slice, svfloat16x4_t zn)
__arm_streaming __arm_inout("za");

as well as the corresponding `bf16` variants.
hassnaaHamdi added a commit to llvm/llvm-project that referenced this pull request May 16, 2024
According to specifications in
[ARM-software/acle/pull/309](ARM-software/acle#309)
Add following intrinsics:

```
// svmax single,multi
svbfloat16x2_t svmax_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm)
svbfloat16x4_t svmax_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm)
svbfloat16x2_t svmax_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm)
svbfloat16x4_t svmax_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm)
```

```
// svmin single,multi
svbfloat16x2_t svmin_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm)
svbfloat16x4_t svmin_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm)
svbfloat16x2_t svmin_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm)
svbfloat16x4_t svmin_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm)
```

```
// svmaxnm single,multi
svbfloat16x2_t svmaxnm_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm)
svbfloat16x4_t svmaxnm_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm)
svbfloat16x2_t svmaxnm_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm)
svbfloat16x4_t svmaxnm_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm)
```

```
// svminnm single,multi
svbfloat16x2_t svminnm_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm)
svbfloat16x4_t svminnm_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm)
svbfloat16x2_t svminnm_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm)
svbfloat16x4_t svminnm_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm)
```
- Variations other than bfloat16 are already supported.
momchil-velikov added a commit to llvm/llvm-project that referenced this pull request May 17, 2024
…tors (#91606)

[Recommit of e88ba6d]

According to the specification in
ARM-software/acle#309 this adds the intrinsics

void_svadd_za16_vg1x2_f16(uint32_t slice, svfloat16x2_t zn)
__arm_streaming __arm_inout("za");
void_svadd_za16_vg1x4_f16(uint32_t slice, svfloat16x4_t zn)
__arm_streaming __arm_inout("za");
void_svsub_za16_vg1x2_f16(uint32_t slice, svfloat16x2_t zn)
__arm_streaming __arm_inout("za");
void_svsub_za16_vg1x4_f16(uint32_t slice, svfloat16x4_t zn)
__arm_streaming __arm_inout("za");

as well as the corresponding `bf16` variants.
Lukacma added a commit to llvm/llvm-project that referenced this pull request May 23, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics

```
svfloat32x2_t svcvt_f32[_f16_x2](svfloat16_t zn) __arm_streaming;
svfloat32x2_t svcvtl_f32[_f16_x2](svfloat16_t zn) __arm_streaming;

```
These are available only  if  __ARM_FEATURE_SME_F16F16 is enabled.

---------

Co-authored-by: Caroline Concatto <caroline.concatto@arm.com>
Lukacma added a commit to llvm/llvm-project that referenced this pull request May 23, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics:

```
  void svzero_za64_vg1x2(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg1x4(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg2x1(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg2x2(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg2x4(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg4x1(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg4x2(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg4x4(uint32_t slice)
    __arm_streaming __arm_inout("za");
```
Lukacma added a commit to llvm/llvm-project that referenced this pull request May 28, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics

```
  svbfloat16x2_t svclamp[_single_bf16_x2](svbfloat16x2_t zd, svbfloat16_t zn,
                                        svbfloat16_t zm)  __arm_streaming;

  svbfloat16x4_t svclamp[_single_bf16_x4](svbfloat16x4_t zd, svbfloat16_t zn,
                                        svbfloat16_t zm)  __arm_streaming;
```
These are available only  if __ARM_FEATURE_SME_B16B16 is enabled.
Lukacma added a commit to llvm/llvm-project that referenced this pull request May 29, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics

```
  svbfloat16x2_t svclamp[_single_bf16_x2](svbfloat16x2_t zd, svbfloat16_t zn,
                                        svbfloat16_t zm)  __arm_streaming;

  svbfloat16x4_t svclamp[_single_bf16_x4](svbfloat16x4_t zd, svbfloat16_t zn,
                                        svbfloat16_t zm)  __arm_streaming;
```
These are available only  if __ARM_FEATURE_SME_B16B16 is enabled.
Lukacma added a commit to llvm/llvm-project that referenced this pull request May 29, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics
```
svfloat32x2_t svcvt_f32[_f16_x2](svfloat16_t zn) __arm_streaming;
svfloat32x2_t svcvtl_f32[_f16_x2](svfloat16_t zn) __arm_streaming;

```
These are available only if __ARM_FEATURE_SME_F16F16 is enabled.

---------

Co-authored-by: Caroline Concatto <caroline.concatto@arm.com>
Lukacma added a commit to llvm/llvm-project that referenced this pull request May 29, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics:

  void svzero_za64_vg1x2(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg1x4(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg2x1(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg2x2(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg2x4(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg4x1(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg4x2(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg4x4(uint32_t slice)
    __arm_streaming __arm_inout("za");
vg0204 pushed a commit to vg0204/llvm-project that referenced this pull request May 29, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics

```
  svbfloat16x2_t svclamp[_single_bf16_x2](svbfloat16x2_t zd, svbfloat16_t zn,
                                        svbfloat16_t zm)  __arm_streaming;

  svbfloat16x4_t svclamp[_single_bf16_x4](svbfloat16x4_t zd, svbfloat16_t zn,
                                        svbfloat16_t zm)  __arm_streaming;
```
These are available only  if __ARM_FEATURE_SME_B16B16 is enabled.
vg0204 pushed a commit to vg0204/llvm-project that referenced this pull request May 29, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics

```
  svbfloat16x2_t svclamp[_single_bf16_x2](svbfloat16x2_t zd, svbfloat16_t zn,
                                        svbfloat16_t zm)  __arm_streaming;

  svbfloat16x4_t svclamp[_single_bf16_x4](svbfloat16x4_t zd, svbfloat16_t zn,
                                        svbfloat16_t zm)  __arm_streaming;
```
These are available only  if __ARM_FEATURE_SME_B16B16 is enabled.
vg0204 pushed a commit to vg0204/llvm-project that referenced this pull request May 29, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics
```
svfloat32x2_t svcvt_f32[_f16_x2](svfloat16_t zn) __arm_streaming;
svfloat32x2_t svcvtl_f32[_f16_x2](svfloat16_t zn) __arm_streaming;

```
These are available only if __ARM_FEATURE_SME_F16F16 is enabled.

---------

Co-authored-by: Caroline Concatto <caroline.concatto@arm.com>
vg0204 pushed a commit to vg0204/llvm-project that referenced this pull request May 29, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics:

  void svzero_za64_vg1x2(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg1x4(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg2x1(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg2x2(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg2x4(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg4x1(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg4x2(uint32_t slice)
    __arm_streaming __arm_inout("za");

  void svzero_za64_vg4x4(uint32_t slice)
    __arm_streaming __arm_inout("za");
CarolineConcatto added a commit to CarolineConcatto/llvm-project that referenced this pull request Jun 4, 2024
According to the specification in
ARM-software/acle#309 this adds the intrinsics

// Variants are also available for _za8_u8, _za16_s16, _za16_u16,
// _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
// _za64_s64, _za64_u64 and _za64_f64
svint8x2_t svreadz_hor_za8_s8_vg2(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");

// Variants are also available for _za8_u8, _za16_s16, _za16_u16,
// _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
// _za64_s64, _za64_u64 and _za64_f64
svint8x4_t svreadz_hor_za8_s8_vg4(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");

// Variants are also available for _za8_u8, _za16_s16, _za16_u16,
// _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
// _za64_s64, _za64_u64 and _za64_f64
svint8x2_t svreadz_ver_za8_s8_vg2(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");

// Variants are also available for _za8_u8, _za16_s16, _za16_u16,
// _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
// _za64_s64, _za64_u64 and _za64_f64
svint8x4_t svreadz_ver_za8_s8_vg4(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants