Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[main] Add Alpha support for SME #188

Merged
merged 6 commits into from
Aug 19, 2022
Merged

Conversation

rsandifo-arm
Copy link
Contributor

@rsandifo-arm rsandifo-arm commented Apr 1, 2022

Checklist: (mark with X those which apply)

  • If an issue reporting the bug exists, I have mentioned it in the
    PR (do not bother creating the issue if all you want to do is
    fixing the bug yourself).
  • I have added/updated the SPDX-FileCopyrightText lines on top
    of any file I have edited. Format is SPDX-FileCopyrightText: Copyright {year} {entity or name} <{contact informations}>
    (Please update existing copyright lines if applicable. You can
    specify year ranges with hyphen , as in 2017-2019, and use
    commas to separate gaps, as in 2018-2020, 2022).
  • I have updated the Copyright section of the sources of the
    specification I have edited (this will show up in the text
    rendered in the PDF and other output format supported). The
    format is the same described in the previous item.
  • I have run the CI scripts (if applicable, as they might be
    tricky to set up on non-*nix machines). The sequence can be
    found in the contribution
    guidelines
    . Don't
    worry if you cannot run these scripts on your machine, your
    patch will be automatically checked in the Actions of the pull
    request.
  • I have added an item that describes the changes I have
    introduced in this PR in the section Changes for next
    release
    of the section Change Control/Document history
    of the document. Create Changes for next release if it does
    not exist. Notice that changes that are not modifying the
    content and rendering of the specifications (both HTML and PDF)
    do not need to be listed.
  • When modifying content and/or its rendering, I have checked the
    correctness of the result in the PDF output (please refer to the
    instructions on how to build the PDFs
    locally
    ).
  • The variable draftversion is set to true in the YAML header
    of the sources of the specifications I have modified.
  • Please DO NOT add my GitHub profile to the list of contributors
    in the README page of the project.

Copy link
Contributor

@georges-arm georges-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only managed a very brief read through, looks good. Found one small typo.

main/acle.md Outdated Show resolved Hide resolved
Copy link
Contributor

@fpetrogalli fpetrogalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rsandifo-arm - mostly minor things!

Francesco

main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Show resolved Hide resolved
Copy link
Contributor

@fpetrogalli fpetrogalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rsandifo-arm!

@fpetrogalli fpetrogalli added this to the 2022Q2 milestone May 4, 2022
The intrinsics in this section have the following properties in common:

* Every argument named `tile`, `slice_offset` or `tile_mask` must
be an integer constant expression.
Copy link

@bryanpkc bryanpkc May 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just came across this ACLE draft recently. What was the rationale for making it the user's responsibility to index into ZA storage explicitly, instead of creating sizeless types analogous to svint32_t, svfloat64_t, etc.? We have done some work in that direction, and extended ACLE to provide types such as smint32_t and smfloat64_t. Did you consider that approach at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Thanks for the pointer, this looks like really nice work. Yeah, we did consider using explicit C/C++ objects to represent the arrays, but in the end, we thought it would be better for the low-level ACLE to maintain a more direct mapping to the instructions. There were several reasons for this:

  • It seemed likely that programmers writing code specifically for SME would be aware of the number of available tiles for a given element type, would know how the tiles are arranged, and would want to use that knowledge to hide accumulation latencies. We therefore wanted to let the programmer specify tile numbers directly if they wanted to. In other words, we thought SME programmers would be thinking of ZA as a whole (unlike SVE programmers, who would think of vectors as individual objects rather than as 1 out of 32 vector registers).
  • Although compilers are improving all the time, there are still cases where they make bad RA decisions for vector code, especially when inputs and outputs need to be tied. The danger with treating matrices like vectors is that the cost of any compiler mistakes would be O(VL) rather than O(1). And it can be quite difficult to work around these issues in the source code. This might be a particular issue when “reinterpreting” 8 64-bit tiles as 1 8-bit tile (for example), since the 8 tiles would need to be in a particular order.
  • If the tiles were normal C/C++ objects, a function wouldn't be able to return a scalable matrix object (in ZA) at the same time as returning a normal object (in GPRs or FPRs) unless we provided some way of putting scalable objects into structures/classes. That would be a good thing long term, but it seems difficult to do, since C++ has a fundamental assumption that sizeof is a constant expression.
  • Complex operations might be split over several subroutines. We thought that, in those cases, the routines would be sharing ZA as a whole, so having a single arm_shared_za attribute seemed more convenient than having to pass multiple tiles around.

However, the above is all about the low-level ACLE interface. I don't think this has to be an “either/or”. It would be useful to have higher-level features too, where the compiler does more of the work. Also, having scalable matrix types in LLVM IR sounds like it should be a good fit for Florian's Clang matrix extensions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rsandifo-arm, thank you for the thoughtful answer. I agree with you that this doesn't have to be an "either/or"; perhaps it's even possible to implement this ACLE draft on top of our work. You have raised very good points about register allocation decision and reinterpretation; we found that we would need to teach LLVM's register allocator about sub-tiles in SME to make it do a better job, and if we were to support a reinterpretation intrinsic, it would put even more constraints on RA.

About a function's inability to return a scalable matrix object at the same time as returning a normal object, wouldn't the same limitation apply to scalable vector objects as well? Or has that problem already been solved?

Copy link
Contributor Author

@rsandifo-arm rsandifo-arm May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bryanpkc : No, you're right, the same restriction applies to scalable vector objects as well. The x2_t, x3_t and x4_t types make it possible to return up to 4 vectors at a time (although all vectors need to be the same type, which would introduce some awkward reinterprets if someone wanted to return, say, a vector of floats and a vector of uint32_ts). But there is currently no way of returning both a scalable vector and a scalar by value, or even a scalable vector and a scalable predicate. It's an unfortunate restriction.

It is of course possible to return things by reference, but it would be better if that wasn't necessary.

(edited to fix a typo: s/time/type/)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi both.

@bryanpkc - thank you for raising your concerns. Please correct if I am wrong, but my understanding is that you are happy with explanations that @rsandifo-arm provided in their answer. If that is the case, are you happy for us to proceed with merging the specs for the SME ACLE as they are in this patch? We will be open to further improvements, if needed.

@rsandifo-arm - your answer is quite useful as it provides a justification of the current shape of the SME ACLE. Do you mind adding it to and SME-specific design document in the design_documents folder?

Kind regards,

Francesco

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fpetrogalli Sorry for the late response. I am happy with the explanations from @rsandifo-arm and have no more comment on this point.

@fpetrogalli fpetrogalli removed this from the 2022Q2 milestone May 11, 2022
main/acle.md Outdated Show resolved Hide resolved
Copy link
Contributor

@sallyarmneale sallyarmneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor stylistic/formatting changes

main/acle.md Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
main/acle.md Outdated Show resolved Hide resolved
@fpetrogalli
Copy link
Contributor

@all-contributors please add @sdesmalen-arm for review.

@allcontributors
Copy link
Contributor

@fpetrogalli

I've put up a pull request to add @sdesmalen-arm! 🎉

@fpetrogalli
Copy link
Contributor

@sagarkulkarni19, @pthariensflame , @bryanpkc - thank you for the comments. Are you happy for me to add you as a contributor to the list of all-contributors for your review?

@bryanpkc
Copy link

Are you happy for me to add you as a contributor to the list of all-contributors for your review?

Sure.

@sagarkulkarni19
Copy link

@sagarkulkarni19, @pthariensflame , @bryanpkc - thank you for the comments. Are you happy for me to add you as a contributor to the list of all-contributors for your review?

Sure, sounds good.

@pthariensflame
Copy link
Contributor

@sagarkulkarni19, @pthariensflame , @bryanpkc - thank you for the comments. Are you happy for me to add you as a contributor to the list of all-contributors for your review?

I'm happy to be there, yes!

@fpetrogalli
Copy link
Contributor

fpetrogalli commented Jun 30, 2022

@all-contributors please add @bryanpkc for review.

@allcontributors
Copy link
Contributor

@fpetrogalli

I've put up a pull request to add @bryanpkc! 🎉

@fpetrogalli
Copy link
Contributor

@all-contributors please add @pthariensflame for review.

@allcontributors
Copy link
Contributor

@fpetrogalli

I've put up a pull request to add @pthariensflame! 🎉

@fpetrogalli
Copy link
Contributor

@all-contributors please add @sagarkulkarni19 for review.

@allcontributors
Copy link
Contributor

@fpetrogalli

I've put up a pull request to add @sagarkulkarni19! 🎉

@sallyarmneale
Copy link
Contributor

Looks good to me.

@rsandifo-arm
Copy link
Contributor Author

I've just updated the AAPCS64 PR to change the way that streaming-compatible functions are handled. Rather than have a special parameter that contains PSTATE.SM, the proposal is instead to have a utility function called __arm_sme_state that returns the current state. This function also returns whether the thread has access to SME and TPIDR2_EL0.

It doesn't seem appropriate to make __arm_sme_state directly callable from C, since the values of PSTATE.ZA and TPIDR2_EL0 are handled by the compiler and do not have values that C/C++ code could rely on. Also, as far as the ACLE is concerned, PSTATE.SM is a property of the abstract machine, whereas __arm_sme_state would directly return the underlying system register. __arm_in_streaming_mode already provides the C/C++ view of PSTATE.SM.

However, the information about whether the thread has access to SME should be useful to C/C++ code, so I've added an __arm_has_sme intrinsic for that part.

functions do. See [[AAPCS64]](#AAPCS64) for more details about
private-ZA interfaces.

A function definition with this attribute is [ill-formed](#ill-formed)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a caller with "arm_new_za" attribute call a callee with "arm_new_za" attribute, when both the function are part of object code's ABI ? In such a case both the functions will have "private-ZA interface" and with lazy save enabled, the TPIDR2_EL0 will be overwritten and the lazy-save functionality is lost. So shouldn't there be a restriction on such callers and callee?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a call from one function with an arm_new_za attribute to another function with an arm_new_za attribute is fine. Using arm_new_za is an internal implementation choice than doesn't affect the function's ABI: the attribute simply indicates that the function uses ZA to store state (and that that state isn't shared with callers).

When a function F has an arm_new_za attribute, the compiler must make F commit any uncommitted lazy save before F stores new data into ZA.

@rsandifo-arm rsandifo-arm merged commit 634227e into ARM-software:main Aug 19, 2022
@rsandifo-arm rsandifo-arm deleted the sme branch August 19, 2022 14:24
@vhscampos vhscampos added this to the 2022Q4 milestone Nov 14, 2022
Copy link
Contributor

@sallyarmneale sallyarmneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 very minor suggestions

> double f() { return another_func(1.0, 2, "oranges"); }
> ```
>
> Functions like `some_func` and `another_func` are referred to as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like -> such as

>
> Functions like `some_func` and `another_func` are referred to as
> (K&R-style) “unprototyped” functions. The first C standard categorized
> them as an obsolescent feature and C18 removed all remaining support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

them -> these functions

switches should be avoided for performance reasons.

* A function provides a public API that is specific to SME.
Again, callers to such functions would want to avoid the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to should?

@vhscampos
Copy link
Member

Latest comments addressed in #233

sdesmalen-arm added a commit to llvm/llvm-project that referenced this pull request Aug 8, 2023
This patch adds all the language-level function keywords defined in:

  ARM-software/acle#188 (merged)
  ARM-software/acle#261 (update after D148700 landed)

The keywords are used to control PSTATE.ZA and PSTATE.SM, which are
respectively used for enabling the use of the ZA matrix array and Streaming
mode. This information needs to be available on call sites, since the use
of ZA or streaming mode may have to be enabled or disabled around the
call-site (depending on the IR attributes set on the caller and the
callee). For calls to functions from a function pointer, there is no IR
declaration available, so the IR attributes must be added explicitly to the
call-site.

With the exception of '__arm_locally_streaming' and '__arm_new_za' the
information is part of the function's interface, not just the function
definition, and thus needs to be propagated through the
FunctionProtoType::ExtProtoInfo.

This patch adds the defintions of these keywords, as well as codegen and
semantic analysis to ensure conversions between function pointers are valid
and that no conflicting keywords are set. For example, '__arm_streaming'
and '__arm_streaming_compatible' are mutually exclusive.

Differential Revision: https://reviews.llvm.org/D127762
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants