-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[main] Add Alpha support for SME #188
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only managed a very brief read through, looks good. Found one small typo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rsandifo-arm - mostly minor things!
Francesco
e1e6f52
to
ce878cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rsandifo-arm!
The intrinsics in this section have the following properties in common: | ||
|
||
* Every argument named `tile`, `slice_offset` or `tile_mask` must | ||
be an integer constant expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just came across this ACLE draft recently. What was the rationale for making it the user's responsibility to index into ZA storage explicitly, instead of creating sizeless types analogous to svint32_t
, svfloat64_t
, etc.? We have done some work in that direction, and extended ACLE to provide types such as smint32_t
and smfloat64_t
. Did you consider that approach at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Thanks for the pointer, this looks like really nice work. Yeah, we did consider using explicit C/C++ objects to represent the arrays, but in the end, we thought it would be better for the low-level ACLE to maintain a more direct mapping to the instructions. There were several reasons for this:
- It seemed likely that programmers writing code specifically for SME would be aware of the number of available tiles for a given element type, would know how the tiles are arranged, and would want to use that knowledge to hide accumulation latencies. We therefore wanted to let the programmer specify tile numbers directly if they wanted to. In other words, we thought SME programmers would be thinking of ZA as a whole (unlike SVE programmers, who would think of vectors as individual objects rather than as 1 out of 32 vector registers).
- Although compilers are improving all the time, there are still cases where they make bad RA decisions for vector code, especially when inputs and outputs need to be tied. The danger with treating matrices like vectors is that the cost of any compiler mistakes would be O(VL) rather than O(1). And it can be quite difficult to work around these issues in the source code. This might be a particular issue when “reinterpreting” 8 64-bit tiles as 1 8-bit tile (for example), since the 8 tiles would need to be in a particular order.
- If the tiles were normal C/C++ objects, a function wouldn't be able to return a scalable matrix object (in ZA) at the same time as returning a normal object (in GPRs or FPRs) unless we provided some way of putting scalable objects into structures/classes. That would be a good thing long term, but it seems difficult to do, since C++ has a fundamental assumption that sizeof is a constant expression.
- Complex operations might be split over several subroutines. We thought that, in those cases, the routines would be sharing ZA as a whole, so having a single
arm_shared_za
attribute seemed more convenient than having to pass multiple tiles around.
However, the above is all about the low-level ACLE interface. I don't think this has to be an “either/or”. It would be useful to have higher-level features too, where the compiler does more of the work. Also, having scalable matrix types in LLVM IR sounds like it should be a good fit for Florian's Clang matrix extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rsandifo-arm, thank you for the thoughtful answer. I agree with you that this doesn't have to be an "either/or"; perhaps it's even possible to implement this ACLE draft on top of our work. You have raised very good points about register allocation decision and reinterpretation; we found that we would need to teach LLVM's register allocator about sub-tiles in SME to make it do a better job, and if we were to support a reinterpretation intrinsic, it would put even more constraints on RA.
About a function's inability to return a scalable matrix object at the same time as returning a normal object, wouldn't the same limitation apply to scalable vector objects as well? Or has that problem already been solved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bryanpkc : No, you're right, the same restriction applies to scalable vector objects as well. The x2_t
, x3_t
and x4_t
types make it possible to return up to 4 vectors at a time (although all vectors need to be the same type, which would introduce some awkward reinterprets if someone wanted to return, say, a vector of float
s and a vector of uint32_t
s). But there is currently no way of returning both a scalable vector and a scalar by value, or even a scalable vector and a scalable predicate. It's an unfortunate restriction.
It is of course possible to return things by reference, but it would be better if that wasn't necessary.
(edited to fix a typo: s/time/type/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi both.
@bryanpkc - thank you for raising your concerns. Please correct if I am wrong, but my understanding is that you are happy with explanations that @rsandifo-arm provided in their answer. If that is the case, are you happy for us to proceed with merging the specs for the SME ACLE as they are in this patch? We will be open to further improvements, if needed.
@rsandifo-arm - your answer is quite useful as it provides a justification of the current shape of the SME ACLE. Do you mind adding it to and SME-specific design document in the design_documents
folder?
Kind regards,
Francesco
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fpetrogalli Sorry for the late response. I am happy with the explanations from @rsandifo-arm and have no more comment on this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor stylistic/formatting changes
@all-contributors please add @sdesmalen-arm for review. |
I've put up a pull request to add @sdesmalen-arm! 🎉 |
@sagarkulkarni19, @pthariensflame , @bryanpkc - thank you for the comments. Are you happy for me to add you as a contributor to the list of all-contributors for your review? |
Sure. |
Sure, sounds good. |
I'm happy to be there, yes! |
@all-contributors please add @bryanpkc for review. |
I've put up a pull request to add @bryanpkc! 🎉 |
@all-contributors please add @pthariensflame for review. |
I've put up a pull request to add @pthariensflame! 🎉 |
@all-contributors please add @sagarkulkarni19 for review. |
I've put up a pull request to add @sagarkulkarni19! 🎉 |
Looks good to me. |
I've just updated the AAPCS64 PR to change the way that streaming-compatible functions are handled. Rather than have a special parameter that contains PSTATE.SM, the proposal is instead to have a utility function called It doesn't seem appropriate to make However, the information about whether the thread has access to SME should be useful to C/C++ code, so I've added an |
functions do. See [[AAPCS64]](#AAPCS64) for more details about | ||
private-ZA interfaces. | ||
|
||
A function definition with this attribute is [ill-formed](#ill-formed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a caller with "arm_new_za" attribute call a callee with "arm_new_za" attribute, when both the function are part of object code's ABI ? In such a case both the functions will have "private-ZA interface" and with lazy save enabled, the TPIDR2_EL0 will be overwritten and the lazy-save functionality is lost. So shouldn't there be a restriction on such callers and callee?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a call from one function with an arm_new_za
attribute to another function with an arm_new_za
attribute is fine. Using arm_new_za
is an internal implementation choice than doesn't affect the function's ABI: the attribute simply indicates that the function uses ZA to store state (and that that state isn't shared with callers).
When a function F has an arm_new_za
attribute, the compiler must make F commit any uncommitted lazy save before F stores new data into ZA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 very minor suggestions
> double f() { return another_func(1.0, 2, "oranges"); } | ||
> ``` | ||
> | ||
> Functions like `some_func` and `another_func` are referred to as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like -> such as
> | ||
> Functions like `some_func` and `another_func` are referred to as | ||
> (K&R-style) “unprototyped” functions. The first C standard categorized | ||
> them as an obsolescent feature and C18 removed all remaining support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
them -> these functions
switches should be avoided for performance reasons. | ||
|
||
* A function provides a public API that is specific to SME. | ||
Again, callers to such functions would want to avoid the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to should?
Latest comments addressed in #233 |
This patch adds all the language-level function keywords defined in: ARM-software/acle#188 (merged) ARM-software/acle#261 (update after D148700 landed) The keywords are used to control PSTATE.ZA and PSTATE.SM, which are respectively used for enabling the use of the ZA matrix array and Streaming mode. This information needs to be available on call sites, since the use of ZA or streaming mode may have to be enabled or disabled around the call-site (depending on the IR attributes set on the caller and the callee). For calls to functions from a function pointer, there is no IR declaration available, so the IR attributes must be added explicitly to the call-site. With the exception of '__arm_locally_streaming' and '__arm_new_za' the information is part of the function's interface, not just the function definition, and thus needs to be propagated through the FunctionProtoType::ExtProtoInfo. This patch adds the defintions of these keywords, as well as codegen and semantic analysis to ensure conversions between function pointers are valid and that no conflicting keywords are set. For example, '__arm_streaming' and '__arm_streaming_compatible' are mutually exclusive. Differential Revision: https://reviews.llvm.org/D127762
Checklist: (mark with
X
those which apply)PR (do not bother creating the issue if all you want to do is
fixing the bug yourself).
SPDX-FileCopyrightText
lines on topof any file I have edited. Format is
SPDX-FileCopyrightText: Copyright {year} {entity or name} <{contact informations}>
(Please update existing copyright lines if applicable. You can
specify year ranges with hyphen , as in
2017-2019
, and usecommas to separate gaps, as in
2018-2020, 2022
).Copyright
section of the sources of thespecification I have edited (this will show up in the text
rendered in the PDF and other output format supported). The
format is the same described in the previous item.
tricky to set up on non-*nix machines). The sequence can be
found in the contribution
guidelines. Don't
worry if you cannot run these scripts on your machine, your
patch will be automatically checked in the Actions of the pull
request.
introduced in this PR in the section Changes for next
release of the section Change Control/Document history
of the document. Create Changes for next release if it does
not exist. Notice that changes that are not modifying the
content and rendering of the specifications (both HTML and PDF)
do not need to be listed.
correctness of the result in the PDF output (please refer to the
instructions on how to build the PDFs
locally).
draftversion
is set totrue
in the YAML headerof the sources of the specifications I have modified.
in the README page of the project.