Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Avoid mask<->vector optimization for masks used in unhandled ways #110307

Merged
merged 3 commits into from
Dec 3, 2024

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Dec 2, 2024

When a local is used as a return buffer it is not address exposed, so the address-exposure check was not sufficient. Add checks for LCL_ADDR, LCL_FLD and STORE_LCL_FLD to make sure any use of a mask local that is not converted disqualifies it from participating in the optimization.

Also avoid doing some work for locals that are not SIMD/mask typed (common case). Previously we would do some unnecessary hash table lookups and other things in these cases.

Fix #110306

When a local is used as a return buffer it is not address exposed, so
the address-exposure check was not sufficient. Add checks for
`LCL_ADDR`, `LCL_FLD` and `STORE_LCL_FLD` to make sure any use of a mask
local that is not converted disqualifies it from participating in the
optimization.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 2, 2024
@amanasifkhalid
Copy link
Member

I haven't looked into them, but I suspect this will also fix the "Assert unreached()" failures from the latest Antigen run.

@jakobbotsch jakobbotsch marked this pull request as ready for review December 2, 2024 16:18
@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @a74nh @tannergooding

No diffs

@tannergooding
Copy link
Member

Do we not have an outerloop job testing SVE under the AltJit?

Is that why these didn't get caught after the initial PR went in, and only after the x64 support came online?

@kunalspathak
Copy link
Member

Do we not have an outerloop job testing SVE under the AltJit?

Is that why these didn't get caught after the initial PR went in, and only after the x64 support came online?

The existing pipeline is broken because it was not testing the right thing. I had #107475 but I saw some other non-sve failures which I didn't get time to investigate. We would get Cobalt machines in CI added shortly, so going forward, it will get caught naturally.

@jakobbotsch
Copy link
Member Author

Is that why these didn't get caught after the initial PR went in, and only after the x64 support came online?

This doesn't repro on arm64 because SIMD16 is always returned by value there, which the existing checks handled. On xarch we have the possibility of the vector being defined as a return buffer, where it doesn't end up address exposed. The checks weren't sufficient for that.

@a74nh
Copy link
Contributor

a74nh commented Dec 2, 2024

When a local is used as a return buffer it is not address exposed, so the address-exposure check was not sufficient. Add checks for LCL_ADDR, LCL_FLD and STORE_LCL_FLD to make sure any use of a mask local that is not converted disqualifies it from participating in the optimization.

Could this appear on Arm64? Interesting as I never saw any failures myself on Fuzzlyn Arm64.

@tannergooding
Copy link
Member

This doesn't repro on arm64 because SIMD16 is always returned by value there

I don't believe we do this for cases like struct S { Vector128<T> field; } or other HVA qualifying structs today; it is a known gap/mismatch in the ABI handling.

@jakobbotsch
Copy link
Member Author

This doesn't repro on arm64 because SIMD16 is always returned by value there

I don't believe we do this for cases like struct S { Vector128<T> field; } or other HVA qualifying structs today; it is a known gap/mismatch in the ABI handling.

16 byte structs on ARM64 are returned in two registers, not via return buffer.

@jakobbotsch
Copy link
Member Author

Could this appear on Arm64? Interesting as I never saw any failures myself on Fuzzlyn Arm64.

The LCL_FLD and STORE_LCL_FLD cases could probably occur, but it requires some odd looking C# to make them appear.

@tannergooding
Copy link
Member

16 byte structs on ARM64 are returned in two registers, not via return buffer.

For some classifications, not all classifications. There are multiple factors that influence it including total size, whether it is all integer, all floating-point, all SIMD, or mixed data types, etc.

The classifications are effectively Composite Type - Known Size, Compositive Type - Unknown Size, HFA, HVA. It is only Composite Type - Known size where the size is no more than 16 bytes that it may get passed/returned in 2 general purpose registers. An HFA/HVA may instead be passed in 1-4 floating-point/SIMD registers, while all other structs are effectively passed by return buffer. Similarly other special considerations (such as copy constructors in C++) may change the classification.

But regardless, my point was that I imagine this general issue should've been reproducible on SVE, it is not unique to x64

@jakobbotsch
Copy link
Member Author

But regardless, my point was that I imagine this general issue should've been reproducible on SVE, it is not unique to x64

If we had larger SIMD types on arm64 than TYP_SIMD16, then yes, it would presumably be reproducible. But as it currently is, since we only end up with TYP_SIMD16 it is never passed by return buffer and we never see this there.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amanasifkhalid
Copy link
Member

/azp run Antigen

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch
Copy link
Member Author

/ba-g Helix work item was dead-lettered

@jakobbotsch jakobbotsch merged commit 05fa881 into dotnet:main Dec 3, 2024
110 of 122 checks passed
@jakobbotsch jakobbotsch deleted the fix-110306 branch December 3, 2024 08:42
eduardo-vp pushed a commit to eduardo-vp/runtime that referenced this pull request Dec 5, 2024
dotnet#110307)

When a local is used as a return buffer it is not address exposed, so
the address-exposure check was not sufficient. Add checks for
`LCL_ADDR`, `LCL_FLD` and `STORE_LCL_FLD` to make sure any use of a mask
local that is not converted disqualifies it from participating in the
optimization.

Also avoid doing some work for locals that are not SIMD/mask typed
(common case). Previously we would do some unnecessary hash table
lookups and other things in these cases.
mikelle-rogers pushed a commit to mikelle-rogers/runtime that referenced this pull request Dec 10, 2024
dotnet#110307)

When a local is used as a return buffer it is not address exposed, so
the address-exposure check was not sufficient. Add checks for
`LCL_ADDR`, `LCL_FLD` and `STORE_LCL_FLD` to make sure any use of a mask
local that is not converted disqualifies it from participating in the
optimization.

Also avoid doing some work for locals that are not SIMD/mask typed
(common case). Previously we would do some unnecessary hash table
lookups and other things in these cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT: Assertion failed 'newLclValue.BothDefined()' during 'Do value numbering'
5 participants