Lowering subset of Vector512 methods for avx512. #82953

DeepakRajendrakumaran · 2023-03-03T18:54:47Z

This PR addresses the following issue : #80814. It currently has the following support.

Vector512.Load()
Vector512.LoadUnsafe()
Vector512.LoadAligned()
Vector512.LoadAlignedNonTemporal()

Vector512.Store()
Vector512.StoreUnsafe()
Vector512.StoreAligned()
Vector512.StoreAlignedNonTemporal()

ghost · 2023-03-03T18:55:01Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author:	DeepakRajendrakumaran
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `community-contribution`
Milestone:	-

BruceForstall · 2023-03-06T19:04:08Z

@DeepakRajendrakumaran Does this address a specific GitHub issue (work item) that is listed on #77034?

DeepakRajendrakumaran · 2023-03-06T19:09:39Z

@DeepakRajendrakumaran Does this address a specific GitHub issue (work item) that is listed on #77034?

This PR addresses part of this - #80814

I plan to add a few more including store here. So, this will have further commits this week. Will move this PR from draft -> live once it's ready for review

src/coreclr/jit/emitxarch.cpp

tannergooding · 2023-03-07T19:14:32Z

Looks to be a small TP regression, likely due to the more case statements and that changing how the jump tables or other dispatch can work.

It's possible some of these would be better via flags or something long term, but I don' think that's something we strictly have to handle as part of this PR.

Might be good as a separate PR and then taking this PR after if the up to +0.03% is a concern.

src/coreclr/jit/emitxarch.cpp

src/coreclr/jit/hwintrinsiclistxarch.h

tannergooding · 2023-03-08T14:30:45Z

CC. @dotnet/jit-contrib, @dotnet/avx512-contrib for secondary review and merging

BruceForstall · 2023-03-08T21:53:52Z

Looks to be a small TP regression, likely due to the more case statements and that changing how the jump tables or other dispatch can work.

Maybe because all the if (size == 64) checks come first and are always false / "not taken"?

DeepakRajendrakumaran · 2023-03-08T21:56:29Z

Looks to be a small TP regression, likely due to the more case statements and that changing how the jump tables or other dispatch can work.

Maybe because all the if (size == 64) checks come first and are always false / "not taken"?

I can switch this around and check again. I'll have another set of methods to lower pretty soon and this should be easy to check

BruceForstall · 2023-03-08T22:05:33Z

Note that our SPMI throughput measurements are based on collections taken on the Helix test machines. I think these are all AVX-512 capable machines so presumably hit the AVX2 paths. Seems like that's the one to optimize for in general, based on expected customer installed base.

Also note that TP measurements are done using a native JIT built without PGO. Theoretically, if all the code paths were hit in the training scenarios (unlikely), the native compiler would rearrange the order of branches.

Final note: the TP measurements are instruction counts, not cycle counts. It's as close a proxy to time-based throughput as we can reliably get.

DeepakRajendrakumaran · 2023-03-09T00:11:52Z

Note that our SPMI throughput measurements are based on collections taken on the Helix test machines. I think these are all AVX-512 capable machines so presumably hit the AVX2 paths. Seems like that's the one to optimize for in general, based on expected customer installed base.

Also note that TP measurements are done using a native JIT built without PGO. Theoretically, if all the code paths were hit in the training scenarios (unlikely), the native compiler would rearrange the order of branches.

Final note: the TP measurements are instruction counts, not cycle counts. It's as close a proxy to time-based throughput as we can reliably get.

I flipped the condition. For eg.

ghost added the community-contribution Indicates that the PR has been added by a community member label Mar 3, 2023

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 3, 2023

DeepakRajendrakumaran force-pushed the LoadLoweringEvex branch from c81ec4a to d084305 Compare March 6, 2023 17:15

Load(), LoadUnsafe(), LoadAligned(), LoadAlignedNonTemporal()

d084305

DeepakRajendrakumaran changed the title ~~Lowering Vector512.Load* for avx512~~ Lowering subset of Vector512 methods for avx512. Mar 6, 2023