Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX-512 throughput improvement opportunties #83946

Open
Tracked by #77034
BruceForstall opened this issue Mar 26, 2023 · 5 comments
Open
Tracked by #77034

AVX-512 throughput improvement opportunties #83946

BruceForstall opened this issue Mar 26, 2023 · 5 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture
Milestone

Comments

@BruceForstall
Copy link
Member

The PR to enable EVEX support by default introduced some JIT throughput regressions. The comments in that PR analyzed the cause of these regressions and identified possible follow-up investigations and improvements.

This issue tracks recovering some of the TP regressions by investigating the proposed improvements or mitigations.

For example, LSRA has a number of places with the following loop structure:

for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; reg = REG_NEXT(reg))

and with AVX-512 available, there are an additional 16 SIMD registers and 8 opmask (k) registers, so these loops iterate more.

@BruceForstall BruceForstall added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture labels Mar 26, 2023
@BruceForstall BruceForstall added this to the 8.0.0 milestone Mar 26, 2023
@ghost
Copy link

ghost commented Mar 26, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

The PR to enable EVEX support by default introduced some JIT throughput regressions. The comments in that PR analyzed the cause of these regressions and identified possible follow-up investigations and improvements.

This issue tracks recovering some of the TP regressions by investigating the proposed improvements or mitigations.

For example, LSRA has a number of places with the following loop structure:

for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; reg = REG_NEXT(reg))

and with AVX-512 available, there are an additional 16 SIMD registers and 8 opmask (k) registers, so these loops iterate more.

Author: BruceForstall
Assignees: -
Labels:

area-CodeGen-coreclr, arch-avx512

Milestone: 8.0.0

@BruceForstall
Copy link
Member Author

Link: #83648

@kunalspathak
Copy link
Member

Related #83109

@JulieLeeMSFT
Copy link
Member

Assigning to @kunalspathak. Please feel free to reassign.

@kunalspathak
Copy link
Member

possible follow-up investigations and improvements

The LSRA TP improvements mentioned in #83648 (comment) and #83648 (comment) are for improving the for loop over registers and is being done in #85744. Other TP improvements need to happen in impImportBlockCode () for example which I am not sure will happen in .NET 8. Once #85744 is merged, I will move this to Future.

@kunalspathak kunalspathak modified the milestones: 8.0.0, Future May 6, 2023
@kunalspathak kunalspathak removed their assignment May 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture
Projects
None yet
Development

No branches or pull requests

3 participants