Skip to content

Adding additional 8 eGPR. #113988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 30, 2025
Merged

Conversation

DeepakRajendrakumaran
Copy link
Contributor

@DeepakRajendrakumaran DeepakRajendrakumaran commented Mar 27, 2025

Overview

This PR does the following 2 things

  1. Adds an additional 8 eGPR taking the total number of x86 GPR to 32 and total registers to 72
  2. Adds a way to skip unused registers if iterating through RegRecords.
    on x64 without APX for instance, REG16-REG_31 are unused. We iterate through this processing them currently. This allows us to just skip these

TPDIFF impact

From CI for this PR after last few PRs to improve tpdiff(See here for numbers before)

windows x64

Overall (+1.67% to +2.54%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +2.07%
benchmarks.run.windows.x64.checked.mch +1.67%
benchmarks.run_pgo.windows.x64.checked.mch +1.83%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch +1.67%
coreclr_tests.run.windows.x64.checked.mch +2.54%
libraries.crossgen2.windows.x64.checked.mch +2.18%
libraries.pmi.windows.x64.checked.mch +1.82%
libraries_tests.run.windows.x64.Release.mch +2.20%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +1.72%
realworld.run.windows.x64.checked.mch +1.71%
smoke_tests.nativeaot.windows.x64.checked.mch +1.75%
MinOpts (+3.14% to +5.71%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +4.34%
benchmarks.run.windows.x64.checked.mch +4.60%
benchmarks.run_pgo.windows.x64.checked.mch +4.28%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch +4.57%
coreclr_tests.run.windows.x64.checked.mch +3.49%
libraries.crossgen2.windows.x64.checked.mch +4.13%
libraries.pmi.windows.x64.checked.mch +3.14%
libraries_tests.run.windows.x64.Release.mch +4.40%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +3.71%
realworld.run.windows.x64.checked.mch +5.71%
smoke_tests.nativeaot.windows.x64.checked.mch +5.01%
FullOpts (+1.63% to +2.18%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +1.68%
benchmarks.run.windows.x64.checked.mch +1.67%
benchmarks.run_pgo.windows.x64.checked.mch +1.66%
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch +1.67%
coreclr_tests.run.windows.x64.checked.mch +1.84%
libraries.crossgen2.windows.x64.checked.mch +2.18%
libraries.pmi.windows.x64.checked.mch +1.82%
libraries_tests.run.windows.x64.Release.mch +1.63%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +1.68%
realworld.run.windows.x64.checked.mch +1.70%
smoke_tests.nativeaot.windows.x64.checked.mch +1.75%

Alternate test numbers from running superpmi directly and using the 'Total Time':

% Regression = ((Diff Mean - Base Mean)/ Base Mean) * 100)
image

This reverts commit cc6d454.
@DeepakRajendrakumaran DeepakRajendrakumaran changed the title Draft : Reduce TP regression in allocateRegistersMinimal() Adding additional 8 eGPR. Apr 21, 2025
@DeepakRajendrakumaran DeepakRajendrakumaran marked this pull request as ready for review April 21, 2025 14:47
@Copilot Copilot AI review requested due to automatic review settings April 21, 2025 14:47
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

Files not reviewed (11)
  • src/coreclr/jit/compiler.cpp: Language not supported
  • src/coreclr/jit/emit.h: Language not supported
  • src/coreclr/jit/emitxarch.cpp: Language not supported
  • src/coreclr/jit/emitxarch.h: Language not supported
  • src/coreclr/jit/lsra.cpp: Language not supported
  • src/coreclr/jit/lsra.h: Language not supported
  • src/coreclr/jit/lsrabuild.cpp: Language not supported
  • src/coreclr/jit/register.h: Language not supported
  • src/coreclr/jit/target.h: Language not supported
  • src/coreclr/jit/targetamd64.h: Language not supported
  • src/coreclr/jit/unwindamd64.cpp: Language not supported

@DeepakRajendrakumaran
Copy link
Contributor Author

@BruceForstall @jakobbotsch @kunalspathak This PR is ready for review

@BruceForstall
Copy link
Contributor

Diffs

@kunalspathak
Copy link
Member

Diffs

@DeepakRajendrakumaran - can you double check why TP of arm64 got affected?

image

@DeepakRajendrakumaran
Copy link
Contributor Author

DeepakRajendrakumaran commented Apr 21, 2025

Diffs

@DeepakRajendrakumaran - can you double check why TP of arm64 got affected?

image

This commit affects all targets but should be negligible for non x64 - commit

This is to skip over eGPR for non APX x64 machines when iterating over RegisterRecord. This can be extended to arm64 if there are any registers needed to be skipped over with/without features for arm64. Will need to add something similar - Change for x86

This reverts commit 963c341.
@BruceForstall BruceForstall added the apx Related to the Intel Advanced Performance Extensions (APX) label Apr 28, 2025
@kunalspathak
Copy link
Member

/azp run runtime-coreclr jitstressregs, runtime-coreclr libraries-jitstressregs

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@kunalspathak
Copy link
Member

Just to note down latest TP numbers:

image

@kunalspathak
Copy link
Member

@DeepakRajendrakumaran - I know you have done lot of work to improve TP. I am wondering did you perform analysis of the remaining regressions and can share where the cost is coming from?

@kunalspathak
Copy link
Member

/azp run runtime-coreclr libraries-jitstress2-jitstressregs

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@DeepakRajendrakumaran
Copy link
Contributor Author

DeepakRajendrakumaran commented Apr 30, 2025

@DeepakRajendrakumaran - I know you have done lot of work to improve TP. I am wondering did you perform analysis of the remaining regressions and can share where the cost is coming from?

I ran this for libraries_tests.run as a sample

The tpdiff for libraries_tests.run is as follows locally

Overall (+2.12%)
Collection PDIFF
libraries_tests.run.windows.x64.Release.mch +2.12%
MinOpts (+4.46%)
Collection PDIFF
libraries_tests.run.windows.x64.Release.mch +4.46%
FullOpts (+1.58%)
Collection PDIFF
libraries_tests.run.windows.x64.Release.mch +1.58%

image

I did try some changes with separating out registers to free in allocateRegisters() and allocateRegistersMinimal(). Handling delay registers made it a little tricky. It could be something someone could circle back to later.

@kunalspathak
Copy link
Member

Failures are timeout and #114638

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@kunalspathak kunalspathak merged commit 3c34fc1 into dotnet:main Apr 30, 2025
135 of 145 checks passed
@BruceForstall
Copy link
Contributor

@DeepakRajendrakumaran Thanks for the hard work and persistence to get this in.

@DeepakRajendrakumaran
Copy link
Contributor Author

@DeepakRajendrakumaran Thanks for the hard work and persistence to get this in.

Thanks to you, @kunalspathak and others involved in the discussions for helping me make the necessary tweaks to get this in. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apx Related to the Intel Advanced Performance Extensions (APX) area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants