Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvm/cuda: Enable function inlining in CUDA path #2338

Merged
merged 1 commit into from
Mar 11, 2022

Conversation

jvesely
Copy link
Collaborator

@jvesely jvesely commented Mar 11, 2022

The threshold is empirically determined using the predator-prey model.
Number of function calls (static) changes:
model | before | inline0 | inline4 | inline6 | inline7 | inline8
p-p | 287 | 195 | 162 | 123 | 121 | 125
s-f | 413 | 221 | 206 | 156 | 155 | 164

This significantly reduces GPU stalls because of instruction fetch
(measured on P620)
(pp-MT/pp-Philox): 10.99%/18.21% -> 5.06%/8.46%
(sf-MT/sf-Philox): 36.72%/38.96% -> 19.1%/22.2%
as well as the amount of private data read/written.

This combines to ~15-20% (pp) improvement in kernel runtime.

Signed-off-by: Jan Vesely jan.vesely@rutgers.edu

The threshold is empirically determined using the predator-prey model.
Number of function calls (static) changes:
model | before | inline0 | inline4 | inline6 | inline7 | inline8
p-p   | 287    | 195     | 162     |  123    |  121    | 125
s-f   | 413    | 221     | 206     |  156    |  155    | 164

This significantly reduces GPU stalls because of instruction fetch
(measured on P620)
(pp-MT/pp-Philox): 10.99%/18.21% -> 5.06%/8.46%
(sf-MT/sf-Philox): 36.72%/38.96% -> 19.1%/22.2%
as well as the amount of private data read/written.

This combines to ~15-20% (pp) improvement in kernel runtime.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
@jvesely jvesely added compiler Runtime Compiler CUDA CUDA target for the runtime compiler labels Mar 11, 2022
@jvesely jvesely added this to In progress in LLVM Runtime Compiler via automation Mar 11, 2022
@github-actions
Copy link

This PR causes the following changes to the html docs (ubuntu-latest-3.7-x64):

No differences!

...

See CI logs for the full diff.

@jvesely jvesely merged commit 1e390ab into PrincetonUniversity:devel Mar 11, 2022
LLVM Runtime Compiler automation moved this from In progress to Done Mar 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler Runtime Compiler CUDA CUDA target for the runtime compiler
Projects
Development

Successfully merging this pull request may close these issues.

None yet

1 participant