llvm/cuda: Optimize memory copies in memory execution #2328

jvesely · 2022-02-21T04:14:09Z

Restrict 'has_initializers' parameter to mechanisms. This leaves most functions with parameters that are modified by parameter ports.
Don't copy base parameters to private parameter space if all of them will be replaced by parameter port outputs.
Use mechanism base params structure in places that don't modify parameters via parameter ports (e.g. running input/output ports). Unlike the modified result, which is private per evaluation, the base structure is shared.
This improves cache performance and memory utilization for both CPUs and GPUs.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

…echanism In reality we only use it in RTM. Reduces space needed for RO parameters: predator-prey: 7.73kB -> 5.96kB stability-flexibility: 8.84kB -> 5.75kB Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

…rwritten Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

… parameters The original approach created a copy to modulate mechanism parameters (if needed to apply mech parameter ports) and then passed this copy to internal function invocation. Invoking internal functions would then create more copies (if needed) to apply parameter ports of function parameters. The new approach passes the mechanism base parameters, so the copies of function parameters can be made from the original. The overall amount of copied data is the same, but now the same shared source is used for all copies This is especially beneficial for GPUs that place the shared parameters in high BW on-chip memories. The observed effect for stability-flexibility model is ~20% reduction in the total amount of data read from thread private memory, Resulting in ~10% improvement in kernel execution time. Measured on P620 GPU. Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

github-actions · 2022-02-21T04:24:40Z

This PR causes the following changes to the html docs (ubuntu-latest-3.7-x64):

No differences!

...

See CI logs for the full diff.

jvesely added 4 commits February 20, 2022 14:18

llvm, component: codestyle

906153d

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

llvm, component: Drop 'has_initializers' from components other than m…

e0d2cf5

…echanism In reality we only use it in RTM. Reduces space needed for RO parameters: predator-prey: 7.73kB -> 5.96kB stability-flexibility: 8.84kB -> 5.75kB Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

llvm, mechanism: Do not copy base parameter values if all will be ove…

8f3770d

…rwritten Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

jvesely added compiler Runtime Compiler CUDA CUDA target for the runtime compiler labels Feb 21, 2022

jvesely added this to In progress in LLVM Runtime Compiler via automation Feb 21, 2022

jvesely merged commit 31c15ce into PrincetonUniversity:devel Feb 21, 2022

LLVM Runtime Compiler automation moved this from In progress to Done Feb 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llvm/cuda: Optimize memory copies in memory execution #2328

llvm/cuda: Optimize memory copies in memory execution #2328

jvesely commented Feb 21, 2022

github-actions bot commented Feb 21, 2022

llvm/cuda: Optimize memory copies in memory execution #2328

llvm/cuda: Optimize memory copies in memory execution #2328

Conversation

jvesely commented Feb 21, 2022

github-actions bot commented Feb 21, 2022