Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RRTMGP data transfer optimization #24

Merged
merged 1 commit into from
Jul 31, 2024
Merged

Conversation

supreethms1809
Copy link

Without the data directives in the RRTMGP-CAM interface, the variables on the GPUs were created and destroyed multiple times causing a performance penalty. In this commit, structured data regions were added above and below the increment, rte_sw, rte_lw calls. After adding data directives,the nsys profile indicates, only a single data transfer at the begining and the kernel executions.
SW: 414.055ms → 77.66ms
LW: 443.532ms → 200.637ms

Without the data directives in the RRTMGP-CAM interface, the variables
on the GPUs were created and destroyed multiple times causing a
performance penalty. In this commit, structured data regions were
added above and below the increment, rte_sw, rte_lw calls. After adding
data directives,the nsys profile indicates, only a single data transfer
at the begining and the kernel executions.
SW: 414.055ms → 77.66ms
LW: 443.532ms → 200.637ms
@supreethms1809 supreethms1809 requested a review from gdicker1 July 30, 2024 02:19
Copy link

@gdicker1 gdicker1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for all the work on this @supreethms1809!

I also appreciate the timing info you shared here.

@gdicker1 gdicker1 merged commit 2aee1e3 into ew-develop Jul 31, 2024
1 check passed
gdicker1 added a commit that referenced this pull request Aug 1, 2024
This is a follow-up for PR #24. It reduces additional data movement in
the rte_sw() and rte_lw() subroutines. It also eliminates data movement
within the calls to gas_optics() by moving the data movement outside of
the subroutine call tree.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants