-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the performance of buoyancy_gradients #2530
Comments
Update on this: the performance of The NVTX trace, showing the relative performance hit compared to other high level kernels: And the flame graph, showing what's being called with more granularity: So, the I think, to start with, we should simply break up We may need to apply some RootSolvers/Thermodynamics optimizations (e.g., CliMA/RootSolvers.jl#51) to reduce the arithmetic intensity of those kernels (they are very expensive). Also, next, I'll add a gpu job and see what the performance of this kernel looks like on the gpu, since that's probably a more important target to optimize. cc @szy21, @trontrytel, @tapios |
That's interesting, thanks! I reached the conclusion that |
Sounds good. If there is anything easy to optimize in Thermodynamics, I think we should start with it. We are calling each function 9 times when running with quadrature points. So even small improvements should show up |
Here are some updates (cc @szy21):
|
Could you post the time-to-solution for the job with and without cloud diagnostics on GPU? Other than that I'm ok with closing this issue. Thanks for all the work! |
Yep, from this PR (with 1 p100 gpu):
So, it's actually not bad. |
And how about the one without cloud diagnostics on GPU? |
Good question, I'll convert the other one, too in that PR so that we can compare. |
Without cloud diagnostics, we have:
cc @szy21 |
Great, thanks! |
@tapios mentioned that this is still an issue, so I'm reopening. |
We should get updated numbers. |
For cloud diagnostics, the issue is not directly buoyancy gradients, but gradientes of moisture/enthalpy. This may be related to the buoyancy gradient issue though. @szy21 knows more. |
The latest build has 2.03 SYPD for held suarez and 1.47 SYPD for held suarez with cloud diagnostics per stage, so ~30% difference (assuming they are using the same GPU on central). I don't know whether the slowdown is mostly from |
The buoyancy gradients itself is now only 547 μs (xref: #2951 (comment)). Closing. |
#2456 makes all the dycore simulations slower by a factor of two. When looking at it more closely, it seems most of the slowdown is from buoyancy_gradients. As a temporary solution, the mixing length and thus buoyancy gradients calculation is moved to a callback in #2466 without EDMF. With EDMF, this is still called at each timestep. It would be good if we can improve the performance of buoyancy_gradients. cc @charleskawczynski
Tasks
The text was updated successfully, but these errors were encountered: