-
Notifications
You must be signed in to change notification settings - Fork 193
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profiling MPI and benchmarking strong + weak scaling #1451
Comments
Happy to help. |
Thanks @ali-ramadhan for doing this. I wonder if we could modify this script and run it on |
Got some helpful replies from Julia Discourse: https://discourse.julialang.org/t/how-to-profile-julia-mpi-code/57136/4 Leading suggestion by @simonbyrne is to try using NVIDIA Nsight which might allow us to do GPU profiling and MPI profiling! |
This registration is still open: https://portal.xsede.org/course-calendar/-/training-user/class/2310/session/3970 It's free and it'll happen on Thursday. I'm considering attending myself |
Thanks for the heads up, just signed up! |
Thanks, and me too! |
@simone-silvestri has done a bit of this. @simone-silvestri feel free to post your results here. I'm converting this to a discussion. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
In PR #590 I added a small/quick strong scaling test and @francispoulin calculated the scaling efficiency which wasn't super great:
I guess to improve performance we should do some MPI profiling to find bottlenecks. Could also benchmark the distributed pressure solve and the halo filling separately to see how they scale as well.
Might also make sense to benchmark scaling with
ShallowWaterModel
to see if it's anIncompressibleModel
issue. Might need a pretty large domain to see good scaling with a 2D shallow water model?@tomchor pointed out that the benchmark could be flawed. We should make sure everything is compiled. Could also try different sizes and a weak scaling benchmark in case the 1D/slab decomposition isn't helping.
Maybe trying on a different machine too. Not sure if there's a "proper" setup for doing these scaling benchmarks.
Bad scaling efficiency might also be a sign of missing barriers/waits?
@vchuravy We might ask for your help!
The text was updated successfully, but these errors were encountered: