Profiling MPI and benchmarking strong + weak scaling #1451

ali-ramadhan · 2021-03-10T17:09:21Z

In PR #590 I added a small/quick strong scaling test and @francispoulin calculated the scaling efficiency which wasn't super great:

np       efficiency
==       ==========
2         0.96
4         0.71
8         0.62
16        0.56

I guess to improve performance we should do some MPI profiling to find bottlenecks. Could also benchmark the distributed pressure solve and the halo filling separately to see how they scale as well.

Might also make sense to benchmark scaling with ShallowWaterModel to see if it's an IncompressibleModel issue. Might need a pretty large domain to see good scaling with a 2D shallow water model?

@tomchor pointed out that the benchmark could be flawed. We should make sure everything is compiled. Could also try different sizes and a weak scaling benchmark in case the 1D/slab decomposition isn't helping.

Maybe trying on a different machine too. Not sure if there's a "proper" setup for doing these scaling benchmarks.

Bad scaling efficiency might also be a sign of missing barriers/waits?

@vchuravy We might ask for your help!

The text was updated successfully, but these errors were encountered:

vchuravy · 2021-03-10T18:23:45Z

We might ask for your help!

Happy to help.

francispoulin · 2021-03-10T18:27:24Z

Thanks @ali-ramadhan for doing this. I wonder if we could modify this script and run it on ShallowWaterModel to start doing some strong scaling tests for that model?

ali-ramadhan · 2021-03-16T18:47:05Z

Got some helpful replies from Julia Discourse: https://discourse.julialang.org/t/how-to-profile-julia-mpi-code/57136/4

Leading suggestion by @simonbyrne is to try using NVIDIA Nsight which might allow us to do GPU profiling and MPI profiling!

tomchor · 2021-03-16T20:17:11Z

This registration is still open: https://portal.xsede.org/course-calendar/-/training-user/class/2310/session/3970

It's free and it'll happen on Thursday. I'm considering attending myself

ali-ramadhan · 2021-03-17T00:49:13Z

Thanks for the heads up, just signed up!

francispoulin · 2021-03-17T01:20:54Z

Thanks, and me too!

glwagner · 2023-03-22T16:13:13Z

@simone-silvestri has done a bit of this. @simone-silvestri feel free to post your results here. I'm converting this to a discussion.

ali-ramadhan added performance 🏍️ So we can get the wrong answer even faster distributed 🕸️ Our plan for total cluster domination labels Mar 10, 2021

CliMA locked and limited conversation to collaborators Mar 22, 2023

glwagner converted this issue into discussion #3002 Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Profiling MPI and benchmarking strong + weak scaling #1451

Profiling MPI and benchmarking strong + weak scaling #1451

ali-ramadhan commented Mar 10, 2021

vchuravy commented Mar 10, 2021

francispoulin commented Mar 10, 2021

ali-ramadhan commented Mar 16, 2021

tomchor commented Mar 16, 2021

ali-ramadhan commented Mar 17, 2021

francispoulin commented Mar 17, 2021

glwagner commented Mar 22, 2023

This issue was moved to a discussion.

This issue was moved to a discussion.

Profiling MPI and benchmarking strong + weak scaling #1451

Profiling MPI and benchmarking strong + weak scaling #1451

Comments

ali-ramadhan commented Mar 10, 2021

vchuravy commented Mar 10, 2021

francispoulin commented Mar 10, 2021

ali-ramadhan commented Mar 16, 2021

tomchor commented Mar 16, 2021

ali-ramadhan commented Mar 17, 2021

francispoulin commented Mar 17, 2021

glwagner commented Mar 22, 2023

This issue was moved to a discussion.