-
Notifications
You must be signed in to change notification settings - Fork 117
Resolving bug with multiple ranks using IBM #990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to ae30de5
Previous suggestionsSuggestions up to commit f074f32
|
@sbryngelson MacOS CI has a problem building cmake btw |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #990 +/- ##
==========================================
- Coverage 40.93% 40.92% -0.02%
==========================================
Files 70 70
Lines 20288 20299 +11
Branches 2517 2521 +4
==========================================
+ Hits 8305 8307 +2
- Misses 10447 10454 +7
- Partials 1536 1538 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Please fix |
Removed 'cmake' from the list of installed packages on macOS.
does doesn't run with case optimization on CPU/Phoenix for some reason... @anandrdbz please check logs |
Actually this case revealed an important corner case, the IBM boundary and processor boundary match almost exactly. You actually need a larger buffer size than I even thought , I kept a limit of 6 = 2*gp_layers for the image point, but this isn't strictly true. In this particular case, it worked when increasing it to 8. But there's actually no deterministic way of calculating this though as it depends on the IBM geometry, For now I'll push a fix to get this test to pass, but I'll put a print message that tells the user to increase buff_size if they run into this issue again in the source code. |
hmm... @anandrdbz given that halo exchanges / ghost cell exchanges are relatively cheap except in extreme strong scaling cases, can we just put a rather generous upper bound on the ghost cells? This is a good find though, and I don't want to lose it in case it comes up again. I'm not sure how you want to add an warning message, but try to place it carefully/mindfully. Presumably the vast majority of cases with an IB will fail for a reason that has nothing to do with this 😄 |
Yeah this will fail uniquely, with a print message that's unique too, so it should be clear ( you can checkout my latest commit) I have kept a generous upper bound of 10 for IBM cases, but there's no way to calculate a strict upper bound as it's geometry dependent. |
Also @sbryngelson the benchmark case with IBM will eventually fail the check in this latest commit due to the increase in buffer size from 6 to 10 (I checked this for CPUs on Phoenix), with a 20% or so slowdown in grind time (I think the limit is about 5-10%). |
User description
Description
Fix a Bug with MPI transfer of IBM Markers on multiple ranks. Also added missing Device--Host transfers for MPI transfer of ib_markers. Addresses Issue #875
Closes #875
Fixes #(issue) [optional]
Type of change
Please delete options that are not relevant.
Scope
If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration
Test Configuration:
Checklist
docs/
)examples/
that demonstrate my new feature performing as expected.They run to completion and demonstrate "interesting physics"
./mfc.sh format
before committing my codeIf your code changes any code source files (anything in
src/simulation
)To make sure the code is performing as expected on GPU devices, I have:
nvtx
ranges so that they can be identified in profiles./mfc.sh run XXXX --gpu -t simulation --nsys
, and have attached the output file (.nsys-rep
) and plain text results to this PR./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace
, and have attached the output file and plain text results to this PR.PR Type
Bug fix
Description
Fix MPI transfer for IBM markers on multiple GPUs
Add RDMA and non-RDMA communication paths
Implement proper GPU memory management for transfers
Add NVTX profiling ranges for performance monitoring
Diagram Walkthrough
File Walkthrough
m_mpi_proxy.fpp
Implement dual-path MPI communication for IBM markers
src/simulation/m_mpi_proxy.fpp
GPU_UPDATE
directivesGPU_HOST_DATA
wrapper for RDMA transfers