Fixes CUDA sanitizer issues#343
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses CUDA sanitizer issues by adding comprehensive memory initialization, proper synchronization with Kokkos fences, and cleanup of resource leaks. The changes ensure that uninitialized memory is not read during halo exchanges and that GPU state is properly synchronized before MPI operations.
Changes:
- Initialize auxiliary variable arrays and ocean state arrays to zero in constructors to prevent uninitialized memory reads during halo exchanges
- Add
Kokkos::fence()calls before halo exchange operations to ensure GPU operations complete before MPI accesses device memory - Fix memory leaks by adding
MPI_Group_free()calls, replacing heap-allocated Clock objects with stack allocation, and adding singleton destruction calls
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| RungeKutta4Stepper.cpp | Initialize ProvisTracers array to zero |
| WindForcingAuxVars.cpp | Initialize wind stress arrays to zero |
| VorticityAuxVars.cpp | Initialize vorticity arrays to zero and add OmegaKokkos.h include |
| VelocityDel2AuxVars.cpp | Initialize Del2 arrays to zero and add OmegaKokkos.h include |
| TracerAuxVars.cpp | Initialize tracer auxiliary arrays to zero and add OmegaKokkos.h include |
| LayerThicknessAuxVars.cpp | Initialize layer thickness arrays to zero and add OmegaKokkos.h include |
| KineticAuxVars.cpp | Initialize kinetic energy arrays to zero and add OmegaKokkos.h include |
| VertCoord.cpp | Initialize min/max layer arrays to zero for edges and vertices |
| Tracers.cpp | Initialize tracer arrays to zero and add fence before halo exchange |
| OceanState.cpp | Initialize ocean state arrays to zero and add fence before halo exchange |
| OceanFinal.cpp | Add IOStream finalization with clock and singleton destruction for Eos and VertMix |
| AuxiliaryState.cpp | Add fence before halo exchange |
| IOStream.cpp | Replace heap-allocated Clock with stack-allocated Clock to fix memory leaks |
| MachEnv.cpp | Add fences before MPI operations, free MPI_Group objects, and set DefaultEnv to nullptr |
| Halo.cpp | Set DefaultHalo to nullptr after clearing |
| Decomp.cpp | Set DefaultDecomp to nullptr after clearing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
philipwjones
left a comment
There was a problem hiding this comment.
I'm fine with the changes with the caveat that @mwarusz comments should be addressed. In particular, if we always need the fences before halos, I think they should be pushed into the halo routines (maybe we just need to remove the if device buffers?). But will approve while others make the case either way.
01915ee to
75a1973
Compare
|
@mwarusz , @philipwjones , Removed unnecessary Kokkos View initializations. Also removed the Kokkos fences, which may be reconsidered later if an actual issue arises. |
mwarusz
left a comment
There was a problem hiding this comment.
Passes CTests on aurora-cpu and aurora-gpu. I have some minor comments, but it looks almost ready.
b11b385 to
a5623ea
Compare
| // the process, calls the destructors for each | ||
| AllDecomps.clear(); // removes all decomps from the list (map) and in | ||
| // the process, calls the destructors for each | ||
| DefaultDecomp = nullptr; // prevent dangling pointer |
There was a problem hiding this comment.
Should the default pointer for other classes also be set to nullptr in the clear method, i.e. HorzMesh, Tracers, State, etc?
There was a problem hiding this comment.
I agree with this. Basically, I looked at the code that the memory check tool pointed to. I guess the tool limits the output of its checks. I will review other parts of the code with a similar structure and try to nullify them in the same way as with Decomp and Halo. Thanks.
|
Thanks everyone for your feedback. I removed Brian, as there were sufficient reviews and he is reviewing other PRs. |
This merge updates the e3sm_submodules/Omega submodule from [d0b3482](https://github.com/E3SM-Project/Omega/tree/d0b3482) to [74611d548d](https://github.com/E3SM-Project/Omega/tree/74611d548d). This update includes the following MPAS-Ocean and MPAS-Frameworks PRs (check mark indicates bit-for-bit with previous PR in the list): - [ ] (ocn) E3SM-Project/Omega#343 - [ ] (ocn) E3SM-Project/Omega#344 - [ ] (ocn) E3SM-Project/Omega#226 - [ ] (ocn) E3SM-Project/Omega#369 - [ ] (ocn) E3SM-Project/Omega#379
This merge updates the e3sm_submodules/Omega submodule from [d0b3482](https://github.com/E3SM-Project/Omega/tree/d0b3482) to [74611d548d](https://github.com/E3SM-Project/Omega/tree/74611d548d). This update includes the following MPAS-Ocean and MPAS-Frameworks PRs (check mark indicates bit-for-bit with previous PR in the list): - [ ] (ocn) E3SM-Project/Omega#343 - [ ] (ocn) E3SM-Project/Omega#344 - [ ] (ocn) E3SM-Project/Omega#226 - [ ] (ocn) E3SM-Project/Omega#369 - [ ] (ocn) E3SM-Project/Omega#379
Add more initialization, finalization and Kokkos fences to eliminate CUDA meminit and memleak check issues.
DefaultDecomp = nullptr;.MPI_Group_Freeto prevent memory leaks.Clock ModelClockObj;on the stack instead of the heap.Eos::destroyInstance();, in the Omega finalize routine.