Skip to content

Conversation

@sframba
Copy link
Contributor

@sframba sframba commented Oct 22, 2024

After #2610, a crash is seen in some cases with wave propagators:

** StackTrace of XX frames **
[...]
Frame 11: chai::ArrayManager::move(chai::PointerRecord*, chai::ExecutionSpace) 
Frame 12: chai::ArrayManager::move(void*, chai::PointerRecord*, chai::ExecutionSpace) 
Frame 13: PATH/lib/libdataRepository.so 
Frame 14: std::enable_if<can_memcpy<float>, int>::type geos::bufferOps::UnpackDataByIndexDevice<float, 1, 0, LvArray::ArrayView<int const, 1, 0, int, LvArray::ChaiBuffer> >(signed char const*&, LvArray::ArrayView<float, 1, 0, int, LvArray::ChaiBuffer> const&, LvArray::ArrayView<int const, 1, 0, int, LvArray::ChaiBuffer> const&, std::vector<camp::resources::v1::Event, std::allocator<camp::resources::v1::Event> >&, ompi_op_t*) 
Frame 15: std::enable_if<can_memcpy<float>, int>::type geos::bufferOps::UnpackByIndexDevice<float, 1, 0, LvArray::ArrayView<int const, 1, 0, int, LvArray::ChaiBuffer> >(signed char const*&, LvArray::ArrayView<float, 1, 0, int, LvArray::ChaiBuffer> const&, LvArray::ArrayView<int const, 1, 0, int, LvArray::ChaiBuffer> const&, std::vector<camp::resources::v1::Event, std::allocator<camp::resources::v1::Event> >&, ompi_op_t*) 
Frame 16: geos::dataRepository::Wrapper<LvArray::Array<float, 1, camp::int_seq<long, 0l>, int, LvArray::ChaiBuffer> >::unpackByIndex(signed char const*&, LvArray::ArrayView<int const, 1, 0, int, LvArray::ChaiBuffer> const&, bool, bool, std::vector<camp::resources::v1::Event, std::allocator<camp::resources::v1::Event> >&, ompi_op_t*) 
Frame 17: geos::ObjectManagerBase::unpack(signed char const*&, LvArray::ArrayView<int, 1, 0, int, LvArray::ChaiBuffer>&, int, bool, std::vector<camp::resources::v1::Event, std::allocator<camp::resources::v1::Event> >&, ompi_op_t*) 
Frame 18: geos::NeighborCommunicator::unpackBufferForSync(geos::FieldIdentifiers const&, geos::MeshLevel&, int, bool, std::vector<camp::resources::v1::Event, std::allocator<camp::resources::v1::Event> >&, ompi_op_t*) 
Frame 19: geos::CommunicationTools::asyncUnpack(geos::MeshLevel&, std::vector<geos::NeighborCommunicator, std::allocator<geos::NeighborCommunicator> >&, geos::MPI_iCommData&, bool, std::vector<camp::resources::v1::Event, std::allocator<camp::resources::v1::Event> >&, ompi_op_t*) 
Frame 20: geos::CommunicationTools::finalizeUnpack(geos::MeshLevel&, std::vector<geos::NeighborCommunicator, std::allocator<geos::NeighborCommunicator> >&, geos::MPI_iCommData&, bool, std::vector<camp::resources::v1::Event, std::allocator<camp::resources::v1::Event> >&, ompi_op_t*) 
Frame 21: geos::CommunicationTools::synchronizeUnpack(geos::MeshLevel&, std::vector<geos::NeighborCommunicator, std::allocator<geos::NeighborCommunicator> >&, geos::MPI_iCommData&, bool) 
Frame 22: geos::CommunicationTools::synchronizeFields(geos::FieldIdentifiers const&, geos::MeshLevel&, std::vector<geos::NeighborCommunicator, std::allocator<geos::NeighborCommunicator> >&, bool) 
Frame 23: geos::AcousticWaveEquationSEM::synchronizeUnknowns(double const&, double const&, int, geos::DomainPartition&, geos::MeshLevel&, LvArray::ArrayView<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, 1, 0, int, LvArray::ChaiBuffer> const&) 
Frame 24: geos::AcousticWaveEquationSEM::explicitStepInternal(double const&, double const&, int, geos::DomainPartition&) 
=====

this is apparently due to an overly zealous freeOnDevice usage. In particular, freeing the variables m_sourceCoordinates and m_receiverCoordinates seems to trigger this problem. This PR should fix the issue.

@sframba sframba changed the title removed offending free on device fix: Crashes in some cases after some varaibles are free'd on device Oct 22, 2024
@sframba sframba changed the title fix: Crashes in some cases after some varaibles are free'd on device fix: Crashes in some wave propagation cases after some varaibles are free'd on device Oct 22, 2024
@sframba sframba requested a review from Bubusch October 22, 2024 14:16
@sframba sframba self-assigned this Oct 22, 2024
@sframba sframba added the type: bug Something isn't working label Oct 22, 2024
@sframba sframba marked this pull request as ready for review October 23, 2024 08:33
@sframba sframba requested a review from acitrain as a code owner October 23, 2024 08:33
@sframba sframba added ci: run CUDA builds Allows to triggers (costly) CUDA jobs ci: run integrated tests Allows to run the integrated tests in GEOS CI ci: run code coverage enables running of the code coverage CI jobs labels Oct 23, 2024
@codecov
Copy link

codecov bot commented Oct 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 56.79%. Comparing base (b43e001) to head (fc3e45d).
Report is 75 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3409      +/-   ##
===========================================
- Coverage    56.79%   56.79%   -0.01%     
===========================================
  Files         1076     1076              
  Lines        95749    95739      -10     
===========================================
- Hits         54383    54375       -8     
+ Misses       41366    41364       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sframba sframba merged commit 9ee88f7 into develop Oct 25, 2024
30 checks passed
@sframba sframba deleted the bugfix/freeondevice branch October 25, 2024 09:36
danielemoretto44 pushed a commit that referenced this pull request Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci: run code coverage enables running of the code coverage CI jobs ci: run CUDA builds Allows to triggers (costly) CUDA jobs ci: run integrated tests Allows to run the integrated tests in GEOS CI type: bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants