Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMReX: Conduit with SoA Particles Broken #5087

Closed
nicolemarsaglia opened this issue Jul 26, 2024 · 4 comments
Closed

AMReX: Conduit with SoA Particles Broken #5087

nicolemarsaglia opened this issue Jul 26, 2024 · 4 comments
Assignees
Labels
bug: affects latest release Bug also exists in latest release version bug Something isn't working component: diagnostics all types of outputs component: third party Changes in WarpX that reflect a change in a third-party library

Comments

@nicolemarsaglia
Copy link

I'm attempting to run WarpX + Ascent on LC's Lassen and I am getting a segfault when trying to convert to blueprint data.

I have Ascent built on Lassen using a script similar to this one: https://github.com/Alpine-DAV/ascent/blob/develop/scripts/build_ascent/build_ascent_cuda_lassen.sh , but updated to use the modules from the WarpX instructions here: https://warpx.readthedocs.io/en/latest/install/hpc/lassen.html

I've attached my Backtrace from the single node run, as well as my inputs, cmake script, and ascent_actions files.

I was able to track it down somewhat, the issue is happening here: https://github.com/AMReX-Codes/amrex/blob/6975ae0ed5ca51ec95268f323bdc705dab4fe571/Src/Extern/Conduit/AMReX_Conduit_Blueprint_ParticlesI.H#L72

Potentially an empty soa? But I'm not familiar enough with the code and data structures to debug this quickly, so I am reaching out for any help I can get. Let me know if there is any other information I can provide, thanks so much!

Currently Loaded Modules:
  1) base-gcc/8.3.1       3) cmake/3.23.1   5) spectrum-mpi/rolling-release   7) boost/1.70.0
  2) StdEnv         (S)   4) gcc/11.2.1     6) cuda/12.0.0                    8) python/3.8.2

Backtrace.txt
inputs.txt
build_warpx.txt
ascent_actions.txt

@ax3l ax3l added the component: diagnostics all types of outputs label Aug 5, 2024
@ax3l
Copy link
Member

ax3l commented Aug 5, 2024

Oh no, thanks for reporting @nicolemarsaglia!

Earlier this year, we transitioned the old data structure layout of AMReX in WarpX from a mix of AoS + SoA to be purely SoA now. I am pretty sure that must have broken it.

Here are the PRs:

Do those look correct to you?
I noticed at the time that CI was missing for Ascent in AMReX and maybe a bug slipped through...

@ax3l
Copy link
Member

ax3l commented Aug 5, 2024

Do you think the specific line that fails on

        n_coords["values/x"].set_external(const_cast<ParticleReal*>(&soa.GetRealData(0)[0]),
                                          num_particles);

needs a guard for zero particles on an MPI rank? That can definitely happen and accessing [0] in that case would be a memory violation...

@ax3l
Copy link
Member

ax3l commented Aug 5, 2024

I think there is another issue there further down the code:
https://github.com/AMReX-Codes/amrex/blob/c09da9977004a32ab4f209fba6114c7889a0a5bd/Src/Extern/Conduit/AMReX_Conduit_Blueprint_ParticlesI.H#L185-L196

We now have a 64bit cpuid attribute to uniquely identify particles. In the linked code, it uses an intermediate version where we split both into 32bit numbers, which was not final. Sorry this slipped through. Do you or @cyrush know how to update this for Conduit?

You need to use soa.GetIdCPUData() for this branch instead of the current logic - and again, there can be zero particles on an MPI rank.

@ax3l ax3l changed the title Segfault converting SOA particles to conduit blueprint AMReX: Conduit with SoA Particles Broken Aug 5, 2024
@ax3l ax3l added bug Something isn't working component: third party Changes in WarpX that reflect a change in a third-party library bug: affects latest release Bug also exists in latest release version labels Aug 5, 2024
atmyers pushed a commit to AMReX-Codes/amrex that referenced this issue Aug 6, 2024
## Summary

Fix the SoA particle structures for Conduit. Handle zero particles on
MPI ranks without UB.

- [ ] @nicolemarsaglia can you test this? 🙏 

## Additional background

Fix #4062
ECP-WarpX/WarpX#5087

cc @cyrush @ChristosT @c-wetterer-nelson 

## Checklist

The proposed changes:
- [x] fix a bug or incorrect behavior in AMReX
- [ ] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate
@ax3l
Copy link
Member

ax3l commented Aug 8, 2024

Fixed via AMReX-Codes/amrex#4065.

This will come in after the WarpX 24.08 release (which depends on AMReX 24.08, which is already tagged), via the next weekly AMReX update.

@ax3l ax3l closed this as completed Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: affects latest release Bug also exists in latest release version bug Something isn't working component: diagnostics all types of outputs component: third party Changes in WarpX that reflect a change in a third-party library
Projects
None yet
Development

No branches or pull requests

4 participants