-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip empty particle tiles in operator++ to avoid race condition in constructor. #3951
Skip empty particle tiles in operator++ to avoid race condition in constructor. #3951
Conversation
@atmyers , thanks a lot for working on this! |
I think we can simplify ParIter by getting rid of
(Code not tested.) |
The code snippet I posted has a bug. Cannot call |
The thread sanitizer test seems to still see an issue:
|
I think this is a different issue from the issue in ParIter. |
Re: operator++, this might work
|
Re: the new issue reported by @lucafedeli88, I think the change needs to be made in WarpX like below. The root issue is DefineAndReturnParticleTile is not thread safe.
|
@atmyers My suggestion on operator++ has a flaw. So it will not work. However, the flaw also exist in this PR. Suppose there are four tiles and a single box. Only one tile has particles. We have 4 threads. In the development branch, only one thread (say thread 0) will enter the loop body and the other three threads will fail isValid as expected. But in this PR, because operator++ is run after at least going through one loop iteration, three threads will execute loop body with no particles. I think we need to modify the isValid test too. |
Using this amrex branch https://github.com/WeiqunZhang/amrex/tree/fix_part_race_condition and this WarpX branch https://github.com/WeiqunZhang/WarpX/tree/test_fix_part_race_condition, the thread sanitizer no longer fails with race condition errors. It does fail eventually with MLMG failure below
But this is probably a different issue. Maybe the tolerance needs to be loosened. |
No need to overload isValid. What we need is to call MFIter::opearator++ in the ctor to make sure it advances to the first non-empty particle tile. See my branch. Note that MFIter::operator++ is thread safe. |
Let's not merge this until after the next release. |
The race condition was found by @lucafedeli88 using an automated tool.
Summary
Additional background
Checklist
The proposed changes: