New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition with recent HPX #2086
Comments
Does this happen with other allocators as well? |
Also, how can I reproduce that? |
Steps to reproduce:
|
I did bisect this and git tells me:
|
I'll check another allocator in a bit. |
@K-ballo have you seen this? |
@hkaiser I've seen it now. I do not have an explanation for the race condition. |
@gentryx A shot from the hip: could you try whether removing the |
@hkaiser Changing that line did not fix the race, but it did reduce the probability a bit (tests would sometimes (30%) succeed): I did also test different allocators: jemalloc fails with current trunk, system succeeds (maybe because it's so slow?). |
@gentryx What happens if you use top of master with the commit reverted you identified? I still think the changes applied by that commit are correct. They do however change overall timings and thread execution sequencing, which may make the race to become visible. |
@gentryx I tried to reproduce this on my system yesterday. Unfortunately everything works as expected :/ I'll keep trying, though. |
I'm closing this since the race seems to have been resolved on master. Thanks! |
@gentryx The funny thing is that we have not done anything to fix it... |
@hkaiser I assume you also didn't do anything to cause it, so apparently this race condition just went back to lurking beneath the surface. I could only reproduce it on one of our test machines anyway. |
Reopening as this race resurfaced, this time on more machines. Good thing: I can reproduce it on the Marvin nodes. I'll add a script for bug reproduction momentarily. |
`[01:43:41]:aschafer@deneb01.hermione:/home/aschafer:0:$ cat test.sh cd $HOME mkdir test_race_lgd_hpx mkdir hpx/build mkdir libgeodecomp/build for ((i=0;i<10;++i)); do [01:43:45]:aschafer@deneb01.hermione:/home/aschafer:0:$ sbatch -p marvin --exclusive test.sh |
@sithhell I can confirm that my issue is not present on the fix_wait_all branch. :-) |
I'm certain this fixes not only your issue but a couple of similarly dubious problems we've been seeing over the last months. |
This can be closed now as #2165 has been merged. |
I'm seeing an invalid free error when running the LibGeoDecomp performance tests with recent HPX commits. The backtrace below was generated with commit ID 6f79ea9. It didn't occur with the same LibGeoDecomp code and the HPX trunk approx. 4 weeks ago. The invalid free goes away if I sprinkle the performance test with printf, so I assume it's a race condition and not a normal invalid free.
I can provide more details if necessary.
Error:
Backtrace:
The text was updated successfully, but these errors were encountered: