-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC bug on skyline benchmark? #156
Comments
Some notes:
|
Some possible progress on this. I've discovered a race between LGC and scheduler steals. The problem is in the implementation of the ABP concurrent deque: on a steal, the read of the stolen value is performed optimistically before the CAS to confirm the steal. In-between the optimistic read and CAS, a concurrent LGC could relocate the object. To fix this, I think all down-pointers from the work-stealing deque need to be pinned. In our implementation so far, we've been handling the work-stealing deque specially. Its updates are not subjected to the standard write barrier, because this would cause all down-pointers from the deque to stay live forever. But, an interesting point: if I subject the deque to the standard write barrier, then the bug seems to go away. (At least, I haven't been able to trigger the bug in this case yet.) So, the interesting challenge now is to figure out how to pin deque down-pointers while also allowing these to be unpinned appropriately at a later time. Our current unpin-depth trick won't work, because the deques live in the global heap (depth 0), and after scheduler initialization, the program will never again return to depth 0. Proposal: we could use a hybrid remembered set strategy, delimited by depth. For objects |
Skyline benchmark from mpllang/parallel-ml-bench.
The bug sometimes causes a segfault, although I've also seen it hang. It also appears to occur only on small core counts.
To reproduce:
The bug is still present as of the current commit, b69ca19
I think the bug is somewhere in CC. (If I disable
forkGC
in the scheduler, then the bug appears to go away. Or, at least, I haven't been able to trigger it after making that change.)The text was updated successfully, but these errors were encountered: