This repository has been archived by the owner. It is now read-only.

Fix an ordering bug in Global::push_bag() #53

Merged
merged 1 commit into from Nov 29, 2017

Conversation

Projects
None yet
3 participants
@jeehoonkang
Copy link
Contributor

jeehoonkang commented Nov 28, 2017

In an RFC (https://github.com/crossbeam-rs/rfcs/blob/master/text/2017-07-23-relaxed-memory.md), we explained why Crossbeam is correct in relaxed memory. However, the pseudocode in the RFC and the implementation differs: in unlink() in RFC, a SeqCst fence is executed and then the global epoch is read. On the other hand, in push_bag() in the implementation, the global epoch is read and then a SeqCst fence is executed.

Unfortunately, the current implementation is incorrect. For example, consider the following scenario:

  • Thread A is pinned at epoch 10, and deferring the deallocation of an object O.
  • In Global::push_bag(), A can read the global epoch 10, and then sleeps for a long time.
  • In the mean time, the global epoch advanced to 11.
  • Thread B is pinned with local epoch 11, and got a reference to O.
  • Thread A woke up and executes the SeqCst fence, and then it is unpinned.
  • The global epoch advances to 12.
  • Thread C actually deallocates O, while B has a reference to it...

This PR fixes it by reordering fence(SeqCst) and a load from the global epoch in Global::push_bag().

Arguably, the performance remains almost the same after this PR. This is an one-time measurement of cargo bench (control = master, variable = this PR):

 name                       control ns/iter  variable ns/iter  diff ns/iter  diff %  speedup
 multi_alloc_defer_free     3,581,149        3,608,989               27,840   0.78%   x 0.99
 multi_defer                1,640,773        1,630,437              -10,336  -0.63%   x 1.01
 multi_flush                7,538,490        7,519,137              -19,353  -0.26%   x 1.00
 multi_pin                  4,725,144        4,674,670              -50,474  -1.07%   x 1.01
 single_alloc_defer_free    49               53                           4   8.16%   x 0.92
 single_default_handle_pin  9                9                            0   0.00%   x 1.00
 single_defer               27               30                           3  11.11%   x 0.90
 single_flush               214              206                         -8  -3.74%   x 1.04
 single_pin                 9                9                            0   0.00%   x 1.00

This PR is blocked by #52: here I just commented out impl<T> Into<Box<T>> for Owned<T>, but it should be properly dealt with in another PR.

@stjepang
Copy link
Member

stjepang left a comment

Good find! :)

@martinhath

This comment has been minimized.

Copy link

martinhath commented Nov 28, 2017

I must say I'm a little confused by this (although it's probably correct :).
Presumably the problem is that A read an old epoch in unlink, and B has read the new epoch, and manages to obtain a reference to O, before As unlinking is visible to B?

Does the fence in unlink guarantee that after executing it, is it not possible for any thread to obtain a reference to O, such that we know if the epoch is incremented to 11, and thread B has observed this, the read by A will also read 11?

I must admit the semantics of fence is really confusing to me; the docs talks about synchronization between operations happening before and after, but here it seems all operations are after the fence.

@jeehoonkang

This comment has been minimized.

Copy link
Contributor Author

jeehoonkang commented Nov 29, 2017

I must admit the semantics of fence is really confusing to me; the docs talks about synchronization between operations happening before and after, but here it seems all operations are after the fence.

The semantics of fence(SeqCst) has long been unclear, and only recently is it clarified.
See: http://sf.snu.ac.kr/promise-concurrency/ and http://plv.mpi-sws.org/scfix/ (disclosure: I'm one of the authors of both papers).

Does the fence in unlink guarantee that after executing it, is it not possible for any thread to obtain a reference to O, such that we know if the epoch is incremented to 11, and thread B has observed this, the read by A will also read 11?

Assuming the semantics presented in those papers are correct, the semantics of fence(SeqCst) is weaker than you expect. Even if A executes fence(SeqCst), it doesn't necessarily mean all the threads should not be able to read O. Only if another thread B executes fence(SeqCst) after A does, B cannot read the unlinked object.

Basically that's why we need to wait for 2 epoch advances. The RFC I mentioned above (https://github.com/crossbeam-rs/rfcs/blob/master/text/2017-07-23-relaxed-memory.md) explains in fuller details of its correctness.

@jeehoonkang jeehoonkang force-pushed the jeehoonkang:push_bag_fix branch from 3e1c6d2 to 4466617 Nov 29, 2017

@stjepang stjepang merged commit 4e2dcd0 into crossbeam-rs:master Nov 29, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@martinhath

This comment has been minimized.

Copy link

martinhath commented Nov 29, 2017

Even if A executes fence(SeqCst), it doesn't necessarily mean all the threads should not be able to read O. Only if another thread B executes fence(SeqCst) after A does, B cannot read the unlinked object.

I see. So in this PR, thread B is the one in push_bag, such that it must fence before reading, which then guarantees that it observes the epoch increment, which is also followed by a fence?

@jeehoonkang

This comment has been minimized.

Copy link
Contributor Author

jeehoonkang commented Dec 8, 2017

@martinhath I'm sorry for the delay.

I'm sorry again that I couldn't fully understand your comment.. I presumed A is the one in push_bag (or unlinked): A's action (removal of O from the memory) should be visible when the epoch advanced enough in try_advance().

Instead of directly answering your question, I'd like to summarize what's going on in the proof in https://github.com/crossbeam-rs/rfcs/blob/master/text/2017-07-23-relaxed-memory.md . In essence, the reasoning depends on the following two properties of SC fences:

  • All the executions of SC fences are linearly ordered.
  • If an execution A of SC fence is before another execution B in the linear order, A's knowledge before its fence should be transferred to B after its fence.

So the proof first (1) somehow deduces that unlinked()'s SC fence should be executed before try_advance()'s SC fence; so (2) O should be invisible after try_advance() successfully advanced the epoch enough.

I hope it helps, at least a little bit. Please feel free to ask any questions.

@martinhath

This comment has been minimized.

Copy link

martinhath commented Dec 11, 2017

Okay, I think I understand now. Thanks for your help! 😄

@jeehoonkang jeehoonkang deleted the jeehoonkang:push_bag_fix branch Oct 31, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.