Fix data race in run loop usage #486

lebdron · 2019-02-18T09:32:07Z

Thread sanitizer detects a data race in this run loop usage example #154
https://gist.github.com/lebdron/ee3e3e74250c27b2595a732bae5a9a22
Since queue can be accessed both by current thread and some other thread which calls schedule(), queue handles have to be protected by a lock.

Signed-off-by: Andrei Lebedev <lebdron@gmail.com>

iam · 2019-02-20T20:16:02Z

This fixes empty() but I think peek() might still be racy?

If another thread mutates the queue concurrently then perhaps the reference could become invalid?

kirkshoop · 2019-02-26T08:33:34Z

Interesting. Here was my logic for these.

I did not care if empty() or peek() are accurate, they are read-only operations and are allowed to be wrong without breaking code as long as callers are from the same thread that calls dispatch() and there is only one thread that calls dispatch(). I do see a use in having some validation of this requirement. I was trying to avoid taking the lock on every iteration of a main loop.

As long as empty() eventually resolves to true after an item is enqueued, then dispatch() will eventually be called. dispatch() uses the lock to protect all access so that it always has a coherent view of the queue contents. an atomic size member could be used to make empty() use an atomic load rather than take a lock. I wanted to avoid the overhead of an atomic load on every iteration of the main loop.

iam · 2019-02-27T00:40:34Z

EDIT: see the next comment for a possible solution, this is just me thinking out loud

Is there a larger context for this, I think I may be missing some detail to better formulate a reply?

Runloop implementations I've seen often have a layer of mutexes/condition variables which are necessary to properly send/receive messages across threads. High throughput runloops could use lockfree queues [but tuning this is highly cpu-count and architecture specific].

Judging on the QT example here (#438), it looks like the QT main loop would occasionally call into the RX runloop through a polling mechanism?

For sparsely scheduled actions, this seems inherently less efficient than an event-based mechanism to feed the run loop?

Also re: thread safety, my under-informed gut reaction seems like perhaps the API could be split? A set of functions that are callable from the loop-owning-thread [perhaps while always under a lock, or while always out of a lock] and another set of functions callable from any thread. This is only a guess without having full context of the above.

Usually the only way to "avoid" all atomics would be to invoke user-callbacks while a lock is held, otherwise there would be races from data manipulation coming from other threads.

iam · 2019-02-27T22:51:31Z

After looking at the API again, I'm not sure if it needs to have peek or empty at all?

I believe it is already assumed that the TimePoint is globally ordered across all threads (for schedule, queue order, etc, so we can take advantage of this assumption.

I'm not exactly sure yet how dispatch works (I filed #489 to clarify) but it seems to be one of the following:

dispatch pops one item from the queue and processes it (with when < now)
dispatch pops all items from the queue and processes all of them (with when < now)

The first use case would look like

   while (queue has more items) {
     rl.dispatch();
   }

The second use case simply looks like:

  rl.dispatch();

Either way, I think dispatch could be changed to return the time of the next item to be scheduled and this would be equally correct?

    std::optional<TimePoint> next = rl.dispatch();
    if (next && *next > rl.now()) {
      user_timer_schedule_at(*next);
    }

From what I understood so far it would obviate the need to call empty(), peek() etc ever again from user code. Of course the functions could still remain, but thread-safe, and be discouraged from using in non-debugging code.

kirkshoop · 2019-03-06T05:31:13Z

I like your proposed change to dispatch(). The one change to your proposal I would ask for, is to use the rxcpp maybe instead of std::optional.

I would like to keep the current changes in this PR for now and add the proposed changes to dispatch in a different PR.

Thanks!

iam · 2019-03-06T23:26:01Z

Perhaps this might need to be reverted? There seems to be some potential issues:

this patch is introducing deadlocks when calling peek or empty from notify_earlier_wakeup (see Potential deadlock in rx-runloop #487)
peek is not actually thread-safe because the reference-into-queue could be deleted by another thread popping from the queue

kirkshoop · 2019-03-09T22:30:07Z

@iam Yes. This is a breaking change. It will create deadlocks if a notify calls empty or peek. the reference returned from peek is indeed unsafe. these methods were only intended to be called from the same thread that calls dispatch. only dispatch would invalidate the head of the queue.

Fix data race in run loop usage

86124ba

Signed-off-by: Andrei Lebedev <lebdron@gmail.com>

lebdron mentioned this pull request Feb 18, 2019

Add rxcpp patch for run loop data race hyperledger-archives/iroha#2105

Closed

Noggog mentioned this pull request Mar 5, 2019

Segfaults in run_loop tetsurom/rxqt#19

Closed

kirkshoop merged commit 7c79c18 into ReactiveX:master Mar 6, 2019

iam mentioned this pull request Mar 6, 2019

Potential deadlock in rx-runloop #487

Open

lebdron deleted the fix/run-loop-data-race branch March 14, 2019 14:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix data race in run loop usage #486

Fix data race in run loop usage #486

lebdron commented Feb 18, 2019 •

edited

iam commented Feb 20, 2019

kirkshoop commented Feb 26, 2019

iam commented Feb 27, 2019 •

edited

iam commented Feb 27, 2019 •

edited

kirkshoop commented Mar 6, 2019

iam commented Mar 6, 2019

kirkshoop commented Mar 9, 2019

Fix data race in run loop usage #486

Fix data race in run loop usage #486

Conversation

lebdron commented Feb 18, 2019 • edited

iam commented Feb 20, 2019

kirkshoop commented Feb 26, 2019

iam commented Feb 27, 2019 • edited

iam commented Feb 27, 2019 • edited

kirkshoop commented Mar 6, 2019

iam commented Mar 6, 2019

kirkshoop commented Mar 9, 2019

lebdron commented Feb 18, 2019 •

edited

iam commented Feb 27, 2019 •

edited

iam commented Feb 27, 2019 •

edited