Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit the memory order used in the WaitQueue/Waker/Waiter #831

Open
lrh2000 opened this issue May 11, 2024 · 3 comments · May be fixed by #896
Open

Revisit the memory order used in the WaitQueue/Waker/Waiter #831

lrh2000 opened this issue May 11, 2024 · 3 comments · May be fixed by #896

Comments

@lrh2000
Copy link
Contributor

lrh2000 commented May 11, 2024

pub fn is_empty(&self) -> bool {
self.num_wakers.load(Ordering::Acquire) == 0
}

Quoted from #828 (comment):

No, I don't think this is correct. If it is correct, why should I write self.count.fetch_add(0, Ordering::Release) == 0, which looks pretty wired at first glance?

The exact point is that you need to make sure that memory writes before is_empty() cannot be reordered after the emptiness check, which means you need to have a Release operation on is_empty(), not Acquire.

Quoted from the cppreference you've attached:

* A load operation with this memory order performs the **acquire** operation on the affected memory location: **no reads or writes in the current thread can be reordered before this load**.

* A store operation with this memory order performs the **release** operation: **no reads or writes in the current thread can be reordered after this store**.

If we use fetch_add(0, Ordering::Release) to load the counter, the compiler will know that we need a Release operation here, so it will insert a mfence instruction before the load (check the result of "show assembly" in the Rust playground):

playground::load_acquire:
	movq	playground::COUNTER0(%rip), %rax
	retq

playground::load_release:
	mfence
	movq	playground::COUNTER0(%rip), %rax
	retq

The mfence instruction is exactly what we need, and it is exactly what the smp_mb function in the Linux kernel implementation does. Without the mfence instruction, the stores (which update the result of the cond() check) can be reordered after the emptiness check, and we will miss a wake-up event in these scenarios.

To be honest, a wrong memory order may improve performance (by avoiding the overhead of the mfence statement). Meanwhile, this problem will only become apparent when we run heavy workloads on multiple processors. So we can leave a TODO here and be happy for a really long time, but it's still problematic, isn't it?

I will revisit the memory order used in the WaitQueue/Waker/Waiter when I have some spare time, so you don't need to file another PR to fix this problem. (But of course you can if you want.)

@junyang-zh
Copy link
Collaborator

junyang-zh commented May 17, 2024

So why not atomic::fence(Ordering::Release) after load?

Got an explanation, thanks.

@lrh2000
Copy link
Contributor Author

lrh2000 commented May 17, 2024

This is the general practice:

   CPU 1                              CPU 2

Produces visible side effects

store(Ordering::Release)

                                   load(Ordering::Acquire)

                                   Sees visible side effects
   CPU 1                              CPU 2

Produces side effects
(May be reordered after the store)

store(Ordering::Relaxed)

                                   load(Ordering::Acquire)

                                   May not see side effects
   CPU 1                              CPU 2

Produces side effects

store(Ordering::Release)

                                   load(Ordering::Relaxed)

                                   May not see side effects
                                   (May be reordered before the store)

This is the example in this PR:

   CPU 1                              CPU 2

Produces side effects

wait_queue.wake_one()

                                   wait_queue.enqueue()

                                   May not see side effects
                                   (May be reordered before the store)

Locks may not help (unless the critical region includes wake_one()):

   CPU 1                              CPU 2

xxx.lock().ready = true

wait_queue.wake_one()

                                   wait_queue.enqueue()

                                   if xxx.lock().ready

So we have to:

     pub fn is_empty(&self) -> bool {
-        self.num_wakers.load(Ordering::Acquire) == 0
+        self.num_wakers.fetch_add(0, Ordering::Release) == 0
     }

@lrh2000
Copy link
Contributor Author

lrh2000 commented May 17, 2024

So why not atomic::fence(Ordering::Release) after load?

For your and others' reference, see the "notes" section in std::atomic_thread_fence, which behaves exactly the same way as atomic::fence:

On x86 (including x86-64), atomic_thread_fence functions issue no CPU instructions and only affect compile-time code motion, except for std::atomic_thread_fence(std::memory_order::seq_cst).

@lrh2000 lrh2000 linked a pull request May 31, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants