Revisit the memory order used in the `WaitQueue`/`Waker`/`Waiter` #831

lrh2000 · 2024-05-11T04:59:28Z

asterinas/framework/aster-frame/src/sync/wait.rs

Lines 157 to 159 in 81cca42

    
           pub fn is_empty(&self) -> bool { 
        
               self.num_wakers.load(Ordering::Acquire) == 0 
        
           }

Quoted from #828 (comment):

No, I don't think this is correct. If it is correct, why should I write self.count.fetch_add(0, Ordering::Release) == 0, which looks pretty wired at first glance?

The exact point is that you need to make sure that memory writes before is_empty() cannot be reordered after the emptiness check, which means you need to have a Release operation on is_empty(), not Acquire.

Quoted from the cppreference you've attached:
* A load operation with this memory order performs the **acquire** operation on the affected memory location: **no reads or writes in the current thread can be reordered before this load**.

* A store operation with this memory order performs the **release** operation: **no reads or writes in the current thread can be reordered after this store**.
If we use fetch_add(0, Ordering::Release) to load the counter, the compiler will know that we need a Release operation here, so it will insert a mfence instruction before the load (check the result of "show assembly" in the Rust playground):
playground::load_acquire:
	movq	playground::COUNTER0(%rip), %rax
	retq

playground::load_release:
	mfence
	movq	playground::COUNTER0(%rip), %rax
	retq
The mfence instruction is exactly what we need, and it is exactly what the smp_mb function in the Linux kernel implementation does. Without the mfence instruction, the stores (which update the result of the cond() check) can be reordered after the emptiness check, and we will miss a wake-up event in these scenarios.

To be honest, a wrong memory order may improve performance (by avoiding the overhead of the mfence statement). Meanwhile, this problem will only become apparent when we run heavy workloads on multiple processors. So we can leave a TODO here and be happy for a really long time, but it's still problematic, isn't it?

I will revisit the memory order used in the WaitQueue/Waker/Waiter when I have some spare time, so you don't need to file another PR to fix this problem. (But of course you can if you want.)

The text was updated successfully, but these errors were encountered:

junyang-zh · 2024-05-17T05:59:42Z

~~So why not atomic::fence(Ordering::Release) after load?~~

Got an explanation, thanks.

lrh2000 · 2024-05-17T06:30:32Z

This is the general practice:

   CPU 1                              CPU 2

Produces visible side effects

store(Ordering::Release)

                                   load(Ordering::Acquire)

                                   Sees visible side effects

   CPU 1                              CPU 2

Produces side effects
(May be reordered after the store)

store(Ordering::Relaxed)

                                   load(Ordering::Acquire)

                                   May not see side effects

   CPU 1                              CPU 2

Produces side effects

store(Ordering::Release)

                                   load(Ordering::Relaxed)

                                   May not see side effects
                                   (May be reordered before the store)

This is the example in this PR:

   CPU 1                              CPU 2

Produces side effects

wait_queue.wake_one()

                                   wait_queue.enqueue()

                                   May not see side effects
                                   (May be reordered before the store)

Locks may not help (unless the critical region includes wake_one()):

   CPU 1                              CPU 2

xxx.lock().ready = true

wait_queue.wake_one()

                                   wait_queue.enqueue()

                                   if xxx.lock().ready

So we have to:

     pub fn is_empty(&self) -> bool {
-        self.num_wakers.load(Ordering::Acquire) == 0
+        self.num_wakers.fetch_add(0, Ordering::Release) == 0
     }

lrh2000 · 2024-05-17T06:30:50Z

So why not atomic::fence(Ordering::Release) after load?

For your and others' reference, see the "notes" section in std::atomic_thread_fence, which behaves exactly the same way as atomic::fence:

On x86 (including x86-64), atomic_thread_fence functions issue no CPU instructions and only affect compile-time code motion, except for std::atomic_thread_fence(std::memory_order::seq_cst).

This was referenced May 11, 2024

[Performance] Frequent usage of mutex significantly slows down performance. #816

Open

Refine WaitQueue with fast path to return early #828

Merged

Optimizations to improve the performance of Ramfs #826

Merged

lrh2000 mentioned this issue May 31, 2024

Fix Waiter-related memory orders #896

Merged

tatetian closed this as completed in #896 Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit the memory order used in the `WaitQueue`/`Waker`/`Waiter` #831

Revisit the memory order used in the `WaitQueue`/`Waker`/`Waiter` #831

lrh2000 commented May 11, 2024 •

edited

Loading

junyang-zh commented May 17, 2024 •

edited

Loading

lrh2000 commented May 17, 2024

lrh2000 commented May 17, 2024

Revisit the memory order used in the WaitQueue/Waker/Waiter #831

Revisit the memory order used in the WaitQueue/Waker/Waiter #831

Comments

lrh2000 commented May 11, 2024 • edited Loading

junyang-zh commented May 17, 2024 • edited Loading

lrh2000 commented May 17, 2024

lrh2000 commented May 17, 2024

Revisit the memory order used in the `WaitQueue`/`Waker`/`Waiter` #831

Revisit the memory order used in the `WaitQueue`/`Waker`/`Waiter` #831

lrh2000 commented May 11, 2024 •

edited

Loading

junyang-zh commented May 17, 2024 •

edited

Loading