Reduce foreign wake latency #434

thirstycrow · 2021-10-05T18:12:06Z

What does this PR do?

Reduce foreign wake latency by sending notifications via the event fd installed on the latency ring. Notifications will only be sent if the task being waken was spawned on a queue with latency requirement. A comparison between with and without notifications produced by the foreign_wake benchmark:

$ RUST_LOG=debug RUST_BACKTRACE=1 cargo bench --bench foreign_wake 
    Finished bench [optimized] target(s) in 0.04s
     Running unittests (target/release/deps/foreign_wake-6d33a328eb2cc763)

Latency requirement: NotImportant
Label              Min   Mean    P99  P99.9    Max
glommio->tokio:  0.014  0.285  0.614  0.822  0.902
tokio schedule:  0.000  0.008  0.055  0.123  0.178
tokio process :  0.515  1.962  2.965  3.067  3.161
tokio->glommio: 11.816 96.668 97.983 99.135 99.135

Latency requirement: Matters(100ms)
Label              Min   Mean    P99  P99.9    Max
glommio->tokio:  0.000  0.169  0.399  0.518  0.564
tokio schedule:  0.000  0.018  0.100  0.176  0.213
tokio process :  0.513  1.794  2.415  2.717  2.873
tokio->glommio:  0.004  0.374  0.633  0.677  0.730

Motivation

What inspired you to submit this pull request?

Related issues

A list of issues either fixed, containing architectural discussions, otherwise relevant
for this Pull Request.

Additional Notes

Anything else we should know when reviewing?

Checklist

[] I have added unit tests to the code I am submitting
[] My unit tests cover both failure and success scenarios
[] If applicable, I have discussed my architecture

glommer

Great work, and great numbers!

My main concern is that the eventfd handling code has broken a lot in the past. It is tricky and hard to test. Because of that, I am not convinced if using the same eventfd for both wake up notifications is the way to go. We need to add more special cases to a code that is already full of them.

Have you considered using a separate eventfd ?

glommer · 2021-10-05T19:18:25Z

glommio/src/sys/uring.rs

@@ -1604,14 +1601,16 @@ impl Reactor {
            membarrier::heavy();
            let events = process_remote_channels() + self.flush_syscall_thread();
            if events == 0 {
-                self.link_rings_and_sleep(&mut main_ring, &self.eventfd_src)
-                    .expect("some error");
+                if self.eventfd_src.is_installed().unwrap() {


I wonder if we shouldn't register two eventfds for this?

The wakeup code is tricky, and you are effectively adding one more condition.

It'd be better to check the eventfd before linking the rings, because if the eventfd is not installed, then we're not going to sleep, so there's no need to do the linking at all.

glommer · 2021-10-05T19:20:01Z

glommio/src/task/header.rs

@@ -27,6 +27,9 @@ pub(crate) struct Header {
    /// Current state of the task.
    pub(crate) state: u8,

+    /// Latency matters or not
+    pub(crate) latency_matters: bool,


may be better to represent it as Option<RawFd>, and add Some(eventfd). None would signal that we don't need to write to the eventfd.

glommer · 2021-10-05T19:20:51Z

glommio/src/task/raw.rs

@@ -221,6 +227,9 @@ where
            dbg_context!(ptr, "foreign", {
                let notifier = raw.notifier();
                notifier.queue_waker(Waker::from_raw(Self::clone_waker(ptr)));
+                if (*raw.header).latency_matters {


If we use a separate eventfd, we can then write if Some(fd) = latency_matters { write(fd) }

thirstycrow · 2021-10-06T13:54:58Z

@glommer There seems to be an issue with the existing code here (https://github.com/DataDog/glommio/blob/master/glommio/src/sys/uring.rs#L997-L1016) in that if it failed to install the eventfd because of running out of sqe, the next time when the executor runs here, it will going to sleep without installing the eventfd, because the result of the eventfd_src was already taken during the last run. I'm not quite sure about it, maybe I missed something, but if the reasoning is correct, it could be solved by using the same eventfd for both foreign wake and sleep awakening.

glommer · 2021-10-08T11:04:55Z

glommio/src/sys/uring.rs

+                .or_else(Reactor::busy_ok)
+                .is_err()
+            {
+                // What to do here?


if you can't install the eventfd, then sleep should be denied.

Indeed, and even in the case EBUSY was returned.

glommer · 2021-10-08T11:06:27Z

glommio/src/sys/uring.rs

            None,
            None,
        );
-        assert!(main_ring.install_eventfd(&eventfd_src));
+
+        if !eventfd_src.is_installed().unwrap() {


install_eventfd is already doing the outer check, so why do we need to check again ?

Reactor::wait is directly called in sys::uring::tests::timeout_smoke_test, so it is here to make the test pass.

glommer · 2021-10-08T11:09:40Z

Ok.

It seems you are also moving the eventfd handler from the main ring to the latency ring in all cases. That makes it simpler to unify.

This seems like an ok approach to me. The sleep code is tricky, but it is the kind of tricky that no amount of code reading will convince me that is ok... there are always hard to predict corner cases.

@HippoBaro Would you mind taking this for a production spin? Please note that you have to remove spin_before_park from the application, otherwise the sleep side of eventfd is not doing anything and we're not really testing it.

HippoBaro · 2021-10-08T12:38:31Z

@HippoBaro Would you mind taking this for a production spin? Please note that you have to remove spin_before_park from the application, otherwise the sleep side of eventfd is not doing anything and we're not really testing it.

I'd be happy to. I'll be OOO next week though so it may take a while before I get to that.

github-actions · 2021-10-09T02:56:17Z

Greetings @thirstycrow!

It looks like your PR added a new or changed an existing dependency, and CI has failed to validate your changes.
Some possible reasons this could happen:

One of the dependencies you added uses a restricted license. See deny.toml for a list of licenses we allow;
One of the dependencies you added has a known security vulnerability;
You added or updated a dependency and didn't update the LICENSE-3rdparty.csv file. To do so, run the following and commit the changes:

$ cargo install cargo-license
$ cargo license --all-features -a -j --no-deps -d | jq -r '(["Component","Origin","License","Copyright"]) as $cols | map(. as $row | ["name", "repository", "license", "authors"] | map($row[.])) as $rows | $cols, $rows[] | @csv' > LICENSE-3rdparty.csv.ci

Thank you!

github-actions · 2021-10-09T03:41:37Z

Greetings @thirstycrow!

It looks like your PR added a new or changed an existing dependency, and CI has failed to validate your changes.
Some possible reasons this could happen:

One of the dependencies you added uses a restricted license. See deny.toml for a list of licenses we allow;
One of the dependencies you added has a known security vulnerability;
You added or updated a dependency and didn't update the LICENSE-3rdparty.csv file. To do so, run the following and commit the changes:

$ cargo install cargo-license
$ cargo license --all-features -a -j --no-deps -d | jq -r '(["Component","Origin","License","Copyright"]) as $cols | map(. as $row | ["name", "repository", "license", "authors"] | map($row[.])) as $rows | $cols, $rows[] | @csv' > LICENSE-3rdparty.csv.ci

Thank you!

thirstycrow · 2021-10-09T07:25:02Z

@glommer Two more commits were added to avoid unnecessary writes to the eventfd, which is reduced by 86% in the foreign_wake benchmark.

glommer · 2021-10-19T17:47:36Z

glommio/src/sys/mod.rs

@@ -278,10 +271,10 @@ impl fmt::Debug for OsError {
 pub(crate) struct SleepNotifier {
    id: usize,
    eventfd: std::fs::File,
-    memory: AtomicUsize,
+    is_sleeping: AtomicBool,
+    notified: AtomicBool,


what's the difference between is_sleeping and notified ?

Do you rely on them both being manipulated as part of the same routine ? I don't see it, but please clarify.

Usually when you have two different atomic variables that have to be set in a particular order, that's when Relaxed memory order is no longer enough.

glommer · 2021-10-19T17:54:30Z

glommio/src/sys/mod.rs

-            0 => None,
-            x => Some(x as _),
+    pub(crate) fn notify_if_sleeping(&self) {
+        if self.is_sleeping.load(Ordering::Relaxed) {


This is very likely wrong.

Notice that this had Acquire semantics before, and you are moving it to Relaxed.
There are two atomic variables now, which in most cases is a dead giveaway that stronger consistency is needed.

Note that if the order is Relaxed, two threads can see those updates in arbitrary order. If you rely on a test for both variables to happen in a particular order, the last of them (inside notify_if_needed) should have Acquire / Release semantics.

glommer · 2021-10-19T17:55:46Z

glommio/src/sys/mod.rs

@@ -370,12 +360,11 @@ impl SleepNotifier {
        // This will allow this `eventfd` to be notified. This should not happen
        // for the placeholder (disconnected) case.
        assert_ne!(self.id, usize::MAX);
-        self.memory
-            .store(self.eventfd.as_raw_fd() as _, Ordering::SeqCst);
+        self.is_sleeping.store(true, Ordering::Relaxed);


same here.

You are reducing the memory ordering guarantees while making the code more dependent on ordering by adding another atomic variable.

If you really do believe those should be Relaxed, please try to make a case as to why (but very likely this can't be Relaxed)

@thirstycrow I am still failing to see why we need two atomic variables.

If you swap is_sleeping to false, wouldn't that have the same effect? Sure the name would be misleading, but we just have to rename the variable.

The sleep code would set this to 1 and then back to 0 at wake_up. The notification code would swap it to 0 and write to the eventfd if it was previously 1. There is a race here in that we can notify twice if we swapped to 0 on the notification side, and woke up at the same time. But I think your code has the same race, and also this is benign if the eventfd is handled correctly -- it will cause an extra exit but won't affect correctness.

HippoBaro · 2021-10-19T20:05:35Z

I got around to trying this out on our in-house app and although I didn't see improvements, I didn't see any pref regression either (I disabled spin_before_park during the test). We try to limit the number of cross-CPU wakes as much as possible so I'm not very surprised.

The change looks really good to me so pending Glauber's approval, I'll gladly merge this 🙇

glommer · 2021-10-19T21:14:41Z

I am happy overall but I think we need to make sure we nail the memory ordering. That's how you end up with Heisenbugs...

Let's spend some more cycles on that, and I want to know why you think Relaxed is okay for those.

thirstycrow · 2021-10-20T03:43:54Z

is_sleeping and notified are used together in notify_if_sleeping(). It would cause problem if write_eventfd should be called but didn't. That would happen if is_sleeping returned true and notified also returned true but should return false.
But that seems impossible because of the membarrier right after prepare_to_sleep where is_sleeping is set true, is this correct?

    pub(crate) fn notify_if_sleeping(&self) {
        if self.is_sleeping.load(Ordering::Relaxed) {
            self.notify_if_needed();
        }
    }

    pub(crate) fn notify_if_needed(&self) {
        if !self.notified.swap(true, Ordering::Relaxed) {
            write_eventfd(self.eventfd_fd());
        }
    }

github-actions · 2021-10-20T10:18:37Z

Greetings @thirstycrow!

It looks like your PR added a new or changed an existing dependency, and CI has failed to validate your changes.
Some possible reasons this could happen:

One of the dependencies you added uses a restricted license. See deny.toml for a list of licenses we allow;
One of the dependencies you added has a known security vulnerability;
You added or updated a dependency and didn't update the LICENSE-3rdparty.csv file. To do so, run the following and commit the changes:

$ cargo install cargo-license
$ cargo license --all-features -a -j --no-deps -d | jq -r '(["Component","Origin","License","Copyright"]) as $cols | map(. as $row | ["name", "repository", "license", "authors"] | map($row[.])) as $rows | $cols, $rows[] | @csv' > LICENSE-3rdparty.csv

Thank you!

github-actions · 2021-10-20T10:21:34Z

Greetings @thirstycrow!

It looks like your PR added a new or changed an existing dependency, and CI has failed to validate your changes.
Some possible reasons this could happen:

One of the dependencies you added uses a restricted license. See deny.toml for a list of licenses we allow;
One of the dependencies you added has a known security vulnerability;
You added or updated a dependency and didn't update the LICENSE-3rdparty.csv file. To do so, run the following and commit the changes:

$ cargo install cargo-license
$ cargo license --all-features -a -j --no-deps -d | jq -r '(["Component","Origin","License","Copyright"]) as $cols | map(. as $row | ["name", "repository", "license", "authors"] | map($row[.])) as $rows | $cols, $rows[] | @csv' > LICENSE-3rdparty.csv

Thank you!

thirstycrow · 2021-10-20T10:22:58Z

@glommer You're right, one boolean should be enough, the code is updated to work that way.

glommer · 2021-10-20T11:01:12Z

We are still, as a side-effect, moving the memory order to relaxed where it was SeqCst before. I think it should be fine, but if you have the chance to run the shared channel torture benchmark that would be a good test.

Other than that I'm ready to merge this once it passes all tests

@HippoBaro I strongly suspect that the dependency bot is broken, though. Maybe let's disable it for now if it keeps breaking.

thirstycrow · 2021-10-20T12:30:38Z

I think it should be fine, but if you have the chance to run the shared channel torture benchmark that would be a good test.

The test failed at the 84 round out of 100 with Too many open files. It seems the eventfd is not closed even if the executor thread finish.

There's 2 executors created in each round of test and 12 more eventfds shown by lsof after the executors finish. The problem exists with the master branch, so it's not introduced by this PR. I'll open another issue for this. #448

HippoBaro · 2021-10-20T13:12:42Z

@HippoBaro I strongly suspect that the dependency bot is broken, though. Maybe let's disable it for now if it keeps breaking.

The bot is not broken, if you look at the stdout of the job, it fails because chrono 0.4.19 has a known security issue. Unfortunately, this time around it hasn't been fixed yet. I made this CI job (Third-party / check) non-mandatory so it's not a blocker for merging this.

error[A001]: Potential segfault in `localtime_r` invocations
   ┌─ /github/workspace/Cargo.lock:19:1
   │
19 │ chrono 0.4.19 registry+https://github.com/rust-lang/crates.io-index
   │ ------------------------------------------------------------------- security vulnerability detected
   │
   = ID: RUSTSEC-2020-0159
   = Advisory: https://rustsec.org/advisories/RUSTSEC-2020-0159
   = ### Impact
     
     Unix-like operating systems may segfault due to dereferencing a dangling pointer in specific circumstances. This requires an environment variable to be set in a different thread than the affected functions. This may occur without the user's knowledge, notably in a third-party library.
     
     ### Workarounds
     
     No workarounds are known.
     
     ### References
     
     - [time-rs/time#293](https://github.com/time-rs/time/issues/293)
   = Announcement: https://github.com/chronotope/chrono/issues/499
   = Solution: No safe upgrade is available!
   = chrono v0.4.19
     └── tracing-subscriber v0.2.25
         └── (dev) glommio v0.6.0
             └── (dev) examples v0.0.0

advisories FAILED, bans ok, licenses ok, sources ok

waynexia · 2021-10-20T16:14:43Z

@HippoBaro How about letting that bot checks security problems against master branch or only trigger it when Cargo.toml changes? Control only emerging problems and leave the inspection of existing problems elsewhere.

HippoBaro · 2021-10-21T14:24:09Z

@waynexia This is a great idea. Let me merge this for now (I don't want to block you), and I will take some time to revisit the bot when I get the chance.

thirstycrow force-pushed the notify_on_foreign_wake branch from a14cfa3 to edab172 Compare October 5, 2021 18:15

glommer reviewed Oct 5, 2021

View reviewed changes

glommer reviewed Oct 8, 2021

View reviewed changes

thirstycrow force-pushed the notify_on_foreign_wake branch from edab172 to 4ec375d Compare October 9, 2021 02:54

thirstycrow force-pushed the notify_on_foreign_wake branch 2 times, most recently from 0b4e59b to c09df63 Compare October 9, 2021 03:40

thirstycrow force-pushed the notify_on_foreign_wake branch 2 times, most recently from 17acb3e to fcde0e1 Compare October 9, 2021 07:03

thirstycrow mentioned this pull request Oct 9, 2021

Benchmark foreign wake latencies #430

Closed

glommer reviewed Oct 19, 2021

View reviewed changes

jianghua added 7 commits October 20, 2021 18:19

Add a benchmark for measuring foreign wake latencies.

3019c1f

update LICENSE-3rdparty.csv

51c19a4

install the eventfd for foreign notification on the latency ring

578ec4e

send a notification on foreign wake

2b4d885

avoid unecessary notifications

6c96193

avoid unecessary notifications even if the reactor is sleeping

edfe246

use a single boolean for notifier status

254aca4

thirstycrow force-pushed the notify_on_foreign_wake branch from 1bf8752 to 254aca4 Compare October 20, 2021 10:19

HippoBaro merged commit 2fb4fb4 into DataDog:master Oct 21, 2021

thirstycrow mentioned this pull request Nov 14, 2021

fix foreign wakes #457

Merged

HippoBaro mentioned this pull request Nov 14, 2021

fix foreign wakes (ditto) #459

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce foreign wake latency #434

Reduce foreign wake latency #434

thirstycrow commented Oct 5, 2021

glommer left a comment

glommer Oct 5, 2021

thirstycrow Oct 7, 2021

glommer Oct 5, 2021

glommer Oct 5, 2021

thirstycrow commented Oct 6, 2021 •

edited

glommer Oct 8, 2021

thirstycrow Oct 9, 2021

glommer Oct 8, 2021

thirstycrow Oct 9, 2021

glommer commented Oct 8, 2021

HippoBaro commented Oct 8, 2021

github-actions bot commented Oct 9, 2021

github-actions bot commented Oct 9, 2021

thirstycrow commented Oct 9, 2021

glommer Oct 19, 2021

glommer Oct 19, 2021

glommer Oct 19, 2021

glommer Oct 20, 2021

HippoBaro commented Oct 19, 2021

glommer commented Oct 19, 2021

thirstycrow commented Oct 20, 2021

github-actions bot commented Oct 20, 2021

github-actions bot commented Oct 20, 2021

thirstycrow commented Oct 20, 2021

glommer commented Oct 20, 2021

thirstycrow commented Oct 20, 2021 •

edited

HippoBaro commented Oct 20, 2021

waynexia commented Oct 20, 2021

HippoBaro commented Oct 21, 2021

Reduce foreign wake latency #434

Reduce foreign wake latency #434

Conversation

thirstycrow commented Oct 5, 2021

What does this PR do?

Motivation

Related issues

Additional Notes

Checklist

glommer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thirstycrow commented Oct 6, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glommer commented Oct 8, 2021

HippoBaro commented Oct 8, 2021

github-actions bot commented Oct 9, 2021

github-actions bot commented Oct 9, 2021

thirstycrow commented Oct 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HippoBaro commented Oct 19, 2021

glommer commented Oct 19, 2021

thirstycrow commented Oct 20, 2021

github-actions bot commented Oct 20, 2021

github-actions bot commented Oct 20, 2021

thirstycrow commented Oct 20, 2021

glommer commented Oct 20, 2021

thirstycrow commented Oct 20, 2021 • edited

HippoBaro commented Oct 20, 2021

waynexia commented Oct 20, 2021

HippoBaro commented Oct 21, 2021

thirstycrow commented Oct 6, 2021 •

edited

thirstycrow commented Oct 20, 2021 •

edited