Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce foreign wake latency #434

Merged
merged 7 commits into from Oct 21, 2021

Conversation

thirstycrow
Copy link
Contributor

What does this PR do?

Reduce foreign wake latency by sending notifications via the event fd installed on the latency ring. Notifications will only be sent if the task being waken was spawned on a queue with latency requirement. A comparison between with and without notifications produced by the foreign_wake benchmark:

$ RUST_LOG=debug RUST_BACKTRACE=1 cargo bench --bench foreign_wake 
    Finished bench [optimized] target(s) in 0.04s
     Running unittests (target/release/deps/foreign_wake-6d33a328eb2cc763)

Latency requirement: NotImportant
Label              Min   Mean    P99  P99.9    Max
glommio->tokio:  0.014  0.285  0.614  0.822  0.902
tokio schedule:  0.000  0.008  0.055  0.123  0.178
tokio process :  0.515  1.962  2.965  3.067  3.161
tokio->glommio: 11.816 96.668 97.983 99.135 99.135

Latency requirement: Matters(100ms)
Label              Min   Mean    P99  P99.9    Max
glommio->tokio:  0.000  0.169  0.399  0.518  0.564
tokio schedule:  0.000  0.018  0.100  0.176  0.213
tokio process :  0.513  1.794  2.415  2.717  2.873
tokio->glommio:  0.004  0.374  0.633  0.677  0.730

Motivation

What inspired you to submit this pull request?

Related issues

A list of issues either fixed, containing architectural discussions, otherwise relevant
for this Pull Request.

Additional Notes

Anything else we should know when reviewing?

Checklist

[] I have added unit tests to the code I am submitting
[] My unit tests cover both failure and success scenarios
[] If applicable, I have discussed my architecture

Copy link
Collaborator

@glommer glommer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, and great numbers!

My main concern is that the eventfd handling code has broken a lot in the past. It is tricky and hard to test. Because of that, I am not convinced if using the same eventfd for both wake up notifications is the way to go. We need to add more special cases to a code that is already full of them.

Have you considered using a separate eventfd ?

@@ -1604,14 +1601,16 @@ impl Reactor {
membarrier::heavy();
let events = process_remote_channels() + self.flush_syscall_thread();
if events == 0 {
self.link_rings_and_sleep(&mut main_ring, &self.eventfd_src)
.expect("some error");
if self.eventfd_src.is_installed().unwrap() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we shouldn't register two eventfds for this?

The wakeup code is tricky, and you are effectively adding one more condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be better to check the eventfd before linking the rings, because if the eventfd is not installed, then we're not going to sleep, so there's no need to do the linking at all.

@@ -27,6 +27,9 @@ pub(crate) struct Header {
/// Current state of the task.
pub(crate) state: u8,

/// Latency matters or not
pub(crate) latency_matters: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be better to represent it as Option<RawFd>, and add Some(eventfd). None would signal that we don't need to write to the eventfd.

@@ -221,6 +227,9 @@ where
dbg_context!(ptr, "foreign", {
let notifier = raw.notifier();
notifier.queue_waker(Waker::from_raw(Self::clone_waker(ptr)));
if (*raw.header).latency_matters {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use a separate eventfd, we can then write if Some(fd) = latency_matters { write(fd) }

@thirstycrow
Copy link
Contributor Author

thirstycrow commented Oct 6, 2021

@glommer There seems to be an issue with the existing code here (https://github.com/DataDog/glommio/blob/master/glommio/src/sys/uring.rs#L997-L1016) in that if it failed to install the eventfd because of running out of sqe, the next time when the executor runs here, it will going to sleep without installing the eventfd, because the result of the eventfd_src was already taken during the last run. I'm not quite sure about it, maybe I missed something, but if the reasoning is correct, it could be solved by using the same eventfd for both foreign wake and sleep awakening.

.or_else(Reactor::busy_ok)
.is_err()
{
// What to do here?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can't install the eventfd, then sleep should be denied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, and even in the case EBUSY was returned.

None,
None,
);
assert!(main_ring.install_eventfd(&eventfd_src));

if !eventfd_src.is_installed().unwrap() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

install_eventfd is already doing the outer check, so why do we need to check again ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reactor::wait is directly called in sys::uring::tests::timeout_smoke_test, so it is here to make the test pass.

@glommer
Copy link
Collaborator

glommer commented Oct 8, 2021

Ok.

It seems you are also moving the eventfd handler from the main ring to the latency ring in all cases. That makes it simpler to unify.

This seems like an ok approach to me. The sleep code is tricky, but it is the kind of tricky that no amount of code reading will convince me that is ok... there are always hard to predict corner cases.

@HippoBaro Would you mind taking this for a production spin? Please note that you have to remove spin_before_park from the application, otherwise the sleep side of eventfd is not doing anything and we're not really testing it.

@HippoBaro
Copy link
Member

@HippoBaro Would you mind taking this for a production spin? Please note that you have to remove spin_before_park from the application, otherwise the sleep side of eventfd is not doing anything and we're not really testing it.

I'd be happy to. I'll be OOO next week though so it may take a while before I get to that.

@github-actions
Copy link

github-actions bot commented Oct 9, 2021

Greetings @thirstycrow!

It looks like your PR added a new or changed an existing dependency, and CI has failed to validate your changes.
Some possible reasons this could happen:

  • One of the dependencies you added uses a restricted license. See deny.toml for a list of licenses we allow;
  • One of the dependencies you added has a known security vulnerability;
  • You added or updated a dependency and didn't update the LICENSE-3rdparty.csv file. To do so, run the following and commit the changes:
$ cargo install cargo-license
$ cargo license --all-features -a -j --no-deps -d | jq -r '(["Component","Origin","License","Copyright"]) as $cols | map(. as $row | ["name", "repository", "license", "authors"] | map($row[.])) as $rows | $cols, $rows[] | @csv' > LICENSE-3rdparty.csv.ci

Thank you!

@thirstycrow thirstycrow force-pushed the notify_on_foreign_wake branch 2 times, most recently from 0b4e59b to c09df63 Compare October 9, 2021 03:40
@github-actions
Copy link

github-actions bot commented Oct 9, 2021

Greetings @thirstycrow!

It looks like your PR added a new or changed an existing dependency, and CI has failed to validate your changes.
Some possible reasons this could happen:

  • One of the dependencies you added uses a restricted license. See deny.toml for a list of licenses we allow;
  • One of the dependencies you added has a known security vulnerability;
  • You added or updated a dependency and didn't update the LICENSE-3rdparty.csv file. To do so, run the following and commit the changes:
$ cargo install cargo-license
$ cargo license --all-features -a -j --no-deps -d | jq -r '(["Component","Origin","License","Copyright"]) as $cols | map(. as $row | ["name", "repository", "license", "authors"] | map($row[.])) as $rows | $cols, $rows[] | @csv' > LICENSE-3rdparty.csv.ci

Thank you!

@thirstycrow thirstycrow force-pushed the notify_on_foreign_wake branch 2 times, most recently from 17acb3e to fcde0e1 Compare October 9, 2021 07:03
@thirstycrow
Copy link
Contributor Author

@glommer Two more commits were added to avoid unnecessary writes to the eventfd, which is reduced by 86% in the foreign_wake benchmark.

@@ -278,10 +271,10 @@ impl fmt::Debug for OsError {
pub(crate) struct SleepNotifier {
id: usize,
eventfd: std::fs::File,
memory: AtomicUsize,
is_sleeping: AtomicBool,
notified: AtomicBool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between is_sleeping and notified ?

Do you rely on them both being manipulated as part of the same routine ? I don't see it, but please clarify.

Usually when you have two different atomic variables that have to be set in a particular order, that's when Relaxed memory order is no longer enough.

0 => None,
x => Some(x as _),
pub(crate) fn notify_if_sleeping(&self) {
if self.is_sleeping.load(Ordering::Relaxed) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very likely wrong.

Notice that this had Acquire semantics before, and you are moving it to Relaxed.
There are two atomic variables now, which in most cases is a dead giveaway that stronger consistency is needed.

Note that if the order is Relaxed, two threads can see those updates in arbitrary order. If you rely on a test for both variables to happen in a particular order, the last of them (inside notify_if_needed) should have Acquire / Release semantics.

@@ -370,12 +360,11 @@ impl SleepNotifier {
// This will allow this `eventfd` to be notified. This should not happen
// for the placeholder (disconnected) case.
assert_ne!(self.id, usize::MAX);
self.memory
.store(self.eventfd.as_raw_fd() as _, Ordering::SeqCst);
self.is_sleeping.store(true, Ordering::Relaxed);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

You are reducing the memory ordering guarantees while making the code more dependent on ordering by adding another atomic variable.

If you really do believe those should be Relaxed, please try to make a case as to why (but very likely this can't be Relaxed)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thirstycrow I am still failing to see why we need two atomic variables.

If you swap is_sleeping to false, wouldn't that have the same effect? Sure the name would be misleading, but we just have to rename the variable.

The sleep code would set this to 1 and then back to 0 at wake_up. The notification code would swap it to 0 and write to the eventfd if it was previously 1. There is a race here in that we can notify twice if we swapped to 0 on the notification side, and woke up at the same time. But I think your code has the same race, and also this is benign if the eventfd is handled correctly -- it will cause an extra exit but won't affect correctness.

@HippoBaro
Copy link
Member

I got around to trying this out on our in-house app and although I didn't see improvements, I didn't see any pref regression either (I disabled spin_before_park during the test). We try to limit the number of cross-CPU wakes as much as possible so I'm not very surprised.

The change looks really good to me so pending Glauber's approval, I'll gladly merge this 🙇

@glommer
Copy link
Collaborator

glommer commented Oct 19, 2021

I am happy overall but I think we need to make sure we nail the memory ordering. That's how you end up with Heisenbugs...

Let's spend some more cycles on that, and I want to know why you think Relaxed is okay for those.

@thirstycrow
Copy link
Contributor Author

is_sleeping and notified are used together in notify_if_sleeping(). It would cause problem if write_eventfd should be called but didn't. That would happen if is_sleeping returned true and notified also returned true but should return false.
But that seems impossible because of the membarrier right after prepare_to_sleep where is_sleeping is set true, is this correct?

    pub(crate) fn notify_if_sleeping(&self) {
        if self.is_sleeping.load(Ordering::Relaxed) {
            self.notify_if_needed();
        }
    }

    pub(crate) fn notify_if_needed(&self) {
        if !self.notified.swap(true, Ordering::Relaxed) {
            write_eventfd(self.eventfd_fd());
        }
    }

@github-actions
Copy link

Greetings @thirstycrow!

It looks like your PR added a new or changed an existing dependency, and CI has failed to validate your changes.
Some possible reasons this could happen:

  • One of the dependencies you added uses a restricted license. See deny.toml for a list of licenses we allow;
  • One of the dependencies you added has a known security vulnerability;
  • You added or updated a dependency and didn't update the LICENSE-3rdparty.csv file. To do so, run the following and commit the changes:
$ cargo install cargo-license
$ cargo license --all-features -a -j --no-deps -d | jq -r '(["Component","Origin","License","Copyright"]) as $cols | map(. as $row | ["name", "repository", "license", "authors"] | map($row[.])) as $rows | $cols, $rows[] | @csv' > LICENSE-3rdparty.csv

Thank you!

@github-actions
Copy link

Greetings @thirstycrow!

It looks like your PR added a new or changed an existing dependency, and CI has failed to validate your changes.
Some possible reasons this could happen:

  • One of the dependencies you added uses a restricted license. See deny.toml for a list of licenses we allow;
  • One of the dependencies you added has a known security vulnerability;
  • You added or updated a dependency and didn't update the LICENSE-3rdparty.csv file. To do so, run the following and commit the changes:
$ cargo install cargo-license
$ cargo license --all-features -a -j --no-deps -d | jq -r '(["Component","Origin","License","Copyright"]) as $cols | map(. as $row | ["name", "repository", "license", "authors"] | map($row[.])) as $rows | $cols, $rows[] | @csv' > LICENSE-3rdparty.csv

Thank you!

@thirstycrow
Copy link
Contributor Author

@glommer You're right, one boolean should be enough, the code is updated to work that way.

@glommer
Copy link
Collaborator

glommer commented Oct 20, 2021

We are still, as a side-effect, moving the memory order to relaxed where it was SeqCst before. I think it should be fine, but if you have the chance to run the shared channel torture benchmark that would be a good test.

Other than that I'm ready to merge this once it passes all tests

@HippoBaro I strongly suspect that the dependency bot is broken, though. Maybe let's disable it for now if it keeps breaking.

@thirstycrow
Copy link
Contributor Author

thirstycrow commented Oct 20, 2021

I think it should be fine, but if you have the chance to run the shared channel torture benchmark that would be a good test.

The test failed at the 84 round out of 100 with Too many open files. It seems the eventfd is not closed even if the executor thread finish.

There's 2 executors created in each round of test and 12 more eventfds shown by lsof after the executors finish. The problem exists with the master branch, so it's not introduced by this PR. I'll open another issue for this. #448

@HippoBaro
Copy link
Member

@HippoBaro I strongly suspect that the dependency bot is broken, though. Maybe let's disable it for now if it keeps breaking.

The bot is not broken, if you look at the stdout of the job, it fails because chrono 0.4.19 has a known security issue. Unfortunately, this time around it hasn't been fixed yet. I made this CI job (Third-party / check) non-mandatory so it's not a blocker for merging this.

error[A001]: Potential segfault in `localtime_r` invocations
   ┌─ /github/workspace/Cargo.lock:19:1
   │
19 │ chrono 0.4.19 registry+https://github.com/rust-lang/crates.io-index
   │ ------------------------------------------------------------------- security vulnerability detected
   │
   = ID: RUSTSEC-2020-0159
   = Advisory: https://rustsec.org/advisories/RUSTSEC-2020-0159
   = ### Impact
     
     Unix-like operating systems may segfault due to dereferencing a dangling pointer in specific circumstances. This requires an environment variable to be set in a different thread than the affected functions. This may occur without the user's knowledge, notably in a third-party library.
     
     ### Workarounds
     
     No workarounds are known.
     
     ### References
     
     - [time-rs/time#293](https://github.com/time-rs/time/issues/293)
   = Announcement: https://github.com/chronotope/chrono/issues/499
   = Solution: No safe upgrade is available!
   = chrono v0.4.19
     └── tracing-subscriber v0.2.25
         └── (dev) glommio v0.6.0
             └── (dev) examples v0.0.0

advisories FAILED, bans ok, licenses ok, sources ok

@waynexia
Copy link
Contributor

@HippoBaro How about letting that bot checks security problems against master branch or only trigger it when Cargo.toml changes? Control only emerging problems and leave the inspection of existing problems elsewhere.

@HippoBaro
Copy link
Member

@waynexia This is a great idea. Let me merge this for now (I don't want to block you), and I will take some time to revisit the bot when I get the chance.

@HippoBaro HippoBaro merged commit 2fb4fb4 into DataDog:master Oct 21, 2021
@thirstycrow thirstycrow mentioned this pull request Nov 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants