Spawn several threads when we fail to enqueue work in the blocki… #181

spacejam · 2019-09-11T07:45:17Z

This makes the spin-up time on the blocking threadpool far more efficient when encountering bursts.

vertexclique · 2019-09-11T11:37:49Z

This implementation will be ~2x slower (if we assume there are 4 cores) due to the congestion of threads in any scenario and batch allocation under pressure(with long bursts, short bursts, without burst, long-running tasks, etc.) which is not desired for most systems (incl. embedded systems – if this library is targeted for embedded domain too) in the future.

Missing parts:

This PR doesn't solve system boundary problems, which is tested and fixed in #108
System boundary problem issue can be seen here: #126

vertexclique · 2019-09-11T12:22:17Z

I've run the benchmarks, here are the results for this implementation:

running 2 tests
test blocking        ... bench: 567,998,519 ns/iter (+/- 381,638,949)
test blocking_single ... bench:      62,606 ns/iter (+/- 19,482)

spacejam · 2019-09-11T12:46:03Z

@vertexclique with my measurements it behaves the same as #108, and sometimes with less jitter because it's not dependent on the single thread that only adjusts every 200ms. The main thing that #108 improves over master on is that it will spawn more than one thread at a time. However, as we discussed, #108 is more complex than necessary, and in any case would require significant clean-up before it reaches a mergeable state.

blocking 10k bench, measuring spawn latency distribution in terms of microseconds per spawn, 2018 i7 macbook pro with no foreground processes running other than tmux:

master:

Histogram[(0 -> 2) (50 -> 4) (75 -> 4) (90 -> 5) (95 -> 6) (97.5 -> 10) (99 -> 14) (99.9 -> 38) (99.99 -> 241) (100 -> 6973) ]

108:

Histogram[(0 -> 1) (50 -> 2) (75 -> 2) (90 -> 3) (95 -> 3) (97.5 -> 5) (99 -> 8) (99.9 -> 44) (99.99 -> 79) (100 -> 5883) ]

this:

Histogram[(0 -> 1) (50 -> 1) (75 -> 1) (90 -> 2) (95 -> 3) (97.5 -> 3) (99 -> 5) (99.9 -> 18) (99.99 -> 364) (100 -> 5431) ]

In general with performance-related code, please test using a proper histogram library like historian to get a sense of tail latency for actual requests, I think it will make conversations around performance more productive. Statements like "this is 2x slower" are not specific or actionable.

From my measurements, I'm getting roughly the same performance, with a small fraction of the additional code and almost none of the math complexity or bespoke thread operational considerations.

spacejam · 2019-09-11T12:49:31Z

Curious to see your results on more hardware, here's the general recipe:

diff --git a/Cargo.toml b/Cargo.toml
index ca0f3fc..f9f40b3 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -25,6 +25,7 @@ unstable = []

 [dependencies]
 async-task = "1.0.0"
+historian = "*"
 cfg-if = "0.1.9"
 crossbeam-channel = "0.3.9"
 futures-core-preview = "0.3.0-alpha.18"
diff --git a/src/task/blocking.rs b/src/task/blocking.rs
index cb2b891..6cc5a3b 100644
--- a/src/task/blocking.rs
+++ b/src/task/blocking.rs
@@ -45,6 +45,7 @@ lazy_static! {
         let (sender, receiver) = bounded(0);
         Pool { sender, receiver }
     };
+    pub static ref SPAWN_TIME: historian::Histo = historian::Histo::default();
 }

 // Create up to MAX_THREADS dynamic blocking task worker threads.
@@ -108,7 +109,9 @@ where
     R: Send + 'static,
 {
     let (task, handle) = async_task::spawn(future, schedule, ());
+    let now = std::time::Instant::now();
     task.schedule();
+    SPAWN_TIME.measure(now.elapsed().as_micros() as f64);
     JoinHandle(handle)
 }

diff --git a/src/task/mod.rs b/src/task/mod.rs
index eef7284..2c5d8db 100644
--- a/src/task/mod.rs
+++ b/src/task/mod.rs
@@ -36,4 +36,4 @@ mod pool;
 mod sleep;
 mod task;

-pub(crate) mod blocking;
+pub mod blocking;

spacejam · 2019-09-11T12:57:19Z

oh, actual bench code:

λ cat src/bin/blocking_histos.rs
use async_std::task;
use std::thread;
use std::time::Duration;

fn main() {
    let handles = (0..10_000)
        .map(|_| {
            task::blocking::spawn(async {
                let duration = Duration::from_millis(1);
                thread::sleep(duration);
            })
        })
        .collect::<Vec<_>>();

    task::blocking::SPAWN_TIME.print_percentiles();
}

vertexclique · 2019-09-11T15:03:51Z

I think what you are measuring is the spawn time on the pool. It will be always approximately same on every implementation.

Also, this histogram's will give exact same numbers in every implementation where it spawns on error case onto the thread pool. The only difference will be caused by how many you spawn dynamically. That will only scale up the percentile values towards the 0th percentile. While comparing that time to the EMA PR is comparing apples to nothing. Because it doesn't do the spawn in there.

From my measurements, I'm getting roughly the same performance, with a small fraction of the additional code and almost none of the math complexity or bespoke thread operational considerations.

Total job in/out time is something else. Job-in rate overhead is something else. You are calculating the latter.

Since ema pr didn't well fit(also I don't think that it is well-read) I am going to close it. It is up to you what you want to use.

naftulikay · 2019-09-11T18:23:07Z

Has the work on #108, the exponential moving average thread pool design, been closed and rejected? I'm not seeing an implementation of that here in this PR.

spacejam · 2019-09-12T08:56:41Z

@naftulikay this PR does not use an exponential moving average, it just spins up more than one thread at a time to get pretty much the same effect with far less complexity.

gnzlbg · 2019-09-22T18:40:22Z

Is there a benchmark comparing this with EMA somewhere ?

vertexclique · 2019-09-24T18:48:09Z

@gnzlbg Yes. In the EMA branch there is a benchmark comparing total time of execution for any implementation of blocking pool.

You can run it with this branch. I ran and wrote the results in the second comment.

EMA's benhcmark result is in the PR description.

yoshuawuyts · 2019-10-28T09:39:21Z

ping @spacejam @stjepang what - what would we like to do with this PR?

ghost · 2019-11-01T19:56:03Z

Rebased onto master. Merge if this passes CI :)

ghost · 2019-11-01T20:53:01Z

Switched to unbounded channels because that improves our benchmark numbers here by an order of magnitude: https://github.com/jebrosen/async-file-benchmark

Reading a 256K file:

tokio: 0.136 sec
async_std: 0.086 sec

vertexclique mentioned this pull request Sep 11, 2019

EMA based statistically adaptive thread pool design #108

Closed

yoshuawuyts mentioned this pull request Sep 18, 2019

Blocking pool doesn't have backpressure on OSX #126

Open

yoshuawuyts added the enhancement New feature or request label Sep 18, 2019

Rebase onto master

ee50156

ghost force-pushed the tyler_spawn_several_threads branch from 726ad2c to ee50156 Compare November 1, 2019 19:55

Switch to unbounded channels

20d39ef

ghost changed the title ~~Spawn several threads when we fail to enqueue work in the blocking threadpool~~ Spawn several threads when we fail to enqueue work in the blocki… Nov 1, 2019

ghost merged commit 5adc608 into async-rs:master Nov 1, 2019

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spawn several threads when we fail to enqueue work in the blocki… #181

Spawn several threads when we fail to enqueue work in the blocki… #181

spacejam commented Sep 11, 2019

vertexclique commented Sep 11, 2019

vertexclique commented Sep 11, 2019

spacejam commented Sep 11, 2019 •

edited

Loading

spacejam commented Sep 11, 2019 •

edited by yoshuawuyts

Loading

spacejam commented Sep 11, 2019

vertexclique commented Sep 11, 2019 •

edited

Loading

naftulikay commented Sep 11, 2019

spacejam commented Sep 12, 2019

gnzlbg commented Sep 22, 2019

vertexclique commented Sep 24, 2019

yoshuawuyts commented Oct 28, 2019

ghost commented Nov 1, 2019

ghost commented Nov 1, 2019

Spawn several threads when we fail to enqueue work in the blocki… #181

Spawn several threads when we fail to enqueue work in the blocki… #181

Conversation

spacejam commented Sep 11, 2019

vertexclique commented Sep 11, 2019

vertexclique commented Sep 11, 2019

spacejam commented Sep 11, 2019 • edited Loading

spacejam commented Sep 11, 2019 • edited by yoshuawuyts Loading

spacejam commented Sep 11, 2019

vertexclique commented Sep 11, 2019 • edited Loading

naftulikay commented Sep 11, 2019

spacejam commented Sep 12, 2019

gnzlbg commented Sep 22, 2019

vertexclique commented Sep 24, 2019

yoshuawuyts commented Oct 28, 2019

ghost commented Nov 1, 2019

ghost commented Nov 1, 2019

spacejam commented Sep 11, 2019 •

edited

Loading

spacejam commented Sep 11, 2019 •

edited by yoshuawuyts

Loading

vertexclique commented Sep 11, 2019 •

edited

Loading