Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New scheduler resilient to blocking #631

Open
wants to merge 2 commits into
base: master
from

Conversation

@stjepang
Copy link
Member

stjepang commented Dec 16, 2019

This PR replaces the current scheduler with a new one. Differences:

  • The executor and reactor are more closely coupled. We don't have a networking driver thread anymore. Instead, executor threads poll mio.
  • The executor creates new threads only when there is too much work to handle. This way the executor automatically becomes single-threaded if one thread can handle all the work.
  • The executor detects blocking tasks and hands off the event loop to a new thread. With this strategy, we don't need spawn_blocking and can eventually remove it. Note that spawn_blocking now simply calls spawn.
@yoshuawuyts

This comment has been minimized.

Copy link
Contributor

yoshuawuyts commented Dec 16, 2019

@stjepang CI seems to be failing due to missing imports. Wondering if something in a rebase maybe went wrong?

@sdroege

This comment has been minimized.

Copy link

sdroege commented Dec 16, 2019

  • The executor creates new threads only when there is too much work to handle. This way the executor automatically becomes single-threaded if one thread can handle all the work.

OOC, what are the thresholds/rules when it adds a new thread or when it removes a thread, and if adding a new thread/removing one, how does it distribute the existing tasks and event sources (fds, etc) among the threads?

@vertexclique

This comment has been minimized.

Copy link

vertexclique commented on 36d24cd Dec 16, 2019

Neato @stjepang ❤️ Much love!

let reactor = &REACTOR;
let mut events = mio::Events::with_capacity(1000);

loop {
// Block on the poller until at least one new event comes in.

This comment has been minimized.

Copy link
@rafaelcaricio

rafaelcaricio Dec 16, 2019

Or until timeout.

@asonix

This comment has been minimized.

Copy link

asonix commented Dec 16, 2019

I'd love an overview of how "this executor detects blocking tasks" works. Do tasks need to identify themselves as containing blocking operations or is there something smarter going on?

static YIELD_NOW: Cell<bool> = Cell::new(false);
}

/// Scheduler state.

This comment has been minimized.

Copy link
@yoshuawuyts

yoshuawuyts Dec 16, 2019

Contributor

I would love a brief overview of terminology here. Words like "machine" and "processor" mean specific things in this context, and are necessary to understand to make sense of what's going on.

It seems like a "machine" is somewhere between a thread and a task. Is this like the old blocking pool? And a "processor" exists as one per core, so I assume it's like just the general worker queue?

@overdrivenpotato

This comment has been minimized.

Copy link

overdrivenpotato commented Dec 16, 2019

If you use this scheduler with a blocking database like diesel, won't you effectively end up with 1 thread per connection if latency is over 10ms? Why even bother using async-std instead of just writing a blocking server in that case?

From the blog post:

The new runtime enables you to fearlessly use synchronous libraries like diesel or rayon in any async-std application.

@skade

This comment has been minimized.

Copy link
Collaborator

skade commented Dec 16, 2019

@overdrivenpotato because it still gives you evented IO to the outside before starting to query the DH, where you usually don't have a client (potentially slowly) sending anymore.


// If there is a machine that is stuck on a task and not making any progress, steal its
// processor and set up a new machine to take over.
for m in &mut sched.machines {

This comment has been minimized.

Copy link
@lloydmeta

lloydmeta Dec 16, 2019

Interesting, it looks like the Runtime's run loop works in tandem with each machine's run loop to set-and-detect progress for each machine and spawns things as required?

Specifically, the scheduler's Runtime's run loop invokes make_machines, which sets each existing Machine's progress to false, and each Machine's, while running, constantly sets it back to true. So long as each Machine sets its own progress to true faster than the Runtime's run loop invokes make_machines (dependent on the delay, which is itself dependent on the need for new machines), no new Machines get spawned?

@overdrivenpotato

This comment has been minimized.

Copy link

overdrivenpotato commented Dec 17, 2019

@skade I see, thanks for the response. However this doesn't alleviate the 1 thread per connection problem - I think an example will clear up what I'm getting with the stjepang:new-scheduler branch:

I'm running into a panic! when running the following code:

use async_std::task;
use std::{thread, time::Duration};

const THREADS: usize = 10_000;

async fn make_response() {
    // Some long-running synchronous task...
    thread::sleep(Duration::from_millis(5000));
}

async fn async_main() {
    // Simulate 10k *successful* requests - we only need a response.
    for _ in 0..THREADS {
        task::spawn(make_response());
    }

    // Wait for all tasks to spawn & finish.
    thread::sleep(Duration::from_millis(30000));
}

fn main() {
    task::block_on(async_main());
    println!("Done.");
}
thread 'async-std/runtime' panicked at 'cannot start a machine thread: Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" }', src/libcore/result.rs:1165:5
stack backtrace:
   0:        0x109d55495 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h66f38d4abb2e41c1
   1:        0x109d6ddc0 - core::fmt::write::h4f487e714088986d
   2:        0x109d5391b - std::io::Write::write_fmt::h7bd7b6fe8b47c5fb
   3:        0x109d5710a - std::panicking::default_hook::{{closure}}::heee79636c241547c
   4:        0x109d56e15 - std::panicking::default_hook::hfcbd07059d15441e
   5:        0x109d577d7 - std::panicking::rust_panic_with_hook::h0c4b67125f55410a
   6:        0x109d5734d - std::panicking::continue_panic_fmt::h0e74ab2b215a1401
   7:        0x109d572a9 - rust_begin_unwind
   8:        0x109d7208f - core::panicking::panic_fmt::h09741a3213dba543
   9:        0x109d72169 - core::result::unwrap_failed::hf47c31c429b02014
  10:        0x109d3cdf5 - async_std::rt::runtime::Runtime::run::{{closure}}::hc49ce7e9abd42f33
  11:        0x109d3b2f9 - crossbeam_utils::thread::scope::{{closure}}::h4beb1775084ec8f5
  12:        0x109d3b2e9 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h4023904186d71ec7
  13:        0x109d40183 - std::panicking::try::do_call::h02701ca8e38cb843
  14:        0x109d5963f - __rust_maybe_catch_panic
  15:        0x109d3d599 - crossbeam_utils::thread::scope::h9004e3edab86f4fb
  16:        0x109d3ed1e - std::sys_common::backtrace::__rust_begin_short_backtrace::h2f0a676e6aea99cd
  17:        0x109d5963f - __rust_maybe_catch_panic
  18:        0x109d40283 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h2f07a838a2089d5b
  19:        0x109d505fe - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::hd1afe4d791e3679f
  20:        0x109d58fee - std::sys::unix::thread::Thread::new::thread_start::h785fcf8d81bd0f52
  21:     0x7fff720d0d76 - _pthread_start

However when I change the blocking thread::sleep(...) to task::sleep(...).await, there are no panics and the example always succeeds without fail. Is this expected behaviour?

@yoshuawuyts

This comment has been minimized.

Copy link
Contributor

yoshuawuyts commented Dec 17, 2019

@overdrivenpotato The default stack size for a thread on unix is 2mb, but often set to 8mb. You're forcing 10.000 threads to be spawned concurrently concurrently. This means that unless you have >=20GB (or >=80GB) of RAM, your RAM will likely be exhausted by spawning threads making it impossible to spawn any more threads. This is expected behavior.

Tasks on the other hand are stackless which means by default they only take up their own allocation's worth of memory, making it possible to spawn much higher numbers. Which is likely why you're seeing tasks succeed, but threads fail.

However conversation has been ongoing about introducing ways we can place upper limits on the amount of threads spawned in scenarios such as these.

@overdrivenpotato

This comment has been minimized.

Copy link

overdrivenpotato commented Dec 17, 2019

@yoshuawuyts So if the new runtime is expected to choke on a bare C10k example, why does the async-std blog post advertise the following?

The new async-std runtime relieves programmers of the burden of isolating blocking code from async code. You simply don’t have to worry about it anymore.

The new runtime enables you to fearlessly use synchronous libraries like diesel or rayon in any async-std application.

The new runtime relieves you of these concerns and allows you to do the blocking operation directly inside the async function

This strategy eliminates the need for separating async and blocking code.

You can see in my example that this PR combined with simple blocking calls is not scalable to C10k. Specifically, you'll run into this same issue when using synchronous libraries like diesel if you don't isolate the blocking calls into something like a thread pool.

Is it not misleading to make the above claims? Is async-std not intended to cater to large scale servers?

FWIW the PR as a standalone feature is great - it is a much better fallback than blocking the entire executor. However I'm confused about the messaging going on and ecosystem recommendations.

@skade

This comment has been minimized.

Copy link
Collaborator

skade commented Dec 17, 2019

@overdrivenpotato But this is not the c10k problem.

The c10k problem is handling 10000 clients at the same time and keeping their sockets open. It is very usual to do async IO on the edges (because your client might be slow, the whole thing) and then spin up a thread for request handling. The library will not spawn 10000 threads on for the network sockets. It will even just scale down to 1 thread for 10000 sleeping sockets, which no other does.

Again: what you present is the "sleep 10000 threads problem" and has no relation to c10k. Forcibly exhausting the resource pool of a system is not very good example for anything, every practical system can be exhausted if you aim at the right spot internally.

@overdrivenpotato

This comment has been minimized.

Copy link

overdrivenpotato commented Dec 17, 2019

@skade I am only using sleeping threads for simplicity of example; the sleep can be replaced with something like a diesel query to make a real C10k case. We also are on the same page regarding edge async I/O for slow clients, this is why my example does not include requests and only calls make_response. There is no disagreement on async for edge I/O.

With that said, it is entirely realistic to have 10,000 concurrent requests going to and from the database layer. Here is a pseudocode example:

async fn make_response() -> u64 {
    diesel::sql_query("SELECT COUNT(*) FROM users")
        .first(CONNECTION)
        .unwrap()
}

This is a realistic and concrete C10k example. As with the previous example, this example could be expanded - this time with something like a real SQL query for a real application, but this is entirely orthogonal to the problem. The example could also call one of several blocking functions based on HTTP request route - again this is not the point of the example. The point is that async-std with this blocking method will not be able to handle 10k simultaneous connections calling blocking functions, as it will exhaust the number of threads available.

I can see that the async-std team is frustrated by some of the attacks that the community is sending out - this is not intended to be one of them. I am evaluating this library for production purposes, and so I am genuinely curious, is this just an oversight in usage of the new runtime? Is there some sort of post planned to clarify that blocking is not worry free?

Help in understanding this situation is appreciated. I am sure others will also find this thread useful.

@skade

This comment has been minimized.

Copy link
Collaborator

skade commented Dec 17, 2019

@overdrivenpotato You are still oversimplifying (and I don't want to sound hostile there) and I think that makes this discussion hard. First of all, this problem would exist if you use spawn_blocking as well (which ends you on threadpool, here we go) - and all systems I have seen using spawn_blocking naively are subject to exactly the same case. So if you run along those edges, you need to isolate anyways. It is completely okay in a c10k setting to not handle 10000 clients in one go, but to stagger them a little in high load scenarios.

Particularly, with our system, you need to first hit the case where all 10000 queries are active at the same time and don't return within a certain time frame.

As an more complex illustration:

async fn handle_request() {
    // some socket
    socket.read_to_string(&mut my_request).await;
    let result = do_database_request(&my_request); // may block
    socket.write_bytes(&result).await;
}

This task mixes a potentially blocking and two non-blocking stretches. The non-blocking stretches lay themselves to sleep in a classic fashion. Assuming that the database request is local and takes ~5ms and the client is a mobile phone with takes ~100ms per request, the chance of 10000 requests being in the database request phase at the same time is very low and needs vastly over 10000 clients. Plus: a fast database access can regulary be so fast that it runs below the general time that is considered a "blocked" task. So this is already a statistical process.

Note that this small time blocking (our current threshold is 10ms) is very usual even in tasks, e.g. for some computation or even memory allocation/reclamation.

Now, consider this approach:

async fn handle_request() {
    // some socket
    socket.read_to_string(&mut my_request).await;
    let result = do_database_request_with_caching(&my_request); //does a db request, statistically, in 33% of all cases
    socket.write_bytes(&result).await;
}

The whole thing becomes even harder to predict. If you were to wrap this in spawn_blocking, you would force a background thread to exist for every concurrent request, even if the cache is hit, making your initial scenario (resource exhaustion) more likely. You could now go and implement do_database_request_with_caching in a scheduler-aware fashion and move the blocking functionality in there, but that gets messy quick.

And this is the statistical process that makes an adaptive runtime so teasing: whether a task spends too much time or not is runtime behaviour. The problem is that that many people overlook is that the adaptive behaviour goes both ways: it will not throw additional resources at the problem if a task does not take much time, be it blocking or not.

The ceiling is really high there.

And this is what we mean with "you shouldn't worry about blocking": write your code, the runtime supports you and often does a right thing in a dynamic setting. Figuring out if it does exactly the right thing requires tracing of an actual systems and not uniform spawned tasks that operate on one edge of the spectrum. When you hit limits, start optimising and potentially using other approaches, e.g. a fixed size threadpool with a request queue for database request handling.

This does not say "you shouldn't think about the structure of your system", especially you should think about capacities. But the hyperfocus on blocking clouds our view there, IMHO. As an example: we have tons of non-blocking, fully-async libraries that have no mitigations against slowloris.

I stand by that statement, but yes, we intend to write more documentation specifically around the construction of systems (and also provide the appropriate tracing features to figure out actual problems in the future, if possible).

Given no system at hand and the whole thing being dependent on 100 of variables, there's not much we can do except building it and finding if we have actual problems.

I hope that helps your evaluation, I'd be happy to discuss any more complex case.

Addition: you can also think of more complex cases, where you read off a blocking socket (file) to a non-blocking socket (network) in a loop and it works fine, because the blocking call is most likely by any measure very fast.

@overdrivenpotato

This comment has been minimized.

Copy link

overdrivenpotato commented Dec 17, 2019

@skade Thanks for the reply. This specific statement clarifies to me your intentions with the new runtime:

[...] so if you run along those edges, you need to isolate anyways.

I see now that async-std does not recommend blindly eschewing sync isolation in favor of blocking calls. Have you considered softening your message: "You simply don’t have to worry about [blocking] anymore"? This is a fairly literal statement and I have assumed from the start that it is meant to be true. I would imagine many others are confused by this, possibly explaining the backlash you guys have received over this PR. Perhaps it would be better to imply that blocking is sometimes better and this should be measured for before assuming one way or the other. It is my experience that Rust developers are concerned heavily with correctness, moreso than other communities; and potential pitfalls like spurious panics are not worth the potential minor speed gains. FWIW, when using tokio, the natural way for me to implement a diesel bridge was a fixed-size thread pool that resulted in no danger of spurious panics.

Have you also considered drafting an RFC to relax the poll requirements on the Future trait? The runtime contract states:

An implementation of poll should strive to return quickly, and should not block.

According to the API, deliberately crafting async functions with blocking characteristics (not necessarily just database calls) breaks the non-blocking requirement. This does pose a coupling requirement as libraries specifically built for async-std may not work correctly for another executor like tokio or even in basic combinators such as those found in futures-rs.

Personally, these two points are important. Rust is a niche language as it stands, and any source of confusion makes it much more difficult to onboard for. It is even harder if the general Rust community is reading into this feature as making blocking "a thing of the past" based on an official blog post.

@skade

This comment has been minimized.

Copy link
Collaborator

skade commented Dec 17, 2019

@overdrivenpotato I consider the danger of async-std panicking along the level of the allocator failing (which panics or segfaults, depending on platform). In particular, a library correctly panicking according to a definition is correct - it has reached a state where it cannot continue.

For completeness sake (as everyone loves to just quote the first part, I wonder why), the full contract of Future is:

An implementation of poll should strive to return quickly, and should not block. Returning quickly prevents unnecessarily clogging up threads or event loops. If it is known ahead of time that a call to poll may end up taking awhile, the work should be offloaded to a thread pool (or something similar) to ensure that poll can return quickly.

I have considered proposing a rephrasing, for 4 reasons:

  • I find the rule weak and useless. It does not define "quickly". I'd say that many async functions break this rule. For example:
async fn image_manipulation()  {
    let image = load_image().await;
    transform_image(image)
};

Breaks this contract, depending on what "quickly" means. 1ms? 2ms? 5ms? Below a certain frequency? So, this forces spawning. Taken to the extreme, it forces spawning to a thread pool around every piece of non-awaited code!

  • async/.await was particularly written to allow the code style above
  • What do I do if I don't know ahead of time?
  • In the end quickly depends on how much leeway my scheduler gives me already, so that definition is already scheduler-bound. If my scheduling strategy insists on tasks running at a high frequency, 2ms is too much. If not, 10ms might be okay.

And that's particularly one pattern we've seen a lot: except in very constrained environments, where work packages are very predictable, this contract is hard to fulfil or quantify.

We already have the case where Futures are reactor bound, so I don't quite understand that point. There's one thing more I want to address: there is a difference between application code and library code. If you write a non-blocking library, it should not block. That also makes it compatible with all executors. If you write application-level code, you should buy into your runtime semantics and use one that fits your application and should not care about general applicability to multiple runtime systems.

This is a problem similar to how database libraries should be abstract over the store, while particular applications may well buy into specific store features.

@overdrivenpotato

This comment has been minimized.

Copy link

overdrivenpotato commented Dec 17, 2019

@skade Sure, panicking is correct, but this is a micro-optimization that can break down quickly. A tokio server can happily serve 10,000 clients on 1GB of total VRAM and a fixed size database pool, however with this new runtime async-std will choke somewhere around 119 clients accessing a blocking database and begin to fail spuriously.

There is really a difference in messaging between these quotes:

often does a right thing in a dynamic setting

[...] so if you run along those edges, you need to isolate anyways.

there is a difference between application code and library code. If you write a non-blocking library, it should not block

And:

The new async-std runtime relieves programmers of the burden of isolating blocking code from async code. You simply don’t have to worry about it anymore.

The new runtime enables you to fearlessly use synchronous libraries like diesel or rayon in any async-std application.

The new runtime relieves you of these concerns and allows you to do the blocking operation directly inside the async function

This strategy eliminates the need for separating async and blocking code.

As for the Future::poll trait, as long as you are specifically recommending blocking in only application code, this is not too relevant to the ecosystem at large, but (correct me if I'm wrong) I see no mention of this anywhere except this thread.

The biggest issue here is the disparity between the first selection of quotes and the second, why do you insist on using such highly opinionated and generalized phrases when it really depends on the application requirements? Sure, panicking on too many threads is a correct thing to do, but this is not the same class of error as memory exhaustion. tokio is a real example that this does not need to happen, while sacrificing arguably minimal real world performance. I would submit a PR to solve this issue but that is not really possible when it comes to social media updates and marketing-based blog posts.

@overdrivenpotato

This comment has been minimized.

Copy link

overdrivenpotato commented Dec 17, 2019

@skade Upon further experimentation, I am running into complete event loop blocking on async-std when utilizing simple blocking diesel calls with this new runtime. Here is the scenario:

We have a custom networking library with an entry function signature that looks similar to this:

async fn drive<T: Source>(source: T) -> Result<(), Error> { ... }

We have a custom scheduler that multiplexes multiple sub-Futures onto a single one, that is the one returned by drive. We also have an AsyncStdSource which implements the Source trait, this is how we drive a socket to completion (whether TCP, WebSocket, etc...). The future returned by drive is run via task::block_on(drive(...)).

It is worth noting that this bypasses the async_std::task::spawn utility completely. On tokio with regular asynchronous code and fixed size thread pools, this is a non issue. However because blocking in this new runtime will block the entire task, the server grinds to a halt as single client requests are handled completely sequentially. There seems to be no way around this if you require a sub-scheduler embedded within a single top-level Future.

EDIT: To be clear I fully expect this behaviour. This is for the sake of a real example that breaks down.

@skade

This comment has been minimized.

Copy link
Collaborator

skade commented Dec 18, 2019

@overdrivenpotato You can actually submit PRs to our blog, the link is a the bottom ;). It leads to https://github.com/async-rs/async-std, the blog post source is https://github.com/async-rs/async.rs/blob/master/source/blog/2019-12-16-stop-worrying-about-blocking-the-new-async-std-runtime.html.md.

The I'm trying to figure out why you multiplex on a single Future? Futures and tasks are units of concurrency, not of parallelism, so asking for parallel working there is hard. Using block_on there is definitely not the right approach, but that's with or without the new scheduler.

@overdrivenpotato

This comment has been minimized.

Copy link

overdrivenpotato commented Dec 18, 2019

@skade This is actually necessary to support structured concurrency. Freely spawning can lead to several exhaustion issues. We also have some additional requirements that are not worth getting into right now.

Anyways thanks for the help, but we will be sticking with tokio as this is causing too many headaches. I just found out they are also exploring this space currently.

@oblique

This comment has been minimized.

Copy link

oblique commented Dec 19, 2019

Please consider this counter-post before deprecating spawn_blocking.

@DemiMarie-parity

This comment has been minimized.

Copy link

DemiMarie-parity commented Dec 19, 2019

From what I can tell, the real answer is to use an asynchronous semaphore to limit the number of blocking requests made at any given time.

@cynecx

This comment has been minimized.

Copy link

cynecx commented Dec 20, 2019

Sorry, I haven’t had time to actually read the code but perhaps to alleviate the issue @overdrivenpotato is bringing up, one could introduce some kind of “threadpool” concept where cpu-consuming task are automatically offloaded, and by introducing some lower/upper bounds on the thread count, thereby eliminating the issue where we exhaust real userspace-threads?

Edit: Oh well, I’ve just realized that this might not be possible because from userspace one cannot “preempt” the running task, therefore it can’t be migrated to the threadpool, so basically this issue is inherent to the nature of detecting long-running/blocking computations completely from userspace. Not sure how Go solves this though (Heard something about compiler inserted preemption points / library assisted preemption points / through unix signals?).

@pickfire

This comment has been minimized.

Copy link

pickfire commented on src/rt/runtime.rs in 36d24cd Dec 22, 2019

Would it be better to use only delay instead?

            let mut delay = 0;

            loop {
                // Get a list of new machines to start, if any need to be started.
                for m in self.make_machines() {
                    delay = 0; // XXX I wonder why this needs to be re-assigned multiple times

                    s.builder()
                        .name("async-std/machine".to_string())
                        .spawn(move |_| {
                            abort_on_panic(|| {
                                let _ = MACHINE.with(|machine| machine.set(m.clone()));
                                m.run(self);
                            })
                        })
                        .expect("cannot start a machine thread");
                }

                // Sleep for a bit longer if the scheduler state hasn't changed in a while.
                thread::sleep(Duration::from_micros(if delay > 10_000 {
                    delay *= 2;
                    delay
                } else {
                    delay += 1000;
                    1000
                }));
            }

The delays in this case would be 1000, ... x 8, 1000, 20000, 40000, 80000, ...

@serzhiio

This comment has been minimized.

Copy link

serzhiio commented Jan 10, 2020

Some real life data here :)))
old-scheduler
new-scheduler
Await latency is a kind of indicator for scheduler, all numbers are micros.
New scheduler is really better and looks much more efficient.
P.S.: Not the best screens, but believe me :)

@dekellum

This comment has been minimized.

Copy link

dekellum commented Jan 10, 2020

Disclosure: I'm writing my own post related to blocking operations in the rust async runtimes, which I can link here later.

One humble observation, forgive me, without having first-hand experience with this branch of async-syd: Could end-users use an async-compatible Semaphore (such as the one in futures-intrusive crate) with fixed number of permits to help constrain the number of concurrent blocking ops and therefore the number of concurrent runtime threads here?

That's an approach that has been suggested with tokio, at least. I don't know enough yet, to know if that is a good or bad suggestion or entirely how applicable it is here, thanks.

@serzhiio

This comment has been minimized.

Copy link

serzhiio commented Jan 13, 2020

I've noticed some allocating problem. Have no such issue with current scheduler.
new-scheduler-allocating

@yoshuawuyts

This comment has been minimized.

Copy link
Contributor

yoshuawuyts commented Jan 14, 2020

@serzhiio could it be that you're having a lot of expensive computation going on, which in turn requires a fair amount of threads? Each thread usually comes with about an 8MB static overhead (e.g. their own stack), so if there are a lot of threads being spawned, it seems obvious more memory would be consumed. This is the tradeoff for overall improved latency.

@dekellum

This comment has been minimized.

Copy link

dekellum commented Jan 14, 2020

@yoshuawuyts Semaphore?

@niklasf

This comment has been minimized.

Copy link

niklasf commented Jan 14, 2020

@dekellum Adding Semaphore can potentially lead to deadlocks. It's a nice tool on application level (for example when you know you're not going to recursively spawn blocking tasks), but seems tricky for a general purpose scheduler.

@dekellum

This comment has been minimized.

Copy link

dekellum commented Jan 14, 2020

I agree on suggestion to use at application level, @niklasf. That was missing/missed here or in the "Stop worrying about blocking" blog post, though. Just to be clear, my empirical results (linked post above) support the notion of "stop worrying" at least for SSD blocking reads/writes, which I found surprising given the conventional wisdom.

@serzhiio

This comment has been minimized.

Copy link

serzhiio commented Jan 14, 2020

@serzhiio could it be that you're having a lot of expensive computation going on, which in turn requires a fair amount of threads? Each thread usually comes with about an 8MB static overhead (e.g. their own stack), so if there are a lot of threads being spawned, it seems obvious more memory would be consumed. This is the tradeoff for overall improved latency.

Hmm, but look at processes i have only 3 threads running(timer+runtime+machine). I though the main idea of new scheduler is to spawn no more blocking threads than really needed. So who is allocating memory? And as i saif i have no memory issues with current scheduler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.