Yield after channel send and move cpu tasks to thread #1163

konstin · 2024-01-29T10:37:37Z

Summary

Previously, we were blocking operations that could run in parallel. We would send request through our main requests channel, but not yield so that the receiver could only start processing requests much later than necessary. We solve this by switching to the async tokio::sync::mpsc::channel, where send is an async functions that yields.

Due to the increased parallelism cache deserialization and the conversion from simple api request to version map became bottlenecks, so i moved them to spawn_blocking. Together these result in a 30-60% speedup for larger warm cache resolution. Small cases such as black already resolve in 5.7 ms on my machine so there's no speedup to be gained, refresh and no cache were to noisy to get signal from.

Note for the future: Revisit the bounded channel if we want to produce requests from process_request, too, (this would be good for prefetching) to avoid deadlocks.

Details

We can look at the behavior change through the spans:

RUST_LOG=puffin=info TRACING_DURATIONS_FILE=target/traces/jupyter-warm-branch.ndjson cargo run --features tracing-durations-export --bin puffin-dev --profile profiling -- resolve jupyter 2> /dev/null

Below, you can see how on main, we have discrete phases: All (cached) simple api requests in parallel, then all (cached) metadata requests in parallel, repeat until done. The solver is mostly waiting until it has it's version map from the simple API query to be able to choose a version. The main thread is blocked by process requests.

In the PR branch, the simple api requests succeeds much earlier, allowing the solver to advance and also to schedule more prefetching. Due to that parse_cache and from_metadata became bottlenecks, so i moved them off the main thread (green color, and their spans can now overlap because they can run on multiple threads in parallel). The main thread isn't blocked on process_request anymore, instead it has frequent idle times. The spans are all much shorter, which indicates that on main they could have finished much earlier, but a task didn't yield so they weren't scheduled to finish (though i haven't dug deep enough to understand the exact scheduling between the process request stream and the solver here).

main

PR

Benchmarks

$ hyperfine --warmup 3 "target/profiling/main-dev resolve jupyter" "target/profiling/branch-dev resolve jupyter"
Benchmark 1: target/profiling/main-dev resolve jupyter
  Time (mean ± σ):      29.1 ms ±   0.7 ms    [User: 22.9 ms, System: 11.1 ms]
  Range (min … max):    27.7 ms …  32.2 ms    103 runs
 
Benchmark 2: target/profiling/branch-dev resolve jupyter
  Time (mean ± σ):      18.8 ms ±   1.1 ms    [User: 37.0 ms, System: 22.7 ms]
  Range (min … max):    16.5 ms …  21.9 ms    154 runs
 
Summary
  target/profiling/branch-dev resolve jupyter ran
    1.55 ± 0.10 times faster than target/profiling/main-dev resolve jupyter

$ hyperfine --warmup 3 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/branch-dev resolve meine_stadt_transparent"
Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
  Time (mean ± σ):      37.8 ms ±   0.9 ms    [User: 30.7 ms, System: 14.1 ms]
  Range (min … max):    36.6 ms …  41.5 ms    79 runs
 
Benchmark 2: target/profiling/branch-dev resolve meine_stadt_transparent
  Time (mean ± σ):      24.7 ms ±   1.5 ms    [User: 47.0 ms, System: 39.3 ms]
  Range (min … max):    21.5 ms …  28.7 ms    113 runs
 
Summary
  target/profiling/branch-dev resolve meine_stadt_transparent ran
    1.53 ± 0.10 times faster than target/profiling/main-dev resolve meine_stadt_transparent

$ hyperfine --warmup 3 "target/profiling/main pip compile scripts/requirements/home-assistant.in" "target/profiling/branch pip compile scripts/requirements/home-assistant.in"
Benchmark 1: target/profiling/main pip compile scripts/requirements/home-assistant.in
  Time (mean ± σ):     229.0 ms ±   2.8 ms    [User: 197.3 ms, System: 63.7 ms]
  Range (min … max):   225.8 ms … 234.0 ms    13 runs
 
Benchmark 2: target/profiling/branch pip compile scripts/requirements/home-assistant.in
  Time (mean ± σ):      91.4 ms ±   5.3 ms    [User: 289.2 ms, System: 176.9 ms]
  Range (min … max):    81.0 ms … 104.7 ms    32 runs
 
Summary
  target/profiling/branch pip compile scripts/requirements/home-assistant.in ran
    2.50 ± 0.15 times faster than target/profiling/main pip compile scripts/requirements/home-assistant.in

charliermarsh · 2024-01-29T12:47:44Z

crates/puffin-resolver/src/resolver/mod.rs

@@ -408,14 +411,16 @@ impl<'a, Provider: ResolverProvider> Resolver<'a, Provider> {
                }
            }
        }
+        // Yield after sending on a channel to allow the subscribers to continue
+        tokio::task::yield_now().await;


Are you able to explain what this is doing / why it works?

The channel sending is sync, so my theory is that the task doesn't yield until all the work for a certain section (e.g. (pre-)fetch all version maps) is done and thereby blocks the receiving end from taking over (the requests and the solver are both on the main thread as far as i can tell). I've confirmed that there is a gap between e.g. a prefetch request being sent and it being received/executed.

You can see this in the span blocks for warm cache jupyter on main:

We shouldn't have to wait till all the simple api requests are done to start the metadata requests.

In comparison, on this branch everything happens concurrently:

The spans are also shorter, which makes sense when they were previously only missing the chance to complete because another task was blocking the thread.

Great, thank you!

I've updated the OP with better plots and a proper description how to read them.

@konstin Can you move your lovely comment into the source here? Obviously the images won't be able to make it in, but even just the text would be very helpful.

We can try with bounded channels, seems straightforward?

@konstin - I defer to you to try this out, but seems like it could be a good solution if it also lets us remove the yields.

@BurntSushi Why is a bounded channel better, and what would a reasonable channel size be? Put differently, what the advantage stopping the producer, given that we know our problem is finite and reasonably sized?

I don't have anything concrete to say in the context of puffin other than the hand-wavy "it provides back-pressure." As for why back-pressure is important, I'll point to the following:

https://lucumr.pocoo.org/2020/1/1/async-pressure/

https://medium.com/@jayphelps/backpressure-explained-the-flow-of-data-through-software-2350b3e77ce7

https://ferd.ca/handling-overload.html

That is, the point of bounded channels isn't really that they "stop," but rather, that they react to things downstream that have gotten clogged. Namely, if a bounded channel stops, that's usually only a symptom of either a capacity that is too small or something downstream that has gotten overloaded. In the former case, we fix it by tuning the capacity. In the latter case, well, it's a feature and not a bug that the bounded channel has stopped sending more input into something that is already overloaded.

As for what a reasonable capacity would be... I'm not sure. It's not that dissimilar from Vec::with_capacity. Usually starting with a small (or even zero) capacity is good enough. But one can experimentally arrive at a different figure. Maybe start with the number of cores? (Or just hard-code 16, since that's probably only a little more than what most folks have, and a number smaller than the number of cores doesn't mean that not all cores will be used.) Just a guess.

Basically, my position is the bounded channels should be the default, because they---hand-wavily---lead to overall better holistic properties of the system. The provide a means for back-pressure. Unbounded channels have no mechanism for back-pressure, and I think they are generally an anti-pattern unless there is some specific motivation for them.

Also, a nice historical anecdote. ("async" channels are unbounded and "sync" channels are bounded.)

Switched to tokio::sync::mpsc::channel with a bound size of 50.

konstin · 2024-01-30T11:38:22Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

Current dependencies on/for this PR:

main
- PR Remove WaitMap dependency #1183
  - PR Yield after channel send and move cpu tasks to thread #1163 👈

This stack of pull requests is managed by Graphite.

BurntSushi

The home-assistant improvement is epic.

BurntSushi · 2024-01-30T14:35:26Z

crates/puffin-resolver/src/resolver/mod.rs

@@ -408,14 +411,16 @@ impl<'a, Provider: ResolverProvider> Resolver<'a, Provider> {
                }
            }
        }
+        // Yield after sending on a channel to allow the subscribers to continue
+        tokio::task::yield_now().await;


@konstin Can you move your lovely comment into the source here? Obviously the images won't be able to make it in, but even just the text would be very helpful.

BurntSushi · 2024-01-30T14:42:40Z

crates/puffin-resolver/src/resolver/mod.rs

@@ -408,14 +411,16 @@ impl<'a, Provider: ResolverProvider> Resolver<'a, Provider> {
                }
            }
        }
+        // Yield after sending on a channel to allow the subscribers to continue
+        tokio::task::yield_now().await;


It is still quite weird to me that you need an explicit yield_now() here. Basically, it should be just like a std::thread::yield_now(), which means it should be incredibly rare to use it.

Have you tried using tokio's bounded channels instead? Its send operation is async: https://docs.rs/tokio/latest/tokio/sync/mpsc/struct.Sender.html#method.send

(I would generally rather we use bounded channels with some kind of reasonable capacity everywhere, since they provide back-pressure. But it's hard to point to concrete specifics here, and is kind of hand-wavy.)

charliermarsh · 2024-01-30T15:16:18Z

Woah, 2.5x? That's insane. I didn't even notice that.

## Summary This is an attempt to #1163 by removing the `WaitMap` and gaining more granular control over the values that we hold over `await` boundaries.

charliermarsh · 2024-01-30T20:26:37Z

@konstin - I see a 1.5x speedup for me, how is your warm home-assistant so fast?

❯ python -m scripts.bench --puffin-path ./target/release/puffin --puffin-path ./target/release/main --benchmark resolve-warm scripts/requirements/home-assistant.in --min-runs 25
Benchmark 1: ./target/release/puffin (resolve-warm)
  Time (mean ± σ):     170.2 ms ±  11.3 ms    [User: 272.1 ms, System: 606.6 ms]
  Range (min … max):   148.5 ms … 195.1 ms    25 runs

Benchmark 2: ./target/release/main (resolve-warm)
  Time (mean ± σ):     243.4 ms ±   4.8 ms    [User: 210.9 ms, System: 206.4 ms]
  Range (min … max):   237.7 ms … 255.0 ms    25 runs

Summary
  './target/release/puffin (resolve-warm)' ran
    1.43 ± 0.10 times faster than './target/release/main (resolve-warm)'

Surprisingly, I don't really see any speedup for the cold benchmarks.

konstin · 2024-01-30T21:36:13Z

No idea, but the difference remains after rebase with more runs. Could this be an os scheduler or fs performance thing?

# From konsti/yield-after-send with unambiguous `gt up`
$ gt down && cargo build --profile profiling && mv target/profiling/puffin-dev target/profiling/main-dev && mv target/profiling/puffin target/profiling/main && gt up && cargo build --profile profiling && mv target/profiling/puffin-dev target/profiling/branch-dev && mv target/profiling/puffin target/profiling/branch
$ hyperfine --warmup 3 --runs 100 "target/profiling/main pip compile scripts/requirements/home-assistant.in" "target/profiling/branch pip compile scripts/requirements/home-assistant.in"
Benchmark 1: target/profiling/main pip compile scripts/requirements/home-assistant.in
  Time (mean ± σ):     225.0 ms ±   2.0 ms    [User: 195.7 ms, System: 61.5 ms]
  Range (min … max):   222.6 ms … 238.0 ms    100 runs
 
Benchmark 2: target/profiling/branch pip compile scripts/requirements/home-assistant.in
  Time (mean ± σ):      92.7 ms ±   5.6 ms    [User: 302.4 ms, System: 174.8 ms]
  Range (min … max):    80.7 ms … 110.9 ms    100 runs
 
Summary
  target/profiling/branch pip compile scripts/requirements/home-assistant.in ran
    2.43 ± 0.15 times faster than target/profiling/main pip compile scripts/requirements/home-assistant.in

I'd be interested how the differences between main and this PR look for other people!

konstin · 2024-01-30T21:36:16Z

Cold runs fluctuate between 1000ms and 2000ms, they are far too volatile for me to do any comparative benchmarking. The span structure looks very similar main and this PR, so i expect no change. You can see that we often have to wait for fetching the metadata of a specific version to start prefetches, the network idling in between. While the request parallelism seems to be well used in general (This is good!), i also see room for improvement, for lower gains though than the original.

Proposal: We try to speculate about the versions we would pick for other packages and prefetch as if we the solver was deciding on whichever package arrived first (instead of in a deterministic order, as we actually do), speculating that the resolution in arbitrary order is the same as the one in deterministic order. The solver itself remains deterministic, if the guess was wrong we only made an unnecessary request; we could consider cancelling them if they turned out to be wrong. The are different designs, such as sharing the partial solution or forking the pubgrub state with non-deterministic decision and rebasing them whenever deterministic decisions are made.

An interesting property would be to prefetch packages deep in the dependency tree: Let us define jupyter as a dependency of order 0, and jupyter requiring notebook makes notebook an order 1 dependency, then (given i did the graph traversal right) pycparser is a order 6 dep and types-python-dateutil (elsewhere in the tree) an order 7 dep (Fig. 4). The branch and main examples both have to wait on pycparser as last and second to last choose version case, in the right side end of the plot where we're not doing much except waiting for those packages (Fig. 1 and 2). If we could prefetch in a way that those deps can get fetch earlier, we could avoid this nearly-idle tail. This becomes even more apparent with --refresh, where the last two, almost isolated requests are pycparser and types-python-dateutil (Fig. 3).

I find it surprising that we spend way more time waiting for metadata than for versions, even though the average duration of simple api and metadata requests would have expect it the other way round. I guess this is because there are more version maps already present when we need them?

For --refresh specifically, we can avoid the slow start (Fig. 3, left side) by loading the (possibly stale), using it to start prefetches and then discarding it.

An easy option is to review the size of the buffer_unordered, there might be easy wins in there, at least as long as we ask the pypi people before making a 100 parallel request by default.

Fig 1: Cold main

Fig 2: Cold branch

Fig 3: Refresh branch

Fig 4: Dependency graph

charliermarsh · 2024-01-30T21:39:13Z

crates/puffin-resolver/src/resolver/mod.rs

@@ -5,7 +5,9 @@ use std::sync::Arc;

 use anyhow::Result;
 use dashmap::{DashMap, DashSet};
-use futures::channel::mpsc::UnboundedReceiver;
+use futures::channel::mpsc::{
+    UnboundedReceiver as MpscUnboundedReceiver, UnboundedSender as MpscUnboundedSender,


Could we remove this rename? Either use the names directly, or the fully-qualified names? I just find this a little more confusing as a reader.

I'll make them consistently fully qualified (there's too many slightly different channels)

BurntSushi · 2024-01-30T22:01:22Z

I'm seeing a 1.5x improvement:

$ hyperfine -w5 --cleanup 'rm -rf /home/andrew/.cache/puffin' "puffin-main pip compile ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" "puffin-test pip compile ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" ; A kang
Benchmark 1: puffin-main pip compile ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
 ⠏ Performing warmup runs         ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ETA 00:00:00 ^C
(.venv) [andrew@duff puffin]$ rm -rf ~/.cache/puffin/
(.venv) [andrew@duff puffin]$ hyperfine -w5 --cleanup 'rm -rf /home/andrew/.cache/puffin' "puffin-main pip compile ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" "puffin-test pip compile ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" ; A kang
Benchmark 1: puffin-main pip compile ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
  Time (mean ± σ):     243.7 ms ±   2.0 ms    [User: 227.1 ms, System: 81.9 ms]
  Range (min … max):   240.2 ms … 246.7 ms    12 runs

Benchmark 2: puffin-test pip compile ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
  Time (mean ± σ):     158.4 ms ±  22.0 ms    [User: 407.0 ms, System: 335.3 ms]
  Range (min … max):   127.0 ms … 199.2 ms    18 runs

Summary
  puffin-test pip compile ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null ran
    1.54 ± 0.21 times faster than puffin-main pip compile ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null

Pretty awesome!

I don't have any hypotheses for the discrepancy. @konstin How many CPU cores do you have? If you have a lot, and since this PR is improving parallelism efficiency, that might explain why you're seeing a greater improvement. (Although I kind of have a lot of cores too. I have 16 physical cores and 24 logical cores (i.e., 8 of my physical cores are hyper-threaded.)

charliermarsh · 2024-01-30T22:02:08Z

The actual raw numbers are kind of crazy to me though, like Konsti's is 92.7ms!

konstin · 2024-01-30T22:10:23Z

Thank you @BurntSushi, that's the answer! I'm using an i9-13900K with 8 performance cores, 16 efficiency cores and 32 threads. Limiting the number of cores reduces the speedup (output cropped for readability):

$ taskset -c 0-3 hyperfine --warmup 3 "target/profiling/main pip compile scripts/requirements/home-assistant.in" "target/profiling/branch pip compile scripts/requirements/home-assistant.in"
1: Time (mean ± σ):     229.8 ms ±   1.4 ms    [User: 198.1 ms, System: 60.4 ms]
2: Time (mean ± σ):     170.9 ms ±   6.1 ms    [User: 335.1 ms, System: 127.7 ms]
target/profiling/branch pip compile scripts/requirements/home-assistant.in ran
  1.34 ± 0.05 times faster than target/profiling/main pip compile scripts/requirements/home-assistant.in
$ taskset -c 0-7 hyperfine --warmup 3 "target/profiling/main pip compile scripts/requirements/home-assistant.in" "target/profiling/branch pip compile scripts/requirements/home-assistant.in"
1: Time (mean ± σ):     224.4 ms ±   1.3 ms    [User: 194.0 ms, System: 56.6 ms]
2: Time (mean ± σ):     112.6 ms ±   5.7 ms    [User: 301.4 ms, System: 140.0 ms]
target/profiling/branch pip compile scripts/requirements/home-assistant.in ran
  1.99 ± 0.10 times faster than target/profiling/main pip compile scripts/requirements/home-assistant.in
$ taskset -c 0-15 hyperfine --warmup 3 "target/profiling/main pip compile scripts/requirements/home-assistant.in" "target/profiling/branch pip compile scripts/requirements/home-assistant.in"
1: Time (mean ± σ):     222.4 ms ±   1.5 ms    [User: 191.7 ms, System: 56.8 ms]
2: Time (mean ± σ):      88.6 ms ±   1.9 ms    [User: 276.0 ms, System: 137.6 ms]
target/profiling/branch pip compile scripts/requirements/home-assistant.in ran
  2.51 ± 0.06 times faster than target/profiling/main pip compile scripts/requirements/home-assistant.in
$ taskset -c 0-31 hyperfine --warmup 3 "target/profiling/main pip compile scripts/requirements/home-assistant.in" "target/profiling/branch pip compile scripts/requirements/home-assistant.in"
1: Time (mean ± σ):     226.3 ms ±   1.8 ms    [User: 197.7 ms, System: 59.3 ms]
2: Time (mean ± σ):      90.4 ms ±   5.4 ms    [User: 294.2 ms, System: 174.1 ms]
target/profiling/branch pip compile scripts/requirements/home-assistant.in ran
  2.50 ± 0.15 times faster than target/profiling/main pip compile scripts/requirements/home-assistant.in

## Summary This PR introduces parallelism to the resolver. Specifically, we can perform PubGrub resolution on a separate thread, while keeping all I/O on the tokio thread. We already have the infrastructure set up for this with the channel and `OnceMap`, which makes this change relatively simple. The big change needed to make this possible is removing the lifetimes on some of the types that need to be shared between the resolver and pubgrub thread. A related PR, #1163, found that adding `yield_now` calls improved throughput. With optimal scheduling we might be able to get away with everything on the same thread here. However, in the ideal pipeline with perfect prefetching, the resolution and prefetching can run completely in parallel without depending on one another. While this would be very difficult to achieve, even with our current prefetching pattern we see a consistent performance improvement from parallelism. This does also require reverting a few of the changes from #3413, but not all of them. The sharing is isolated to the resolver task. ## Test Plan On smaller tasks performance is mixed with ~2% improvements/regressions on both sides. However, on medium-large resolution tasks we see the benefits of parallelism, with improvements anywhere from 10-50%. ``` ./scripts/requirements/jupyter.in Benchmark 1: ./target/profiling/baseline (resolve-warm) Time (mean ± σ): 29.2 ms ± 1.8 ms [User: 20.3 ms, System: 29.8 ms] Range (min … max): 26.4 ms … 36.0 ms 91 runs Benchmark 2: ./target/profiling/parallel (resolve-warm) Time (mean ± σ): 25.5 ms ± 1.0 ms [User: 19.5 ms, System: 25.5 ms] Range (min … max): 23.6 ms … 27.8 ms 99 runs Summary ./target/profiling/parallel (resolve-warm) ran 1.15 ± 0.08 times faster than ./target/profiling/baseline (resolve-warm) ``` ``` ./scripts/requirements/boto3.in Benchmark 1: ./target/profiling/baseline (resolve-warm) Time (mean ± σ): 487.1 ms ± 6.2 ms [User: 464.6 ms, System: 61.6 ms] Range (min … max): 480.0 ms … 497.3 ms 10 runs Benchmark 2: ./target/profiling/parallel (resolve-warm) Time (mean ± σ): 430.8 ms ± 9.3 ms [User: 529.0 ms, System: 77.2 ms] Range (min … max): 417.1 ms … 442.5 ms 10 runs Summary ./target/profiling/parallel (resolve-warm) ran 1.13 ± 0.03 times faster than ./target/profiling/baseline (resolve-warm) ``` ``` ./scripts/requirements/airflow.in Benchmark 1: ./target/profiling/baseline (resolve-warm) Time (mean ± σ): 478.1 ms ± 18.8 ms [User: 482.6 ms, System: 205.0 ms] Range (min … max): 454.7 ms … 508.9 ms 10 runs Benchmark 2: ./target/profiling/parallel (resolve-warm) Time (mean ± σ): 308.7 ms ± 11.7 ms [User: 428.5 ms, System: 209.5 ms] Range (min … max): 287.8 ms … 323.1 ms 10 runs Summary ./target/profiling/parallel (resolve-warm) ran 1.55 ± 0.08 times faster than ./target/profiling/baseline (resolve-warm) ```

charliermarsh reviewed Jan 29, 2024

View reviewed changes

konstin force-pushed the konsti/yield-after-send branch from 0f15008 to b347a00 Compare January 29, 2024 18:46

charliermarsh mentioned this pull request Jan 30, 2024

Remove WaitMap dependency #1183

Merged

konstin force-pushed the konsti/yield-after-send branch from b347a00 to 253b79a Compare January 30, 2024 11:18

konstin changed the base branch from main to charlie/wait January 30, 2024 11:38

konstin force-pushed the konsti/yield-after-send branch from 253b79a to 83cc4d0 Compare January 30, 2024 12:15

konstin marked this pull request as ready for review January 30, 2024 12:24

konstin requested review from BurntSushi and charliermarsh January 30, 2024 12:25

konstin changed the title ~~Yield after channel send, causing waitmap deadlock~~ Yield after channel send and move cpu tasks to thread Jan 30, 2024

BurntSushi reviewed Jan 30, 2024

View reviewed changes

charliermarsh force-pushed the charlie/wait branch from 2ab1949 to 887fc3f Compare January 30, 2024 20:11

Base automatically changed from charlie/wait to main January 30, 2024 20:25

charliermarsh added a commit that referenced this pull request Jan 30, 2024

Remove WaitMap dependency (#1183)

3f5e730

## Summary This is an attempt to #1163 by removing the `WaitMap` and gaining more granular control over the values that we hold over `await` boundaries.

konstin force-pushed the konsti/yield-after-send branch from 83cc4d0 to 55ea90a Compare January 30, 2024 20:38

charliermarsh reviewed Jan 30, 2024

View reviewed changes

Yield after channel send and move cpu tasks to thread

9671930

zanieb added the performance Potential performance improvement label Feb 2, 2024

Use bounded channels

8d7ad2d

konstin force-pushed the konsti/yield-after-send branch from 55ea90a to 8d7ad2d Compare February 2, 2024 17:12

Remove yield points

476829f

konstin merged commit f10f902 into main Feb 2, 2024
6 checks passed

konstin deleted the konsti/yield-after-send branch February 2, 2024 17:18

BurntSushi mentioned this pull request Mar 5, 2024

ruff server - A new built-in LSP for Ruff, written in Rust astral-sh/ruff#10158

Merged

This was referenced May 9, 2024

Consolidate concurrency limits #3493

Merged

Parallelize Resolver #3627

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yield after channel send and move cpu tasks to thread #1163

Yield after channel send and move cpu tasks to thread #1163

konstin commented Jan 29, 2024 •

edited

charliermarsh Jan 29, 2024

konstin Jan 29, 2024

charliermarsh Jan 29, 2024

konstin Jan 29, 2024

BurntSushi Jan 30, 2024

charliermarsh Jan 30, 2024

charliermarsh Jan 30, 2024

konstin Jan 31, 2024

BurntSushi Jan 31, 2024

konstin Feb 2, 2024

konstin commented Jan 30, 2024

BurntSushi left a comment

BurntSushi Jan 30, 2024

BurntSushi Jan 30, 2024 •

edited

charliermarsh commented Jan 30, 2024

charliermarsh commented Jan 30, 2024

konstin commented Jan 30, 2024

konstin commented Jan 30, 2024 •

edited

charliermarsh Jan 30, 2024

konstin Jan 30, 2024

BurntSushi commented Jan 30, 2024

charliermarsh commented Jan 30, 2024

konstin commented Jan 30, 2024

Yield after channel send and move cpu tasks to thread #1163

Yield after channel send and move cpu tasks to thread #1163

Conversation

konstin commented Jan 29, 2024 • edited

Summary

Details

Benchmarks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

konstin commented Jan 30, 2024

BurntSushi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BurntSushi Jan 30, 2024 • edited

Choose a reason for hiding this comment

charliermarsh commented Jan 30, 2024

charliermarsh commented Jan 30, 2024

konstin commented Jan 30, 2024

konstin commented Jan 30, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BurntSushi commented Jan 30, 2024

charliermarsh commented Jan 30, 2024

konstin commented Jan 30, 2024

konstin commented Jan 29, 2024 •

edited

BurntSushi Jan 30, 2024 •

edited

konstin commented Jan 30, 2024 •

edited