Generators --> Corosensei by dylanjwolff · Pull Request #204 · awslabs/shuttle

dylanjwolff · 2025-08-22T18:57:36Z

This PR switches Shuttle continuations to use corosensei rather than the generators library. In the context-switch heavy micro-benchmarks (for example, shuttle/benches/counter.rs with the RandomScheduler), this results in 5-20% faster scheduling throughput because of reduced context-switching overhead.

Major points for discussion:

There is an API change with corosensei where we now need to pass an explicit yielder in order to suspend a coroutine. Because the continuations are kept in a pool and reused, this requires us to exfiltrate the yielder from the closure passed in to Coroutine::with_stack and persist it for the lifetime of the Continuation. Currently, the yielder is carried around as a raw pointer on the Task struct, but I suspect there might be a nicer way to do this. Unfortunately, naively embedding it in the Continuation struct is not possible because the Continuation is already borrowed by its resume method when switch is called.
There is an issue where corosensei expects to catch a ForcedUnwind when dropping coroutines. Something in the way Shuttle handles panics is intercepting the ForcedUnwind and not propagating it up to where corosensei expects it, causing it to panic. This only arises on an initial panic, or during cleanup if some Tasks are detached. The current solution is to call force_reset instead of force_unwind in these cases, which does not attempt to unwind the coroutine, but therefore may leak resources allocated on the coroutine. In these two scenarios, we probably don't care about leaking resources, because the whole test is about to exit. UPDATE this is fixed in force_unwind and panic!"the ForcedUnwind panic was caught and not rethrown" Amanieu/corosensei#57.

TODOS before this is ready for merging:

test on some client projects over a longer period of time
update the ignored test (depends on how we handle (1))
general cleanup (remove unnecessary pub, delete commented out println's etc.)
~~refactor ContinuationState enum to contain the coroutine function itself~~ -- I'd like to move this refactor to a future PR so that we can land this PR sooner rather than later

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

sarsko · 2025-09-17T22:00:55Z

Sorry for doing an oopsie on your PR history.

Regarding Continuation::drop:
As far as I can tell, force_unwind with a caught panic, as in:

loop {
    let res = std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| self.coroutine.force_unwind())); 
    if res.is_ok() {
        break;
    }
}

Seems to work.

A few notes:

In my limited testing the first call is always a panic (becoming an Err) and then the second call is an Ok. We should probably limit the number of calls to 5 or something, and just error log and force_reset if it fails the 5th time.
On an Ok we should be able to reuse the continuation, meaning that PooledContinuation should be such that we call the "try_reset" functionality, then if that succeeds we reuse the continuation.
We should info or debug trace before and after doing continuation cleanups (ie we should instrument ExecutionState::cleanup)
We need to take the panic hook before doing this, if not it will be invoked repeatedly.
We don't wanna do cleanup if we're panicking. Just leak memory and get outta there (ie force_reset)

sarsko · 2025-09-17T22:38:43Z

Okay nevermind on the loop, as per Amanieu/corosensei#57 only the first call matters

sarsko · 2025-10-13T20:14:50Z

+    Ready,             // has a suspended function in its cell; waiting for input about what to do next
+    Running,           // currently inside a user-provided function
+    FinishedIteration, // has finished the previous function, can be initialized with a new one
+    Exited,            // the internal coroutine has exited its loop and cannot receive new functions to execute


Why do we need FinishedIteration and Exited?

FinishedIteration is similar to NotReady, except that it has an old function in its cell, instead of no function at all.

Exited indicates that the coroutine has exited the inner loop and cannot be reused or resumed.

I think the new states make the state-machine easier to understand, even if they aren't strictly necessary (obviously they aren't because we got away with only three states before).

sarsko · 2025-10-13T20:17:25Z

    pub fn new() -> Self {
        Self {
-            stack_size: 0x8000,
+            stack_size: 0xf000,


In my testing the corosensei coroutines were running out of stack space occasionally. I guess there is some slight constant memory usage overhead as compared to generators

per the comment on the longer benchmarking time with backtraces, this is probably because corosensei produces deeper callstacks than generators

sarsko · 2025-10-13T20:18:12Z

 }

-impl Drop for ContinuationPool {
-    fn drop(&mut self) {


Why is is fine to remove this?

This "cheat" was necessary because of an internal implementation detail of the Generators library. As far as I can see corosensei does not internally use ThreadLocals in the same way. I think the "cheat" is a bit difficult to reason about (overloading a state, messing with drop behavior) and so if it is not necessary I believe it should be removed.

sarsko · 2025-10-14T11:58:53Z

Benchmarks take a gigantuan amount of time. Not uncommon for them to take long, but they shouldn't time out?

dylanjwolff · 2025-10-14T14:32:52Z

Benchmarks take a gigantuan amount of time. Not uncommon for them to take long, but they shouldn't time out?

I reran the CI job last night with #213 and it finished successfully in 30m:

dylanjwolff#11

Corosensei is faster across the board.

Must be that capturing a backtrace from corosensei is more expensive (deeper stack)

sarsko · 2025-10-21T21:50:03Z

Could you rebase so that benchmarks get run (I merged the backtrace fix)

sarsko force-pushed the corosensei-leak branch from b30b4ac to 40cc2a5 Compare September 15, 2025 03:53

sarsko mentioned this pull request Sep 16, 2025

force_unwind and panic!"the ForcedUnwind panic was caught and not rethrown" Amanieu/corosensei#57

Closed

dylanjwolff force-pushed the corosensei-leak branch 2 times, most recently from a0e2074 to 40cc2a5 Compare October 13, 2025 19:47

Generators --> Corosensei

e55ee58

dylanjwolff force-pushed the corosensei-leak branch from 40cc2a5 to e55ee58 Compare October 13, 2025 19:54