Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions doc/modules/ROOT/pages/4.coroutines/4g.allocators.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,20 @@ The "window" is the interval between setting the thread-local allocator and the

After the window closes (at the first suspension), the TLS allocator may be restored to a previous value. The task retains its captured allocator regardless.

== TLS Preservation

Between a coroutine's `await_resume` (which sets TLS to the correct allocator) and the next child coroutine invocation (whose `operator new` reads TLS), arbitrary user code runs. If that code resumes a coroutine from a different chain on the same thread -- by calling `.resume()` directly, pumping a completion queue, or running nested dispatch -- the other coroutine's `await_resume` overwrites TLS with its own allocator. The original coroutine's next child would then allocate from the wrong resource.

To prevent this, any code that calls `.resume()` on a coroutine handle must use `safe_resume` from `<boost/capy/ex/frame_allocator.hpp>`:

[source,cpp]
----
// In your event loop or dispatch path:
capy::safe_resume(h); // saves and restores TLS around h.resume()
----

`safe_resume` saves the current thread-local allocator, calls `h.resume()`, then restores the saved value. This makes TLS behave like a stack: nested resumes cannot spoil the outer value. All of Capy's built-in executors (`thread_pool`, strands, `blocking_context`) use `safe_resume` internally. Custom executor event loops must do the same -- see xref:7.examples/7n.custom-executor.adoc[Custom Executor] for an example.

== The FrameAllocator Concept

Custom allocators must satisfy the `FrameAllocator` concept, which is compatible with {cpp} allocator requirements:
Expand Down
5 changes: 4 additions & 1 deletion doc/modules/ROOT/pages/7.examples/7n.custom-executor.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Implementing the Executor concept with a single-threaded run loop.
[source,cpp]
----
#include <boost/capy.hpp>
#include <boost/capy/ex/frame_allocator.hpp>
#include <iostream>
#include <queue>
#include <thread>
Expand Down Expand Up @@ -56,7 +57,7 @@ public:
{
auto h = queue_.front();
queue_.pop();
h.resume();
capy::safe_resume(h);
}
}

Expand Down Expand Up @@ -231,6 +232,8 @@ loop.run();

`run_async` enqueues the initial coroutine. `loop.run()` drains the queue, resuming coroutines one by one until all work completes. This is analogous to a GUI event loop or game tick loop.

Note that `run()` uses `capy::safe_resume(h)` instead of `h.resume()`. This saves and restores the thread-local frame allocator around each resumption, preventing coroutines from spoiling each other's allocator. All custom executor event loops must use `safe_resume` -- see xref:../4.coroutines/4g.allocators.adoc#_tls_preservation[TLS Preservation] for details.

== Output

----
Expand Down
33 changes: 33 additions & 0 deletions doc/modules/ROOT/pages/8.design/8k.Executor.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,39 @@ ex_.post(cont_);

This pattern is identical across all three Corosio backends: epoll (Linux), IOCP (Windows), and select (POSIX fallback). The executor concept and `executor_ref` provide the abstraction that makes this possible. The backend-specific code deals with I/O readiness or completion notification. The executor-specific code deals with coroutine scheduling. The two concerns are cleanly separated.

== Frame Allocator Preservation

Capy propagates frame allocators via thread-local storage (see xref:../4.coroutines/4g.allocators.adoc#_thread_local_propagation[Thread-Local Propagation]). The TLS value is set in `await_resume` when a coroutine resumes and read in `operator new` when a child coroutine is created. Between these two points, the coroutine body executes arbitrary user code.

If that user code resumes a coroutine from a different chain on the same thread -- by calling `.resume()` directly, pumping a dispatch queue, or running nested event loop work -- the other coroutine's `await_resume` overwrites TLS. The original coroutine's next child then allocates from the wrong resource.

=== The Save/Restore Protocol

The fix is to save and restore TLS around every `.resume()` call:

[source,cpp]
----
inline void
safe_resume(std::coroutine_handle<> h) noexcept
{
auto* saved = get_current_frame_allocator();
h.resume();
set_current_frame_allocator(saved);
}
----

This makes TLS behave like a stack. Each nested resume pushes its own allocator; when the coroutine suspends and `.resume()` returns, the previous value is restored. The cost is two TLS accesses (one read, one write) per `.resume()` call -- negligible compared to the cost of resuming a coroutine.

=== Where It Applies

All executor event loops and strand dispatch loops must use `safe_resume` instead of calling `.resume()` directly. Capy's `thread_pool`, `blocking_context`, and `strand_queue` all use it internally.

Two `.resume()` call sites intentionally do _not_ use `safe_resume`:

* **`symmetric_transfer`** (MSVC workaround). The calling coroutine is about to suspend unconditionally. When it later resumes, `await_resume` restores TLS from the promise's stored environment. Save/restore would add overhead with no benefit.

* **`run_async_wrapper::operator()`**. TLS is already saved in the wrapper's constructor and restored in its destructor, which bracket the entire task lifetime.

== Why Not `std::execution` (P2300)

https://wg21.link/P2300[P2300] defines a sender/receiver model where execution context flows _backward_ from receiver to sender via queries after `connect()`:
Expand Down
3 changes: 2 additions & 1 deletion example/custom-executor/custom_executor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
//

#include <boost/capy.hpp>
#include <boost/capy/ex/frame_allocator.hpp>
#include <iostream>
#include <queue>
#include <thread>
Expand Down Expand Up @@ -56,7 +57,7 @@ class run_loop : public capy::execution_context
{
auto h = queue_.front();
queue_.pop();
h.resume();
capy::safe_resume(h);
}
}

Expand Down
11 changes: 11 additions & 0 deletions include/boost/capy/concept/executor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,17 @@ class execution_context;
`post`, the continuation is enqueued and the lifetime
requirement applies.

@par Frame Allocator TLS

The library propagates a frame allocator via thread-local
storage. When a custom executor's event loop calls
`.resume()` to drain its work queue, it must use
`safe_resume()` from `<boost/capy/ex/frame_allocator.hpp>`
instead of calling `h.resume()` directly. This saves and
restores the thread-local frame allocator around the call,
preventing a resumed coroutine from permanently overwriting
the caller's value.

@par Executor Validity

An executor becomes invalid when the first call to
Expand Down
3 changes: 3 additions & 0 deletions include/boost/capy/detail/await_suspend_helper.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ namespace detail {
#if BOOST_CAPY_WORKAROUND(_MSC_VER, >= 1)
inline void symmetric_transfer(std::coroutine_handle<> h) noexcept
{
// safe_resume is not needed here: the calling coroutine is
// about to suspend unconditionally. When it later resumes,
// await_resume restores TLS from the promise's environment.
h.resume();
}
#else
Expand Down
34 changes: 34 additions & 0 deletions include/boost/capy/ex/frame_allocator.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

#include <boost/capy/detail/config.hpp>

#include <coroutine>
#include <memory_resource>

/* Design rationale (pdimov):
Expand Down Expand Up @@ -109,6 +110,39 @@ set_current_frame_allocator(
detail::current_frame_allocator_ref() = mr;
}

/** Resume a coroutine handle with frame-allocator TLS protection.

Saves the current thread-local frame allocator before
calling `h.resume()`, then restores it after the call
returns. This prevents a resumed coroutine's
`await_resume` from permanently overwriting the caller's
allocator value.

Between a coroutine's resumption and its next child
invocation, arbitrary user code may run. If that code
resumes a coroutine from a different chain on this
thread, the other coroutine's `await_resume` overwrites
TLS with its own allocator. Without save/restore, the
original coroutine's next child would allocate from
the wrong resource.

Event loops, strand dispatch loops, and any code that
calls `.resume()` on a coroutine handle should use
this function instead of calling `.resume()` directly.
See the @ref Executor concept documentation for details.

@param h The coroutine handle to resume.

@see get_current_frame_allocator, set_current_frame_allocator
*/
inline void
safe_resume(std::coroutine_handle<> h) noexcept
{
auto* saved = get_current_frame_allocator();
h.resume();
set_current_frame_allocator(saved);
}

} // namespace capy
} // namespace boost

Expand Down
4 changes: 3 additions & 1 deletion include/boost/capy/ex/run_async.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -411,7 +411,9 @@ class [[nodiscard]] run_async_wrapper
p.env_ = {p.wg_.executor(), st_, p.get_resource()};
task_promise.set_environment(&p.env_);

// Start task through executor
// Start task through executor.
// safe_resume is not needed here: TLS is already saved in the
// constructor (saved_tls_) and restored in the destructor.
p.task_cont_.h = task_h;
p.wg_.executor().dispatch(p.task_cont_).resume();
}
Expand Down
7 changes: 4 additions & 3 deletions src/ex/detail/strand_queue.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#define BOOST_CAPY_SRC_EX_DETAIL_STRAND_QUEUE_HPP

#include <boost/capy/detail/config.hpp>
#include <boost/capy/ex/frame_allocator.hpp>

#include <coroutine>
#include <cstddef>
Expand Down Expand Up @@ -128,7 +129,7 @@ class strand_queue
std::coroutine_handle<void> target)
{
(void)q;
target.resume();
safe_resume(target);
co_return;
}

Expand Down Expand Up @@ -220,7 +221,7 @@ class strand_queue
tail_ = nullptr;

auto h = std::coroutine_handle<promise_type>::from_promise(*p);
h.resume();
safe_resume(h);
h.destroy();
}
}
Expand Down Expand Up @@ -265,7 +266,7 @@ class strand_queue
batch.head = p->next;

auto h = std::coroutine_handle<promise_type>::from_promise(*p);
h.resume();
safe_resume(h);
// Don't use h.destroy() - it would call operator delete which
// accesses the queue's free_list_ (race with push).
// Instead, manually free the frame without recycling.
Expand Down
3 changes: 2 additions & 1 deletion src/ex/thread_pool.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

#include <boost/capy/ex/thread_pool.hpp>
#include <boost/capy/continuation.hpp>
#include <boost/capy/ex/frame_allocator.hpp>
#include <boost/capy/test/thread_name.hpp>
#include <algorithm>
#include <atomic>
Expand Down Expand Up @@ -226,7 +227,7 @@ class thread_pool::impl
c = pop();
}
if(c)
c->h.resume();
safe_resume(c->h);
}
}
};
Expand Down
3 changes: 2 additions & 1 deletion src/test/run_blocking.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

#include <boost/capy/test/run_blocking.hpp>

#include <boost/capy/ex/frame_allocator.hpp>
#include <condition_variable>
#include <mutex>
#include <queue>
Expand Down Expand Up @@ -76,7 +77,7 @@ blocking_context::run()
h = impl_->queue.front();
impl_->queue.pop();
}
h.resume();
safe_resume(h);
}
if(impl_->ep)
std::rethrow_exception(impl_->ep);
Expand Down
8 changes: 8 additions & 0 deletions test/unit/ex/frame_allocator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,14 @@ TLS restoration on resume:
After awaiting a child, the parent's TLS may have been changed by the child.
transform_awaiter::await_resume restores parent's allocator from its promise.

Event loops must use safe_resume:
Between a coroutine's await_resume (which sets TLS) and the next child
invocation (whose operator new reads TLS), arbitrary user code runs. If
that code resumes a coroutine from a different chain on the same thread,
the other coroutine's await_resume overwrites TLS. Event loops, strand
dispatch loops, and any code that calls .resume() must use safe_resume()
to save and restore TLS around the call.

memory_resource* lifetime:
When passing memory_resource* directly, the user is responsible for ensuring
it outlives all tasks. This matches std::pmr conventions.
Expand Down
Loading
Loading