From 47f024ca415f9b025511f7e0318636761577297a Mon Sep 17 00:00:00 2001 From: Eric Niebler Date: Sun, 24 May 2026 15:52:39 -0700 Subject: [PATCH] give stdexec a more professional and useful README --- README.md | 397 +++++++++++++++++++++++++++--------------------------- 1 file changed, 201 insertions(+), 196 deletions(-) diff --git a/README.md b/README.md index 7d747efb9..aa7cafbc1 100644 --- a/README.md +++ b/README.md @@ -1,273 +1,278 @@ -# Senders - A Standard Model for Asynchronous Execution in C++ +# stdexec β€” Senders for C++ -`stdexec` is an experimental reference implementation of the _Senders_ model of -asynchronous programming proposed by [**P2300 - -`std::execution`**](http://wg21.link/p2300) and accepted into the C++26 Standard. - -**Purpose of this Repository:** - -1. Provide a reference implementation for the C++26 additions to `std::execution`. - -2. Get usage experience with the C++26 `std::execution` additions, as well as with - not-yet-proposed extensions to `std::execution`. - -3. Provide NVIDIA-specific extensions for running code on NVIDIA GPUs using standard C++. - -4. Collaborate with those interested in extending `std::execution` for C++29. - -## Disclaimer - -`stdexec` is experimental in nature and subject to change without warning. -The authors and NVIDIA do not guarantee that this code is fit for any purpose whatsoever. +**A reference implementation of `std::execution` ([\[exec\]](https://wg21.link/exec)), the C++26 model for asynchronous and parallel programming.** [![CI (CPU)](https://github.com/NVIDIA/stdexec/actions/workflows/ci.cpu.yml/badge.svg)](https://github.com/NVIDIA/stdexec/actions/workflows/ci.cpu.yml) [![CI (GPU)](https://github.com/NVIDIA/stdexec/actions/workflows/ci.gpu.yml/badge.svg)](https://github.com/NVIDIA/stdexec/actions/workflows/ci.gpu.yml) +[![License](https://img.shields.io/badge/license-Apache%202.0%20with%20LLVM--exception-blue.svg)](LICENSE.txt) +[![C++](https://img.shields.io/badge/C%2B%2B-20%2B-blue.svg)](https://en.cppreference.com/w/cpp/compiler_support) +[![Try on Godbolt](https://img.shields.io/badge/try-godbolt-orange.svg)](https://godbolt.org/z/zjjvWoWPW) +[![Documentation](https://img.shields.io/badge/docs-nvidia.github.io%2Fstdexec-blue.svg)](https://nvidia.github.io/stdexec) + +`stdexec` lets you express asynchronous work as composable, lazy *sender* pipelines that can run on threads, thread pools, GPUs, or any custom execution context β€” with structured concurrency guarantees. + +> [!WARNING] +> `stdexec` is experimental and tracks an evolving standard. APIs may change without notice. NVIDIA does not guarantee fitness for any particular purpose. + +## Table of contents + +- [Example](#example) +- [Features](#features) +- [Compiler support](#compiler-support) +- [Installation](#installation) +- [Quick start](#quick-start) +- [GPU support](#gpu-support) +- [Examples gallery](#examples-gallery) +- [Documentation](#documentation) +- [Building tests and examples](#building-tests-and-examples) +- [IDE support](#ide-support) +- [Resources](#resources) +- [Contributing](#contributing) +- [Citation](#citation) +- [License](#license) ## Example -Below is a simple program that executes three senders concurrently on a thread pool. -Try it live on [godbolt!](https://godbolt.org/z/8MqaKEEfT). +Run three pieces of work concurrently on the system thread pool. Try it live on [godbolt](https://godbolt.org/z/zjjvWoWPW). ```c++ #include -#include - -int main() -{ - // Declare a pool of 3 worker threads: - exec::static_thread_pool pool(3); - - // Get a handle to the thread pool: - auto sched = pool.get_scheduler(); - - // Describe some work: - // Creates 3 sender pipelines that are executed concurrently by passing to `when_all` - // Each sender is scheduled on `sched` using `on` and starts with `just(n)` that creates a - // Sender that just forwards `n` to the next sender. - // After `just(n)`, we chain `then(fun)` which invokes `fun` using the value provided from `just()` - // Note: No work actually happens here. Everything is lazy and `work` is just an object that statically - // represents the work to later be executed - auto fun = [](int i) { return i*i; }; - auto work = stdexec::when_all( - stdexec::starts_on(sched, stdexec::just(0) | stdexec::then(fun)), - stdexec::starts_on(sched, stdexec::just(1) | stdexec::then(fun)), - stdexec::starts_on(sched, stdexec::just(2) | stdexec::then(fun)) - ); - - // Launch the work and wait for the result - auto [i, j, k] = stdexec::sync_wait(std::move(work)).value(); - - // Prints "0 1 4": - std::printf("%d %d %d\n", i, j, k); -} -``` - -## Structure +#include -This library is header-only, so all the source code can be found in the `include/` directory. The physical and logical structure of the code can be summarized by the following table: +namespace ex = stdexec; -| Kind | Path | Namespace | -|------|------|-----------| -| Things approved for the C++ standard | `` | `::stdexec` | -| Generic additions and extensions | `` | `::exec` | -| NVIDIA-specific extensions and customizations | <nvexec/...> | ::nvexec | -| | | +int main() { + auto sched = ex::get_parallel_scheduler(); + auto fun = [](int i) { return i * i; }; -## How to get `stdexec` + // Build a lazy pipeline: three squares, computed in parallel. + auto work = ex::when_all(ex::on(sched, ex::just(0) | ex::then(fun)), + ex::on(sched, ex::just(1) | ex::then(fun)), + ex::on(sched, ex::just(2) | ex::then(fun))); -There are a few ways to get `stdexec`: -1. Clone from GitHub - - `git clone https://github.com/NVIDIA/stdexec.git` - -2. Download the [NVIDIA HPC SDK starting with - 22.11](https://developer.nvidia.com/nvidia-hpc-sdk-releases) - -3. (Recommended) Use [CMake Package Manager (CPM)](https://github.com/cpm-cmake/CPM.cmake) - to automatically pull `stdexec` as part of your CMake project. [See - below](#cmake-package-manager-cpm) for more information. + // Launch the work and wait for the result. + auto [i, j, k] = ex::sync_wait(std::move(work)).value(); + std::printf("%d %d %d\n", i, j, k); // prints "0 1 4" +} +``` -You can also try it directly on [godbolt.org](https://godbolt.org/z/acaE93xq3) where it is -available as a C++ library or via the nvc++ compiler starting with version 22.11 ([see -below](#nvhpc-sdk) for more details). +## Features -## Using `stdexec` +- **C++26 reference implementation** of `std::execution` (P2300). +- **Header-only**, no external dependencies. +- **Composable algorithms**: `then`, `let_value`, `when_all`, `bulk`, `split`, `transfer`, `upon_*`, ... +- **Structured concurrency primitives**: `async_scope`, `task`, `finally`, `when_any`, `repeat_n`, ... +- **Pluggable schedulers**: system parallel scheduler, static thread pool, Linux `io_uring` context, NVIDIA GPU contexts, your own. +- **GPU offload** via `nvexec` schedulers (`nvc++` compiler). +- **Coroutine interop**: senders are awaitable; awaitables are senders. +- **Generic extensions** (``) for primitives not (yet) in the standard. -### Requirements +## Compiler support -`stdexec` requires compiling with C++20 (`-std=c++20`) but otherwise does not have any -dependencies and only requires a sufficiently new compiler: +| Compiler | Minimum version | Notes | +|---|---|---| +| GCC | 12 | | +| Clang | 16 | | +| MSVC | 14.43 | | +| Xcode (Apple Clang) | 16 | | +| nvc++ | 25.9 | required for [GPU support](#gpu-support) | -- gcc 12+ -- clang 16+ -- MSVC 14.43+ -- XCode 16+ -- [nvc++ 25.9+](https://developer.nvidia.com/nvidia-hpc-sdk-releases) (required for [GPU - support](#gpu-support)). +Requires `-std=c++20` or later. > [!NOTE] -> `stdexec` does not yet work with NVIDIA's nvcc compiler. - -How you configure your environment to use `stdexec` depends on how you got `stdexec`. +> `stdexec` does not yet support NVIDIA's `nvcc` compiler. -### NVHPC SDK +## Installation -Starting with the 22.11 release of the [NVHPC -SDK](https://developer.nvidia.com/nvidia-hpc-sdk-releases), `stdexec` is available as an -experimental, opt-in feature. Specifying the `--experimental-stdpar` flag to `nvc++` makes -the `stdexec` headers available on the include path. You can then include any `stdexec` -header as normal: `#include `, `#include `. See [godbolt -example](https://godbolt.org/z/qc1h3sqEv). +Pick whichever fits your project. -GPU features additionally require specifying `-stdpar=gpu`. For more details, see [GPU -Support](#gpu-support). +### CPM (recommended) -### GitHub +[CPM](https://github.com/cpm-cmake/CPM.cmake) fetches and configures `stdexec` automatically from your `CMakeLists.txt`: -As a (mostly) header-only C++ library, technically all one needs to do is add the -`stdexec` `include/` directory to your include path as `-I/include` in -addition to specifying any necessary compile options. +```cmake +CPMAddPackage( + NAME stdexec + GITHUB_REPOSITORY NVIDIA/stdexec + GIT_TAG main # or a specific tag +) -For simplicity, we recommend using the [CMake targets](#cmake) that `stdexec` provides as -they encapsulate the necessary configuration. +target_link_libraries(my_target PRIVATE STDEXEC::stdexec) +``` -#### cmake +### `add_subdirectory` -If your project uses CMake, then after cloning `stdexec` simply add the following to your -`CMakeLists.txt`: +Clone alongside your project and add it as a subdirectory: +```bash +git clone https://github.com/NVIDIA/stdexec.git ``` -add_subdirectory() + +```cmake +add_subdirectory(stdexec) +target_link_libraries(my_target PRIVATE STDEXEC::stdexec) ``` -This will make the `STDEXEC::stdexec` target available to link with your project: +### Conan -``` -target_link_libraries(my_project PRIVATE STDEXEC::stdexec) -``` +A [`conanfile.py`](conanfile.py) is provided for use with the [Conan](https://conan.io) package manager. -This target encapsulates all of the necessary configuration and compiler flags for using -`stdexec`. +### NVIDIA HPC SDK +Starting with [NVHPC SDK 22.11](https://developer.nvidia.com/nvidia-hpc-sdk-releases), `stdexec` is bundled with `nvc++`. Pass `--experimental-stdpar` to put `stdexec` headers on the include path. Add `-stdpar=gpu` for GPU features. See the [godbolt example](https://godbolt.org/z/qc1h3sqEv). -#### CMake Package Manager (CPM) +### Manual include path -To further simplify obtaining and including `stdexec` in your CMake project, we recommend -using [CMake Package Manager (CPM)](https://github.com/cpm-cmake/CPM.cmake) to fetch and -configure `stdexec`. +`stdexec` is header-only, so adding `-I/include` to your compile command is sufficient. Using the CMake target is recommended because it sets the required compile flags. -Complete example: +## Quick start -``` -cmake_minimum_required(VERSION 3.25.0 FATAL_ERROR) +A minimal `CMakeLists.txt` using CPM: -project(stdexecExample) +```cmake +cmake_minimum_required(VERSION 3.25.0) +project(stdexec_example LANGUAGES CXX) -# Get CPM. For more information on how to add CPM to your project, see: -# https://github.com/cpm-cmake/CPM.cmake#adding-cpm -include(CPM.cmake) +include(CPM.cmake) # see https://github.com/cpm-cmake/CPM.cmake#adding-cpm CPMAddPackage( NAME stdexec GITHUB_REPOSITORY NVIDIA/stdexec - GIT_TAG main # This will always pull the latest code from the `main` branch. - # You may also use a specific release version or tag. + GIT_TAG main ) -add_executable(main example.cpp) - -target_link_libraries(main STDEXEC::stdexec) +add_executable(example example.cpp) +target_link_libraries(example PRIVATE STDEXEC::stdexec) ``` -### GPU Support +## GPU support + +`stdexec` ships GPU schedulers in [``](include/nvexec/) for use with `nvc++ -stdpar=gpu`: -`stdexec` provides schedulers that enable execution on NVIDIA GPUs: +| Scheduler | Header | Description | +|---|---|---| +| `nvexec::stream_scheduler` | [``](include/nvexec/stream_context.cuh) | Single-GPU scheduler (device 0). | +| `nvexec::multi_gpu_stream_scheduler` | [``](include/nvexec/multi_gpu_context.cuh) | Multi-GPU scheduler across all visible devices. | -- `nvexec::stream_scheduler` - - Single GPU scheduler that executes on the first available GPU (device 0) - - Defined in - [``](https://github.com/NVIDIA/stdexec/blob/main/include/nvexec/stream_context.cuh) +Live example: -- `nvexec::multi_gpu_stream_scheduler` - - Executes on all visible GPUs - - Defined in - [``](https://github.com/NVIDIA/stdexec/blob/main/include/nvexec/multi_gpu_context.cuh) +## Examples gallery -These schedulers are only supported when using the `nvc++` compiler with `-stdpar=gpu`. +The [`examples/`](examples/) directory contains runnable programs demonstrating the library. -Example: https://godbolt.org/z/h7rh5qGhj +| Example | What it shows | +|---|---| +| [`hello_world.cpp`](examples/hello_world.cpp) | The "hello world" of senders. | +| [`hello_coro.cpp`](examples/hello_coro.cpp) | Awaiting a sender from a coroutine. | +| [`then.cpp`](examples/then.cpp) | Writing a `then` algorithm from scratch. | +| [`retry.cpp`](examples/retry.cpp) | Writing a `retry` algorithm from scratch. | +| [`scope.cpp`](examples/scope.cpp) | Structured concurrency with `async_scope`. | +| [`io_uring.cpp`](examples/io_uring.cpp) | Async I/O via the Linux `io_uring` context. | +| [`sudoku.cpp`](examples/sudoku.cpp) | A parallel sudoku solver. | +| [`server_theme/`](examples/server_theme/) | Server-style patterns (`let_value`, `split`, `bulk`, `transfer`). | +| [`nvexec/`](examples/nvexec/) | GPU schedulers, including the Maxwell solver. | -## Building +## Documentation -`stdexec` is a header-only library and does not require building anything. +**πŸ“– Full documentation: ** -This section is only relevant if you wish to build the `stdexec` tests or examples. +- **User guide**: ([source](docs/source/user/)) +- **Reference**: ([source](docs/source/reference/)) +- **Developer docs**: ([source](docs/source/developer/)) +- **Contributing to docs**: [`docs/CONTRIBUTING-docs.md`](docs/CONTRIBUTING-docs.md) +- **The proposal**: [`[exec]` β€” `std::execution`](https://wg21.link/exec) -The following tools are needed: +The library is organized into three namespaces: -* [`CMake`](https://cmake.org/) -* One of the following supported C++ compilers: - * GCC 11+ - * clang 12+ - * nvc++ 25.9 +| Namespace | Headers | Contents | +|---|---|---| +| `::stdexec` | `` | Things in (or proposed for) the C++ standard. | +| `::exec` | `` | Generic additions and extensions. | +| `::nvexec` | `` | NVIDIA-specific schedulers and customizations. | -Perform the following actions: +## Building tests and examples + +The library itself is header-only β€” these steps are only needed if you want to build the test suite or the examples. ```bash -# Configure the project -cmake -S . -B build -G -# Build the project +cmake -S . -B build -G Ninja cmake --build build +ctest --test-dir build ``` -Here, `` can be `Ninja`, `"Unix Makefiles"`, `XCode`, `"Visual Studio 15 Win64"`, etc. +To select a specific compiler: -### Specifying the compiler +```bash +cmake -S . -B build/clang -DCMAKE_CXX_COMPILER=$(which clang++) +cmake --build build/clang +``` -You can set the C++ compiler via `-D CMAKE_CXX_COMPILER`: +To use `libc++` with Clang: ```bash -# Use GCC: -cmake -S . -B build/g++ -DCMAKE_CXX_COMPILER=$(which g++) -cmake --build build/g++ - -# Or clang: -cmake -S . -B build/clang++ -DCMAKE_CXX_COMPILER=$(which clang++) -cmake --build build/clang++ +cmake -S . -B build/libcxx \ + -DCMAKE_CXX_COMPILER=$(which clang++) \ + -DCMAKE_CXX_FLAGS=-stdlib=libc++ +cmake --build build/libcxx ``` -### Specifying the stdlib +## IDE support -If you want to use `libc++` with clang instead of `libstdc++`, you can specify the standard library as follows: +A [VSCode extension](https://marketplace.visualstudio.com/items?itemName=ericniebler.erics-build-output-colorizer) is available that colorizes compiler diagnostics from `stdexec`, making the long template error messages much easier to read. Source and configuration: . -```bash -# Do the actual build -cmake -S . -B build/clang++ -G \ - -DCMAKE_CXX_FLAGS=-stdlib=libc++ \ - -DCMAKE_CXX_COMPILER=$(which clang++) +## Resources + +### Standards papers + +- [P2300 β€” `std::execution`](https://wg21.link/p2300) β€” the proposal accepted into C++26. + +### Talks + +- [Working with Asynchrony Generically: A Tour of Executors](https://www.youtube.com/watch?v=xLboNIf7BTg) ([Part 2](https://www.youtube.com/watch?v=6a0zzUBUNW4)) β€” comprehensive introduction. +- [From Zero to Sender/Receiver in ~60 Minutes](https://www.youtube.com/watch?v=xiaqNvqRB2E) β€” live-coding a toy sender/receiver from scratch. +- [A Unifying Abstraction for Async in C++](https://www.youtube.com/watch?v=h-ExnuD6jms) β€” concepts behind P2300. +- [Structured Concurrency](https://www.youtube.com/watch?v=1Wy5sq3s2rg) β€” what structured concurrency means and why. +- [Structured Networking in C++](https://www.youtube.com/watch?v=XaNajUp-sGY) β€” what a P2300-style networking library could look like. + +### Articles and blog posts -cmake --build build/clang++ +- [What are Senders Good For, Anyway?](https://ericniebler.com/2024/02/04/what-are-senders-good-for-anyway/) β€” wrapping a C-style async API in a sender. +- [A Universal Async Abstraction for C++](https://cor3ntin.github.io/posts/executors/) β€” an introduction to senders. +- [A Universal I/O Abstraction for C++](https://cor3ntin.github.io/posts/iouring/) β€” senders meet `io_uring`. +- [Executors: a Change of Perspective](https://accu.org/journals/overload/29/165/teodorescu/) β€” on the computational completeness of senders. +- [Structured Concurrency in C++](https://accu.org/journals/overload/30/168/teodorescu/) β€” how senders manifest structured concurrency. +- [HPCWire: New C++ Sender Library Enables Portable Asynchrony](https://www.hpcwire.com/2022/12/05/new-c-sender-library-enables-portable-asynchrony/). + +### NVIDIA + +- [NVIDIA HPC SDK documentation](https://docs.nvidia.com/hpc-sdk/index.html). + +## Contributing + +Contributions are welcome. Before opening a PR, please review: + +- [`CODE_OF_CONDUCT.md`](CODE_OF_CONDUCT.md) +- [`MAINTAINERS.md`](MAINTAINERS.md) +- [`docs/CONTRIBUTING-docs.md`](docs/CONTRIBUTING-docs.md) for documentation contributions. + +Bug reports and feature requests belong in [GitHub Issues](https://github.com/NVIDIA/stdexec/issues); design discussion in [GitHub Discussions](https://github.com/NVIDIA/stdexec/discussions). + +## Citation + +If you reference `stdexec` in academic work, please cite the standards proposal: + +```bibtex +@techreport{P2300, + author = {Niebler, Eric and Shoop, Kirk and Baker, Lewis and Dominiak, MichaΕ‚ and + Evtushenko, Georgy and Teodorescu, Lucian Radu and Howes, Lee and Garland, + Michael and Lelbach, Bryce Adelstein} + title = {{P2300R10}: \texttt{std::execution}}, + institution = {ISO/IEC JTC1/SC22/WG21}, + year = {2024}, + url = {https://wg21.link/p2300} +} ``` -## Resources -- [Working with Asynchrony Generically: A Tour of Executors: Part 1](https://www.youtube.com/watch?v=xLboNIf7BTg) ([Part 2](https://www.youtube.com/watch?v=6a0zzUBUNW4)) (Video): A comprehensive introduction to Senders and structured concurrency -- [What are Senders Good For, Anyway?](https://ericniebler.com/2024/02/04/what-are-senders-good-for-anyway/) (Blog): Demonstrates the value of a standard async programming model by wrapping a C-style async API in a sender -- [From Zero to Sender/Receiver in ~60 Minutes](https://www.youtube.com/watch?v=xiaqNvqRB2E) (Video): Live-coding a toy sender/receiver implementation from scratch -- [A Unifying Abstraction for Async in C++](https://www.youtube.com/watch?v=h-ExnuD6jms) (Video): A simple introduction to the concepts behind P2300 -- [A Universal Async Abstraction for C++](https://cor3ntin.github.io/posts/executors/) (Blog): An introduction to Senders -- [A Universal I/O Abstraction for C++](https://cor3ntin.github.io/posts/iouring/) (Blog): A look at how the Senders concepts interact with `io_uring` on Linux -- [Structured Concurrency](https://www.youtube.com/watch?v=1Wy5sq3s2rg) (Video): An explanation of structured concurrency in C++ and its benefits -- [Executors: a Change of Perspective](https://accu.org/journals/overload/29/165/teodorescu/) (Article): An article about the computational completeness of Senders -- [Structured Concurrency in C++](https://accu.org/journals/overload/30/168/teodorescu/) (Article): An article about how Senders manifest the principles of structured concurrency -- [Structured Networking in C++](https://www.youtube.com/watch?v=XaNajUp-sGY) (Video): A look at what a P2300-style networking library could look like -- [HPCWire Article](https://www.hpcwire.com/2022/12/05/new-c-sender-library-enables-portable-asynchrony/): Provides a high-level overview of the Sender model and its benefits -- [NVIDIA HPC SDK Documentation](https://docs.nvidia.com/hpc-sdk/index.html): Documentation for the NVIDIA HPC SDK -- [P2300 - `std::execution`](https://wg21.link/p2300): Senders proposal to C++ Standard - -### Tooling - -For users of **VSCode**, stdexec provides a -[VSCode extension](https://marketplace.visualstudio.com/items?itemName=ericniebler.erics-build-output-colorizer) -that colorizes compiler output. The highlighter recognizes the diagnostics -generated by the stdexec library, styling them to make them easier to pick -out. Details about how to configure the extension can be found -[here](https://github.com/ericniebler/buildoutputcolorizer). +## License + +`stdexec` is licensed under the **Apache License 2.0 with LLVM Exceptions**. See [LICENSE.txt](LICENSE.txt) for the full text.