diff --git a/docs/extended_api/memory_model.md b/docs/extended_api/memory_model.md index e0e08a54c1..b00f37a860 100644 --- a/docs/extended_api/memory_model.md +++ b/docs/extended_api/memory_model.md @@ -12,6 +12,13 @@ It is low across threads within a block, but high across arbitrary threads in th To account for non-uniform thread synchronization costs that are not always low, CUDA C++ extends the standard C++ memory model and concurrency facilities in the `cuda::` namespace with **thread scopes**, retaining the syntax and semantics of standard C++ by default. +## Asynchronous operations + +[Asynchronous operations] - like the copy operations performed by [`memcpy_async`] - are performed _as-if_ by new _asynchronous threads_. + +[Asynchronous operations]: extended_api/asynchronous_operations.md +[`memcpy_async`]: extended_api/asynchronous_operations/memcpy_async.md + ## Thread Scopes A _thread scope_ specifies the kind of threads that can synchronize with each other using synchronization primitive such as [`atomic`] or [`barrier`]. @@ -39,6 +46,7 @@ Each program thread is related to each other program thread by one or more threa - Each GPU thread is related to each other GPU thread in the same CUDA device by the *device* thread scope: `thread_scope_device`. - Each GPU thread is related to each other GPU thread in the same CUDA thread block by the *block* thread scope: `thread_scope_block`. - Each thread is related to itself by the `thread` thread scope: `thread_scope_thread`. +- Each thread is related to each asynchronous thread that it creates by all scopes. ## Synchronization primitives