Skip to content

Commit e763642

Browse files
committed
FInal small tweaks to anatomy_of_a_backend.md before release v0.5
1 parent 16ac352 commit e763642

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

arrayjit/lib/anatomy_of_a_backend.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,7 @@ The modules and files of `arrayjit` can loosely be divided into three parts.
4242
- The functor `Raise_backend` converts any backend implementation relying on the `Low_level` representation (all backends currently), to match the user-facing `Backend_intf.Backend` interface (which relies on the high-level `Assignments` representation).
4343
- The functor `Add_buffer_retrieval_and_syncing` (used by `Raise_backend`) converts (array pointer) `buffer_ptr`-level copying opeations, to tensor node level, and adds per-tensor-node stream-to-stream synchronization.
4444
- Putting the above together with the device specific implementations, and exposing the resulting modules to the user via backend names.
45-
- It also exposes a couple of backend-generic functions:
46-
- `reinitialize` a backend,
45+
- It also exposes backend-generic functions, currently just one:
4746
- `finalize` a context (freeing all of its arrays that don't come from its parent context).
4847

4948
### Batch compilation; in the future: lazy and cached compilation artifacts
@@ -187,7 +186,7 @@ Besides routines, calling `from_host`, `to_host`, `device_to_device` from a back
187186

188187
### Data transfers
189188

190-
OCANNL supports asynchronous data transfers by embedding them in the scheduling mechanism.
189+
OCANNL supports asynchronous data transfers -- `from_host`, `to_host`, `device_to_device` -- by embedding them in the scheduling mechanism. The transfers themselves synchronize streams in a non-blocking way -- when it's time for the destination stream to copy a node, it waits for the source stream to finish computing the node.
191190

192191
OCANNL provides explicit _merge buffers_ for performing those tensor node updates, where different versions of a tensor node from two streams feature in the same computation. The `%cd` syntax for using merge buffers is via the `.merge` pseudo-field. For example, the code for merging gradients might be: `[%cd p.grad =+ p.grad.merge]`. In the current design, there's at most one merge buffer per stream, and the memory is reused for merging different nodes. We keep track of the specific tensor node that was scheduled to occupy this buffer in the stream, and the merge node expected by the linked code, so that we can detect mismatches at scheduling time.
193192

0 commit comments

Comments
 (0)