You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: arrayjit/lib/anatomy_of_a_backend.md
+10-3Lines changed: 10 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,7 +57,7 @@ In the future, when we introduce program search, `compile` functions will return
57
57
OCANNL classifies tensor nodes according to their memory properties:
58
58
59
59
```ocaml
60
-
(** A possible algorithm for deciding sharing within a single device:
60
+
(** A possible algorithm for deciding sharing within a single device:
61
61
- If a tensor node is read-only for a context, and not otherwise recorded, it is stored as a
62
62
cross-stream sharing candidate.
63
63
- If a cross-stream sharing candidate is read-only for another context, whose parent does not
@@ -71,9 +71,14 @@ OCANNL classifies tensor nodes according to their memory properties:
71
71
If a tensor node is shared cross-stream, within-device copying is a NOOP as source and
72
72
destination pointers are in that case identical. *)
73
73
type sharing =
74
-
| Unset
74
+
| Unset (** One of: [Per_stream], [Shared_cross_streams]. *)
75
75
| Per_stream (** The tensor node has separate arrays for each stream. *)
76
-
| Shared_cross_stream (** The tensor node has a single array per device. *)
76
+
| Shared_cross_streams
77
+
(** The tensor node has a single array per device that can appear in multiple contexts, except
78
+
for backends with [Option.is_some use_host_memory] and nodes with memory mode already
79
+
[Hosted (Changed_on_devices Shared_cross_streams)] before first linking on a device, where
80
+
it only has the on-host array. In that case the on-host array is registered in the
81
+
context, to avoid misleading behavior from `device_to_device`. *)
77
82
78
83
type memory_type =
79
84
| Constant (** The tensor node does not change after initialization. *)
@@ -110,6 +115,8 @@ A backend can make more refined distinctions, for example a `Local` node in CUDA
110
115
111
116
Contexts track (or store) the on-device arrays corresponding to tensor nodes. Contexts form a hierarchy: linking takes a parent context and outputs a child context. Related contexts that use a tensor node must use the same on-device array for the tensor node. If two unrelated contexts are on the same device, i.e. have a common ancestor, and use the same tensor node that is not part of the most recent common ancestor, the behavior is undefined.
112
117
118
+
To avoid misleading behavior of `device_to_device` data movement, non-constant materialized tensor nodes are represented in contexts making use of them, even when the underlying array is on host. This way the logic remains the same regardless of whether a backend shares memory with the host. We are careful to not accidentally call `free_buffer` on hosted arrays.
119
+
113
120
## Typical details of a backend implementation
114
121
115
122
During the compilation process, the old context cannot be available when `compile` is handled. Currently, all backends generate context-and-device-independent kernels, that refer to context arrays via parameters.
0 commit comments