You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Lifting of the applications of ~config arguments: if it's an error, refactor your code](#lifting-of-the-applications-of-config-arguments-if-its-an-error-refactor-your-code)
-[Intricacies of the syntax extension %cd](#implementation-extension-cd)
21
22
- In a nutshell
22
23
- Syntax extension `%cd` stands for "code", to express assignments: `Assignments.t`.
23
24
- Syntax extension `%op` stands for "operation", to express tensors: `Tensor.t`.
@@ -165,6 +166,17 @@ let%op mlp_layer ~config x = !/ ("w" * x + "b" config.hid_dim)
165
166
166
167
## Using OCANNL's generalized einsum notation
167
168
169
+
As we mentioned above, in the `%cd` syntax you can set up an arbitrary assignment with projections derived from a generalized einsum specification, by passing the specification as a string with the `~logic` label. However, both the `%cd` and `%op` syntaxes support built-in operators that take an einsum specification: `*+` binding to `NTDSL.einsum` resp. `TDSL.einsum`, and `++` binding to `NTDSL.einsum1` resp. `TDSL.einsum1`. `*+` is a "ternary" operator, binary wrt. tensor arguments, and `++` is a binary operator, unary postfix wrt. tensor arguments. The einsum specification string should directly follow `*+` and `++`.
170
+
171
+
Both `*+` and `++` use addition for the accumulation operation; `*+` uses multiplication. You can verify that looking at the `Operation.einsum` and `Operation.einsum1` definitions. You can find examples of `*+` and `++` behavior in the test suite [einsum_trivia.ml](test/einsum_trivia.ml). A frequent use-case for `++` is to sum out all axes of a tensor:
where `(!..)` converts an integer into a constant tensor.
179
+
168
180
## Further features of the syntax extension `%cd` {#features-of-syntax-cd}
169
181
170
182
### Referencing arrays: tensor value, tensor gradient, merge buffer of a tensor node
@@ -181,6 +193,17 @@ For example, in a data-parallel computation, gradients of the same param `p` can
181
193
182
194
### Block comments
183
195
196
+
The `%cd` syntax uses the prefix operator `(~~)` in a semicolon sequence to introduce block comments:
197
+
198
+
```ocaml
199
+
type Assignments.t =
200
+
...
201
+
| Block_comment of string * t
202
+
...
203
+
```
204
+
205
+
Schematic example: `~~("space" "separated" "comment" "tensor p debug_name:" p); <scope of the comment>`. The content of the comment uses application syntax, must be composed of strings, `<tensor>`, `<tensor>.value` (equivalent to `<tensor>`), `<tensor>.grad` components, where `<tensor>` is any tensor expression or tensor identifier.
206
+
184
207
## Further features of the syntax extension `%op` {#features-of-syntax-op}
185
208
186
209
### Name from binding
@@ -193,6 +216,19 @@ The resulting (primary) tensor's label will also have incorporated the label of
193
216
194
217
Note that we do not include `config.label`, even if `config` is available, because the actually applied input argument will typically have more specific information.
In the `%op` syntax, when a tuple follows an inline declaration of a tensor (i.e. a string literal), the tuple is passed to specify the output axes in the tensor definition (via the `~output_dims` argument).
222
+
223
+
When it is an integer, an identifier, or a record field dereference following an inline declaration, this expression specifies the single output axis in the tensor definition. You can see an example above in this document: `let%op mlp_layer ~config x = !/ ("w" * x + "b" config.hid_dim)`.
224
+
225
+
If it is a list expression following an inline declaration, the expression is parsed as an [N-dimensional array constant](#numeric-and-n-dimensional-array-literals), and used to initialize the value tensor node of the defined tensor. A very simple example from [micrograd_demo: Micrograd README basic example](test/micrograd_demo.ml):
226
+
227
+
```ocaml
228
+
let%op c = "a" [ -4 ] + "b" [ 2 ] in
229
+
...
230
+
```
231
+
196
232
### Lifting of the applications of `~config` arguments: if it's an error, refactor your code
197
233
198
234
If you recall, inline declared param tensors get lifted out of functions except for the function `fun ~config ->`, where they get defined. Our example `let%op mlp_layer ~config x = !/ ("w" * x + "b" config.hid_dim)` translates as:
@@ -253,23 +289,43 @@ Unfortunately, we need to be mindful to introduce params at the right times.
The translate function returns an record. The `expr` field (filler expression) meaning depends on `typ` (filler type): for `Code`, this is an `Assignments.t` expression. For `Unknown` and `Tensor`, this is a `Tensor.t` expression. For `Array` and `Merge_value`, this is a non-optional `Tnode.t` expression, and for `Grad_of_tensor` and `Merge_grad`, it's an optional `Tnode.t` expresssion.
259
-
260
-
Next, `setup_array ~is_lhs:true` converts the filler expression into a `Tnode.t option` expression, and `setup_array ~is_lhs:false` converts the filler into an `Assignments.buffer option` expression according to `filler_typ`.
294
+
OCANNL has a built-in numerical binary operation to-power-of: `Ops.ToPowOf`. As part of assignments, the corresponding operator is `**`. Here is the full definition of the to-power-of tensor operation from [Operation](lib/operation.ml):
On the `Tensor` level, this is implemented as a binary tensor operation, but it is exposed as a unary tensor operation! To avoid the complexities of propagating gradient into the exponent, `Operation.pointpow` is implemented as a function of only one tensor, the exponent is a number. We hard-code the pointwise-power-of operator `NTDSL.O.( **. )`, resp. `TDSL.O.( **. )`, in the `%cd` and `%op` syntaxes, to pass the numeric value to `pointpow` (the second argument of `**.`) without converting it to a tensor first.
319
+
320
+
### Intricacies of the syntax extension `%cd` {#implementation-extension-cd}
321
+
322
+
The syntax `%cd` translator needs to accomplish more than a context-free conversion of a concise notation to an `Assignments.t` data-type.
323
+
324
+
- It needs to keep track if `~projections` is in scope, and it needs to collect the information about an assignment to properly transofm the projections from the scope into the projections valid for the particular assignment.
325
+
- Whenever the parsed notation uses tensors whose value nodes have not been computed yet, the translator needs to include the "forward" code of the tensors among the generated assignments. Typically this is required for embedded tensor expressions, which create new tensors. The translator puts the forward code in sequence just prior to the assignment that made use of the created tensor. The translator includes the forward code of tensors that are "forward roots" at the time the assigments are constructed (using `Tensor.is_fwd_root`).
326
+
- For inline declarations of tensors, the translator needs to pick the right other tensor, if any, to enrich the label information of the created tensor. Mechanisms:
327
+
- Prefer tensors from identifiers (or field dereferences), since labels of tensor expressions (creating new tensors) will typically be overly verbose.
328
+
- Filter out escaping variables (identifiers coming from nested function parameters).
329
+
- When one inline declaration uses another inline declaration on its right-hand-side, recall the other declaration's label-enriching-tensor and use it directly.
330
+
- The argument slots in `Assignments.Accum_binop` and `Assignments.Accum_unop` can be either regular tensor nodes, or merge buffers of tensor nodes. The translator needs to determine that.
331
+
- When a tensor expression is used to create a new tensor, the translator lifts the expression into a let-binding, to be able to refer to the (same) tensor more than once. The created tensor is referred to at least twice: at its use site, and to include its forward code among the assignments.
0 commit comments