Skip to content

Commit 5c04afc

Browse files
committed
Yay, the completed syntax extensions documentation!
1 parent daef2bf commit 5c04afc

File tree

1 file changed

+73
-17
lines changed

1 file changed

+73
-17
lines changed

lib/syntax_extensions.md

Lines changed: 73 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,11 @@
1414
- [Further features of the syntax extension %op](#features-of-syntax-op)
1515
- [Name from binding](#name-from-binding)
1616
- [Label from function argument](#label-from-function-argument)
17+
- [Configuring inline declarations: inline output dimensions, initial values](#configuring-inline-declarations-inline-output-dimensions-initial-values)
1718
- [Lifting of the applications of ~config arguments: if it's an error, refactor your code](#lifting-of-the-applications-of-config-arguments-if-its-an-error-refactor-your-code)
1819
- [Implementation details](#implementation-details)
19-
- [Syntax extension %cd](#implementation-extension-cd)
20-
- [Syntax extension %op](#implementation-extension-op)
20+
- [The hard-coded to-the-power-of operator](#the-hard-coded-to-the-power-of-operator)
21+
- [Intricacies of the syntax extension %cd](#implementation-extension-cd)
2122
- In a nutshell
2223
- Syntax extension `%cd` stands for "code", to express assignments: `Assignments.t`.
2324
- Syntax extension `%op` stands for "operation", to express tensors: `Tensor.t`.
@@ -165,6 +166,17 @@ let%op mlp_layer ~config x = !/ ("w" * x + "b" config.hid_dim)
165166

166167
## Using OCANNL's generalized einsum notation
167168

169+
As we mentioned above, in the `%cd` syntax you can set up an arbitrary assignment with projections derived from a generalized einsum specification, by passing the specification as a string with the `~logic` label. However, both the `%cd` and `%op` syntaxes support built-in operators that take an einsum specification: `*+` binding to `NTDSL.einsum` resp. `TDSL.einsum`, and `++` binding to `NTDSL.einsum1` resp. `TDSL.einsum1`. `*+` is a "ternary" operator, binary wrt. tensor arguments, and `++` is a binary operator, unary postfix wrt. tensor arguments. The einsum specification string should directly follow `*+` and `++`.
170+
171+
Both `*+` and `++` use addition for the accumulation operation; `*+` uses multiplication. You can verify that looking at the `Operation.einsum` and `Operation.einsum1` definitions. You can find examples of `*+` and `++` behavior in the test suite [einsum_trivia.ml](test/einsum_trivia.ml). A frequent use-case for `++` is to sum out all axes of a tensor:
172+
173+
```ocaml
174+
let%op scalar_loss = (margin_loss ++ "...|... => 0") /. !..batch_size in
175+
...
176+
```
177+
178+
where `(!..)` converts an integer into a constant tensor.
179+
168180
## Further features of the syntax extension `%cd` {#features-of-syntax-cd}
169181

170182
### Referencing arrays: tensor value, tensor gradient, merge buffer of a tensor node
@@ -181,6 +193,17 @@ For example, in a data-parallel computation, gradients of the same param `p` can
181193

182194
### Block comments
183195

196+
The `%cd` syntax uses the prefix operator `(~~)` in a semicolon sequence to introduce block comments:
197+
198+
```ocaml
199+
type Assignments.t =
200+
...
201+
| Block_comment of string * t
202+
...
203+
```
204+
205+
Schematic example: `~~("space" "separated" "comment" "tensor p debug_name:" p); <scope of the comment>`. The content of the comment uses application syntax, must be composed of strings, `<tensor>`, `<tensor>.value` (equivalent to `<tensor>`), `<tensor>.grad` components, where `<tensor>` is any tensor expression or tensor identifier.
206+
184207
## Further features of the syntax extension `%op` {#features-of-syntax-op}
185208

186209
### Name from binding
@@ -193,6 +216,19 @@ The resulting (primary) tensor's label will also have incorporated the label of
193216

194217
Note that we do not include `config.label`, even if `config` is available, because the actually applied input argument will typically have more specific information.
195218

219+
### Configuring inline declarations: inline output dimensions, initial values
220+
221+
In the `%op` syntax, when a tuple follows an inline declaration of a tensor (i.e. a string literal), the tuple is passed to specify the output axes in the tensor definition (via the `~output_dims` argument).
222+
223+
When it is an integer, an identifier, or a record field dereference following an inline declaration, this expression specifies the single output axis in the tensor definition. You can see an example above in this document: `let%op mlp_layer ~config x = !/ ("w" * x + "b" config.hid_dim)`.
224+
225+
If it is a list expression following an inline declaration, the expression is parsed as an [N-dimensional array constant](#numeric-and-n-dimensional-array-literals), and used to initialize the value tensor node of the defined tensor. A very simple example from [micrograd_demo: Micrograd README basic example](test/micrograd_demo.ml):
226+
227+
```ocaml
228+
let%op c = "a" [ -4 ] + "b" [ 2 ] in
229+
...
230+
```
231+
196232
### Lifting of the applications of `~config` arguments: if it's an error, refactor your code
197233

198234
If you recall, inline declared param tensors get lifted out of functions except for the function `fun ~config ->`, where they get defined. Our example `let%op mlp_layer ~config x = !/ ("w" * x + "b" config.hid_dim)` translates as:
@@ -253,23 +289,43 @@ Unfortunately, we need to be mindful to introduce params at the right times.
253289

254290
## Implementation details
255291

256-
### Syntax extension `%cd` {#implementation-extension-cd}
292+
### The hard-coded to-the-power-of operator
257293

258-
The translate function returns an record. The `expr` field (filler expression) meaning depends on `typ` (filler type): for `Code`, this is an `Assignments.t` expression. For `Unknown` and `Tensor`, this is a `Tensor.t` expression. For `Array` and `Merge_value`, this is a non-optional `Tnode.t` expression, and for `Grad_of_tensor` and `Merge_grad`, it's an optional `Tnode.t` expresssion.
259-
260-
Next, `setup_array ~is_lhs:true` converts the filler expression into a `Tnode.t option` expression, and `setup_array ~is_lhs:false` converts the filler into an `Assignments.buffer option` expression according to `filler_typ`.
294+
OCANNL has a built-in numerical binary operation to-power-of: `Ops.ToPowOf`. As part of assignments, the corresponding operator is `**`. Here is the full definition of the to-power-of tensor operation from [Operation](lib/operation.ml):
261295

262296
```ocaml
263-
type expr_type =
264-
| Code
265-
| Array
266-
| Grad_of_tensor of expression
267-
| Tensor
268-
| Unknown
269-
| Merge_value
270-
| Merge_grad of expression
271-
272-
type projections_slot = LHS | RHS1 | RHS2 | Nonslot | Undet
297+
let rec pointpow ?(label : string list = []) ~grad_spec p t1 : Tensor.t =
298+
let module NTDSL = struct
299+
include Initial_NTDSL
300+
301+
module O = struct
302+
include NDO_without_pow
303+
304+
let ( **. ) ?label base exp = pointpow ?label ~grad_spec:Tensor.Prohibit_grad exp base
305+
end
306+
end in
307+
let p_t = NTDSL.number p in
308+
let%cd op_asn ~v ~t1 ~t2 ~projections = v =: v1 ** v2 ~projections in
309+
let%cd grad_asn =
310+
if Tensor.is_prohibit_grad grad_spec then fun ~v:_ ~g:_ ~t1:_ ~t2:_ ~projections:_ -> Asgns.Noop
311+
else if Float.equal p 2.0 then fun ~v:_ ~g ~t1 ~t2:_ ~projections -> g1 =+ p_t *. t1 * g
312+
else if Float.equal p 1.0 then fun ~v:_ ~g ~t1 ~t2:_ ~projections -> g1 =+ g
313+
else fun ~v:_ ~g ~t1 ~t2:_ ~projections -> g1 =+ p_t *. (t1 **. (p -. 1.)) * g
314+
in
315+
Tensor.binop ~label:("**." :: label) ~compose_op:Pointwise_bin ~op_asn ~grad_asn ~grad_spec t1 p_t
273316
```
274317

275-
### Syntax extension `%op` {#implementation-extension-op}
318+
On the `Tensor` level, this is implemented as a binary tensor operation, but it is exposed as a unary tensor operation! To avoid the complexities of propagating gradient into the exponent, `Operation.pointpow` is implemented as a function of only one tensor, the exponent is a number. We hard-code the pointwise-power-of operator `NTDSL.O.( **. )`, resp. `TDSL.O.( **. )`, in the `%cd` and `%op` syntaxes, to pass the numeric value to `pointpow` (the second argument of `**.`) without converting it to a tensor first.
319+
320+
### Intricacies of the syntax extension `%cd` {#implementation-extension-cd}
321+
322+
The syntax `%cd` translator needs to accomplish more than a context-free conversion of a concise notation to an `Assignments.t` data-type.
323+
324+
- It needs to keep track if `~projections` is in scope, and it needs to collect the information about an assignment to properly transofm the projections from the scope into the projections valid for the particular assignment.
325+
- Whenever the parsed notation uses tensors whose value nodes have not been computed yet, the translator needs to include the "forward" code of the tensors among the generated assignments. Typically this is required for embedded tensor expressions, which create new tensors. The translator puts the forward code in sequence just prior to the assignment that made use of the created tensor. The translator includes the forward code of tensors that are "forward roots" at the time the assigments are constructed (using `Tensor.is_fwd_root`).
326+
- For inline declarations of tensors, the translator needs to pick the right other tensor, if any, to enrich the label information of the created tensor. Mechanisms:
327+
- Prefer tensors from identifiers (or field dereferences), since labels of tensor expressions (creating new tensors) will typically be overly verbose.
328+
- Filter out escaping variables (identifiers coming from nested function parameters).
329+
- When one inline declaration uses another inline declaration on its right-hand-side, recall the other declaration's label-enriching-tensor and use it directly.
330+
- The argument slots in `Assignments.Accum_binop` and `Assignments.Accum_unop` can be either regular tensor nodes, or merge buffers of tensor nodes. The translator needs to determine that.
331+
- When a tensor expression is used to create a new tensor, the translator lifts the expression into a let-binding, to be able to refer to the (same) tensor more than once. The created tensor is referred to at least twice: at its use site, and to include its forward code among the assignments.

0 commit comments

Comments
 (0)