You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AGENTS.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@
18
18
## Coding Style & Naming Conventions
19
19
- OCaml formatting enforced by `.ocamlformat` (margin 100, parse/wrap docstrings). Run `dune fmt` before pushing.
20
20
- Overall preference for snake_case (e.g. files `my_module.ml`); OCaml enforces capitalized modules and constructors (`My_module`, `My_variant`).
21
-
- Prefer small, composable functions; avoid needless global state. PPX usage (`%op`, `%cd`) is described in `lib/syntax_extensions.md`.
21
+
- Prefer small, composable functions; avoid needless global state. PPX usage (`%op`, `%cd`) is described in `docs/syntax_extensions.md`.
22
22
23
23
## Testing Guidelines
24
24
- Frameworks: `ppx_expect` for inline `%expect` tests, and Dune `test` stanzas for tests with output targets in `.expected` files. Tests live under `test/<area>/*.ml` with paired `*.expected` where applicable.
2. Get some basic grasp of the aims and design of the project by reading or skimming files in [test/](test/).
47
-
3. Read the syntax extensions documentation [lib/syntax_extensions.md](lib/syntax_extensions.md).
48
-
4. Read the introductory part of the shape inference documentation [lib/shape_inference.md](lib/shape_inference.md).
47
+
3. Read the syntax extensions documentation [docs/syntax_extensions.md](docs/syntax_extensions.md).
48
+
4. Read the introductory part of the shape inference documentation [docs/shape_inference.md](docs/shape_inference.md).
49
49
5. Read the configuration documentation [ocannl_config.example](ocannl_config.example).
50
50
6. Improve your understanding by reading or skimming: [lib/shape.mli](lib/shape.mli), [lib/tensor.mli](lib/tensor.mli), [lib/operation.ml](lib/operation.ml), [arrayjit/lib/backend_intf.ml](arrayjit/lib/backend_intf.ml), [lib/train.ml](lib/train.ml), and [lib/nn_blocks.ml](lib/nn_blocks.ml).
2. Backend-independent optimizations [arrayjit/lib/lowering_and_inlining.md](arrayjit/lib/lowering_and_inlining.md) -- _lowering_ means translating (compiling) from the high-level representation (as assignments) to the low-level representation.
2. Backend-independent optimizations [docs/lowering_and_inlining.md](arrayjit/lib/lowering_and_inlining.md) -- _lowering_ means translating (compiling) from the high-level representation (as assignments) to the low-level representation.
55
55
3. More documentation to come.
56
56
57
57
### Using the tracing debugger with CUDA computations
Let's write a user-centered introduction to how shapes work in OCANNL. Let's put the slides in docs/slides-shapes_and_einsum.md , and write them using slipshow navigation metadata as described in docs/CLAUDE.md . The slides should take a user from a beginner to advanced in making full use of shape inference and generalized einsum notation when building neural network models. They should end up aware of how projections work, how to lean on shape inference or row variables / ellipsis notation to not commit to dimension sizes or for example the number of batch axes unnecessarily. They should learn when to use the dedicated einsum operators `++` and `+*` (these operators are translated by syntax extensions to `einsum1` and `einsum`). They should be able to use what they learned to construct a max pooling layer operation, and any other challenges they encounter in NN modeling. Consider these sources of information: files docs/syntax_extensions.md , docs/shape_inference.md , docs/shape.mli , selected parts of lib/operation.ml , selected parts of docs/slides-basics_backprop_training_codegen.md . Let me also provide some points that might not be stated sufficiently explicitly in other documentation. (1) The split of axes into kinds does not enforce semantics, because the generalized einsum notation can make aribtrary use of the axes. However, it offers expressivity gains: (a) outside of einsum spec, there is a shape logic specification with syntax `~logic:"@"`, where all input axes of the first tensor are reduced with all output axes of the second tensor, generalizing matrix multiplication to tensor multiplication -- with einsum spec, any two kinds of axes can be picked to reduce together, but it would not be possible without having distinct kinds; (b) having multiple kinds, thus opportunity for multiple row variables per tensor, allows more patterns of reorganizing and reducing axes, while being agnostic to the total number of axes -- for example, one could build code for a multihead-attention tranformer, that is agnostic whether one uses one batch axis or two batch+microbatch axes, and simultaneously is agnostic whether one uses one single-axis regular 1D attention or two-axes 2D axial attention, while handling the head-number axis as needed. (2) It's important to stress the syntactic difference with NumPy: since we use `->` to separate input and output axes, it cannot mean separating the argument tensor(s) from the result tensor -- thus `=>` is used to the left of the result tensor. (3) Remember to use kind separators where you intend to use the distinct axis kinds, e.g use `|` after batch axes . (4) To trigger multichar mode there must be a comma in the spec, it can be a trailing comma e.g. "input->output, => output->input" . (5) A reminder that, as defined in lib/operation.ml , `*` stands for tensor multiplication and `*.` stands for pointwise multiplication when working with tensor expressions (rather than low-level assignments in the `%cd` syntax). (7) The user can define operations analogous to the `einsum1` and `einsum` operation in lib/operation.ml , for example with the max operator as theaccumulation operator -- this is not so scary, operations can be easily added by users even if not inside lib/operation.ml .
| Matrix multiply |`torch.einsum('ij,jk->ik', a, b)`|`a +* "i j; j k => i k" b`|`a +* "i, j; j, k => i, k" b`|
148
149
| Batch matmul |`torch.einsum('bij,bjk->bik', a, b)`|`a +* "b i j; b j k => b i k" b`|`a +* "batch, i -> j; batch, j -> k => batch, i -> k" b`|
149
-
| Attention scores |`torch.einsum('bqhd,bkhd->bhqk', q, k)`|`q +* "b q \| h d; b k \| h d => b \| q k -> h" k`|`q +* "b, q \| h, d; b, k \| h, d => b \| q, k -> h" k`|
0 commit comments