ahrefs
diff --git a/‎.github/workflows/gh-pages-docs.yml‎
Lines changed: 11 additions & 0 deletions b/‎.github/workflows/gh-pages-docs.yml‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎AGENTS.md‎
Lines changed: 1 addition & 1 deletion b/‎AGENTS.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎CLAUDE.md‎
Lines changed: 1 addition & 1 deletion b/‎CLAUDE.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 5 additions & 5 deletions b/‎README.md‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎arrayjit/lib/anatomy_of_a_backend.md‎ renamed to ‎docs/anatomy_of_a_backend.md‎ b/‎arrayjit/lib/anatomy_of_a_backend.md‎ renamed to ‎docs/anatomy_of_a_backend.md‎
diff --git a/‎docs/einsum-slides-request.txt‎
Lines changed: 1 addition & 0 deletions b/‎docs/einsum-slides-request.txt‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/html/index.html‎
Lines changed: 10 additions & 0 deletions b/‎docs/html/index.html‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎docs/html/style.css‎
Lines changed: 5 additions & 0 deletions b/‎docs/html/style.css‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎arrayjit/lib/lowering_and_inlining.md‎ renamed to ‎docs/lowering_and_inlining.md‎ b/‎arrayjit/lib/lowering_and_inlining.md‎ renamed to ‎docs/lowering_and_inlining.md‎
diff --git a/‎docs/migration_guide.md‎
Lines changed: 14 additions & 13 deletions b/‎docs/migration_guide.md‎
Lines changed: 14 additions & 13 deletions
@@ -34,6 +34,17 @@ jobs:
 
       - name: RL slides
         run: opam exec -- slipshow compile docs/slides-RL-REINFORCE.md -o docs/html/RL-REINFORCE.html
+        
+      - name: Setup pandoc
+        uses: pandoc/actions/setup@main
+        with:
+          version: 'latest'
+
+      - name: Migration guide
+        run: pandoc --toc -s --embed-resources --css=docs/html/style.css docs/migration_guide.md -o docs/html/migration_guide.html
+
+      - name: Syntax extensions
+        run: pandoc --toc -s --embed-resources --css=docs/html/style.css docs/syntax_extensions.md -o docs/html/syntax_extensions.html
 
       - name: Deploy
         uses: peaceiris/actions-gh-pages@v4
 
@@ -18,7 +18,7 @@
 ## Coding Style & Naming Conventions
 - OCaml formatting enforced by `.ocamlformat` (margin 100, parse/wrap docstrings). Run `dune fmt` before pushing.
 - Overall preference for snake_case (e.g. files `my_module.ml`); OCaml enforces capitalized modules and constructors (`My_module`, `My_variant`).
-- Prefer small, composable functions; avoid needless global state. PPX usage (`%op`, `%cd`) is described in `lib/syntax_extensions.md`.
+- Prefer small, composable functions; avoid needless global state. PPX usage (`%op`, `%cd`) is described in `docs/syntax_extensions.md`.
 
 ## Testing Guidelines
 - Frameworks: `ppx_expect` for inline `%expect` tests, and Dune `test` stanzas for tests with output targets in `.expected` files. Tests live under `test/<area>/*.ml` with paired `*.expected` where applicable.
 
@@ -179,7 +179,7 @@ opam install cudajit  # for CUDA backend
 **Key differences between %op and %cd**:
 - `%op` allows initialization expressions (`{ x = uniform () }`), used for model parameters
 - `%cd` is self-referential only (`{ x }`), used in computation graphs where tensors are defined by operations
-- See `lib/syntax_extensions.md` for comprehensive documentation
+- See `docs/syntax_extensions.md` for comprehensive documentation
 
 **Record syntax features**:
 - OCaml punning: `{ x }` expands to default initialization (uniform() for parameters in %op)
 
@@ -44,14 +44,14 @@ A possible route to learning OCANNL:
 
 1. Read [the introductory slides](https://ahrefs.github.io/ocannl/docs/basics_backprop_training_codegen.html).
 2. Get some basic grasp of the aims and design of the project by reading or skimming files in [test/](test/).
-3. Read the syntax extensions documentation [lib/syntax_extensions.md](lib/syntax_extensions.md).
-4. Read the introductory part of the shape inference documentation [lib/shape_inference.md](lib/shape_inference.md).
+3. Read the syntax extensions documentation [docs/syntax_extensions.md](docs/syntax_extensions.md).
+4. Read the introductory part of the shape inference documentation [docs/shape_inference.md](docs/shape_inference.md).
 5. Read the configuration documentation [ocannl_config.example](ocannl_config.example).
 6. Improve your understanding by reading or skimming: [lib/shape.mli](lib/shape.mli), [lib/tensor.mli](lib/tensor.mli), [lib/operation.ml](lib/operation.ml), [arrayjit/lib/backend_intf.ml](arrayjit/lib/backend_intf.ml), [lib/train.ml](lib/train.ml), and [lib/nn_blocks.ml](lib/nn_blocks.ml).
-7. Read [arrayjit/lib/anatomy_of_a_backend.md](arrayjit/lib/anatomy_of_a_backend.md).
+7. Read [docs/anatomy_of_a_backend.md](arrayjit/lib/anatomy_of_a_backend.md).
 8. Read the implementation overview:
-   1. Shape inference details [lib/shape_inference.md](lib/shape_inference.md).
-   2. Backend-independent optimizations [arrayjit/lib/lowering_and_inlining.md](arrayjit/lib/lowering_and_inlining.md) -- _lowering_ means translating (compiling) from the high-level representation (as assignments) to the low-level representation.
+   1. Shape inference details [docs/shape_inference.md](docs/shape_inference.md).
+   2. Backend-independent optimizations [docs/lowering_and_inlining.md](arrayjit/lib/lowering_and_inlining.md) -- _lowering_ means translating (compiling) from the high-level representation (as assignments) to the low-level representation.
    3. More documentation to come.
 
 ### Using the tracing debugger with CUDA computations
 
@@ -0,0 +1 @@
+Let's write a user-centered introduction to how shapes work in OCANNL. Let's put the slides in docs/slides-shapes_and_einsum.md , and write  them using slipshow navigation metadata as described in docs/CLAUDE.md . The slides should take a user from a beginner to advanced in making full use of shape inference and generalized einsum notation when  building neural network models. They should end up aware of how  projections work, how to lean on shape inference or row variables /  ellipsis notation to not commit to dimension sizes or for example the  number of batch axes unnecessarily. They should learn when to use the  dedicated einsum operators `++` and `+*` (these operators are  translated by syntax extensions to `einsum1` and `einsum`). They  should be able to use what they learned to construct a max pooling  layer operation, and any other challenges they encounter in NN  modeling.  Consider these sources of information: files docs/syntax_extensions.md , docs/shape_inference.md ,  docs/shape.mli , selected parts of lib/operation.ml , selected parts of docs/slides-basics_backprop_training_codegen.md .  Let me also provide some points that might not be stated sufficiently  explicitly in other documentation. (1) The split of axes into kinds  does not enforce semantics, because the generalized einsum notation  can make aribtrary use of the axes. However, it offers expressivity  gains: (a) outside of einsum spec, there is a shape logic  specification with syntax `~logic:"@"`, where all input axes of the  first tensor are reduced with all output axes of the second tensor,  generalizing matrix multiplication to tensor multiplication -- with  einsum spec, any two kinds of axes can be picked to reduce together,  but it would not be possible without having distinct kinds; (b) having multiple kinds, thus opportunity for multiple row variables per  tensor, allows more patterns of reorganizing and reducing axes, while  being agnostic to the total number of axes -- for example, one could  build code for a multihead-attention tranformer, that is agnostic  whether one uses one batch axis or two batch+microbatch axes, and  simultaneously is agnostic whether one uses one single-axis regular 1D attention or two-axes 2D axial attention, while handling the  head-number axis as needed. (2) It's important to stress the syntactic difference with NumPy: since we use `->` to separate input and output axes, it cannot mean separating the argument tensor(s) from the  result tensor -- thus `=>` is used to the left of the result tensor. (3) Remember to use kind separators where you intend to use the distinct axis kinds,  e.g use `|` after batch axes . (4) To trigger multichar mode there must be a comma in the spec, it can be a trailing comma e.g. "input->output, => output->input" . (5) A reminder that, as defined in lib/operation.ml , `*` stands for tensor multiplication and `*.` stands for pointwise  multiplication when working with tensor expressions (rather than  low-level assignments in the `%cd` syntax). (7) The user can define operations analogous to the  `einsum1` and `einsum` operation in lib/operation.ml , for example with the max operator as theaccumulation operator -- this is not so scary, operations can be easily added by users even if not inside lib/operation.ml .
@@ -200,6 +200,16 @@ <h1>🌐 OCANNL Directory</h1>
                 title: 'Introduction to Reinforcement Learning',
                 description: 'Introduction to Reinforcement Learning, the REINFORCE and GRPO algorithms'
             },
+            {
+                name: 'migration_guide.html',
+                title: 'Migration Guide',
+                description: 'Migration Guide: PyTorch/TensorFlow to OCANNL'
+            },
+            {
+                name: 'syntax_extensions.html',
+                title: 'Syntax Extensions',
+                description: 'Syntax Extensions: %op and %cd'
+            },
             {
                 name: '../dev/neural_nets_lib/Ocannl/index.html',
                 title: 'OCANNL Frontend API Documentation',
 
@@ -0,0 +1,5 @@
+body {
+  max-width: 1000px;
+  margin: 0 auto;
+  padding: 20px;
+}
@@ -22,19 +22,19 @@ This is why pooling needs a dummy constant kernel - to carry shape info between
 ## Common Operations Mapping
 
 | PyTorch/TensorFlow | OCANNL | Notes |
-|-------------------|---------|--------|
-| `x.view(-1, d)` or `x.reshape(-1, d)` | Not directly supported | Use manual dimension setting on constant tensor as workaround |
-| `x.flatten()` | Not supported | Future syntax might be: `"x,y => x&y"` |
+|-----|------|----|
+| `x.view(-1, d)` or `x.reshape(-1, d)` | Not supported yet | Use shape inference and let tensors have the shape they want |
+| `x.flatten()` | Not supported yet | Future syntax might be: `"x,y => x&y"` |
 | `nn.Conv2d(in_c, out_c, kernel_size=k)` | `conv2d ~kernel_size:k () x` | Channels inferred or use row vars |
 | `F.max_pool2d(x, kernel_size=k)` | `max_pool2d ~window_size:k () x` | Uses `(0.5 + 0.5)` trick internally |
 | `F.avg_pool2d(x, kernel_size=k)` | `avg_pool2d ~window_size:k () x` | Normalized by window size |
 | `nn.BatchNorm2d(channels)` | `batch_norm2d () ~train_step x` | Channels inferred |
 | `F.dropout(x, p=0.5)` | `dropout ~rate:0.5 () ~train_step x` | Needs train_step for PRNG |
 | `F.relu(x)` | `relu x` | Direct function application |
-| `F.softmax(x, dim=-1)` | `softmax ~spec:"... \| ... -> ... d" () x` | Specify axes explicitly |
+| `F.softmax(x, dim=-1)` | `softmax ~spec:"... | ... -> ... d" () x` | Specify axes explicitly |
 | `torch.matmul(a, b)` | `a * b` or `a +* "...; ... => ..." b` | Einsum for complex cases |
-| `x.mean(dim=[1,2])` | `x ++ "... \| h, w, c => ... \| 0, 0, c" ["h"; "w"] /. (dim h *. dim w)` | Sum then divide |
-| `x.sum(dim=-1)` | `x ++ "... \| ... d => ... \| 0"` | Reduce by summing |
+| `x.mean(dim=[1,2])` | `x ++ "... | h, w, c => ... | 0, 0, c" ["h"; "w"] /. (dim h *. dim w)` | Sum then divide |
+| `x.sum(dim=-1)` | `x ++ "... | ... d => ... | 0"` | Reduce by summing |
 
 ## Tensor Creation Patterns
 
@@ -138,16 +138,17 @@ OCANNL's einsum has two syntax modes:
 
 2. **Multi-character mode**:
    - Triggered by ANY comma in the spec
+   - Trailing commas ignored
    - Identifiers can be multi-character (e.g., `height`, `width`)
    - Must be separated by non-alphanumeric: `,` `|` `->` `;` `=>`
-   - Enables convolution syntax: `stride*out+kernel`
+   - Makes convolution syntax less confusing: `stride*out+kernel`
 
 | Operation | PyTorch einsum | OCANNL single-char | OCANNL multi-char |
-|-----------|---------------|-------------------|-------------------|
+|--------|------------------|-------------------|-------------------|
 | Matrix multiply | `torch.einsum('ij,jk->ik', a, b)` | `a +* "i j; j k => i k" b` | `a +* "i, j; j, k => i, k" b` |
 | Batch matmul | `torch.einsum('bij,bjk->bik', a, b)` | `a +* "b i j; b j k => b i k" b` | `a +* "batch, i -> j; batch, j -> k => batch, i -> k" b` |
-| Attention scores | `torch.einsum('bqhd,bkhd->bhqk', q, k)` | `q +* "b q \| h d; b k \| h d => b \| q k -> h" k` | `q +* "b, q \| h, d; b, k \| h, d => b \| q, k -> h" k` |
-| Convolution | N/A | N/A (needs multi-char) | `x +* "... \| stride*oh+kh, stride*ow+kw, ic; kh, kw, ic -> oc => ... \| oh, ow, oc" kernel` |
+| Attention scores | `torch.einsum('bqhd,bkhd->bhqk', q, k)` | `q +* "bq|hd; bk|hd => b|qk->h" k` | `q +* "b, q | h, d; b, k | h, d => b | q, k -> h" k` |
+| Convolution | N/A | better use multi-char | `x +* "... | stride*oh+kh, stride*ow+kw, ic; kh, kw, ic -> oc => ... | oh, ow, oc" kernel` |
 
 ### Row Variables
 - `...` context-dependent ellipsis: expands to `..batch..` in batch position, `..input..` before `->`, `..output..` after `->`
@@ -288,7 +289,7 @@ dropout ~rate:0.5 () ~train_step x
 
 ## Further Resources
 
-- [Shape Inference Documentation](../lib/shape.mli) - Detailed einsum notation spec
-- [Syntax Extensions Guide](../lib/syntax_extensions.md) - `%op` and `%cd` details  
-- [Neural Network Blocks](../lib/nn_blocks.ml) - Example implementations
+- [Shape Inference Documentation](../dev/neural_nets_lib/Ocannl/Shape/index.html) - Detailed einsum notation spec
+- [Syntax Extensions Guide](syntax_extensions.html) - `%op` and `%cd` details  
+- [Neural Network Blocks](https://github.com/ahrefs/ocannl/blob/master/lib/nn_blocks.ml) - Example implementations
 - [GitHub Discussions](https://github.com/ahrefs/ocannl/discussions) - Community Q&A
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	+Let's write a user-centered introduction to how shapes work in OCANNL. Let's put the slides in docs/slides-shapes_and_einsum.md , and write them using slipshow navigation metadata as described in docs/CLAUDE.md . The slides should take a user from a beginner to advanced in making full use of shape inference and generalized einsum notation when building neural network models. They should end up aware of how projections work, how to lean on shape inference or row variables / ellipsis notation to not commit to dimension sizes or for example the number of batch axes unnecessarily. They should learn when to use the dedicated einsum operators `++` and `+` (these operators are translated by syntax extensions to `einsum1` and `einsum`). They should be able to use what they learned to construct a max pooling layer operation, and any other challenges they encounter in NN modeling. Consider these sources of information: files docs/syntax_extensions.md , docs/shape_inference.md , docs/shape.mli , selected parts of lib/operation.ml , selected parts of docs/slides-basics_backprop_training_codegen.md . Let me also provide some points that might not be stated sufficiently explicitly in other documentation. (1) The split of axes into kinds does not enforce semantics, because the generalized einsum notation can make aribtrary use of the axes. However, it offers expressivity gains: (a) outside of einsum spec, there is a shape logic specification with syntax `~logic:"@"`, where all input axes of the first tensor are reduced with all output axes of the second tensor, generalizing matrix multiplication to tensor multiplication -- with einsum spec, any two kinds of axes can be picked to reduce together, but it would not be possible without having distinct kinds; (b) having multiple kinds, thus opportunity for multiple row variables per tensor, allows more patterns of reorganizing and reducing axes, while being agnostic to the total number of axes -- for example, one could build code for a multihead-attention tranformer, that is agnostic whether one uses one batch axis or two batch+microbatch axes, and simultaneously is agnostic whether one uses one single-axis regular 1D attention or two-axes 2D axial attention, while handling the head-number axis as needed. (2) It's important to stress the syntactic difference with NumPy: since we use `->` to separate input and output axes, it cannot mean separating the argument tensor(s) from the result tensor -- thus `=>` is used to the left of the result tensor. (3) Remember to use kind separators where you intend to use the distinct axis kinds, e.g use `\|` after batch axes . (4) To trigger multichar mode there must be a comma in the spec, it can be a trailing comma e.g. "input->output, => output->input" . (5) A reminder that, as defined in lib/operation.ml , `` stands for tensor multiplication and `*.` stands for pointwise multiplication when working with tensor expressions (rather than low-level assignments in the `%cd` syntax). (7) The user can define operations analogous to the `einsum1` and `einsum` operation in lib/operation.ml , for example with the max operator as theaccumulation operator -- this is not so scary, operations can be easily added by users even if not inside lib/operation.ml .