Skip to content

Commit 6505db0

Browse files
committed
Move Markdown files under docs/, add pandoc rendering for non-presentation docs, currently migration_guide.md and syntax_extensions.md
1 parent 062ae64 commit 6505db0

15 files changed

+857
-20
lines changed

.github/workflows/gh-pages-docs.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,17 @@ jobs:
3434

3535
- name: RL slides
3636
run: opam exec -- slipshow compile docs/slides-RL-REINFORCE.md -o docs/html/RL-REINFORCE.html
37+
38+
- name: Setup pandoc
39+
uses: pandoc/actions/setup@main
40+
with:
41+
version: 'latest'
42+
43+
- name: Migration guide
44+
run: pandoc --toc -s --embed-resources --css=docs/html/style.css docs/migration_guide.md -o docs/html/migration_guide.html
45+
46+
- name: Syntax extensions
47+
run: pandoc --toc -s --embed-resources --css=docs/html/style.css docs/syntax_extensions.md -o docs/html/syntax_extensions.html
3748

3849
- name: Deploy
3950
uses: peaceiris/actions-gh-pages@v4

AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
## Coding Style & Naming Conventions
1919
- OCaml formatting enforced by `.ocamlformat` (margin 100, parse/wrap docstrings). Run `dune fmt` before pushing.
2020
- Overall preference for snake_case (e.g. files `my_module.ml`); OCaml enforces capitalized modules and constructors (`My_module`, `My_variant`).
21-
- Prefer small, composable functions; avoid needless global state. PPX usage (`%op`, `%cd`) is described in `lib/syntax_extensions.md`.
21+
- Prefer small, composable functions; avoid needless global state. PPX usage (`%op`, `%cd`) is described in `docs/syntax_extensions.md`.
2222

2323
## Testing Guidelines
2424
- Frameworks: `ppx_expect` for inline `%expect` tests, and Dune `test` stanzas for tests with output targets in `.expected` files. Tests live under `test/<area>/*.ml` with paired `*.expected` where applicable.

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ opam install cudajit # for CUDA backend
179179
**Key differences between %op and %cd**:
180180
- `%op` allows initialization expressions (`{ x = uniform () }`), used for model parameters
181181
- `%cd` is self-referential only (`{ x }`), used in computation graphs where tensors are defined by operations
182-
- See `lib/syntax_extensions.md` for comprehensive documentation
182+
- See `docs/syntax_extensions.md` for comprehensive documentation
183183

184184
**Record syntax features**:
185185
- OCaml punning: `{ x }` expands to default initialization (uniform() for parameters in %op)

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -44,14 +44,14 @@ A possible route to learning OCANNL:
4444

4545
1. Read [the introductory slides](https://ahrefs.github.io/ocannl/docs/basics_backprop_training_codegen.html).
4646
2. Get some basic grasp of the aims and design of the project by reading or skimming files in [test/](test/).
47-
3. Read the syntax extensions documentation [lib/syntax_extensions.md](lib/syntax_extensions.md).
48-
4. Read the introductory part of the shape inference documentation [lib/shape_inference.md](lib/shape_inference.md).
47+
3. Read the syntax extensions documentation [docs/syntax_extensions.md](docs/syntax_extensions.md).
48+
4. Read the introductory part of the shape inference documentation [docs/shape_inference.md](docs/shape_inference.md).
4949
5. Read the configuration documentation [ocannl_config.example](ocannl_config.example).
5050
6. Improve your understanding by reading or skimming: [lib/shape.mli](lib/shape.mli), [lib/tensor.mli](lib/tensor.mli), [lib/operation.ml](lib/operation.ml), [arrayjit/lib/backend_intf.ml](arrayjit/lib/backend_intf.ml), [lib/train.ml](lib/train.ml), and [lib/nn_blocks.ml](lib/nn_blocks.ml).
51-
7. Read [arrayjit/lib/anatomy_of_a_backend.md](arrayjit/lib/anatomy_of_a_backend.md).
51+
7. Read [docs/anatomy_of_a_backend.md](arrayjit/lib/anatomy_of_a_backend.md).
5252
8. Read the implementation overview:
53-
1. Shape inference details [lib/shape_inference.md](lib/shape_inference.md).
54-
2. Backend-independent optimizations [arrayjit/lib/lowering_and_inlining.md](arrayjit/lib/lowering_and_inlining.md) -- _lowering_ means translating (compiling) from the high-level representation (as assignments) to the low-level representation.
53+
1. Shape inference details [docs/shape_inference.md](docs/shape_inference.md).
54+
2. Backend-independent optimizations [docs/lowering_and_inlining.md](arrayjit/lib/lowering_and_inlining.md) -- _lowering_ means translating (compiling) from the high-level representation (as assignments) to the low-level representation.
5555
3. More documentation to come.
5656

5757
### Using the tracing debugger with CUDA computations
File renamed without changes.

docs/einsum-slides-request.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Let's write a user-centered introduction to how shapes work in OCANNL. Let's put the slides in docs/slides-shapes_and_einsum.md , and write them using slipshow navigation metadata as described in docs/CLAUDE.md . The slides should take a user from a beginner to advanced in making full use of shape inference and generalized einsum notation when building neural network models. They should end up aware of how projections work, how to lean on shape inference or row variables / ellipsis notation to not commit to dimension sizes or for example the number of batch axes unnecessarily. They should learn when to use the dedicated einsum operators `++` and `+*` (these operators are translated by syntax extensions to `einsum1` and `einsum`). They should be able to use what they learned to construct a max pooling layer operation, and any other challenges they encounter in NN modeling. Consider these sources of information: files docs/syntax_extensions.md , docs/shape_inference.md , docs/shape.mli , selected parts of lib/operation.ml , selected parts of docs/slides-basics_backprop_training_codegen.md . Let me also provide some points that might not be stated sufficiently explicitly in other documentation. (1) The split of axes into kinds does not enforce semantics, because the generalized einsum notation can make aribtrary use of the axes. However, it offers expressivity gains: (a) outside of einsum spec, there is a shape logic specification with syntax `~logic:"@"`, where all input axes of the first tensor are reduced with all output axes of the second tensor, generalizing matrix multiplication to tensor multiplication -- with einsum spec, any two kinds of axes can be picked to reduce together, but it would not be possible without having distinct kinds; (b) having multiple kinds, thus opportunity for multiple row variables per tensor, allows more patterns of reorganizing and reducing axes, while being agnostic to the total number of axes -- for example, one could build code for a multihead-attention tranformer, that is agnostic whether one uses one batch axis or two batch+microbatch axes, and simultaneously is agnostic whether one uses one single-axis regular 1D attention or two-axes 2D axial attention, while handling the head-number axis as needed. (2) It's important to stress the syntactic difference with NumPy: since we use `->` to separate input and output axes, it cannot mean separating the argument tensor(s) from the result tensor -- thus `=>` is used to the left of the result tensor. (3) Remember to use kind separators where you intend to use the distinct axis kinds, e.g use `|` after batch axes . (4) To trigger multichar mode there must be a comma in the spec, it can be a trailing comma e.g. "input->output, => output->input" . (5) A reminder that, as defined in lib/operation.ml , `*` stands for tensor multiplication and `*.` stands for pointwise multiplication when working with tensor expressions (rather than low-level assignments in the `%cd` syntax). (7) The user can define operations analogous to the `einsum1` and `einsum` operation in lib/operation.ml , for example with the max operator as theaccumulation operator -- this is not so scary, operations can be easily added by users even if not inside lib/operation.ml .

docs/html/index.html

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,16 @@ <h1>🌐 OCANNL Directory</h1>
200200
title: 'Introduction to Reinforcement Learning',
201201
description: 'Introduction to Reinforcement Learning, the REINFORCE and GRPO algorithms'
202202
},
203+
{
204+
name: 'migration_guide.html',
205+
title: 'Migration Guide',
206+
description: 'Migration Guide: PyTorch/TensorFlow to OCANNL'
207+
},
208+
{
209+
name: 'syntax_extensions.html',
210+
title: 'Syntax Extensions',
211+
description: 'Syntax Extensions: %op and %cd'
212+
},
203213
{
204214
name: '../dev/neural_nets_lib/Ocannl/index.html',
205215
title: 'OCANNL Frontend API Documentation',

docs/html/style.css

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
body {
2+
max-width: 1000px;
3+
margin: 0 auto;
4+
padding: 20px;
5+
}
File renamed without changes.

docs/migration_guide.md

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -22,19 +22,19 @@ This is why pooling needs a dummy constant kernel - to carry shape info between
2222
## Common Operations Mapping
2323

2424
| PyTorch/TensorFlow | OCANNL | Notes |
25-
|-------------------|---------|--------|
26-
| `x.view(-1, d)` or `x.reshape(-1, d)` | Not directly supported | Use manual dimension setting on constant tensor as workaround |
27-
| `x.flatten()` | Not supported | Future syntax might be: `"x,y => x&y"` |
25+
|-----|------|----|
26+
| `x.view(-1, d)` or `x.reshape(-1, d)` | Not supported yet | Use shape inference and let tensors have the shape they want |
27+
| `x.flatten()` | Not supported yet | Future syntax might be: `"x,y => x&y"` |
2828
| `nn.Conv2d(in_c, out_c, kernel_size=k)` | `conv2d ~kernel_size:k () x` | Channels inferred or use row vars |
2929
| `F.max_pool2d(x, kernel_size=k)` | `max_pool2d ~window_size:k () x` | Uses `(0.5 + 0.5)` trick internally |
3030
| `F.avg_pool2d(x, kernel_size=k)` | `avg_pool2d ~window_size:k () x` | Normalized by window size |
3131
| `nn.BatchNorm2d(channels)` | `batch_norm2d () ~train_step x` | Channels inferred |
3232
| `F.dropout(x, p=0.5)` | `dropout ~rate:0.5 () ~train_step x` | Needs train_step for PRNG |
3333
| `F.relu(x)` | `relu x` | Direct function application |
34-
| `F.softmax(x, dim=-1)` | `softmax ~spec:"... \| ... -> ... d" () x` | Specify axes explicitly |
34+
| `F.softmax(x, dim=-1)` | `softmax ~spec:"... | ... -> ... d" () x` | Specify axes explicitly |
3535
| `torch.matmul(a, b)` | `a * b` or `a +* "...; ... => ..." b` | Einsum for complex cases |
36-
| `x.mean(dim=[1,2])` | `x ++ "... \| h, w, c => ... \| 0, 0, c" ["h"; "w"] /. (dim h *. dim w)` | Sum then divide |
37-
| `x.sum(dim=-1)` | `x ++ "... \| ... d => ... \| 0"` | Reduce by summing |
36+
| `x.mean(dim=[1,2])` | `x ++ "... | h, w, c => ... | 0, 0, c" ["h"; "w"] /. (dim h *. dim w)` | Sum then divide |
37+
| `x.sum(dim=-1)` | `x ++ "... | ... d => ... | 0"` | Reduce by summing |
3838

3939
## Tensor Creation Patterns
4040

@@ -138,16 +138,17 @@ OCANNL's einsum has two syntax modes:
138138

139139
2. **Multi-character mode**:
140140
- Triggered by ANY comma in the spec
141+
- Trailing commas ignored
141142
- Identifiers can be multi-character (e.g., `height`, `width`)
142143
- Must be separated by non-alphanumeric: `,` `|` `->` `;` `=>`
143-
- Enables convolution syntax: `stride*out+kernel`
144+
- Makes convolution syntax less confusing: `stride*out+kernel`
144145

145146
| Operation | PyTorch einsum | OCANNL single-char | OCANNL multi-char |
146-
|-----------|---------------|-------------------|-------------------|
147+
|--------|------------------|-------------------|-------------------|
147148
| Matrix multiply | `torch.einsum('ij,jk->ik', a, b)` | `a +* "i j; j k => i k" b` | `a +* "i, j; j, k => i, k" b` |
148149
| Batch matmul | `torch.einsum('bij,bjk->bik', a, b)` | `a +* "b i j; b j k => b i k" b` | `a +* "batch, i -> j; batch, j -> k => batch, i -> k" b` |
149-
| Attention scores | `torch.einsum('bqhd,bkhd->bhqk', q, k)` | `q +* "b q \| h d; b k \| h d => b \| q k -> h" k` | `q +* "b, q \| h, d; b, k \| h, d => b \| q, k -> h" k` |
150-
| Convolution | N/A | N/A (needs multi-char) | `x +* "... \| stride*oh+kh, stride*ow+kw, ic; kh, kw, ic -> oc => ... \| oh, ow, oc" kernel` |
150+
| Attention scores | `torch.einsum('bqhd,bkhd->bhqk', q, k)` | `q +* "bq|hd; bk|hd => b|qk->h" k` | `q +* "b, q | h, d; b, k | h, d => b | q, k -> h" k` |
151+
| Convolution | N/A | better use multi-char | `x +* "... | stride*oh+kh, stride*ow+kw, ic; kh, kw, ic -> oc => ... | oh, ow, oc" kernel` |
151152

152153
### Row Variables
153154
- `...` context-dependent ellipsis: expands to `..batch..` in batch position, `..input..` before `->`, `..output..` after `->`
@@ -288,7 +289,7 @@ dropout ~rate:0.5 () ~train_step x
288289

289290
## Further Resources
290291

291-
- [Shape Inference Documentation](../lib/shape.mli) - Detailed einsum notation spec
292-
- [Syntax Extensions Guide](../lib/syntax_extensions.md) - `%op` and `%cd` details
293-
- [Neural Network Blocks](../lib/nn_blocks.ml) - Example implementations
292+
- [Shape Inference Documentation](../dev/neural_nets_lib/Ocannl/Shape/index.html) - Detailed einsum notation spec
293+
- [Syntax Extensions Guide](syntax_extensions.html) - `%op` and `%cd` details
294+
- [Neural Network Blocks](https://github.com/ahrefs/ocannl/blob/master/lib/nn_blocks.ml) - Example implementations
294295
- [GitHub Discussions](https://github.com/ahrefs/ocannl/discussions) - Community Q&A

0 commit comments

Comments
 (0)