slides-shapes_and_einsum.md, collab with Claude Opus

lukstafi · lukstafi · commit b93368267725 · 2025-09-07T16:59:52.000+02:00
This was driven by Claude, but I corrected things and wrote examples 1 and 2 in `Advanced: Building Custom Operations`, after some back-and-forth Claude wrote example 3.

Prompt:
Let's write a user-centered introduction to how shapes work in OCANNL. Let's put the slides in docs/slides-shapes_and_einsum.md , and write  them using slipshow navigation metadata as described in docs/CLAUDE.md . The slides should take a user from a beginner to advanced in making full use of shape inference and generalized einsum notation when building neural network models. They should end up aware of how projections work, how to lean on shape inference or row variables / ellipsis notation to not commit to dimension sizes or for example the  number of batch axes unnecessarily. They should learn when to use the  dedicated einsum operators `++`, `+*`, `@^^`, `@^+` (these operators are  translated by syntax extensions to `einsum1`, `einsum`, `einmax1`, `tropical`). They should be able to use what they learned to construct a new model not yet in lib/nn_blocks.ml . Consider these sources of information: files docs/syntax_extensions.md , docs/shape_inference.md , lib/nn_blocks.ml ,  lib/shape.mli , selected parts of lib/operation.ml , selected parts of docs/slides-basics_backprop_training_codegen.md . Let me also provide some points that might not be stated sufficiently  explicitly in other documentation. (1) The split of axes into kinds does not enforce semantics, because the generalized einsum notation can make aribtrary use of the axes. However, it offers expressivity gains. (2) It's important to stress the syntactic difference with NumPy: since we use `-&gt;` to separate input and output axes, it cannot mean separating the argument tensor(s) from the  result tensor -- thus `=&gt;` is used to the left of the result tensor; similarly, we use `;` to separate arguments and (in multi-char mode) `,` to separate identifiers within a row. (3) Remember to use kind separators where you intend to use the distinct axis kinds, e.g use `|` after batch axes . (4) To trigger multichar mode there must be a comma in the spec, it can be a trailing comma e.g. "input-&gt;output, =&gt; output-&gt;input" . (5) A reminder that, as defined in lib/operation.ml , `*` stands for tensor multiplication and `*.` stands for pointwise multiplication when working with tensor expressions (rather than low-level assignments in the `%cd` syntax). (7) The user can define operations analogous to the `einsum1` and `einsum` operation in lib/operation.ml , for example with the max operator as theaccumulation operator -- this is not so scary, operations can be easily added by users even if not inside lib/operation.ml .

Summary 1 by Claude:
I've created comprehensive slides on shapes and einsum notation in
  OCANNL. The slides progress from beginner to advanced concepts,
  covering:

  Structure:
  1. Introduction - Why shape inference matters
  2. Basics - NumPy differences, axis kinds
  3. Row variables - Flexible broadcasting with ... and ..var..
  4. Multi-character mode - When to use commas
  5. Einsum operators - ++, +*, @^^, @^+
  6. Convolutions - Using the + syntax
  7. Dimension capture - Extracting sizes for computation
  8. Custom operations - Building beyond built-ins
  9. Practical patterns - Real-world examples
  10. Shape inference - How it works under the hood
  11. Projections - Loop generation details
  12. Tips and exercise - Best practices with hands-on example

  Key features of the slides:
  - Uses slipshow navigation metadata (pauses, up/down, IDs)
  - Progresses from simple to complex concepts
  - Includes practical examples from nn_blocks.ml
  - Emphasizes the "principle of least commitment"
  - Covers all the points you mentioned (axis kinds, row variables,
  syntactic differences, etc.)
  - Ends with an exercise building custom attention

  The slides teach users to leverage OCANNL's powerful shape inference
   rather than over-specifying dimensions, and to use row variables
  for maximum flexibility when building neural network models.

Prompt 2:
We also have the documentation docs/migration_guide.md . Review your
  work in light of that file. Note that a user is likely to read both
  files, although I am not sure which order to recommend -- I'm leaning
  to recommend the migration guide after the slides we currently work
  on. Note that you made some mistakes that I corrected. Apply any
  updated you deem are still helpful.

Summary 2 by Claude:
Corrections applied:
  1. Fixed the ellipsis examples to properly show axis kinds (batch |
  input -&gt; output)
  2. Corrected *. for pointwise multiplication (not * which is tensor
  multiplication)
  3. Added clarification for single-char vs multi-char row variable
  syntax (..v.. vs ..var..)
  4. Fixed the matrix multiplication examples to properly show axis
  kinds

  Enhancements from migration guide:
  1. Added a "Common Pitfalls" section highlighting operator
  differences
  2. Included the 0.5 + 0.5 trick for creating shape-inferred
  constants
  3. Added connection to migration guide in the "Next Steps"
  4. Included "The OCANNL Way" philosophy section emphasizing the
  difference from PyTorch/TF

  Recommendation for reading order:
  I suggest users read these slides first, then the migration guide.
  The slides provide conceptual understanding of shapes and einsum
  from first principles, while the migration guide offers practical
  translation patterns for those coming from other frameworks. The
  slides build intuition that makes the migration guide's examples
  clearer.

  The slides now accurately represent OCANNL's shape system with:
  - Proper understanding of the three axis kinds
  - Clear examples of single-char vs multi-char einsum modes
  - Practical patterns from actual usage in nn_blocks.ml
  - Warnings about common mistakes
  - Connection to further resources

Summary 3 by Claude:
I've replaced the problematic "Custom Max-Pool" example
  with the "Reduce Last N Dimensions" example. This new example:

  1. Shows genuine programmatic spec building - you can't write a
  literal spec when n is a runtime parameter
  2. Has a real use case - reducing an arbitrary number of trailing
  dimensions is useful for various pooling and aggregation operations
  3. Is easy to understand - the example comment shows exactly what
  spec gets generated for n=3
  4. Uses single-char mode appropriately - generates sequential
  letters (a, b, c, ...) for the dimension variables

  The example clearly demonstrates when and why you'd need to build
  specs programmatically rather than using literal strings!

Summary 4:
I've updated the "Principle of least commitment" section to
   accurately reflect how OCANNL handles different axis kinds:

  1. Made it clear that "...|...-&gt;..." is needed for truly arbitrary
  shapes (all three axis kinds)
  2. Added context for when simpler patterns work:
    - "...-&gt;..." for parameters (which typically don't have batch
  axes)
    - "...|..." for data tensors (which typically don't have input
  axes)
  3. Fixed the binary op pattern to show the full specification needed
   for complete generality
  4. Clarified that "... | ..d.. =&gt; ... | 0" specifically reduces
  output axes

  This now accurately represents OCANNL's requirement that missing
  axis kinds are assumed absent, not arbitrary, which is an important
  distinction for users to understand when writing maximally flexible
  code.
diff --git a/.github/workflows/gh-pages-docs.yml b/.github/workflows/gh-pages-docs.yml
@@ -38,6 +38,9 @@ jobs:
       - name: Syntax extensions
         run: pandoc --toc -s --embed-resources --css=docs/html/style.css docs/syntax_extensions.md -o docs/html/syntax_extensions.html
 
+      - name: Shapes and Einsum
+        run: pandoc --toc -s --embed-resources --css=docs/html/style.css docs/slides-shapes_and_einsum.md -o docs/html/shapes_and_einsum.html
+
       - name: Deploy
         uses: peaceiris/actions-gh-pages@v4
         with:
diff --git a/docs/html/index.html b/docs/html/index.html
@@ -196,15 +196,20 @@ <h1>🌐 OCANNL Directory</h1>
                 description: 'Introduction to OCANNL: Supervised Learning, backpropagation, training loop, code generation'
             },
             {
-                name: 'RL-REINFORCE.html',
-                title: 'Introduction to Reinforcement Learning',
-                description: 'Introduction to Reinforcement Learning, the REINFORCE and GRPO algorithms'
+                name: 'shapes_and_einsum.html',
+                title: 'Shapes and Einsum in OCANNL: From Basics to Advanced',
+                description: 'Shapes and the generalized einsum notation in OCANNL: with explanations and examples'
             },
             {
                 name: 'migration_guide.html',
                 title: 'Migration Guide',
                 description: 'Migration Guide: PyTorch/TensorFlow to OCANNL'
             },
+            {
+                name: 'RL-REINFORCE.html',
+                title: 'Introduction to Reinforcement Learning',
+                description: 'Introduction to Reinforcement Learning, the REINFORCE and GRPO algorithms'
+            },
             {
                 name: 'syntax_extensions.html',
                 title: 'Syntax Extensions',
diff --git a/docs/slides-shapes_and_einsum.md b/docs/slides-shapes_and_einsum.md