Shape inference doc small update: monomorphism, new type defs

lukstafi · lukstafi · commit 97eb7764fc73 · 2025-07-06T22:43:58.000+02:00
diff --git a/lib/shape_inference.md b/lib/shape_inference.md
@@ -6,11 +6,13 @@ Shape inference broadly speaking consists in OCANNL of inferring the `Shape.t` r
 
 The bulk of the projections inference happens alongside shape inference, with the projections-relevant information stored in auxiliary fields -- this prevents subtle bugs where projection semantics deviates from shape semantics, and will simplify adding new shape/projection inference features. Shape inference happens during `propagate_shapes` calls, and then again in a `finish_inference` call, which is triggered whenever the dimensions or projections are required (i.e. typically by jitting). Finally, the projections are reconstructed in `derive_projections`. It would seem `derive_projections` could reuse the already-computed solutions constraints. But we face a problem: we must prevent contaminating projections across different operations. To illustrate: we conclude the dimensions of two axes are the same because they are reduced together in another operation -- this should not force the axes to share a projection in the processed operation. To prevent the contamination, in each `derive_projections` call, we freshen the projection ids in the (inferred) shapes, and regenerate and re-solve the constraints with the fresh projection ids.
 
+The shape system in OCANNL is currently monomorphic: both row and dimension variables are interpreted existentially. It can in principle be made polymorphic: by abstracting over the remaining fresh variables when forming a tensor-producing function, and by replacing universally bound variables by fresh variables when applying such functions. However, this is non-trivial and would depend on introducing namespaces for tensor nodes. Then, we could perform "abstract interpretation" (aka. tracing like e.g. in JAX) by computing an OCaml function under an abstract tensor node namespace. Applying the function would not execute the OCaml code again, but instead would copy the tensors generated by "abstract interpretation"-stage execution with appropriately freshened shape variables into a concrete tensor node namespace. There is a natural context for introducing such abstraction: the special `~config` labeled functions as processed by the `%op` syntax extension -- see [syntax extensions](./syntax_extensions.md). Exploring this is left as potential future work (no earlier than OCANNL v2).
+
 ## Representing shapes and constraints
 
-A tensor shape in OCANNL is composed of three rows of axes: batch, input and output. These are ordered input-last (`batch @ output @ input`) in the underlying n-dimensional array implementation of tensors. A (fully inferred) tensor shape must have non-empty output axes; we do not use the convention where empty axes mean the tensor is a scalar -- scalars = 1-D output-only tensors. For printing and einsum-notation-like specifications, we use the syntax: `batch|input->output` (or `input->output`, `batch|output`, `output`), where `batch`, `input`, `output` are whitespace or comma or parenthesis separated axis entries; or the axis entries are the individual characters, if no separators are used (except if it's digits only).
+A tensor shape in OCANNL is composed of three rows of axes: batch, input and output. These are ordered input-last (`batch @ output @ input`) in the underlying n-dimensional array implementation of tensors (at least when hosted, as backends can reorder axes via a stride mechanism NOTE: NOT IMPLEMENTED YET). A (fully inferred) tensor shape must have non-empty output axes; we do not use the convention where empty axes mean the tensor is a scalar -- scalars = 1-D output-only tensors. For printing and einsum-notation-like specifications, we use the syntax: `batch|input->output` (or `input->output`, `batch|output`, `output`), where `batch`, `input`, `output` are whitespace or comma or parenthesis separated axis entries; or the axis entries are the individual characters, if no separators are used (except if it's digits only).
 
-A row is a sequence of axes of a single kind: batch, input, or output. The shape type incorporates information relevant to inference, in particular shape variables: both for individual axes (`dim` variables), and for extending a row with more axes (`row` variables). Currently, all rows are (independently) broadcastable: can be broadcasted to a larger number of axes. However, in OCANNL the broadcasting can happen "in the middle", with not only the given trailing axes fixed, but also with the given leading axes fixed.
+A row is a sequence of axes of a single kind: batch, input, or output. The shape type incorporates information relevant to inference, in particular shape variables: both for individual axes (`dim` variables), and for extending a row with more axes (`row` variables). Currently, all rows are (independently) broadcastable: can be broadcasted to a larger number of axes. However, in OCANNL the broadcasting can happen "in the middle", with not only the given trailing axes fixed, but also with the given leading axes fixed. (TODO: clarify here the precise logic as it is implemented, I'm not sure this description is correct.)
 
 ```ocaml
 type solved_dim = { d : int; label : string option; proj_id : proj_id option }
@@ -93,40 +95,40 @@ The entry point to shape inference is the shape logic specification, that each o
 type deduce_within_shape = Not_constrained | Input_equals_output
 
 type compose_type =
-  | Pointwise_bin  (** NumPy-style broadcast matching batch, input and output axes, e.g. as in [s1 + s2]. *)
+  | Pointwise_bin
+      (** NumPy-style broadcast matching batch, input and output axes, e.g. as in [s1 + s2]. *)
   | Compose
-      (** Compose the outputs of the second shape with the inputs of the first shape, i.e. the shape of
-      [fun x -> s1(s2(x))], or [s1 * s2] where [*] is the inner product (e.g. matrix multiply). *)
+      (** Compose the outputs of the second shape with the inputs of the first shape, i.e. the shape
+          of [fun x -> s1(s2(x))], or [s1 * s2] where [*] is the inner product (e.g. matrix
+          multiply). *)
   | Einsum of string
-      (** The [einsum] syntax: LABELS1;LABELS2=>LABELS3, where LABELSi are labels specifications.
-      Since OCANNL's extended einsum notation supports both axis variables and row variables, it makes
-      other compose types redundant.
-      The [axis_labels] use pseudo-labels local to the notation, to line up the axes.
-      For [Einsum (ls1^";"^ls2^"=>"^ls3)], the symmetric difference / disjunctive union of [ls1] and [ls2]'s
-      pseudo-labels should be equal to [ls3] pseudo-labels.
+      (** The binary "einsum" syntax: RHS1;RHS2=>LHS, where RHSi, LHS are labels specifications.
+          Since OCANNL's extended einsum notation supports both axis variables and row variables, it
+          makes other compose types redundant. The [axis_labels] use pseudo-labels local to the
+          notation, to line up the axes and row variables. The symmetric difference / disjunctive
+          union of RHS1 and RHS2's pseudo-labels should be equal to LHS pseudo-labels.
 
-      Note: The "right-hand-side" is on the left! I.e. the syntax is "rhs=>lhs", "rhs1;rhs2=>lhs". *)
+          Note: The "right-hand-side" is on the left! I.e. the syntax is "rhs=>lhs",
+          "rhs1;rhs2=>lhs". *)
 
 type transpose_type =
   | Transpose  (** Swaps inputs and outputs of a shape, preserves batch axes. *)
   | Pointwise_un  (** Preserves the shape. *)
-  | Permute of string
-      (** [Permute (ls1^"=>"^ls2)] is a variant of the [einsum] syntax [Einsum (ls1^";"^ls1^"=>"^ls2)].
-      Note: The "right-hand-side" is on the left! I.e. the syntax is "rhs=>lhs", "rhs1;rhs2=>lhs". *)
+  | Permute of string  (** The unary "einsum" syntax: RHS1=>LHS. *)
   | Batch_slice of Ir.Indexing.static_symbol  (** Removes the leftmost batch axis. *)
-
-type logic =
-  | Broadcast of compose_type * shape * shape
-      (** Matches the shapes for a binary operation.
-
-      For [Broadcast (Einsum (ls1, ls2, ls3), s1, s2)], the labels of [s1] and [s2] must match according
-      to the [ls1], [ls2] lineup, and the resulting shape inherits the labels according to the [ls3] lineup.
-  *)
-  | Transpose of transpose_type * shape
-      (** Permutes the axes of a shape. One case of [Transpose] is to swap inputs with outputs of [s1],
-      hence the name. *)
-  | Terminal of [ `Data of Ir.Assignments.init_data | `Fetch of Ir.Assignments.fetch_op ]
-      (** Extracts any available shape information from the initialization, e.g. the number of elements. *)
+  | Uint4x32_to_prec of Ir.Ops.prec Lazy.t
+      (** Converts precision in a bit-effient way, with a corresponding conversion in total number
+          of elements. Currently, assumes the incoming tensor (RHS) has just a single axis to not
+          force unnecessary minimum sizes on output axes. *)
+
+(** If you miss expressivity here, leave a note on
+    {{:https://github.com/ahrefs/ocannl/issues/305}issue 305}. *)
+type ternary_type =
+  | Pointwise_tern  (** As in the operation [Where]. *)
+  | Compose_accumulate  (** As in the operation [FMA]. *)
+
+(** Extracts any available shape information from the initialization or fetch. *)
+type terminal_type = Data of Ir.Assignments.init_data | Fetch of Ir.Assignments.fetch_op
 ```
 
 ### Non-tensor-like constraints
@@ -136,10 +138,13 @@ The above mechanisms (excluding `dim_constraint` and `row_constraint`) are suffi
 ```ocaml
 type dim_constraint = Unconstrained_dim | At_least_dim of int
 
+type total_elems = Num_elems of int | Delayed of { coeff : int Lazy.t; var : dim_var }
+
 type row_constraint =
   | Unconstrained
-  | Total_elems of { nominator : int; divided_by : dim_var list }
-      (** The row or remainder of a row, inclusive of the further row spec, has this many elements. *)
+  | Total_elems of { nominator : total_elems; divided_by : dim_var_set }
+      (** The rows, inclusive of the further row spec, have this many elements. *)
+  | Exact of dim list  (** The concatenated rows have these axes. *)
 ```
 
 During the solution process, the constraints are incorporated, or propagated, into the environment `constr` entry fields, and into further `constraint_` constraints, as needed. This provides sufficient scaffolding to implement the other complex constraints as the need arises.