Doc: clarify that einsum operations use equations, not inequalities

lukstafi · claude · lukstafi · commit 64f1ec171862 · 2025-11-18T22:18:25.000+01:00
Einsum operations (both binary Einsum and unary Permute) generate Row_eq and Dim_eq constraints, not Row_ineq and Dim_ineq. This means they do NOT permit broadcasting, unlike Pointwise_bin, Pointwise_un, and Compose operations which use inequalities. Updated docs/shape_inference.md and tensor/shape.mli to: - Remove claim that einsum "makes other compose types redundant" - Clarify einsum is more restrictive (no broadcasting) but more precise - Update get_inequalities description to reflect equations for einsum 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
diff --git a/docs/shape_inference.md b/docs/shape_inference.md
@@ -143,10 +143,14 @@ type compose_type =
           multiply). *)
   | Einsum of string * Ir.Indexing.variable_ref list
       (** The binary "einsum" syntax: RHS1;RHS2=>LHS, where RHSi, LHS are labels specifications.
-          Since OCANNL's extended einsum notation supports both axis variables and row variables, it
-          makes other compose types redundant. The [axis_labels] use pseudo-labels local to the
-          notation, to line up the axes and row variables. The symmetric difference / disjunctive
-          union of RHS1 and RHS2's pseudo-labels should be equal to LHS pseudo-labels.
+          OCANNL's extended einsum notation supports both axis variables and row variables.
+          The [axis_labels] use pseudo-labels local to the notation, to line up the axes and row
+          variables. The symmetric difference / disjunctive union of RHS1 and RHS2's pseudo-labels
+          should be equal to LHS pseudo-labels.
+
+          Unlike [Pointwise_bin] and [Compose], einsum operations use equations only (not
+          inequalities), so they do NOT permit broadcasting. This makes einsum more restrictive
+          but also more precise for operations where exact shape matching is required.
 
           The optional {!Ir.Indexing.variable_ref}s will capture the solutions of the dimensions
           corresponding to the specification labels equal to [ref_label] of a reference.
@@ -161,6 +165,10 @@ type transpose_type =
   | Permute of string * Ir.Indexing.variable_ref list
       (** The unary "einsum" syntax: RHS1=>LHS.
 
+          Unlike [Pointwise_un], permute operations use equations only (not inequalities), so they
+          do NOT permit broadcasting. This makes permute more restrictive but also more precise
+          for operations where exact shape matching is required.
+
           The optional {!Ir.Indexing.variable_ref}s will capture the solutions of the dimensions
           corresponding to the specification labels equal to [ref_label] of a reference. *)
   | Batch_slice of Ir.Indexing.static_symbol  (** Removes the leftmost batch axis. *)
@@ -270,7 +278,7 @@ There is an important and intentional difference between `dims` in the `arrayjit
 Other important functions in the `Shape` module.
 
 * `einsum_slot_spec_to_dims_bio` parses an einsum spec for a single shape, returns the three rows and a mapping from axis (`dim`) variables to indices where the einsum specifies fixed indexing.
-* `get_inequalities` builds row inequalities by pairing the rows of the current shape (as `cur`) with the rows of sub-shapes (as `subr`). It also derives a batch row constraint for terminals initialized with `Constant_fill values`. For `Batch_slice` (the `@|` operation) it waits till the batch row variables (if any) are solved, and derives row equations (not inequalities) between the current shape and the sub-shape, with `cur_sh.batch.dims` expanded to account for the slicing / indexing. For einsum specs, it derives inequalities, roughly: _current shape ≥ lhs spec shape_, and _rhs spec shape ≥ sub-shape_.
+* `get_inequalities` builds row inequalities by pairing the rows of the current shape (as `cur`) with the rows of sub-shapes (as `subr`). It also derives a batch row constraint for terminals initialized with `Constant_fill values`. For `Batch_slice` (the `@|` operation) it waits till the batch row variables (if any) are solved, and derives row equations (not inequalities) between the current shape and the sub-shape, with `cur_sh.batch.dims` expanded to account for the slicing / indexing. For einsum specs, it derives equations (not inequalities), equating the current shape with the lhs spec shape, and the rhs spec shapes with the sub-shapes. This means einsum operations do NOT permit broadcasting, unlike pointwise and compose operations which use inequalities.
 * `propagate_shapes` gets and then solves the inequalities, using a global state for the environment. It udpates the shapes in-place with the partial solution. It is invoked twice for each `update_step`: first during the bottom-up process of building tensors, and then in reverse order from `finish_inference`.
 * `finish_inference` is called right before some projections or array dimensions are required (typically, because of jitting). It performs a second round of `propagate_shapes`, and then once again attempts to solve any remaining constraints that `propagate_shapes` didn't solve. Then it "closes the shapes": substitutes out remaining shape variables by their LUBs if any, or dimension-1 / `Broadcastable` (no-more-axes). Then it resets the environment state, since the shapes are now guaranteed to not have variables.
 * `derive_projections` starts by freshening the `proj_id`s in the `update_step`. Then it generates and solves shape inequalities, and then generates and solves projection equations, and constructs the `projections` record.
diff --git a/tensor/shape.ml b/tensor/shape.ml
@@ -984,10 +984,10 @@ let%debug4_sexp get_inequalities ({ shape = cur_sh; logic; id = _ } as _upd : up
       ( proj_env,
         extras_dim_refs @ extras_rhs @ extras_lhs
         @ [
-            Row_ineq
+            Row_eq
               {
-                cur = cur_sh.batch;
-                subr = b_lhs;
+                r1 = cur_sh.batch;
+                r2 = b_lhs;
                 origin =
                   [
                     {
@@ -1014,10 +1014,10 @@ let%debug4_sexp get_inequalities ({ shape = cur_sh; logic; id = _ } as _upd : up
                     };
                   ];
               };
-            Row_ineq
+            Row_eq
               {
-                cur = cur_sh.input;
-                subr = i_lhs;
+                r1 = cur_sh.input;
+                r2 = i_lhs;
                 origin =
                   [
                     {
@@ -1044,10 +1044,10 @@ let%debug4_sexp get_inequalities ({ shape = cur_sh; logic; id = _ } as _upd : up
                     };
                   ];
               };
-            Row_ineq
+            Row_eq
               {
-                cur = cur_sh.output;
-                subr = o_lhs;
+                r1 = cur_sh.output;
+                r2 = o_lhs;
                 origin =
                   [
                     {
@@ -1210,10 +1210,10 @@ let%debug4_sexp get_inequalities ({ shape = cur_sh; logic; id = _ } as _upd : up
       ( proj_env,
         extras_dim_refs @ extras_rhs1 @ extras_rhs2 @ extras_lhs
         @ [
-            Row_ineq
+            Row_eq
               {
-                cur = cur_sh.batch;
-                subr = b_lhs;
+                r1 = cur_sh.batch;
+                r2 = b_lhs;
                 origin =
                   [
                     {
@@ -1255,10 +1255,10 @@ let%debug4_sexp get_inequalities ({ shape = cur_sh; logic; id = _ } as _upd : up
                     };
                   ];
               };
-            Row_ineq
+            Row_eq
               {
-                cur = cur_sh.input;
-                subr = i_lhs;
+                r1 = cur_sh.input;
+                r2 = i_lhs;
                 origin =
                   [
                     {
@@ -1300,10 +1300,10 @@ let%debug4_sexp get_inequalities ({ shape = cur_sh; logic; id = _ } as _upd : up
                     };
                   ];
               };
-            Row_ineq
+            Row_eq
               {
-                cur = cur_sh.output;
-                subr = o_lhs;
+                r1 = cur_sh.output;
+                r2 = o_lhs;
                 origin =
                   [
                     {
diff --git a/tensor/shape.mli b/tensor/shape.mli
@@ -113,10 +113,14 @@ type compose_type =
           multiply). *)
   | Einsum of string * delayed_var_ref list
       (** The binary "einsum" syntax: RHS1;RHS2=>LHS, where RHSi, LHS are labels specifications.
-          Since OCANNL's extended einsum notation supports both axis variables and row variables, it
-          makes other compose types redundant. The [axis_labels] use pseudo-labels local to the
-          notation, to line up the axes and row variables. The symmetric difference / disjunctive
-          union of RHS1 and RHS2's pseudo-labels should be equal to LHS pseudo-labels.
+          OCANNL's extended einsum notation supports both axis variables and row variables.
+          The [axis_labels] use pseudo-labels local to the notation, to line up the axes and row
+          variables. The symmetric difference / disjunctive union of RHS1 and RHS2's pseudo-labels
+          should be equal to LHS pseudo-labels.
+
+          Unlike [Pointwise_bin] and [Compose], einsum operations use equations only (not
+          inequalities), so they do NOT permit broadcasting. This makes einsum more restrictive
+          but also more precise for operations where exact shape matching is required.
 
           The optional {!Ir.Indexing.variable_ref}s will capture the solutions of the dimensions
           corresponding to the specification labels equal to [ref_label] of a reference.
@@ -131,6 +135,10 @@ type transpose_type =
   | Permute of string * delayed_var_ref list
       (** The unary "einsum" syntax: RHS1=>LHS.
 
+          Unlike [Pointwise_un], permute operations use equations only (not inequalities), so they
+          do NOT permit broadcasting. This makes permute more restrictive but also more precise
+          for operations where exact shape matching is required.
+
           The optional {!Ir.Indexing.variable_ref}s will capture the solutions of the dimensions
           corresponding to the specification labels equal to [ref_label] of a reference. *)
   | Batch_slice of Ir.Indexing.static_symbol  (** Removes the leftmost batch axis. *)

Original file line number	Diff line number	Diff line change
`@@ -984,10 +984,10 @@ let%debug4_sexp get_inequalities ({ shape = cur_sh; logic; id = _ } as _upd : up`
`984`	`984`	`( proj_env,`
`985`	`985`	`extras_dim_refs @ extras_rhs @ extras_lhs`
`986`	`986`	`@ [`
`987`		`- Row_ineq`
	`987`	`+ Row_eq`
`988`	`988`	`{`
`989`		`- cur = cur_sh.batch;`
`990`		`- subr = b_lhs;`
	`989`	`+ r1 = cur_sh.batch;`
	`990`	`+ r2 = b_lhs;`
`991`	`991`	`origin =`
`992`	`992`	`[`
`993`	`993`	`{`
`@@ -1014,10 +1014,10 @@ let%debug4_sexp get_inequalities ({ shape = cur_sh; logic; id = _ } as _upd : up`
`1014`	`1014`	`};`
`1015`	`1015`	`];`
`1016`	`1016`	`};`
`1017`		`- Row_ineq`
	`1017`	`+ Row_eq`
`1018`	`1018`	`{`
`1019`		`- cur = cur_sh.input;`
`1020`		`- subr = i_lhs;`
	`1019`	`+ r1 = cur_sh.input;`
	`1020`	`+ r2 = i_lhs;`
`1021`	`1021`	`origin =`
`1022`	`1022`	`[`
`1023`	`1023`	`{`
`@@ -1044,10 +1044,10 @@ let%debug4_sexp get_inequalities ({ shape = cur_sh; logic; id = _ } as _upd : up`
`1044`	`1044`	`};`
`1045`	`1045`	`];`
`1046`	`1046`	`};`
`1047`		`- Row_ineq`
	`1047`	`+ Row_eq`
`1048`	`1048`	`{`
`1049`		`- cur = cur_sh.output;`
`1050`		`- subr = o_lhs;`
	`1049`	`+ r1 = cur_sh.output;`
	`1050`	`+ r2 = o_lhs;`
`1051`	`1051`	`origin =`
`1052`	`1052`	`[`
`1053`	`1053`	`{`
`@@ -1210,10 +1210,10 @@ let%debug4_sexp get_inequalities ({ shape = cur_sh; logic; id = _ } as _upd : up`
`1210`	`1210`	`( proj_env,`
`1211`	`1211`	`extras_dim_refs @ extras_rhs1 @ extras_rhs2 @ extras_lhs`
`1212`	`1212`	`@ [`
`1213`		`- Row_ineq`
	`1213`	`+ Row_eq`
`1214`	`1214`	`{`
`1215`		`- cur = cur_sh.batch;`
`1216`		`- subr = b_lhs;`
	`1215`	`+ r1 = cur_sh.batch;`
	`1216`	`+ r2 = b_lhs;`
`1217`	`1217`	`origin =`
`1218`	`1218`	`[`
`1219`	`1219`	`{`
`@@ -1255,10 +1255,10 @@ let%debug4_sexp get_inequalities ({ shape = cur_sh; logic; id = _ } as _upd : up`
`1255`	`1255`	`};`
`1256`	`1256`	`];`
`1257`	`1257`	`};`
`1258`		`- Row_ineq`
	`1258`	`+ Row_eq`
`1259`	`1259`	`{`
`1260`		`- cur = cur_sh.input;`
`1261`		`- subr = i_lhs;`
	`1260`	`+ r1 = cur_sh.input;`
	`1261`	`+ r2 = i_lhs;`
`1262`	`1262`	`origin =`
`1263`	`1263`	`[`
`1264`	`1264`	`{`
`@@ -1300,10 +1300,10 @@ let%debug4_sexp get_inequalities ({ shape = cur_sh; logic; id = _ } as _upd : up`
`1300`	`1300`	`};`
`1301`	`1301`	`];`
`1302`	`1302`	`};`
`1303`		`- Row_ineq`
	`1303`	`+ Row_eq`
`1304`	`1304`	`{`
`1305`		`- cur = cur_sh.output;`
`1306`		`- subr = o_lhs;`
	`1305`	`+ r1 = cur_sh.output;`
	`1306`	`+ r2 = o_lhs;`
`1307`	`1307`	`origin =`
`1308`	`1308`	`[`
`1309`	`1309`	`{`