Fixes #410; in progress: refine the shape inference to treat dim-1 with label the same as dim>1, only dim-1 without label is different (more general)

lukstafi · lukstafi · commit 75e8fbc40c0e · 2025-11-17T11:43:28.000+01:00
diff --git a/docs/shape_inference.md b/docs/shape_inference.md
@@ -2,7 +2,7 @@
 
 To separate concerns, OCANNL is split into the `arrayjit` library, responsible for compilation of high-level n-D array operation sequences (`Assignments.comp`) via backends such as sync_cc, metal and cuda, and the main `ocannl` library, responsible for deriving the operations computing the forward propagation and backpropagation from tensor expressions. In particular, `arrayjit` contains `Indexing`, which represents complex indexing into arrays, and the main library `ocannl` has `Row` and `Shape` modules, which do the most "heavy-lifting" in the translation from concise tensor expressions to sequences of assignments.
 
-Shape inference broadly speaking consists in OCANNL of inferring the `Shape.t` record -- shape inference proper, and inferring the `Indexing.projections` record -- projections inference. `Shape.t` records are mutable, so that the partially inferred shapes can be observed by the user. Shape and projections inference is intended to be declarative -- independent of the order in which constraints are added. There is one aspect that is not declarative: when tensor expressions are compiled to assignments, i.e. jitted, still-unsolved shape variables in terminal nodes are substituted by their least upper bounds if any, or by dimension-1 / no-more-axes.
+Shape inference broadly speaking consists in OCANNL of inferring the `Shape.t` record -- shape inference proper, and inferring the `Indexing.projections` record -- projections inference. `Shape.t` records are mutable, so that the partially inferred shapes can be observed by the user. Shape and projections inference is intended to be declarative -- independent of the order in which constraints are added. There is one aspect that is not declarative: when tensor expressions are compiled to assignments, i.e. jitted, still-unsolved shape variables in terminal nodes are substituted by their least upper bounds if any, or by dimension-1 (no label) / no-more-axes.
 
 The bulk of the projections inference happens alongside shape inference, with the projections-relevant information stored in auxiliary fields -- this prevents subtle bugs where projection semantics deviates from shape semantics, and will simplify adding new shape/projection inference features. Shape inference happens during `propagate_shapes` calls, and then again in a `finish_inference` call, which is triggered whenever the dimensions or projections are required (i.e. typically by jitting). Finally, the projections are reconstructed in `derive_projections`. It would seem `derive_projections` could reuse the already-computed solutions constraints. But we face a problem: we must prevent contaminating projections across different operations. To illustrate: we conclude the dimensions of two axes are the same because they are reduced together in another operation -- this should not force the axes to share a projection in the processed operation. To prevent the contamination, in each `derive_projections` call, we freshen the projection ids in the (inferred) shapes, and regenerate and re-solve the constraints with the fresh projection ids.
 
@@ -63,7 +63,7 @@ Shape inference does not maintain padding for axes of individual tensor nodes, t
 
 ### Preventing Premature Guessing with Total_elems Constraints
 
-A critical aspect of shape inference is avoiding premature "guessing" of dimension variables to minimal values (dimension-1 or no-further-axes for rows) when such guessing would make pending constraints unsatisfiable. This is particularly important for `Total_elems` constraints of the form:
+A critical aspect of shape inference is avoiding premature "guessing" of dimension variables to minimal values (dimension-1-no-label or no-further-axes for rows) when such guessing would make pending constraints unsatisfiable. This is particularly important for `Total_elems` constraints of the form:
 
 ```ocaml
 Total_elems { numerator = Strided_var { coeff; var; denom }; divided_by }
@@ -89,7 +89,7 @@ This mechanism ensures that `Total_elems` constraints with stride-based numerato
 
 ### Inference strategy
 
-The actual shape inference combines row polymorphism with (nominal) subtyping, as known in the type inference literature. The subtyping stems merely from the fact that a dimension-1 axis can be used in the context of any dimension due to per-axis broadcasting. Row polymorphism stems from broadcasting to more axes: for example, when unifying an unknown (shape) row with a known one, we cannot assume that the unknown row will have just the axes of the known one, because maybe the known row is meant to be broadcasted here to more axes. The combination of row polymorphism with nominal subtyping means that the constraints we are solving are inequalities, both inequalities between rows (the `Row.t` type, i.e. the `row` type above), and between axes/dimensions (the `Row.dim` type). We maintain the inequality ordering between variables in the environment to compute the transitive closure during simplification. We also maintain a least upper bound on the solution.
+The actual shape inference combines row polymorphism with (nominal) subtyping, as known in the type inference literature. The subtyping stems merely from the fact that a dimension-1-no-label axis can be used in the context of any dimension due to per-axis broadcasting. Row polymorphism stems from broadcasting to more axes: for example, when unifying an unknown (shape) row with a known one, we cannot assume that the unknown row will have just the axes of the known one, because maybe the known row is meant to be broadcasted here to more axes. The combination of row polymorphism with nominal subtyping means that the constraints we are solving are inequalities, both inequalities between rows (the `Row.t` type, i.e. the `row` type above), and between axes/dimensions (the `Row.dim` type). We maintain the inequality ordering between variables in the environment to compute the transitive closure during simplification. We also maintain a least upper bound on the solution.
 
 ```ocaml
 type dim_entry =
@@ -205,7 +205,7 @@ During the solution process, the constraints are incorporated, or propagated, in
 
 ## Solving the constraints
 
-The constraints are solved by: unification of the equation constraints, unification-like simplification of the inequality constraints, propagation of the complex constraints. The inequalities are like in type systems combining parametric polymorphism with structural and nominal subtyping, where the nominal subtyping relation states that dimension-1 axis is smaller than all axes, and axes of other dimensions are incomparable. For rows, the subtyping is suffix-wise (shorter is smaller) and axis-wise.
+The constraints are solved by: unification of the equation constraints, unification-like simplification of the inequality constraints, propagation of the complex constraints. The inequalities are like in type systems combining parametric polymorphism with structural and nominal subtyping, where the nominal subtyping relation states that dimension-1 without a label axis is smaller than all axes, and axes of other mismatching dimensions or mismatching labels are incomparable. For rows, the subtyping is suffix-wise (shorter is smaller) and axis-wise.
 
 Simplification of an inequality, and constraint propagation, can generate more constraints, so we need to be careful to keep it terminating. The solution proceeds in stages. Currently there are 8 stages, with a fractional stage coming from splitting an earlier design.
 
diff --git a/tensor/row.ml b/tensor/row.ml
@@ -1816,8 +1816,8 @@ let%track5_sexp solve_dim_ineq ~(stage : stage) origin ~(cur : dim) ~(subr : dim
       @@ Shape_error
            ("dimension comparison for axis: different labels", [ Dim_mismatch [ cur; subr ] ])
   | Dim { d = d1; _ }, Dim { d = d2; _ } when d1 = d2 -> ([], env)
-  | _, Dim { d = 1; _ } -> ([], env)
-  | (Dim { d = 1; _ } as cur), _ -> ([ Dim_eq { d1 = subr; d2 = cur; origin } ], env)
+  | _, Dim { d = 1; label = None; _ } -> ([], env)
+  | (Dim { d = 1; label = None; _ } as cur), _ -> ([ Dim_eq { d1 = subr; d2 = cur; origin } ], env)
   | Conv_input _, _ | _, Conv_input _ -> ([ Dim_eq { d1 = subr; d2 = cur; origin } ], env)
   | Var cur_v, Var subr_v -> (
       match (find_dim env.dim_env cur_v, find_dim env.dim_env subr_v) with
@@ -2488,9 +2488,16 @@ let%debug5_sexp solve_row_ineq ~(stage : stage) origin ~(cur : t) ~(subr : t) en
             List.map2_exn (take_from_end r_cur.dims lub_len) (take_from_end lub2.dims lub_len)
               ~f:(fun d1 d2 ->
                 match (d1, d2) with
-                | Dim { d = 1; _ }, _ -> d1
-                | _, Dim { d = 1; _ } -> d2
+                (* Prefer dimensions without labels (more general), then prefer d=1 (more general
+                   size) *)
+                | Dim { d = 1; label = None; _ }, _ -> d1
+                | _, Dim { d = 1; label = None; _ } -> d2
+                | Dim { d = 1; label = Some _; _ }, Dim { label = None; _ } -> d2
+                | Dim { label = None; _ }, Dim { d = 1; label = Some _; _ } -> d1
                 | Dim { d = d1; _ }, Dim { d = d2; _ } when d1 <> d2 -> get_dim ~d:1 ~proj_id:48 ()
+                | Dim { label = Some l1; _ }, Dim { label = Some l2; _ }
+                  when not (String.equal l1 l2) ->
+                    get_dim ~d:1 ~proj_id:63 ()
                 | Conv_input { stride; output = Dim s; _ }, Dim s'
                 | Dim s', Conv_input { stride; output = Dim s; _ }
                   when !use_padding && stride * s.d <> s'.d ->
diff --git a/tensor/shape.ml b/tensor/shape.ml
@@ -328,7 +328,7 @@ let axis_map_to_dims_bio (type a) ?(default : a option) (idcs : a axis_map) =
       let back_axes, front_axes =
         Map.to_alist axes
         |> List.partition_map ~f:(fun ({ AxisKey.from_end; pos = i; _ }, v) ->
-               if from_end then Either.First (i, v) else Second (i, v))
+            if from_end then Either.First (i, v) else Second (i, v))
       in
       let back_size = List.fold back_axes ~init:0 ~f:(fun accu (i, _) -> max i accu) in
       let front_size = List.fold front_axes ~init:0 ~f:(fun accu (i, _) -> max i accu) in
@@ -888,21 +888,21 @@ let%debug4_sexp get_inequalities ({ shape = cur_sh; logic; id = _ } as _upd : up
       ( proj_axis_env,
         (Option.to_list static_range
         |> List.map ~f:(fun range ->
-               Dim_eq
-                 {
-                   d1 = get_dim ~d:range ();
-                   d2 = slice_var;
-                   origin =
-                     [
-                       {
-                         lhs_name = sh.debug_name;
-                         lhs_kind = `Batch;
-                         rhs_name = Idx.symbol_ident static_symbol;
-                         rhs_kind = `Batch;
-                         operation = Some "Slice";
-                       };
-                     ];
-                 }))
+            Dim_eq
+              {
+                d1 = get_dim ~d:range ();
+                d2 = slice_var;
+                origin =
+                  [
+                    {
+                      lhs_name = sh.debug_name;
+                      lhs_kind = `Batch;
+                      rhs_name = Idx.symbol_ident static_symbol;
+                      rhs_kind = `Batch;
+                      operation = Some "Slice";
+                    };
+                  ];
+              }))
         @ [
             Row_eq { r1 = expanded_batch; r2 = sh.batch; origin = get_origin `Batch };
             Row_eq { r1 = cur_sh.input; r2 = sh.input; origin = get_origin `Input };
@@ -1832,12 +1832,21 @@ let%debug4_sexp derive_projections (update_step : update_step) : Idx.projections
 let make ?batch_dims ?input_dims ?output_dims ?batch_axes ?input_axes ?output_axes
     ?(deduced = Not_constrained) ~debug_name ~id () =
   let open Row in
+  let known_no_batch =
+    match (batch_dims, batch_axes) with Some [], None -> true | None, Some [] -> true | _ -> false
+  in
+  let num_dim1_output = Option.to_list output_dims |> List.join |> List.count ~f:(fun d -> d = 1) in
+  let f kind d =
+    match kind with
+    | `Batch | `Input -> get_dim ~d ()
+    | `Output ->
+        if not known_no_batch && num_dim1_output = 1 && d = 1 then
+          let label = debug_name ^ "_output" in
+          get_dim ~d ~label ()
+        else get_dim ~d ()
+  in
   let make_dims kind ds =
-    {
-      dims = List.map ~f:(fun d -> get_dim ~d ()) ds;
-      bcast = Broadcastable;
-      prov = provenance ~sh_id:id ~kind;
-    }
+    { dims = List.map ~f:(f kind) ds; bcast = Broadcastable; prov = provenance ~sh_id:id ~kind }
   in
   let make_axes kind ds =
     {
@@ -1987,15 +1996,12 @@ let to_string_hum ?(style = Row.Axis_size) (sh : t) =
     let dims = (row_of_kind kind sh).dims in
     String.concat ~sep:","
     @@ List.mapi dims ~f:(fun i d ->
-           let num =
-             match kind with
-             | `Input -> n_batch + n_outputs + i
-             | `Output -> n_batch + i
-             | `Batch -> i
-           in
-           match style with
-           | Row.Only_labels | Axis_size | Projection_and_size -> Row.dim_to_string style d
-           | Axis_number_and_size -> Int.to_string num ^ ":" ^ Row.dim_to_string style d)
+        let num =
+          match kind with `Input -> n_batch + n_outputs + i | `Output -> n_batch + i | `Batch -> i
+        in
+        match style with
+        | Row.Only_labels | Axis_size | Projection_and_size -> Row.dim_to_string style d
+        | Axis_number_and_size -> Int.to_string num ^ ":" ^ Row.dim_to_string style d)
   in
   let batch_dims = dims_to_string `Batch in
   let input_dims = dims_to_string `Input in
diff --git a/test/operations/hello_world_op.ml b/test/operations/hello_world_op.ml