NDArray phases 1, 2 & 3: core type, factories, indexing + views, element-wise ops by Copilot · Pull Request #70 · Quafadas/vecxt

Copilot · 2026-04-07T07:40:02Z

Adds the foundational NDArray[A] type to vecxt, a full set of indexing and view operations, and element-wise arithmetic/comparison operations on NDArray[Double] — cross-platform, N-dimensional array with configurable strides, offset, and column-major default layout.

New files

vecxt/src/ndarray.scala — NDArray[A] class + companion
- @publicInBinary() private constructor; no @specialized — Array[A] for primitives is already unboxed at the JVM level, and @specialized in Scala 3 can silently de-specialize with inline and extension methods
- Lazy properties: ndim, numel, isColMajor, isRowMajor, isContiguous, layout
- Factories: apply (full strides / column-major convenience), fromArray, zeros, ones, fill
- Private helpers: colMajorStrides, shapeProduct, mkNDArray (package-private unchecked constructor for view operations)
vecxt/src/NDArrayCheck.scala — inline bounds checks (erasable via BoundsCheck)
- strideNDArrayCheck — rank consistency, positive dims, offset bounds, corner-index range
- dimNDArrayCheck — shape product vs data length
- shapeCheck — non-empty, all-positive shape
- indexNDArrayCheck — rank and per-axis bounds for element access
- InvalidNDArray exception
vecxt/src/ndarrayOps.scala — extension methods on NDArray[A] for indexing and views
- Element read — apply overloads for 1D/2D/3D/4D (transparent inline, zero allocation) + N-D Array[Int] variant
- Element write — update overloads matching all apply variants (enables arr(i,j) = value syntax)
- slice(dim, start, end) — zero-copy view; adjusts offset and shrinks one shape dimension
- T — 2D transpose shorthand (zero-copy, validates ndim=2)
- transpose(perm) — N-D axis permutation with full permutation validation (zero-copy)
- reshape(newShape) — zero-copy view when isColMajor; copies via toArray otherwise
- squeeze / squeeze(dim) — remove all or a specific size-1 dimension (zero-copy)
- unsqueeze(dim) / expandDims(dim) — insert a size-1 dimension (zero-copy)
- flatten — 1D view if contiguous; copies to col-major order otherwise
- toArray — fast data.clone() when isColMajor; col-major odometer iteration otherwise
vecxt/src/broadcast.scala — explicit broadcasting (no implicit broadcast in binary ops)
- broadcastTo(targetShape) — inline zero-copy view with stride-0 expansion for broadcast dimensions
- broadcastPair(a, b) — broadcasts both operands to their common shape
- broadcastShape, broadcastStrides, sameShape — helpers
- BroadcastException, ShapeMismatchException — error types
vecxt/src/ndarrayDoubleOps.scala — element-wise operations on NDArray[Double]
- Binary — +, -, *, / (same shape required; use broadcastTo/broadcastPair to align shapes first)
- Scalar — +, -, *, / (both ndarray op scalar and scalar op ndarray)
- Unary — neg, abs, exp, log, sqrt, tanh, sigmoid
- In-place binary — +=, -=, *=, /= (col-major fast path + general stride kernel)
- In-place scalar — +=, -=, *=, /=
- Comparison — >, <, >=, <=, =:=, !:= (array and scalar variants, return NDArray[Boolean])
- Two dispatch paths: flat while-loop fast path for contiguous col-major arrays reusing existing platform-specific Array[Double] ops; general stride-kernel for non-col-major/broadcast views

Modified

vecxt/src/all.scala — adds export vecxt.ndarray.*, export vecxt.ndarrayOps.*, export vecxt.NDArrayDoubleOps.*, export vecxt.broadcast.*

Key invariants

Broadcasting is explicit. Binary ops require same-shape operands. Use broadcastTo / broadcastPair to align shapes before arithmetic — consistent with vecxt's existing Array[Double] ops and making broadcast sites visible in code.
View operations (slice, transpose, squeeze, unsqueeze, broadcastTo) always share arr.data — mutation through one is visible through the other (NumPy semantics).
Copy operations (toArray, flatten on non-contiguous, reshape on non-col-major, binary ops on non-col-major) return a fresh col-major Array.
Stride formula everywhere: offset + Σ indices(k) * strides(k).
All entry points respect BoundsCheck — inline, erasable at call site with DoBoundsCheck.no.
All code in vecxt/src/ (cross-platform shared only — no JVM/JS/Native forks).

Usage

import vecxt.all.*

// column-major 2×3 from flat array
val a = NDArray(Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0), Array(2, 3))
// strides: [1, 2], ndim: 2, numel: 6, isColMajor: true

// element access
a(0, 1)          // 3.0
a.update(1, 0, 99.0)

// views (zero-copy)
val col1 = a.slice(1, 1, 2)   // column 1
val at   = a.T                 // transposed 3×2 view
val flat = a.flatten           // 1D view of 6 elements

// reshape / squeeze
val z = NDArray.zeros[Double](Array(1, 3, 1))
z.squeeze          // shape [3]
z.unsqueeze(0)     // shape [1, 1, 3, 1]

// materialise
a.toArray          // Array[Double] in col-major order

// element-wise arithmetic (same-shape required)
val b = NDArray.fill(Array(2, 3), 2.0)
val c = a + b      // element-wise add
a *= 3.0           // in-place scalar multiply

// explicit broadcasting
val row = NDArray(Array(1.0, 2.0, 3.0), Array(1, 3))
val (a2, b2) = broadcastPair(row, NDArray.fill(Array(4, 3), 1.0))
val result = a2 + b2   // shape [4, 3]

// comparison
val mask = a > 3.0     // NDArray[Boolean]

Strides follow the same column-major convention as Matrix (rowStride=1, colStride=rows): for shape [d₀, d₁, …], strides = [1, d₀, d₀·d₁, …].

Agent-Logs-Url: https://github.com/Quafadas/vecxt/sessions/7d847bd1-604b-4b29-a265-f3cb0abf602d Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>

Quafadas · 2026-04-07T07:59:33Z

@copilot I've updated the design doc to roll back the use of specialised. Can you review your implementation to check consistency with that new principle?

…imitives Agent-Logs-Url: https://github.com/Quafadas/vecxt/sessions/9b13c6c2-31a2-45c9-a1e2-aa70b9a9f636 Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>

Copilot · 2026-04-07T08:06:44Z

@copilot I've updated the design doc to roll back the use of specialised. Can you review your implementation to check consistency with that new principle?

Done in bc60626. Removed all @specialized(Double, Int, Float, Boolean) annotations from the NDArray class, all factory methods (apply, fromArray, zeros, ones, fill), and the extension — consistent with the design doc's reasoning that Array[A] for primitives is already unboxed at the JVM level, and @specialized in Scala 3 can silently de-specialize when combined with inline and extension methods.

Quafadas · 2026-04-07T08:31:31Z

@copilot Write out a detailed implementation plan for Milestone 2.

…e-type-factories' into copilot/ndarray-core-type-factories

Copilot · 2026-04-07T08:43:37Z

@copilot Write out a detailed implementation plan for Milestone 2.

Here is the detailed implementation plan for Milestone 2.

Milestone 2: Indexing + Views — Detailed Plan

Goal: Read/write individual elements, and create views of an NDArray without copying data.

All code lives in vecxt/src/ (shared, cross-platform). No platform-specific forks.

Files

File	Role
`vecxt/src/ndarrayOps.scala`	New file: all extension methods on `NDArray[A]` for indexing, update, and views
`vecxt/src/NDArrayCheck.scala`	Extend with `indexNDArrayCheck` for element-access bounds
`vecxt/test/src/ndarray.test.scala`	Extend existing test suite with M2 tests

1. Element read — `apply`

Following the design doc recommendation (open question #5): specific overloads for 1–4D, varargs for N>4 to avoid unnecessary Int* allocation for the common cases.

// in extension [A](arr: NDArray[A])

// 1D
transparent inline def apply(i0: Int)(using inline bc: BoundsCheck): A =
  indexNDArrayCheck(arr, Array(i0))
  arr.data(arr.offset + i0 * arr.strides(0))

// 2D
transparent inline def apply(i0: Int, i1: Int)(using inline bc: BoundsCheck): A =
  indexNDArrayCheck(arr, Array(i0, i1))
  arr.data(arr.offset + i0 * arr.strides(0) + i1 * arr.strides(1))

// 3D, 4D analogously...

// N-D general case
inline def apply(indices: Array[Int])(using inline bc: BoundsCheck): A =
  indexNDArrayCheck(arr, indices)
  var pos = arr.offset
  var k = 0
  while k < indices.length do
    pos += indices(k) * arr.strides(k)
    k += 1
  arr.data(pos)

indexNDArrayCheck validates: rank matches arr.ndim, each indices(k) is in [0, arr.shape(k)).

2. Element write — `update`

Mirror of apply. Enables arr(i, j) = value syntax via Scala's update convention.

// 1D
inline def update(i0: Int, value: A)(using inline bc: BoundsCheck): Unit = ...

// 2D
inline def update(i0: Int, i1: Int, value: A)(using inline bc: BoundsCheck): Unit = ...

// N-D
inline def update(indices: Array[Int], value: A)(using inline bc: BoundsCheck): Unit = ...

3. Slice / view — `slice`

Returns a new NDArray sharing the backing data. No allocation of new data. Adjusts offset and shrinks the dimension.

// arr.slice(dim, start, end)  →  NDArray[A] (view, no copy)
// Preconditions: 0 <= dim < ndim, 0 <= start < end <= shape(dim)
inline def slice(dim: Int, start: Int, end: Int)(using inline bc: BoundsCheck): NDArray[A] =
  // new offset: offset + start * strides(dim)
  // new shape(dim): end - start
  // strides unchanged
  val newShape = arr.shape.clone()
  newShape(dim) = end - start
  val newOffset = arr.offset + start * arr.strides(dim)
  new NDArray(arr.data, newShape, arr.strides.clone(), newOffset)

This mirrors how Matrix.submatrix produces a zero-copy view when rows/cols are contiguous.

4. Transpose — `transpose` / `T`

2D shorthand:

inline def T: NDArray[A] = arr.transpose(Array(1, 0))

General N-D permutation (swap strides and shape by the permutation):

inline def transpose(perm: Array[Int])(using inline bc: BoundsCheck): NDArray[A] =
  // validate perm is a valid permutation of 0..ndim-1
  val newShape   = Array.tabulate(arr.ndim)(k => arr.shape(perm(k)))
  val newStrides = Array.tabulate(arr.ndim)(k => arr.strides(perm(k)))
  new NDArray(arr.data, newShape, newStrides, arr.offset)

No data copy — same data reference, same offset, only shape and strides reordered. Mirrors Matrix.transpose.

5. Reshape

Only valid for contiguous arrays (no stride gaps). If contiguous, create new strides for newShape and reuse data. If non-contiguous, throw or copy to a contiguous array first.

inline def reshape(newShape: Array[Int])(using inline bc: BoundsCheck, ct: ClassTag[A]): NDArray[A] =
  // validate: product(newShape) == numel
  // if isContiguous: new NDArray(arr.data, newShape, colMajorStrides(newShape), 0)
  // else: toArray then create fresh NDArray

6. Squeeze / Unsqueeze

Squeeze — remove all dimensions of size 1 (or a specific dim):

inline def squeeze: NDArray[A] = ...                      // remove all size-1 dims
inline def squeeze(dim: Int): NDArray[A] = ...            // remove specific size-1 dim (error if shape(dim) != 1)

Unsqueeze (= expandDims) — insert a size-1 dimension at position dim:

inline def unsqueeze(dim: Int): NDArray[A] = ...
inline def expandDims(dim: Int): NDArray[A] = arr.unsqueeze(dim)   // alias

Both are zero-copy (just adjust shape/strides arrays).

7. Flatten

Returns a 1D view if contiguous; otherwise materialises a contiguous copy first.

inline def flatten(using ct: ClassTag[A]): NDArray[A] =
  if arr.isContiguous then
    new NDArray(arr.data, Array(arr.numel), Array(1), arr.offset)
  else
    val out = arr.toArray
    new NDArray(out, Array(arr.numel), Array(1), 0)

8. `toArray` — materialise to contiguous `Array[A]`

Iterates via the general stride formula to fill a fresh array in column-major order.

def toArray(using ct: ClassTag[A]): Array[A] =
  // If already dense col-major with offset==0, just return data.clone()
  // Otherwise iterate all multi-indices and fill in order

9. New bounds check

Add to NDArrayCheck.scala:

object indexNDArrayCheck:
  inline def apply[A](arr: NDArray[A], indices: Array[Int])(using inline bc: BoundsCheck): Unit =
    inline if bc then
      if indices.length != arr.ndim then throw InvalidNDArray(...)
      var k = 0
      while k < indices.length do
        if indices(k) < 0 || indices(k) >= arr.shape(k) then
          throw IndexOutOfBoundsException(...)
        k += 1

10. Tests

All tests in vecxt/test/src/ndarray.test.scala, cross-platform (shared src/).

Group	Tests
Element read	1D, 2D, 3D correct value; col-major layout reads correct element; strided/offset view reads correctly
Element write	1D, 2D mutation; mutation through a view is visible on the original data
Slice	Correct shape/offset/strides; data aliasing (mutation visible); slice of slice
Transpose 2D	Shape/strides swapped; `T.T` identity; element equivalence
Transpose N-D	Arbitrary permutation; invalid permutation rejected
Reshape	Contiguous → new shape, correct strides; non-contiguous triggers copy; wrong numel rejected
Squeeze	All-1 dims removed; single dim removed; non-1 dim rejected
Unsqueeze	Shape gains new size-1 dim; `unsqueeze.squeeze` round-trip
Flatten	Contiguous returns view (same data reference); non-contiguous returns fresh array
toArray	Dense col-major returns clone; strided view materialises correctly
Bounds check	Out-of-range index throws; wrong rank throws; `no` skips check

Key invariants throughout

View operations (slice, transpose, squeeze, unsqueeze) always share arr.data — mutation through one is visible through the other (NumPy semantics, documented).
Copy operations (toArray, flatten on non-contiguous, reshape on non-contiguous) return a fresh Array.
Stride formula everywhere: offset + Σ indices(k) * strides(k).
All entry points respect BoundsCheck — inline, erasable at call site with DoBoundsCheck.no.
All code in vecxt/src/ (cross-platform shared only — no JVM/JS/Native forks in M2).

Quafadas · 2026-04-07T08:46:28Z

@copilot implement it

Quafadas · 2026-04-07T10:16:29Z

@copilot continue with the implementation. Implement milestone 2 in code, with tests etc as per the plan you've written out above.

…en, toArray Agent-Logs-Url: https://github.com/Quafadas/vecxt/sessions/12022e30-8b23-45cf-9d8f-c6ccf26801ca Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>

Copilot · 2026-04-07T10:53:58Z

@copilot continue with the implementation. Implement milestone 2 in code, with tests etc as per the plan you've written out above.

Done in e17a842. Implemented all of Milestone 2:

New file vecxt/src/ndarrayOps.scala (exported via all.*):

apply — element read, 1D/2D/3D/4D overloads + N-D Array[Int] variant
update — element write, same variants
slice(dim, start, end) — zero-copy view adjusting offset/shape
T — 2D transpose shorthand (zero-copy)
transpose(perm) — N-D axis permutation with permutation validation (zero-copy)
reshape(newShape) — view when already col-major (offset=0 guaranteed), otherwise copies via toArray
squeeze / squeeze(dim) — remove size-1 dimensions (zero-copy)
unsqueeze(dim) / expandDims(dim) — insert size-1 dimension (zero-copy)
flatten — 1D view if contiguous, otherwise materialises in col-major order
toArray — col-major odometer iteration; fast clone path when isColMajor

NDArrayCheck.scala — added indexNDArrayCheck for N-D element-access bounds validation.

ndarray.scala — added private[vecxt] def mkNDArray (unchecked construction for view operations).

All 140 tests pass.

Quafadas · 2026-04-07T15:54:21Z

@copilot I've pushed an update to the design doc which changes the approach to broadcasting. We want it to be explicit rather than implicit. Please check that diff. Once done, implement milestone 3 in accordance with the plan below.

Milestone 3 — Detailed Design: Element-wise Operations (Double)

Overview

Milestone 3 introduces arithmetic, comparison, unary, and in-place operations on NDArray[Double].
The core deliverable is a working, correct, cross-platform implementation for all ranks and memory
layouts, with a fast path for contiguous same-shape arrays that leverages existing platform-specific
Array[Double] operations (SIMD on JVM, while loops on JS/Native).

The operations do not require M4 (reductions) or M5 (Matrix bridge). The only prerequisite is
that M2 indexing and views work correctly, since the general-path iterator relies on element access.

Design principles for this milestone

Correctness above all else. Every op must produce results matching NumPy to within ~1 ULP for
equivalent inputs on all three platforms.
Broadcasting is explicit. Binary ops require same-shape operands — consistent with vecxt's
existing Array[Double] ops. Broadcasting is a separate, zero-copy broadcastTo operation that
the user invokes before arithmetic. This makes broadcast sites visible in code, produces better
error messages, and creates clean first-class nodes for AD computation graphs.
Two dispatch paths, not two implementations. The fast path (contiguous, same shape) reuses
the existing platform-specific Array[Double] ops. The slow path (general case) is a single
cross-platform N-dimensional iterator with no platform-specific code.
No new platform-specific files for this milestone. The fast path delegates to already-existing
platform routines in vecxt.arrays (JVM SIMD) and vecxt.JsNativeDoubleArrays (JS/Native).
In-place ops are aliasing-safe. An in-place a += b is valid as long as a is contiguous
and a and b have the same shape. Non-contiguous in-place is not supported in M3 (throws).

Broadcasting algorithm

Broadcasting is the mechanism behind broadcastTo. Binary ops themselves do not broadcast —
they require same-shape operands. The user explicitly calls broadcastTo (or broadcastPair) to
expand shapes before arithmetic. The algorithm follows NumPy semantics:

Given a source shape and a target shape:

Step 1 — Shape alignment. Right-align the shapes, padding the shorter one with ones on the left:

A: [   3, 4]   →   [1, 3, 4]
B: [2, 1, 4]   →   [2, 1, 4]

Step 2 — Dimension compatibility. For each aligned dimension pair (a, b):

If a == b: output dimension = a
If a == 1: output dimension = b (A is broadcast along this axis)
If b == 1: output dimension = a (B is broadcast along this axis)
Otherwise: throw BroadcastException(s"Cannot broadcast shapes ...")

Result shape above: [2, 3, 4].

Step 3 — Broadcast strides. For each input, compute effective strides in the output rank:

Prepend zeros for the dimensions that were padded (they had size 1 implicitly)
For a real dimension where the input had size 1 (broadcast), set stride to 0

A stride of 0 means "always read the same element along this axis" — no copy, no expansion.

A original:  shape=[3,4],   strides=[1,3]   (col-major, 2D)
A broadcast: shape=[2,3,4], strides=[0,1,3] (size-1 dimension padded at front → stride 0)

B original:  shape=[2,1,4], strides=[1,2,2] (col-major; middle dim has size 1)
B broadcast: shape=[2,3,4], strides=[1,0,2] (middle dim was 1 → stride 0)

Element (i₀, i₁, i₂) in A is read from A.offset + i₀·0 + i₁·1 + i₂·3.
Element (i₀, i₁, i₂) in B is read from B.offset + i₀·1 + i₁·0 + i₂·2.

Implementation utilities (to be placed in vecxt/src/broadcast.scala):

package vecxt

import vecxt.ndarray.NDArray

object broadcast:

  /** Compute the output shape for broadcasting two shapes. Throws BroadcastException on incompatibility. */
  def broadcastShape(a: Array[Int], b: Array[Int]): Array[Int] = ...

  /** Compute broadcast-extended strides for `arr` into `outShape`.
   *  Pads with 0 for prepended dimensions; sets 0 for original dimensions of size 1.
   */
  def broadcastStrides(arr: NDArray[?], outShape: Array[Int]): Array[Int] = ...

  /** True if two shapes are identical (no broadcasting needed). */
  def sameShape(a: Array[Int], b: Array[Int]): Boolean = ...

  extension [A](arr: NDArray[A])
    /** Return a zero-copy view of this NDArray broadcast to `targetShape`.
     *  Dimensions of size 1 are expanded via stride-0; prepended dimensions get stride 0.
     *  Throws BroadcastException if shapes are incompatible.
     */
    def broadcastTo(targetShape: Array[Int]): NDArray[A] = ...

  /** Broadcast both operands to their common shape. Convenience for explicit broadcasting.
   *  Returns (a’, b’) where both have shape == broadcastShape(a.shape, b.shape).
   */
  def broadcastPair[A](a: NDArray[A], b: NDArray[A]): (NDArray[A], NDArray[A]) =
    val outShape = broadcastShape(a.shape, b.shape)
    (a.broadcastTo(outShape), b.broadcastTo(outShape))

BroadcastException is a new exception type, alongside the existing InvalidNDArray.

N-dimensional iteration kernel

All general-case (non-fast-path) binary ops share one iteration function. This kernel handles
the case where both operands have the same shape but may have non-contiguous strides (e.g.
transposed or sliced views, or views created by broadcastTo).

The key insight is that iterating over a column-major output in linear order (0 to numel-1)
and decomposing the flat index back into per-dimension coordinates is straightforward:

flat index j in [0, numel):
  coord[0] = j % shape[0]
  coord[1] = (j / shape[0]) % shape[1]
  coord[2] = (j / (shape[0]*shape[1])) % shape[2]
  ...
  coord[k] = (j / cumulativeProduct(k)) % shape[k]

Where the cumulative products are precomputed once. This is O(ndim) per element, which is
acceptable for small ndim (typically 1–4). For very large ndim the overhead is still dominated
by the actual arithmetic.

Concrete kernel (cross-platform, lives in vecxt/src/ndarrayDoubleOps.scala):

private def binaryOpGeneral(
    a: NDArray[Double],
    b: NDArray[Double],
    outShape: Array[Int],
    aStrides: Array[Int],   // broadcast strides for a into outShape
    bStrides: Array[Int],   // broadcast strides for b into outShape
    f: (Double, Double) => Double
): NDArray[Double] =
  val n = shapeProduct(outShape)
  val out = new Array[Double](n)
  // Precompute cumulative products for coordinate decomposition
  val ndim = outShape.length
  val cumProd = new Array[Int](ndim)
  cumProd(0) = 1
  var d = 1
  while d < ndim do
    cumProd(d) = cumProd(d - 1) * outShape(d - 1)
    d += 1
  end while
  var j = 0
  while j < n do
    var posA = a.offset
    var posB = b.offset
    var k = 0
    while k < ndim do
      val coord = (j / cumProd(k)) % outShape(k)
      posA += coord * aStrides(k)
      posB += coord * bStrides(k)
      k += 1
    end while
    out(j) = f(a.data(posA), b.data(posB))
    j += 1
  end while
  new NDArray(out, outShape, colMajorStrides(outShape), 0)
end binaryOpGeneral

For the common 1D and 2D cases (where the inner coordinate loop is 1 or 2 iterations), the JVM JIT
typically unrolls this. For strictly performance-critical 2D convolution-style workloads, a
dedicated 2D specialisation can be added in M3 if benchmarks warrant it — but not by default.

File layout

File	Purpose
`vecxt/src/broadcast.scala`	Broadcasting utilities: `broadcastTo`, `broadcastPair`, `broadcastShape`, `broadcastStrides`, `sameShape`, `BroadcastException`
`vecxt/src/ndarrayDoubleOps.scala`	Extension methods on `NDArray[Double]`: all binary (same-shape), scalar, unary, in-place, and comparison ops. Contains both fast-path dispatch and slow-path (general kernel). No platform-specific code.
`vecxt/test/src/ndarrayElemWise.test.scala`	Cross-platform test suite for all ops defined in this milestone

No new platform-specific files are needed. The fast path delegates to existing vecxt.arrays.*
flat Array[Double] operations which are already SIMD (JVM) / while-loop (JS/Native).

API reference

All extension methods live in object NDArrayDoubleOps and are exported from all.scala.

The BoundsCheck context is not threaded through element-wise ops. Shape mismatches in
binary ops are always reported (they are programming errors, not performance-tunable assertions).
Broadcasting errors are reported by broadcastTo / broadcastPair, not by arithmetic ops.

Binary ops (element-wise, same-shape required)

extension (a: NDArray[Double])
  def +(b: NDArray[Double]): NDArray[Double]
  def -(b: NDArray[Double]): NDArray[Double]
  def *(b: NDArray[Double]): NDArray[Double]
  def /(b: NDArray[Double]): NDArray[Double]

Throws ShapeMismatchException if a.shape != b.shape. Return a new contiguous NDArray[Double].

To operate on differently-shaped arrays, broadcast explicitly first:

val (a2, b2) = NDArray.broadcastPair(a, b)
val c = a2 + b2

// or:
val c = a + b.broadcastTo(a.shape)

Scalar ops

extension (a: NDArray[Double])
  def +(s: Double): NDArray[Double]
  def -(s: Double): NDArray[Double]
  def *(s: Double): NDArray[Double]
  def /(s: Double): NDArray[Double]

extension (s: Double)
  def +(a: NDArray[Double]): NDArray[Double]
  def -(a: NDArray[Double]): NDArray[Double]
  def *(a: NDArray[Double]): NDArray[Double]
  def /(a: NDArray[Double]): NDArray[Double]

Scalar ops are a special case of array-vs-broadcast(scalar-as-0D). Implemented directly as a flat
loop over the output data for simplicity (a 0D NDArray would work but is unnecessarily indirect).

Unary ops

extension (a: NDArray[Double])
  def neg: NDArray[Double]          // element-wise negation
  def abs: NDArray[Double]          // element-wise |x|
  def exp: NDArray[Double]          // element-wise e^x
  def log: NDArray[Double]          // element-wise ln(x)
  def sqrt: NDArray[Double]         // element-wise √x
  def tanh: NDArray[Double]         // element-wise tanh(x)
  def sigmoid: NDArray[Double]      // element-wise 1 / (1 + e^{-x})

Unary ops are always free of broadcasting complexity. The fast path for contiguous arrays delegates
to Array[Double] scalar ops (e.g. arr.data.map(math.exp) is fine — or a while-loop equivalent).

For exp, log, tanh, and sigmoid in particular, no SIMD path exists yet in vecxt for
Array[Double]. M3 implements these as simple while-loops on the backing array. A future benchmark
milestone can add SVML/intrinsic alternatives if profiling shows them as bottlenecks.

In-place binary ops (mutating `a`)

extension (a: NDArray[Double])
  def +=(b: NDArray[Double]): Unit
  def -=(b: NDArray[Double]): Unit
  def *=(b: NDArray[Double]): Unit
  def /=(b: NDArray[Double]): Unit

Precondition: a must be contiguous. If a.isContiguous is false, throw
UnsupportedOperationException("In-place ops require a contiguous NDArray").

Shape requirement: a.shape must equal b.shape. No implicit broadcasting. To accumulate
a broadcast value in-place, broadcast explicitly first:

val bias = NDArray(Array(1.0, 2.0, 3.0), Array(3))       // shape [3]
val batch = NDArray.zeros[Double](Array(4, 3))            // shape [4, 3]
batch += bias.broadcastTo(Array(4, 3))                    // explicit

This prevents a common class of gradient-accumulation bugs in AD code where the user silently
broadcasts in the wrong direction.

In-place scalar ops

extension (a: NDArray[Double])
  def +=(s: Double): Unit
  def -=(s: Double): Unit
  def *=(s: Double): Unit
  def /=(s: Double): Unit

Requires a.isContiguous. Delegates to existing Array[Double] in-place ops in vecxt.arrays.

Comparison ops (return `NDArray[Boolean]`)

extension (a: NDArray[Double])
  def >(b: NDArray[Double]): NDArray[Boolean]
  def <(b: NDArray[Double]): NDArray[Boolean]
  def >=(b: NDArray[Double]): NDArray[Boolean]
  def <=(b: NDArray[Double]): NDArray[Boolean]
  def =:=(b: NDArray[Double]): NDArray[Boolean]   // element-wise equality (mirrors existing Array[Double] naming)
  def !:=(b: NDArray[Double]): NDArray[Boolean]   // element-wise inequality

Scalar variants:

extension (a: NDArray[Double])
  def >(s: Double): NDArray[Boolean]
  def <(s: Double): NDArray[Boolean]
  def >=(s: Double): NDArray[Boolean]
  def <=(s: Double): NDArray[Boolean]
  def =:=(s: Double): NDArray[Boolean]
  def !:=(s: Double): NDArray[Boolean]

Comparison ops always produce a fresh NDArray[Boolean] whose backing array is a Array[Boolean].
They follow the same fast-path / slow-path dispatch as arithmetic ops. Array-vs-array comparisons
require same shape (broadcast explicitly first if needed).

Dispatch logic (fast path vs slow path)

Since binary ops require same shape, the dispatch is simpler than the implicit-broadcast alternative:

a.shape == b.shape?  (else throw ShapeMismatchException)
  a.isContiguous && b.isContiguous?
    ├─ YES → fastPathFlat(a.data, b.data, op)   ← delegates to vecxt.arrays (SIMD on JVM)
    └─ NO  → generalKernel(a, b, a.shape, a.strides, b.strides, op)

Note: broadcast views created by broadcastTo have stride-0 dimensions, so they are NOT
contiguous. A broadcastTo + + combination always takes the general kernel path. This is
by design — the general kernel handles stride-0 correctly, and the explicit broadcast makes
the performance characteristic visible to the user.

fastPathFlat for + looks like:

// JVM: vecxt.arrays.+(a.data)(b.data) — uses DoubleVector SIMD
// JS/Native: JsNativeDoubleArrays equivalent — uses while loop
// Both paths are already implemented. This is a pure delegation.
private inline def fastPathFlat(aData: Array[Double], bData: Array[Double]): Array[Double] =
  vecxt.arrays.+(aData)(bData)  // exported by arrays.* in all.scala

For -, *, /, the same pattern applies. The existing vecxt.arrays object already has
-, *, / extension methods on Array[Double] with JVM SIMD implementations.

Implementation sketch for `+`

// vecxt/src/ndarrayDoubleOps.scala

package vecxt

import vecxt.ndarray.*
import vecxt.broadcast.*

object NDArrayDoubleOps:

  extension (a: NDArray[Double])

    def +(b: NDArray[Double]): NDArray[Double] =
      if !sameShape(a.shape, b.shape) then
        throw ShapeMismatchException(
          s"Binary op requires same shape: [${a.shape.mkString(",")}] vs [${b.shape.mkString(",")}]. " +
          s"Use broadcastTo or NDArray.broadcastPair to align shapes first."
        )
      if a.isContiguous && b.isContiguous then
        val rawOut = vecxt.arrays.+(a.data)(b.data)  // SIMD on JVM, while-loop on JS/Native
        new NDArray(rawOut, a.shape.clone(), colMajorStrides(a.shape), 0)
      else
        binaryOpGeneral(a, b, a.shape, a.strides, b.strides, _ + _)

    def +=(b: NDArray[Double]): Unit =
      if !a.isContiguous then
        throw UnsupportedOperationException("In-place ops require a contiguous NDArray")
      if !sameShape(a.shape, b.shape) then
        throw ShapeMismatchException(
          s"In-place op requires same shape: [${a.shape.mkString(",")}] vs [${b.shape.mkString(",")}]. " +
          s"Use broadcastTo to align shapes first."
        )
      if b.isContiguous then
        vecxt.arrays.+=(a.data)(b.data)   // existing in-place SIMD op
      else
        // general in-place: iterate output coordinates, add into a.data
        ...
    end +=

Note the use of new NDArray(...) directly (bypassing bounds check) in the fast path —
we have already validated via sameShape that the input arrays are consistent.

Usage with broadcasting:

// Explicit broadcast before arithmetic
val bias = NDArray(Array(1.0, 2.0, 3.0), Array(3))     // shape [3]
val batch = NDArray.zeros[Double](Array(4, 3))          // shape [4, 3]

// Option 1: broadcastPair
val (a2, b2) = NDArray.broadcastPair(batch, bias)       // both shape [4, 3]
val result = a2 + b2

// Option 2: broadcastTo
val result = batch + bias.broadcastTo(Array(4, 3))

// Option 3: in-place with explicit broadcast
batch += bias.broadcastTo(Array(4, 3))

Scalar op implementation sketch

Scalar ops don't need broadcasting infrastructure — they operate on the flat backing array directly
when contiguous, or iterate manually when non-contiguous:

def +(s: Double): NDArray[Double] =
  if a.isContiguous then
    val rawOut = vecxt.arrays.+(a.data)(s)     // existing scalar-add from vecxt.arrays
    new NDArray(rawOut, a.shape.clone(), colMajorStrides(a.shape), 0)
  else
    val out = new Array[Double](a.numel)
    val outStrides = colMajorStrides(a.shape)
    // iterate coordinates, copy + add
    ...
    new NDArray(out, a.shape.clone(), outStrides, 0)

Unary op implementation sketch

def exp: NDArray[Double] =
  val out = new Array[Double](a.numel)
  if a.isContiguous then
    var i = 0
    while i < a.numel do
      out(i) = math.exp(a.data(a.offset + i))
      i += 1
    end while
  else
    // general coordinate iteration
    ...
  end if
  new NDArray(out, a.shape.clone(), colMajorStrides(a.shape), 0)
end exp

sigmoid is implemented as 1.0 / (1.0 + math.exp(-x)) — no special-casing required.

Open question: `arrays.+` signature

The existing vecxt.arrays extension on Array[Double] uses curried application:

// src-jvm/arrays.scala
extension (vec: Array[Double])
  inline def +(other: Array[Double]): Array[Double] = ...

This is directly usable as a.data + b.data or vecxt.arrays.+(a.data)(b.data). The NDArray
fast path should use whichever spelling compiles cleanly once both arrays.* and NDArrayDoubleOps
are in scope. If ambiguity arises, qualify explicitly: vecxt.arrays.+(a.data)(b.data).

Test suite outline

File: vecxt/test/src/ndarrayElemWise.test.scala

All tests must pass on JVM, JS, and Native. Use assertEqualsDouble(actual, expected, delta) from
munit for floating-point comparisons. The tolerance is 1e-9 unless otherwise noted.

Binary ops — same-shape contiguous (fast path exercise)

test("1D + 1D element-wise") {
  a = [1.0, 2.0, 3.0]  (shape [3])
  b = [4.0, 5.0, 6.0]  (shape [3])
  a + b == [5.0, 7.0, 9.0]
}

test("2D + 2D element-wise, col-major") {
  a = [[1.0, 3.0], [2.0, 4.0]]  (shape [2,2], col-major data = [1,2,3,4])
  b = [[5.0, 7.0], [6.0, 8.0]]  (shape [2,2], col-major data = [5,6,7,8])
  a + b == [[6,10],[8,12]]
}

test("3D + 3D element-wise") {
  a = NDArray.fill(Array(2,3,4), 1.0)
  b = NDArray.fill(Array(2,3,4), 2.0)
  (a + b).data.forall(_ == 3.0)
}

test("1D - 1D") { [5,3,1] - [1,2,3] == [4,1,-2] }
test("1D * 1D") { [2,3,4] * [5,6,7] == [10,18,28] }
test("1D / 1D") { [6,4,2] / [2,1,1] == [3,4,2] }

Scalar ops

test("scalar add: ndarray + scalar") { NDArray([1,2,3]) + 10.0 == [11,12,13] }
test("scalar sub: ndarray - scalar") { NDArray([5,4,3]) - 1.0 == [4,3,2] }
test("scalar mul: scalar * ndarray") { 3.0 * NDArray([1,2,3]) == [3,6,9] }
test("scalar div: ndarray / scalar") { NDArray([6,4,2]) / 2.0 == [3,2,1] }
test("scalar div: scalar / ndarray") { 12.0 / NDArray([2,3,4]) == [6,4,3] }

Broadcasting (explicit `broadcastTo` / `broadcastPair`)

test("broadcastTo: [3] to [2,3]") {
  val a = NDArray(Array(1.0, 2.0, 3.0), Array(3))
  val b = a.broadcastTo(Array(2, 3))
  assertEquals(b.shape.toSeq, Seq(2, 3))
  // row 0 and row 1 both read from same data via stride 0
  assertEquals(b(0, 0), 1.0)
  assertEquals(b(1, 0), 1.0)
  assertEquals(b(0, 2), 3.0)
  assertEquals(b(1, 2), 3.0)
}

test("broadcastTo: [2,1] to [2,3]") {
  val a = NDArray(Array(10.0, 20.0), Array(2, 1))
  val b = a.broadcastTo(Array(2, 3))
  assertEquals(b.shape.toSeq, Seq(2, 3))
  assertEquals(b(0, 0), 10.0)
  assertEquals(b(0, 1), 10.0)
  assertEquals(b(0, 2), 10.0)
  assertEquals(b(1, 0), 20.0)
}

test("broadcastTo is zero-copy") {
  val a = NDArray(Array(1.0, 2.0, 3.0), Array(3))
  val b = a.broadcastTo(Array(4, 3))
  assert(a.data eq b.data)  // same backing array
}

test("broadcastPair: [2,1] + [1,3] → both [2,3]") {
  val a = NDArray(Array(1.0, 2.0), Array(2, 1))
  val b = NDArray(Array(10.0, 20.0, 30.0), Array(1, 3))
  val (a2, b2) = NDArray.broadcastPair(a, b)
  assertEquals(a2.shape.toSeq, Seq(2, 3))
  assertEquals(b2.shape.toSeq, Seq(2, 3))
  val result = a2 + b2
  assertEquals(result(0, 0), 11.0)
  assertEquals(result(1, 0), 12.0)
  assertEquals(result(0, 1), 21.0)
  assertEquals(result(1, 1), 22.0)
  assertEquals(result(0, 2), 31.0)
  assertEquals(result(1, 2), 32.0)
}

test("broadcastTo incompatible throws BroadcastException") {
  val a = NDArray.zeros[Double](Array(2, 3))
  intercept[BroadcastException] {
    a.broadcastTo(Array(2, 4))
  }
}

test("broadcastPair incompatible throws BroadcastException") {
  intercept[BroadcastException] {
    NDArray.broadcastPair(
      NDArray.zeros[Double](Array(2, 3)),
      NDArray.zeros[Double](Array(2, 4))
    )
  }
}

test("broadcast + add: [3] + [2,3] via broadcastTo") {
  val bias = NDArray(Array(1.0, 2.0, 3.0), Array(3))
  val batch = NDArray.fill(Array(2, 3), 10.0)
  val result = batch + bias.broadcastTo(Array(2, 3))
  assertEquals(result(0, 0), 11.0)
  assertEquals(result(1, 0), 11.0)
  assertEquals(result(0, 2), 13.0)
}

test("mismatched shapes without broadcastTo throws ShapeMismatchException") {
  val a = NDArray.zeros[Double](Array(2, 3))
  val b = NDArray.zeros[Double](Array(3))
  intercept[ShapeMismatchException] { a + b }
}

test("broadcast 3D: [2,1,4] broadcastTo [2,3,4]") {
  val a = NDArray.fill(Array(2, 1, 4), 5.0)
  val b = a.broadcastTo(Array(2, 3, 4))
  assertEquals(b.shape.toSeq, Seq(2, 3, 4))
  // All elements should be 5.0 since the original was filled with 5.0
  for i <- 0 until 2; j <- 0 until 3; k <- 0 until 4 do
    assertEquals(b(i, j, k), 5.0)
}

Non-contiguous arrays (slow path exercise)

The slow path is exercised by operating on transposed or sliced views from M2:

test("add on transposed 2D NDArray") {
  a = NDArray([[1,3],[2,4]], shape=[2,2])  // col-major
  t = a.transpose                          // strides permuted, not contiguous
  b = NDArray([[10,10],[10,10]], shape=[2,2])
  // t + b should equal the transposed result
  assertEquals(t.isContiguous, false)
  result = t + b
  assertEquals(result(0,0), 11.0)
  assertEquals(result(1,0), 12.0)  // was row 0 col 1 of a = 3, +10 = 13? verify manually
}

test("add on sliced 1D view") {
  raw = NDArray([0,1,2,3,4,5,6,7,8,9], shape=[10])
  view = raw.slice(0, 2 until 7)   // elements [2,3,4,5,6], stride=1, offset=2
  b = NDArray([10,10,10,10,10], shape=[5])
  result = view + b
  assertEquals(result.shape.toSeq, Seq(5))
  assertEquals(result.data.toSeq, Seq(12.0, 13.0, 14.0, 15.0, 16.0))
}

Unary ops

test("neg") { NDArray([-1, 0, 1]).neg == [1, 0, -1] }
test("abs") { NDArray([-3, -1, 0, 2]).abs == [3, 1, 0, 2] }
test("sqrt") { NDArray([0, 1, 4, 9]).sqrt == [0, 1, 2, 3] }
test("exp")  { NDArray([0.0]).exp(0) ≈ 1.0 (tolerance 1e-12) }
test("log")  { NDArray([1.0, math.E]).log == [0.0, 1.0] (tolerance 1e-12) }
test("tanh") { NDArray([0.0]).tanh(0) == 0.0; NDArray([100.0]).tanh(0) ≈ 1.0 }
test("sigmoid") {
  NDArray([0.0]).sigmoid(0) ≈ 0.5
  NDArray([100.0]).sigmoid(0) ≈ 1.0
  NDArray([-100.0]).sigmoid(0) ≈ 0.0
}

test("exp of known values") {
  val a = NDArray(Array(0.0, 1.0, 2.0), Array(3))
  val r = a.exp
  assertEqualsDouble(r(0), 1.0, 1e-12)
  assertEqualsDouble(r(1), math.E, 1e-12)
  assertEqualsDouble(r(2), math.E * math.E, 1e-8)
}

test("log inverse of exp roundtrip") {
  val a = NDArray(Array(0.5, 1.0, 2.0, 10.0), Array(4))
  val roundTrip = a.exp.log
  for i <- 0 until 4 do assertEqualsDouble(roundTrip(i), a(i), 1e-10)
}

In-place ops

test("in-place += same shape") {
  a = NDArray([1.0, 2.0, 3.0])
  b = NDArray([10.0, 20.0, 30.0])
  val dataRef = a.data
  a += b
  a == [11.0, 22.0, 33.0]
  assert(a.data eq dataRef)  // verify mutation, same backing array
}

test("in-place += with broadcast view") {
  // Explicit broadcast before in-place add
  val a = NDArray.fill(Array(2, 3), 1.0)
  val bias = NDArray(Array(10.0, 20.0, 30.0), Array(1, 3))
  a += bias.broadcastTo(Array(2, 3))
  // row 0: [11, 21, 31], row 1: [11, 21, 31]
  assertEquals(a(0, 0), 11.0)
  assertEquals(a(1, 0), 11.0)
  assertEquals(a(0, 2), 31.0)
}

test("in-place += mismatched shape throws ShapeMismatchException") {
  val a = NDArray.zeros[Double](Array(2, 3))
  val b = NDArray.zeros[Double](Array(3))
  intercept[ShapeMismatchException] { a += b }
}

test("in-place += on non-contiguous throws") {
  val raw = NDArray.zeros[Double](Array(4, 4))
  val view = raw.transpose        // non-contiguous
  intercept[UnsupportedOperationException] { view += raw }
}

test("in-place *= scalar") {
  a = NDArray([1.0, 2.0, 3.0])
  a *= 3.0
  a == [3.0, 6.0, 9.0]
}

test("in-place does not modify b") {
  val a = NDArray(Array(1.0, 2.0, 3.0), Array(3))
  val b = NDArray(Array(10.0, 20.0, 30.0), Array(3))
  val bCopy = b.data.clone()
  a += b
  assertEquals(b.data.toSeq, bCopy.toSeq)
}

Comparison ops

test("> scalar") {
  NDArray([1.0, 5.0, 3.0]) > 2.0 == NDArray[Boolean]([false, true, true])
}

test("< element-wise") {
  NDArray([1.0, 2.0, 3.0]) < NDArray([3.0, 2.0, 1.0]) == [true, false, false]
}

test(">= with explicit broadcastTo") {
  val a = NDArray(Array(1.0, 2.0, 3.0), Array(3))
  val b = NDArray(Array(2.0), Array(1))
  val result = a >= b.broadcastTo(Array(3))
  assertEquals(result.data.toSeq, Seq(false, true, true))
  assertEquals(result.shape.toSeq, Seq(3))
}

test(">= mismatched shapes throws ShapeMismatchException") {
  val a = NDArray(Array(1.0, 2.0, 3.0), Array(3))
  val b = NDArray(Array(2.0), Array(1))
  intercept[ShapeMismatchException] { a >= b }
}

test("=:= element-wise equality") {
  NDArray([1.0, 2.0, 3.0]) =:= NDArray([1.0, 0.0, 3.0]) == [true, false, true]
}

test("!:= element-wise inequality") {
  NDArray([1.0, 2.0]) !:= NDArray([1.0, 3.0]) == [false, true]
}

Numerical correctness spot-checks

These tests verify the fast path and slow path give the same result (regression guard against
accidentally diverging implementations):

test("fast path == slow path: 1D add") {
  val a = NDArray(Array(1.0, 2.0, 3.0), Array(3))  // contiguous
  val b = NDArray(Array(4.0, 5.0, 6.0), Array(3))  // contiguous
  // Force slow path by constructing non-contiguous version
  val aNC = nonContiguousViewOf(a)  // test helper: wrap in stride-2 view
  val bNC = nonContiguousViewOf(b)
  val fastResult = a + b
  val slowResult = aNC + bNC
  // Both should give same logical result (element-wise [5,7,9])
  for i <- 0 until 3 do assertEqualsDouble(fastResult(i), slowResult(i), 0.0)
}

A nonContiguousViewOf test helper creates a view with non-default strides but same logical
content (e.g. by creating a 2D array and taking a transposed view back to 1D-logical).

Correctness invariants (for all binary ops)

For any op ∈ {+, -, *, /} and same-shape a: NDArray[Double], b: NDArray[Double]:

a.shape == b.shape is required; mismatched shapes throw ShapeMismatchException
(a op b).shape == a.shape (result has same shape as operands)
(a op b)(i) == a(i) op b(i) for all valid multi-indices i
a op b does not modify a.data or b.data
The result is always contiguous (col-major strides, offset 0)
a + b == b + a (commutativity of + and *)
(a + b) + c == a + (b + c) within floating-point tolerance

For broadcasting:
8. broadcastTo returns a zero-copy view (shared data array)
9. broadcastTo(targetShape) throws BroadcastException if incompatible
10. broadcastPair(a, b) returns views with identical shape

For in-place ops:
11. After a += b, a.data is the same array reference as before (no reallocation)
12. After a += b, b.data is unchanged
13. a += b requires a.shape == b.shape (broadcast first if needed)

Checklist

Agent-Logs-Url: https://github.com/Quafadas/vecxt/sessions/297d8ade-9e29-44b5-8626-4d7f180ca09b Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>

Copilot · 2026-04-07T16:44:23Z

@copilot I've pushed an update to the design doc which changes the approach to broadcasting. We want it to be explicit rather than implicit. Please check that diff. Once done, implement milestone 3 i...

Done in 4be1e34. Milestone 3 implemented:

vecxt/src/broadcast.scala — broadcastTo (inline, zero-copy stride-0 view), broadcastPair, broadcastShape, broadcastStrides, sameShape, BroadcastException, ShapeMismatchException. Broadcasting is explicit per the updated design doc.
vecxt/src/ndarrayDoubleOps.scala — element-wise +/-/*/÷ (array and scalar), unary neg/abs/exp/log/sqrt/tanh/sigmoid, in-place +=/−=/∗=/÷= (array and scalar), comparison >/</≥/≤/=:=/!:= (array and scalar), left-scalar ops. Fast path for col-major pairs (flat while loops); general stride-kernel for non-col-major/broadcast views.
vecxt/test/src/ndarrayElemWise.test.scala — 43 tests covering all ops, fast and general paths, broadcasting, error cases.
vecxt/src/all.scala — added export vecxt.NDArrayDoubleOps.* and export vecxt.broadcast.*.

All 140 JVM tests pass.

Agent-Logs-Url: https://github.com/Quafadas/vecxt/sessions/297d8ade-9e29-44b5-8626-4d7f180ca09b Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>

Initial plan

fad6cd4

Copilot AI assigned Copilot and Quafadas Apr 7, 2026

Copilot started work on behalf of Quafadas April 7, 2026 07:40 View session

Copilot AI linked an issue Apr 7, 2026 that may be closed by this pull request

NDArray phase 1 #69

Closed

Implement NDArray phase 1: core type, factories, checks, and tests

6c5849a

Agent-Logs-Url: https://github.com/Quafadas/vecxt/sessions/7d847bd1-604b-4b29-a265-f3cb0abf602d Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Implement NDArray core type and factories~~ NDArray phase 1: core type + factories Apr 7, 2026

Copilot AI requested a review from Quafadas April 7, 2026 07:54

Copilot finished work on behalf of Quafadas April 7, 2026 07:54

Merge branch 'main' into copilot/ndarray-core-type-factories

1d57783

Copilot started work on behalf of Quafadas April 7, 2026 07:59 View session

Remove @specialized from NDArray — Array[A] is already unboxed for pr…

bc60626

…imitives Agent-Logs-Url: https://github.com/Quafadas/vecxt/sessions/9b13c6c2-31a2-45c9-a1e2-aa70b9a9f636 Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>

Copilot finished work on behalf of Quafadas April 7, 2026 08:08

Copilot started work on behalf of Quafadas April 7, 2026 08:31 View session

Quafadas added 2 commits April 7, 2026 10:32

.

899aa89

Merge remote-tracking branch 'refs/remotes/origin/copilot/ndarray-cor…

6212415

…e-type-factories' into copilot/ndarray-core-type-factories

Copilot finished work on behalf of Quafadas April 7, 2026 08:45

Copilot started work on behalf of Quafadas April 7, 2026 08:46 View session

Copilot finished work on behalf of Quafadas April 7, 2026 09:45

Copilot started work on behalf of Quafadas April 7, 2026 10:16 View session

feat: NDArray M2 - indexing, views, reshape, squeeze/unsqueeze, flatt…

e17a842

…en, toArray Agent-Logs-Url: https://github.com/Quafadas/vecxt/sessions/12022e30-8b23-45cf-9d8f-c6ccf26801ca Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>

Copilot AI changed the title ~~NDArray phase 1: core type + factories~~ NDArray phases 1 & 2: core type, factories, indexing + views Apr 7, 2026

Copilot finished work on behalf of Quafadas April 7, 2026 10:56

.

3f115c6

Copilot started work on behalf of Quafadas April 7, 2026 15:54 View session

M3: broadcast.scala, ndarrayDoubleOps.scala, tests, all.scala exports

4be1e34

Agent-Logs-Url: https://github.com/Quafadas/vecxt/sessions/297d8ade-9e29-44b5-8626-4d7f180ca09b Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>

M3: add col-major fast path to in-place /= operator

8f89d80

Agent-Logs-Url: https://github.com/Quafadas/vecxt/sessions/297d8ade-9e29-44b5-8626-4d7f180ca09b Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>

Copilot AI changed the title ~~NDArray phases 1 & 2: core type, factories, indexing + views~~ NDArray phases 1, 2 & 3: core type, factories, indexing + views, element-wise ops Apr 7, 2026

Copilot finished work on behalf of Quafadas April 7, 2026 16:47

Quafadas added 3 commits April 8, 2026 08:36

.

bf05968

.

9f5ddfd

.

6a93d5a

Quafadas marked this pull request as ready for review April 8, 2026 06:54

Quafadas merged commit 2b17b9a into main Apr 8, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NDArray phases 1, 2 & 3: core type, factories, indexing + views, element-wise ops#70

NDArray phases 1, 2 & 3: core type, factories, indexing + views, element-wise ops#70
Quafadas merged 13 commits intomainfrom
copilot/ndarray-core-type-factories

Copilot AI commented Apr 7, 2026 •

edited

Loading

Uh oh!

Quafadas commented Apr 7, 2026

Uh oh!

Copilot AI commented Apr 7, 2026

Uh oh!

Quafadas commented Apr 7, 2026

Uh oh!

Copilot AI commented Apr 7, 2026

Uh oh!

Quafadas commented Apr 7, 2026

Uh oh!

Quafadas commented Apr 7, 2026

Uh oh!

Copilot AI commented Apr 7, 2026

Uh oh!

Quafadas commented Apr 7, 2026

Uh oh!

Copilot AI commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Copilot AI commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New files

Modified

Key invariants

Usage

Uh oh!

Quafadas commented Apr 7, 2026

Uh oh!

Copilot AI commented Apr 7, 2026

Uh oh!

Quafadas commented Apr 7, 2026

Uh oh!

Copilot AI commented Apr 7, 2026

Milestone 2: Indexing + Views — Detailed Plan

Files

1. Element read — apply

2. Element write — update

3. Slice / view — slice

4. Transpose — transpose / T

5. Reshape

6. Squeeze / Unsqueeze

7. Flatten

8. toArray — materialise to contiguous Array[A]

9. New bounds check

10. Tests

Key invariants throughout

Uh oh!

Quafadas commented Apr 7, 2026

Uh oh!

Quafadas commented Apr 7, 2026

Uh oh!

Copilot AI commented Apr 7, 2026

Uh oh!

Quafadas commented Apr 7, 2026

Milestone 3 — Detailed Design: Element-wise Operations (Double)

Overview

Design principles for this milestone

Broadcasting algorithm

N-dimensional iteration kernel

File layout

API reference

Binary ops (element-wise, same-shape required)

Scalar ops

Unary ops

In-place binary ops (mutating a)

In-place scalar ops

Comparison ops (return NDArray[Boolean])

Dispatch logic (fast path vs slow path)

Implementation sketch for +

Scalar op implementation sketch

Unary op implementation sketch

Open question: arrays.+ signature

Test suite outline

Binary ops — same-shape contiguous (fast path exercise)

Scalar ops

Broadcasting (explicit broadcastTo / broadcastPair)

Non-contiguous arrays (slow path exercise)

Unary ops

In-place ops

Comparison ops

Numerical correctness spot-checks

Correctness invariants (for all binary ops)

Checklist

Uh oh!

Copilot AI commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 7, 2026 •

edited

Loading

1. Element read — `apply`

2. Element write — `update`

3. Slice / view — `slice`

4. Transpose — `transpose` / `T`

8. `toArray` — materialise to contiguous `Array[A]`

In-place binary ops (mutating `a`)

Comparison ops (return `NDArray[Boolean]`)

Implementation sketch for `+`

Open question: `arrays.+` signature

Broadcasting (explicit `broadcastTo` / `broadcastPair`)