refactor(deps): drop ndarray-stats, use AdaWorldAPI ndarray fork directly#10
Conversation
…es.io)
Per maintainer: 'ndarray is NEVER crates.io / always our fork at
adaworldapi'. The workspace's `ndarray = "0.17.2"` was resolving
to the crates.io 0.17.2 release (Cargo.lock had THREE ndarray
entries: 0.16.1 transitive, 0.17.2 crates.io, and 0.17.2 path-dep
via the kv-lance ndarray-hpc rename).
Changes:
1. workspace Cargo.toml:
- ndarray = { git = 'https://github.com/AdaWorldAPI/ndarray.git',
default-features = false,
features = ['std', 'hpc-extras'] }
- hpc-extras pulls in p64 + fractal + blake3 (transitive); needed
for the hpc/* modules used by vector-hpc.
2. surrealdb/core/Cargo.toml:
- Drops the `ndarray-hpc = { package = 'ndarray', path = ... }`
rename. The workspace ndarray IS the fork now; no rename needed.
- vector-hpc feature is now a pure cfg flip (no extra dep), since
ndarray is already on every build.
3. surrealdb/core/src/idx/trees/vector.rs and
surrealdb/core/benches/vector_distance.rs:
- `ndarray_hpc::hpc::heel_f64x8::cosine_f64_simd` →
`ndarray::hpc::heel_f64x8::cosine_f64_simd`
- `ndarray_hpc::simd::F64x8` → `ndarray::simd::F64x8`
- All 17 import + call sites swapped via sed.
This makes the dependency graph honest: the fork is mandatory,
versioned via git ref (Cargo.lock pins the SHA), and the same
`ndarray::Array1` used by the existing scalar paths is the same
crate that hosts `ndarray::hpc::*` for SIMD.
The prior commit broke the build because ndarray-stats's DeviationExt
(l1_dist / l2_dist) is impl'd against crates.io's ndarray. Replacing
the workspace ndarray with a git dep made ndarray-stats see crates.io
ndarray while surrealdb-core saw our fork — diamond dep, type
mismatch, l2_dist not found.
Fix: revert the workspace ndarray declaration to the simpler
`ndarray = { version = '0.17.2', default-features = false,
features = ['std', 'hpc-extras'] }` syntax and add a
[patch.crates-io] entry at the workspace root pointing crates.io's
ndarray to the AdaWorldAPI fork:
[patch.crates-io]
ndarray = { git = 'https://github.com/AdaWorldAPI/ndarray.git' }
This redirects EVERY dep in the graph (ndarray-stats and any other
transitive consumer) to the fork. Cargo resolves them to the same
crate instance, so DeviationExt impls apply uniformly.
The fork's hpc/* and simd/* modules remain reachable as
ndarray::hpc::* and ndarray::simd::*.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b771a38b41
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| mimalloc = "0.1.48" | ||
| native-tls = "0.2.14" | ||
| ndarray = "0.17.2" | ||
| ndarray = { version = "0.17.2", default-features = false, features = ["std", "hpc-extras"] } |
There was a problem hiding this comment.
Keep published ndarray dependencies resolvable
When surrealdb-core or surrealml-core is built as a published crate or as a dependency outside this workspace, the root [patch.crates-io] is not applied transitively, so this line asks the normal crates.io ndarray 0.17.2 for the fork-only hpc-extras feature. Those consumers fail dependency resolution before vector-hpc is even involved; keep fork-only features out of the published workspace dependency, or make the fork requirement explicit in a way consumers can resolve.
Useful? React with 👍 / 👎.
Cleaner alternative to the prior commit's [patch.crates-io] hammer.
The diamond dep was: surrealdb-core wanted the AdaWorldAPI ndarray
fork, ndarray-stats wanted crates.io ndarray 0.17 (different crate
to Cargo). The DeviationExt trait was impl'd against the crates.io
type → l1_dist/l2_dist/linf_dist not found on the fork's Array1.
Two fixes:
- patch.crates-io: route ALL ndarray to the fork transitively.
- THIS PR: drop ndarray-stats entirely. surrealdb-core only used 3
of its methods; inline them.
What changed:
1. Cargo.toml (workspace):
- ndarray = direct git dep on AdaWorldAPI/ndarray.git
- ndarray-stats workspace pin REMOVED
- [patch.crates-io] block REMOVED
2. surrealdb/core/Cargo.toml:
- ndarray-stats.workspace dropped
- vector-hpc = [] (was ["dep:ndarray-hpc"]) — workspace ndarray
IS the fork now
3. surrealdb/core/src/idx/trees/vector.rs:
- drop use ndarray_stats::DeviationExt
- add 3 inline helpers: inline_l1_dist / inline_l2_dist /
inline_linf_dist — generic over T: ToFloat + Copy, do
Zip::from(a).and(b).fold(...) over to_float() differences,
return f64::INFINITY on length mismatch
- replace ~20 call sites: a.l[12]/linf_dist(b).unwrap_or(...) →
Self::inline_l*_dist(a, b)
- F32/I64/I32 sites that previously did .map(|r| r as f64)
collapse because ToFloat::to_float already returns f64
Net effect: no diamond, no patch, fewer crates in the graph,
identical semantics on the Vector::distance() API surface.
|
Local verification post-patch-removal Confirmed clean compile after dropping the Inline distance helpers ( Generated by Claude Code |
…line
Before this change Cargo.lock listed THREE ndarray entries:
- 0.16.1 crates.io ← via lance-index 4.0 (unavoidable: major mismatch)
- 0.17.2 crates.io ← via ort (gated by optional 'ml' feature)
- 0.17.2 git/fork ← surrealdb-core's workspace dep
The 0.17.2 crates.io entry means an --features ml build would link
TWO distinct ndarray crates with distinct TypeIds — our fork (used
by surrealdb-core for SIMD distance kernels and Array1 everywhere)
and crates.io's 0.17.2 (used by ort, the ONNX Runtime binding).
Values couldn't flow between them without a conversion.
The narrow patch at the workspace root forces ANY transitive
consumer asking for ndarray 0.17.x from crates.io to resolve to
the AdaWorldAPI fork instead:
[patch.crates-io]
ndarray = { git = 'https://github.com/AdaWorldAPI/ndarray.git' }
After regenerating Cargo.lock, the entries collapse to:
- 0.16.1 crates.io ← lance-index (still there; separate major version)
- 0.17.2 git/fork ← everything else, including ort transitively
Caveat: lance-index 4.0 pins ndarray = '0.16', a different major
version than our fork (0.17.2). Cargo can patch 0.17 → 0.17 but
cannot redirect 0.16 → 0.17 (semver violation). Eliminating the
last crates.io ndarray would require either lance-index bumping to
ndarray 0.17 upstream, or building a ndarray-0.16-compat shim
inside the AdaWorldAPI fork — both out of scope for this PR.
Why narrow patch vs. the broader one we had before PR #10:
The PR #10 patch came with ndarray-stats in the graph, which created
a real diamond dep that needed structural resolution (we dropped
ndarray-stats and inlined l1/l2/linf in surrealdb-core). With that
diamond gone, this narrow patch is just an idempotent redirect —
no compatibility risk, no API surface to maintain. It's a pure
'everything-0.17 collapses to the fork' rule.
Summary
Wire
surrealdb-coreto use the AdaWorldAPIndarrayfork directly (no crates.io fallback, no patch hammer). The clean path: drop the only transitive consumer that produced a diamond dep (ndarray-stats) by inlining the threeDeviationExtmethods we used (l1_dist,l2_dist,linf_dist).What changed
Cargo.toml(workspace)ndarrayis now a direct git dep pointing athttps://github.com/AdaWorldAPI/ndarray.git, withdefault-features = false, features = ["std", "hpc-extras"].ndarray-statsworkspace pin removed.[patch.crates-io]block — not needed once the diamond dep is gone.surrealdb/core/Cargo.tomlndarray-stats.workspace = true.ndarray-hpc = { package = "ndarray", path = ... }rename —ndarray.workspaceIS the fork now, so the rename is redundant.vector-hpcfeature is now a pure cfg flip (vector-hpc = []) since the fork is on every build.surrealdb/core/src/idx/trees/vector.rsuse ndarray_stats::DeviationExt;.impl Vectorhelpers —inline_l1_dist,inline_l2_dist,inline_linf_dist— generic overT: ToFloat + Copy. Each does the obviousZip::from(a).and(b).fold(...)overto_float()differences, returningf64::INFINITYon length mismatch (matches the prior.unwrap_or(f64::INFINITY)pattern).a.l1_dist(b),a.l2_dist(b),a.linf_dist(b)call sites swapped toSelf::inline_l*_dist(a, b). F32/I64/I32 sites that previously did.map(|r| r as f64)collapse cleanly becauseToFloat::to_floatalready returnsf64.surrealdb/core/benches/vector_distance.rsndarray_hpc::→ndarray::(rename drop).Why this is cleaner than
[patch.crates-io]The diamond was:
ndarray-stats'sDeviationExtis impl'd against the crates.ioArray1<T>type, not the fork's. With two source-distinctndarraycrates in the lockfile, the trait didn't apply to our types — link-time mismatch.Two ways to fix:
[patch.crates-io]: redirect crates.io'sndarrayto the fork everywhere transitively. Works, but it's a workspace-wide hammer that applies to every crate that asks forndarray = "0.17"for any reason.ndarray-stats(this PR): surrealdb-core only used 3 methods from it (l1_dist,l2_dist,linf_dist); inlining them is ~25 LOC. No diamond, no patch, no transitive-source-mismatch surface to maintain.Option 2 is what this PR does.
Verification
The kv-lance integration test suite + Sprint U concurrent property tests are unchanged in scope; behavior preserved.
Earlier signal (carries over from the original PR)
While the patch approach was being prepared,
SURREAL_TEST_KV=lance cargo test --test selectran clean:selectA wider batch will be aggregated under
SURREAL_TEST_KV=lancein a follow-up sprint.Test plan
--features vector-hpcbuilds (nondarray-statsin the graph).ndarrayentry, sourced from the AdaWorldAPI git fork.[patch.crates-io]anywhere in the workspace.