Supersedes #27: HNSW correctness + supply-chain CI (merge resolved against current main)#30
Conversation
…ed pruning + storage rebuild Three remaining root causes from issue ruvnet#430, plus the storage-rebuild gap from PR ruvnet#460. Bug B — insert beam was clamped to ef_construction.min(m * 2). With defaults (m=16, ef_construction=200) the beam silently became 32. Late- inserted clusters got wired through whatever was near the entry point instead of through ef_construction-wide neighbour search. Bug C — adjacency-list pruning used `drain(0..drain_count)`, dropping the OLDEST edges regardless of distance. Proper HNSW pruning keeps the m CLOSEST edges. Now sort by `calculate_distance` to the anchor vector and truncate to m. Kept a fallback that preserves the newest-m behaviour when the anchor vector lookup fails so we never panic on a missing vector. Storage — VectorDB::new() always created a fresh empty HnswIndex, so previously persisted vectors were invisible to search after reopening the database. Now rebuild via storage.get_all_ids() + index.insert_batch() on open, and seed VectorDbStats.total_vectors with the recovered count. Tests: - test_pruning_keeps_closest_not_newest: builds a hub with 20 close neighbours then 6 far neighbours, asserts no "far_*" id appears in top-10 around the hub. Fails on FIFO pruning. - test_index_rebuilt_from_storage_on_open: writes 5 vectors via one VectorDB instance, reopens against the same path, asserts search returns the persisted match. Fails on the historical empty-index bug. Regression-guard CI additions: - hnsw-insert-beam-no-m2-clamp: textually forbids the ef_construction.min(m*2) pattern in index.rs. - hnsw-distance-based-neighbor-pruning: requires calculate_distance and the `> m * 2` overflow gate to both live in index.rs. - vector-db-rebuilds-index-on-open: requires storage.get_all_ids() in vector_db.rs. - hnsw-recall-at-1 job now also runs the two new tests. Supersedes PR ruvnet#460 (CoolDude1969) which covered storage rebuild + an overlapping heap fix already in main from PR ruvnet#466. Closes ruvnet#430. Co-Authored-By: claude-flow <ruv@ruv.net>
Surface the ruvnet#430 HNSW correctness fixes (insert beam, distance-based pruning, storage rebuild) to npm consumers. Bump applies to the meta package and all 5 platform-specific subpackages so optionalDependencies resolve consistently after publish-all.yml runs. Co-Authored-By: claude-flow <ruv@ruv.net>
The expanded README and 0.1.1 version were already published to npm by an earlier release, but never committed back to git. Verified identical to `npm pack @ruvector/diskann@0.1.1`. Bringing the working tree in sync so future bumps start from a clean baseline. Co-Authored-By: claude-flow <ruv@ruv.net>
No behaviour change — collapses single-expression closure and assignment onto one line per rustfmt defaults so the rustfmt CI job passes. Co-Authored-By: claude-flow <ruv@ruv.net>
The `optional-deps-resolvable-on-npm` regression guard fails because
@ruvector/router-<platform>@0.1.31 doesn't exist on npm yet — those
platform binaries are only published by `publish-all.yml` after a tag is
cut, which happens AFTER this PR merges.
Splitting the work:
- This PR: HNSW correctness fix + CI guards (keeps regression-guard
green on every commit).
- Follow-up release PR: bump @ruvector/router meta + 5 platform
packages to 0.1.31, tag v0.1.31, publish-all.yml ships the fix.
This commit reverts c5c7e7f and is itself reverted in the release PR.
Co-Authored-By: claude-flow <ruv@ruv.net>
Mirrors the pattern landed on sublinear-time-solver#25:
1. dependency-review (PRs only, informational)
2. cargo-audit (RustSec advisory DB, vulnerabilities only)
3. cargo-deny (license/source/ban policy via deny.toml)
4. npm-audit (workspace npm/ at --audit-level=critical)
5. lockfile-integrity (cargo metadata --locked)
npm criticals cleared via package.json overrides:
- vm2: transitively dropped via @google-cloud/redis 5.x
- fast-xml-parser: >=5.7.0 (was <=5.6.0 vuln)
- protobufjs: >=7.5.6 (was <=7.5.5 vuln)
- @google-cloud/redis: >=5.0.0 (was <=3.3.0 vuln)
- handlebars: picked up >=4.7.9 via override resolution
Result: 73 vulns → 33 (3 crit → 0, 36 high → 19, 17 medium → 5).
19 highs remain (mostly devDep transitives + ML helpers) and are
tracked via the new dependabot.yml — Dependabot will chip away
weekly.
deny.toml ignore-list with re-review dates covers:
- RUSTSEC-2023-0071 rsa Marvin Attack (no patched version yet,
local-only signing for Kalshi API; re-review
2026-08-01)
- RUSTSEC-2026-0097 rand unsoundness (not triggerable in our
usage — no logging inside RNG draws)
- RUSTSEC-2026-0115/0116/0117 imageproc unsoundness (scipix
offline examples only, never published)
- 8 unmaintained advisories (paste, bincode, instant, rand_os,
proc-macro-error, rustls-pemfile, rusttype, number_prefix,
core2) — all transitive, no CVE, tracked for migration
Added BSL-1.0, CDLA-Permissive-2.0, NCSA licenses to allowlist
(present in transitive deps via xxhash-rust, tch-rs, LLVM family).
dependabot.yml schedules weekly Tuesday 09:35 UTC for cargo +
npm + github-actions ecosystems with patch+minor grouping.
Co-Authored-By: claude-flow <ruv@ruv.net>
ci(security): 5-layer supply-chain CI + clear 3 npm criticals
Built from commit b9bb370 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions
# Conflicts: # npm/core/platforms/win32-x64-msvc/ruvector.node # npm/packages/diskann/README.md
There was a problem hiding this comment.
💡 Codex Review
ruvector/.github/workflows/regression-guard.yml
Lines 37 to 38 in 1c12866
The reentrant-lock guard uses a single grep -P command with two -e patterns (-rnPe ... -rnPe ...), but GNU grep's PCRE mode supports only one pattern, so this command exits with an error instead of evaluating either check. Because it's inside an if ...; then condition, the script treats that failure as a non-match and continues, which silently disables this regression guard and lets reentrant RwLock patterns slip through CI.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # to users). | ||
| ignore: | ||
| - dependency-name: "*" | ||
| dependency-type: "development" |
There was a problem hiding this comment.
Replace invalid ignore key with a supported Dependabot rule
The npm update block uses ignore with dependency-type: "development", but dependency-type is not a supported field for ignore entries (it is supported for allow/groups). This can make the Dependabot config invalid or cause the rule to be ignored, which means the intended filtering of devDependency updates will not work and can stop expected npm dependency-update PRs from being generated.
Useful? React with 👍 / 👎.
…y-chain-resolved # Conflicts: # npm/core/platforms/win32-x64-msvc/ruvector.node
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b5858b9a21
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| permissions: | ||
| contents: read | ||
| pull-requests: read |
There was a problem hiding this comment.
Grant PR write permission for dependency-review comments
Enable pull-requests: write when comment-summary-in-pr: on-failure is set; the dependency-review-action docs/examples require write scope to post PR summaries, and with only pull-requests: read this step cannot comment and may fail noisily (currently masked by continue-on-error). In PR runs, that effectively disables the intended Layer 1 review signal even when the repository supports dependency review.
Useful? React with 👍 / 👎.
| { triple = "x86_64-unknown-linux-gnu" }, | ||
| { triple = "aarch64-unknown-linux-gnu" }, | ||
| { triple = "x86_64-apple-darwin" }, | ||
| { triple = "aarch64-apple-darwin" }, |
There was a problem hiding this comment.
Include Windows targets in cargo-deny graph checks
cargo deny check only evaluates dependency edges that match the configured [graph].targets; with just Linux/macOS triples here, Windows-only crates are excluded from advisories/license/source checks. This repo clearly ships/tests Windows artifacts (x86_64-pc-windows-msvc appears across multiple workflows), so the current target set leaves a real blind spot in the new supply-chain policy.
Useful? React with 👍 / 👎.
Resolved conflict in npm/core/platforms/win32-x64-msvc/ruvector.node by taking origin/main's version (newer NAPI-RS rebuild). CI rebuilds binaries on push regardless. Co-Authored-By: claude-flow <ruv@ruv.net>
- Run cargo fmt on rvf-runtime/tests/agi_e2e.rs (assert! macro wrapping was the only rustfmt diff) - Expand dependency-review allow-licenses to cover SPDX expressions appearing in our resolved transitive graph (BlueOak-1.0.0, MIT-0, Ruby) and add allow-dependencies for @anthropic-ai/claude-code* packages whose license ships as a README reference (LicenseRef-bad). Closes Rustfmt + dependency-review failures observed on PR 30. Co-Authored-By: claude-flow <ruv@ruv.net>
|
ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2a3ddbf698
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| name: dependency-review (PRs only) | ||
| runs-on: ubuntu-latest | ||
| if: github.event_name == 'pull_request' | ||
| continue-on-error: true |
There was a problem hiding this comment.
Make dependency-review failures block PRs
Remove continue-on-error: true from the dependency-review job, because with this setting any vulnerability/license failure from actions/dependency-review-action is treated as non-blocking and the workflow can still pass. In PRs that introduce a new moderate/high vulnerability (or license violation), this silently bypasses the intended Layer 1 gate and allows merge unless another layer independently catches it.
Useful? React with 👍 / 👎.
| # `LicenseRef-bad-see-license-in-…`. Allow-list the package + per- | ||
| # platform binary variants by name so the unparseable LicenseRef | ||
| # doesn't gate PRs. | ||
| allow-dependencies: '@anthropic-ai/claude-code, @anthropic-ai/claude-code-darwin-arm64, @anthropic-ai/claude-code-darwin-x64, @anthropic-ai/claude-code-linux-arm64, @anthropic-ai/claude-code-linux-arm64-musl, @anthropic-ai/claude-code-linux-x64, @anthropic-ai/claude-code-linux-x64-musl, @anthropic-ai/claude-code-win32-arm64, @anthropic-ai/claude-code-win32-x64' |
There was a problem hiding this comment.
Use a supported dependency-review allowlist input
Replace allow-dependencies with the action’s supported input (allow-dependencies-licenses using purl values), because allow-dependencies is not a recognized dependency-review-action parameter and will be ignored. As written, the Anthropic package exception does not actually apply, so PRs can still fail on those license detections despite the intended allowlist.
Useful? React with 👍 / 👎.
1. supply-chain.yml: replace invalid 'allow-dependencies' input with
'allow-dependencies-licenses' (the v4 action rejects the former,
causing dependency-review to fail on @anthropic-ai/claude-code
license detection)
2. supply-chain.yml: upgrade permissions to pull-requests: write so
comment-summary-in-pr: on-failure can post PR summaries
3. dependabot.yml: replace invalid 'dependency-type' under 'ignore'
(only valid under 'allow') with allow: [{dependency-type: production}]
4. regression-guard.yml: split dual-pattern PCRE grep into two separate
invocations — GNU grep -P supports only one pattern per call; the
prior form silently errored and disabled the reentrant-lock guard
5. deny.toml: add x86_64-pc-windows-msvc target triple so cargo-deny
checks cover Windows deps (the repo ships win32 NAPI artifacts)
|
ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR. |
The dependency-review-action@v4 requires package-url (PURL) format for the allow-dependencies-licenses input. Bare npm package names cause 'package-url must start with pkg:' parse errors. Format: pkg:npm/%40<scope>/<name> (percent-encoded @ for scoped pkgs)
|
ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR. |
There was a problem hiding this comment.
📝 Info: index.rs exceeds 500-line guideline from CLAUDE.md
CLAUDE.md's Project Architecture section states "Keep files under 500 lines." index.rs was already ~558 lines before this PR and is now 650 lines after adding the pruning logic and test_pruning_keeps_closest_not_newest test. The violation is pre-existing, but the PR extends it. The added code (pruning logic + regression test) is correctly co-located with the HNSW index implementation, so splitting requires a thoughtful refactor (e.g., extracting tests to a separate tests/ module or the pruning helper to a submodule).
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
📝 Info: deny.toml placed in repo root vs CLAUDE.md file organization rules
The CLAUDE.md states 'NEVER save to root folder' and 'Use /config for configuration files'. However, deny.toml is a standard Rust ecosystem tool config that cargo deny check (in .github/workflows/supply-chain.yml:118) expects at the workspace root by default. The repo already has other config files in the root (Cargo.toml, package.json, .mcp.json). The tool convention effectively requires root placement unless --config config/deny.toml is passed to the CI command. This is a pragmatic exception to the organizational rule rather than a meaningful violation, since relocating it would require updating the CI workflow.
Was this helpful? React with 👍 or 👎 to provide feedback.
| let vectors_snapshot = self.vectors.read(); | ||
|
|
||
| // Re-acquire graph lock for modifications | ||
| let mut graph = self.graph.write(); |
There was a problem hiding this comment.
📝 Info: vectors.read() held during pruning widens the critical section vs old code
The new code at crates/ruvector-router-core/src/index.rs:129 acquires self.vectors.read() and holds it across the entire graph modification loop (through line 176). In the old code, only graph.write() was held during connection modifications. The new pattern additionally blocks any concurrent self.vectors.write() (i.e., the very first step of every other concurrent insert call at line 97). This is not a deadlock (lock ordering is consistent: vectors → graph across all paths), but it serializes the initial vector store step of concurrent inserts during the pruning phase, which is a minor throughput regression for high-concurrency workloads. The test_hnsw_concurrent_inserts test passes because it validates correctness, not throughput.
Was this helpful? React with 👍 or 👎 to provide feedback.
| return Ok(()); | ||
| } |
There was a problem hiding this comment.
📝 Info: is_first variable is always false — dead code at lines 114-116
The is_first block (lines 102-112) always evaluates to false because when the entry point is None, the function returns early at line 109 (return Ok(())). The check at lines 114-116 (if is_first { return Ok(()); }) is therefore unreachable dead code. This is a pre-existing pattern not introduced by this PR, but worth noting as it slightly obscures the control flow.
(Refers to lines 114-116)
Was this helpful? React with 👍 or 👎 to provide feedback.
| let stored_ids = storage.get_all_ids()?; | ||
| let total_vectors = stored_ids.len(); | ||
| if !stored_ids.is_empty() { | ||
| let mut entries = Vec::with_capacity(stored_ids.len()); | ||
| for id in &stored_ids { | ||
| if let Some(vector) = storage.get(id)? { | ||
| entries.push((id.clone(), vector)); | ||
| } | ||
| } | ||
| index.insert_batch(entries)?; |
There was a problem hiding this comment.
📝 Info: total_vectors stat assumes all stored IDs have retrievable vectors
In VectorDB::new, total_vectors is set to stored_ids.len() at line 40, but the subsequent loop at lines 43-46 only pushes entries where storage.get(id) returns Some. If any stored ID lacks a decodable vector (data corruption), the stat would overcount relative to what's actually in the index. In practice this shouldn't occur — all IDs in VECTORS_TABLE should have valid bincode-encoded vectors. The count() method (line 198-199) delegates to storage.count() which reads the table length directly, so count() and total_vectors would agree even in the corruption case — both would overcount. Minimal practical impact.
Was this helpful? React with 👍 or 👎 to provide feedback.
| let mut scored: Vec<(String, f32)> = neighbor_connections | ||
| .drain(..) | ||
| .filter_map(|cid| { | ||
| vectors_snapshot.get(&cid).map(|cv| { | ||
| let d = calculate_distance(anchor_vec, cv, self.config.metric) | ||
| .unwrap_or(f32::MAX); | ||
| (cid, d) | ||
| }) | ||
| }) | ||
| .collect(); | ||
| scored.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(Ordering::Equal)); | ||
| scored.truncate(self.config.m); | ||
| *neighbor_connections = scored.into_iter().map(|(cid, _)| cid).collect(); |
There was a problem hiding this comment.
📝 Info: filter_map in pruning silently drops connections for missing vectors
At crates/ruvector-router-core/src/index.rs:154, the filter_map skips any cid whose vector is not found in vectors_snapshot. If this occurs, the connection is permanently lost from the adjacency list after drain(..). In theory this cannot happen because all IDs in the graph must have their vectors stored (vectors.write() at line 97 always executes before graph connections are established), and the snapshot is taken after all prior writes are visible. However, if a future code change introduces a path where a vector ID appears in graph connections without a corresponding entry in the vectors HashMap (e.g., a partial failure during batch insert), this would silently degrade graph connectivity rather than raising an error.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 84d6ab2f46
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| - name: install cargo-deny | ||
| run: cargo install --locked cargo-deny | ||
| - name: cargo deny check | ||
| run: cargo deny check |
There was a problem hiding this comment.
Run cargo-deny with all Cargo features enabled
The new cargo deny check step only scans the default feature graph, so advisories/licenses on dependencies behind non-default features are skipped. cargo-deny defaults all-features to false unless you pass --all-features or set [graph].all-features = true, and this workspace has many optional feature-gated deps (for example crates with default = [] plus optional dependencies). As written, PRs can introduce a vulnerable crate through an opt-in feature without tripping this supply-chain gate.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 6c7ed83 — added --all-features to the cargo deny check command and set all-features = true in deny.toml so non-default feature-gated deps are scanned.
| { triple = "aarch64-unknown-linux-gnu" }, | ||
| { triple = "x86_64-apple-darwin" }, | ||
| { triple = "aarch64-apple-darwin" }, | ||
| { triple = "x86_64-pc-windows-msvc" }, | ||
| ] |
There was a problem hiding this comment.
Add missing shipped target triples to cargo-deny graph
The target filter in deny.toml still omits platforms this repo builds and publishes for (for example aarch64-pc-windows-msvc and *-unknown-linux-musl appear in CI/workspace package targets), and cargo-deny ignores target-specific dependency edges when the target isn’t listed. That leaves a real policy blind spot where advisories/license/source violations in those platform-only dependencies won’t fail the supply-chain gate.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 6c7ed83 — added the 3 missing shipped targets to deny.toml:
x86_64-unknown-linux-muslaarch64-unknown-linux-muslaarch64-pc-windows-msvc
These all appear in CI build matrices (NAPI/SONA workflows) but were previously unchecked.
Address Codex review P2 findings: 1. Run cargo deny with --all-features so deps behind non-default feature gates are also scanned for advisories/licenses/sources. Set [graph].all-features = true in deny.toml as the canonical config. 2. Add the 3 shipped targets missing from deny.toml: - x86_64-unknown-linux-musl (NAPI/SONA builds) - aarch64-unknown-linux-musl (NAPI/SONA builds) - aarch64-pc-windows-msvc (NAPI builds) These triples appear in CI build matrices but were unchecked by cargo-deny, leaving a blind spot for platform-specific advisories.
|
ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR. |
| found=0 | ||
| if grep -rnP '(\S+)\.write\(\)[^;]*\1\.(write|read)\(\)' \ | ||
| --include='*.rs' -- crates/ ; then | ||
| found=1 | ||
| fi | ||
| if grep -rnP '(\S+)\.read\(\)[^;]*\1\.write\(\)' \ | ||
| --include='*.rs' -- crates/ ; then | ||
| found=1 | ||
| fi | ||
| if [ "$found" -eq 1 ]; then | ||
| echo "::error::Found reentrant parking_lot lock acquisition on a single statement (regression of issue #437). Bind the guard once: 'let mut g = x.write(); g.field = …;'" | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
📝 Info: Regression guard comment claims grep -P supports only one pattern per invocation — this is incorrect
The comment at line 36-37 states 'Bash -P (PCRE) supports backreferences but only ONE pattern per invocation'. This is incorrect — GNU grep's -P flag supports multiple -e patterns per invocation, and each pattern has its own independent capture group namespace, so backreferences like \1 work correctly per-pattern. The old code (grep -rnPe 'pat1' -rnPe 'pat2') was syntactically valid. The split into two greps is functionally equivalent and arguably clearer, but the rationale in the comment is wrong.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
The comment is actually correct — GNU grep's -P flag does not support multiple -e patterns:
$ echo "x.write() x.write()" | grep -P -e '(\S+)\.write\(\)' -e '(\S+)\.read\(\)'
grep: the -P option only supports a single pattern
This is a documented limitation of GNU grep (PCRE mode). The split into two invocations is a necessary fix, not just a clarity improvement.
The 0.1.0→0.1.1 bump was README-only but the five platform-specific optionalDependencies still pin 0.1.0. Per repo convention (visible in @ruvector/router and @ruvector/rvf-node), the wrapper package version must stay in lockstep with its native binaries. Revert to 0.1.0 to avoid ABI skew when platform binaries are eventually republished.
|
ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR. |
…all-features scan The --all-features flag correctly expanded cargo-deny's scope to cover optional-feature deps. This surfaced 5 pre-existing RUSTSEC advisories (derivative, pprof, pqcrypto-dilithium, pqcrypto-kyber, wee_alloc) and 1 license violation (inferno's CDDL-1.0). All are pre-existing transitive deps behind optional features — not introduced by this PR. Each gets a justification + 2026-08-01 re-review date in deny.toml. CDDL-1.0 is OSI-approved and added to the license allow list for the inferno flamegraph library.
|
ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 560e85fc91
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # The `insert` function in index.rs must reach calculate_distance() | ||
| # AFTER the `> self.config.m * 2` overflow check fires — that is | ||
| # what proves the pruner is distance-aware, not FIFO. | ||
| if ! grep -nE 'calculate_distance' crates/ruvector-router-core/src/index.rs >/dev/null ; then |
There was a problem hiding this comment.
Make pruning regression check actually inspect overflow path
This guard only checks whether calculate_distance appears anywhere in index.rs, but that function is already used in unrelated search code, so the job will still pass even if distance-based pruning is removed from the overflow branch. That leaves the new “distance-based-neighbor-pruning” CI check unable to detect the exact regression it claims to block.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — fixed in 058e8d8. The check now extracts a 20-line window after the overflow gate (> self.config.m * 2) and verifies calculate_distance appears within that window, rather than grepping the whole file where search code also calls it.
|
|
||
| permissions: | ||
| contents: read | ||
| pull-requests: write |
There was a problem hiding this comment.
Restrict PR-write token to the dependency-review job
pull-requests: write is set at workflow scope, so every job in this workflow (including jobs that execute downloaded toolchains and registry data) receives a write-capable GITHUB_TOKEN. Only the dependency-review step needs this permission for comment-summary-in-pr, so leaving it global unnecessarily expands blast radius if any action or dependency in other jobs is compromised.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Agreed — fixed in 058e8d8. pull-requests: write is now scoped to the dependency-review job only. The other four jobs (cargo-audit, cargo-deny, npm-audit, lockfile-integrity) run with the workflow-level contents: read permission only.
1. supply-chain.yml: Move pull-requests: write from workflow-level to the dependency-review job only. Other jobs (cargo-audit, cargo-deny, npm-audit, lockfile-integrity) don't need write access and should run with read-only tokens to minimize blast radius. 2. regression-guard.yml: The hnsw-distance-based-neighbor-pruning check now verifies calculate_distance() appears within 20 lines of the overflow gate (> self.config.m * 2), not just anywhere in index.rs. The old whole-file grep would pass even if distance-based pruning was removed, because search code also calls calculate_distance. Both issues flagged by Codex review.
|
ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR. |
| let mut entries = Vec::with_capacity(stored_ids.len()); | ||
| for id in &stored_ids { | ||
| if let Some(vector) = storage.get(id)? { | ||
| entries.push((id.clone(), vector)); | ||
| } | ||
| } | ||
| index.insert_batch(entries)?; |
There was a problem hiding this comment.
📝 Info: Storage rebuild reads each vector in a separate transaction (O(N) transactions)
The rebuild loop at crates/ruvector-router-core/src/vector_db.rs:43-47 calls storage.get(id) for each stored ID. Each get() call opens a new read transaction (storage.rs:182). For a database with thousands of vectors, this means thousands of separate transactions. A single-transaction bulk read would be more efficient. This is a startup-only cost so it doesn't affect runtime performance, but for large persisted databases the reopen time could be noticeable.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Acknowledged — this is a startup-only cost and acceptable for the initial fix. A bulk get_all() or single-transaction read is a good follow-up optimization for large persisted databases, but out of scope for this correctness fix.
Supersedes #27
PR #27 ("Fix HNSW correctness issues and improve storage rebuild process") was authored from
ruvnet/RuVector:main(a third-party fork). It became CONFLICTING against currentmainafter months of org-normalization commits landed. Resolving the conflicts in-place would have required pushing to a fork I don't own — so this PR brings the same content intoFlexNetOS/ruvectordirectly.What's in this PR
All 8 commits from #27 plus a merge commit (
1c128662) bringing the branch up to date with currentmain:.github/dependabot.yml(new) — daily npm/cargo/actions update PRs.github/workflows/supply-chain.yml(new) — 5-layer supply-chain CI.github/workflows/regression-guard.yml(updated) — extended coveragedeny.toml(new) — cargo-deny rulescrates/ruvector-router-core/src/{index,vector_db}.rs— HNSW correctness fixes + storage rebuild improvementsexamples/subpolynomial-time/Cargo.toml— minor depsnpm/core/platforms/win32-x64-msvc/ruvector.node— NAPI binarynpm/packages/diskann/{README.md,package.json}— docs + versionnpm/package.json,npm/package-lock.json— npm criticals clearedConflict resolution applied
npm/core/platforms/win32-x64-msvc/ruvector.node— Tookorigin/main's newer NAPI-RS build artifact (regenerable; the newer binary reflects all recent NAPI updates).npm/packages/diskann/README.md— Kept the PR's rich documentation (API table, benchmarks, algorithm notes) overmain's 1-line stub. Normalized 4github.com/ruvnet/ruvectorURLs togithub.com/FlexNetOS/ruvectorto match the canonical org slug.Provenance
Original commits are preserved with their original authors (FlexNetOS). The merge commit (
1c128662) is the resolution.Test plan
🤖 Generated with claude-flow