Skip to content

Supersedes #27: HNSW correctness + supply-chain CI (merge resolved against current main)#30

Merged
FlexNetOS merged 18 commits into
mainfrom
merge/pr27-hnsw-supply-chain-resolved
May 21, 2026
Merged

Supersedes #27: HNSW correctness + supply-chain CI (merge resolved against current main)#30
FlexNetOS merged 18 commits into
mainfrom
merge/pr27-hnsw-supply-chain-resolved

Conversation

@FlexNetOS
Copy link
Copy Markdown
Owner

Supersedes #27

PR #27 ("Fix HNSW correctness issues and improve storage rebuild process") was authored from ruvnet/RuVector:main (a third-party fork). It became CONFLICTING against current main after months of org-normalization commits landed. Resolving the conflicts in-place would have required pushing to a fork I don't own — so this PR brings the same content into FlexNetOS/ruvector directly.

What's in this PR

All 8 commits from #27 plus a merge commit (1c128662) bringing the branch up to date with current main:

  • .github/dependabot.yml (new) — daily npm/cargo/actions update PRs
  • .github/workflows/supply-chain.yml (new) — 5-layer supply-chain CI
  • .github/workflows/regression-guard.yml (updated) — extended coverage
  • deny.toml (new) — cargo-deny rules
  • crates/ruvector-router-core/src/{index,vector_db}.rs — HNSW correctness fixes + storage rebuild improvements
  • examples/subpolynomial-time/Cargo.toml — minor deps
  • npm/core/platforms/win32-x64-msvc/ruvector.node — NAPI binary
  • npm/packages/diskann/{README.md,package.json} — docs + version
  • npm/package.json, npm/package-lock.json — npm criticals cleared

Conflict resolution applied

  1. npm/core/platforms/win32-x64-msvc/ruvector.node — Took origin/main's newer NAPI-RS build artifact (regenerable; the newer binary reflects all recent NAPI updates).
  2. npm/packages/diskann/README.md — Kept the PR's rich documentation (API table, benchmarks, algorithm notes) over main's 1-line stub. Normalized 4 github.com/ruvnet/ruvector URLs to github.com/FlexNetOS/ruvector to match the canonical org slug.

Provenance

Original commits are preserved with their original authors (FlexNetOS). The merge commit (1c128662) is the resolution.

Test plan

🤖 Generated with claude-flow

ruvnet and others added 9 commits May 18, 2026 16:30
…ed pruning + storage rebuild

Three remaining root causes from issue ruvnet#430, plus the storage-rebuild gap from PR ruvnet#460.

  Bug B — insert beam was clamped to ef_construction.min(m * 2). With defaults
          (m=16, ef_construction=200) the beam silently became 32. Late-
          inserted clusters got wired through whatever was near the entry
          point instead of through ef_construction-wide neighbour search.

  Bug C — adjacency-list pruning used `drain(0..drain_count)`, dropping the
          OLDEST edges regardless of distance. Proper HNSW pruning keeps the
          m CLOSEST edges. Now sort by `calculate_distance` to the anchor
          vector and truncate to m. Kept a fallback that preserves the
          newest-m behaviour when the anchor vector lookup fails so we
          never panic on a missing vector.

  Storage — VectorDB::new() always created a fresh empty HnswIndex, so
            previously persisted vectors were invisible to search after
            reopening the database. Now rebuild via storage.get_all_ids()
            + index.insert_batch() on open, and seed VectorDbStats.total_vectors
            with the recovered count.

Tests:
  - test_pruning_keeps_closest_not_newest: builds a hub with 20 close
    neighbours then 6 far neighbours, asserts no "far_*" id appears in
    top-10 around the hub. Fails on FIFO pruning.
  - test_index_rebuilt_from_storage_on_open: writes 5 vectors via one
    VectorDB instance, reopens against the same path, asserts search
    returns the persisted match. Fails on the historical empty-index bug.

Regression-guard CI additions:
  - hnsw-insert-beam-no-m2-clamp: textually forbids the ef_construction.min(m*2)
    pattern in index.rs.
  - hnsw-distance-based-neighbor-pruning: requires calculate_distance and the
    `> m * 2` overflow gate to both live in index.rs.
  - vector-db-rebuilds-index-on-open: requires storage.get_all_ids() in
    vector_db.rs.
  - hnsw-recall-at-1 job now also runs the two new tests.

Supersedes PR ruvnet#460 (CoolDude1969) which covered storage rebuild + an
overlapping heap fix already in main from PR ruvnet#466.

Closes ruvnet#430.

Co-Authored-By: claude-flow <ruv@ruv.net>
Surface the ruvnet#430 HNSW correctness fixes (insert beam, distance-based
pruning, storage rebuild) to npm consumers. Bump applies to the meta
package and all 5 platform-specific subpackages so optionalDependencies
resolve consistently after publish-all.yml runs.

Co-Authored-By: claude-flow <ruv@ruv.net>
The expanded README and 0.1.1 version were already published to npm by
an earlier release, but never committed back to git. Verified identical
to `npm pack @ruvector/diskann@0.1.1`. Bringing the working tree in sync
so future bumps start from a clean baseline.

Co-Authored-By: claude-flow <ruv@ruv.net>
No behaviour change — collapses single-expression closure and assignment
onto one line per rustfmt defaults so the rustfmt CI job passes.

Co-Authored-By: claude-flow <ruv@ruv.net>
The `optional-deps-resolvable-on-npm` regression guard fails because
@ruvector/router-<platform>@0.1.31 doesn't exist on npm yet — those
platform binaries are only published by `publish-all.yml` after a tag is
cut, which happens AFTER this PR merges.

Splitting the work:
  - This PR: HNSW correctness fix + CI guards (keeps regression-guard
    green on every commit).
  - Follow-up release PR: bump @ruvector/router meta + 5 platform
    packages to 0.1.31, tag v0.1.31, publish-all.yml ships the fix.

This commit reverts c5c7e7f and is itself reverted in the release PR.

Co-Authored-By: claude-flow <ruv@ruv.net>
Mirrors the pattern landed on sublinear-time-solver#25:
  1. dependency-review  (PRs only, informational)
  2. cargo-audit        (RustSec advisory DB, vulnerabilities only)
  3. cargo-deny         (license/source/ban policy via deny.toml)
  4. npm-audit          (workspace npm/ at --audit-level=critical)
  5. lockfile-integrity (cargo metadata --locked)

npm criticals cleared via package.json overrides:
  - vm2:                 transitively dropped via @google-cloud/redis 5.x
  - fast-xml-parser:     >=5.7.0 (was <=5.6.0 vuln)
  - protobufjs:          >=7.5.6 (was <=7.5.5 vuln)
  - @google-cloud/redis: >=5.0.0 (was <=3.3.0 vuln)
  - handlebars:          picked up >=4.7.9 via override resolution

Result: 73 vulns → 33 (3 crit → 0, 36 high → 19, 17 medium → 5).
19 highs remain (mostly devDep transitives + ML helpers) and are
tracked via the new dependabot.yml — Dependabot will chip away
weekly.

deny.toml ignore-list with re-review dates covers:
  - RUSTSEC-2023-0071  rsa Marvin Attack (no patched version yet,
                       local-only signing for Kalshi API; re-review
                       2026-08-01)
  - RUSTSEC-2026-0097  rand unsoundness (not triggerable in our
                       usage — no logging inside RNG draws)
  - RUSTSEC-2026-0115/0116/0117  imageproc unsoundness (scipix
                       offline examples only, never published)
  - 8 unmaintained advisories (paste, bincode, instant, rand_os,
    proc-macro-error, rustls-pemfile, rusttype, number_prefix,
    core2) — all transitive, no CVE, tracked for migration

Added BSL-1.0, CDLA-Permissive-2.0, NCSA licenses to allowlist
(present in transitive deps via xxhash-rust, tch-rs, LLVM family).

dependabot.yml schedules weekly Tuesday 09:35 UTC for cargo +
npm + github-actions ecosystems with patch+minor grouping.

Co-Authored-By: claude-flow <ruv@ruv.net>
ci(security): 5-layer supply-chain CI + clear 3 npm criticals
  Built from commit b9bb370

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
# Conflicts:
#	npm/core/platforms/win32-x64-msvc/ruvector.node
#	npm/packages/diskann/README.md
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

if grep -rnPe '(\S+)\.write\(\)[^;]*\1\.(write|read)\(\)' \
-rnPe '(\S+)\.read\(\)[^;]*\1\.write\(\)' \

P2 Badge Fix PCRE grep invocation in deadlock guard job

The reentrant-lock guard uses a single grep -P command with two -e patterns (-rnPe ... -rnPe ...), but GNU grep's PCRE mode supports only one pattern, so this command exits with an error instead of evaluating either check. Because it's inside an if ...; then condition, the script treats that failure as a non-match and continues, which silently disables this regression guard and lets reentrant RwLock patterns slip through CI.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/dependabot.yml Outdated
# to users).
ignore:
- dependency-name: "*"
dependency-type: "development"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replace invalid ignore key with a supported Dependabot rule

The npm update block uses ignore with dependency-type: "development", but dependency-type is not a supported field for ignore entries (it is supported for allow/groups). This can make the Dependabot config invalid or cause the rule to be ignored, which means the intended filtering of devDependency updates will not work and can stop expected npm dependency-update PRs from being generated.

Useful? React with 👍 / 👎.

…y-chain-resolved

# Conflicts:
#	npm/core/platforms/win32-x64-msvc/ruvector.node
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b5858b9a21

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/supply-chain.yml Outdated

permissions:
contents: read
pull-requests: read
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Grant PR write permission for dependency-review comments

Enable pull-requests: write when comment-summary-in-pr: on-failure is set; the dependency-review-action docs/examples require write scope to post PR summaries, and with only pull-requests: read this step cannot comment and may fail noisily (currently masked by continue-on-error). In PR runs, that effectively disables the intended Layer 1 review signal even when the repository supports dependency review.

Useful? React with 👍 / 👎.

Comment thread deny.toml
Comment on lines +19 to +22
{ triple = "x86_64-unknown-linux-gnu" },
{ triple = "aarch64-unknown-linux-gnu" },
{ triple = "x86_64-apple-darwin" },
{ triple = "aarch64-apple-darwin" },
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include Windows targets in cargo-deny graph checks

cargo deny check only evaluates dependency edges that match the configured [graph].targets; with just Linux/macOS triples here, Windows-only crates are excluded from advisories/license/source checks. This repo clearly ships/tests Windows artifacts (x86_64-pc-windows-msvc appears across multiple workflows), so the current target set leaves a real blind spot in the new supply-chain policy.

Useful? React with 👍 / 👎.

FlexNetOS and others added 2 commits May 21, 2026 03:04
Resolved conflict in npm/core/platforms/win32-x64-msvc/ruvector.node
by taking origin/main's version (newer NAPI-RS rebuild).
CI rebuilds binaries on push regardless.

Co-Authored-By: claude-flow <ruv@ruv.net>
- Run cargo fmt on rvf-runtime/tests/agi_e2e.rs (assert! macro
  wrapping was the only rustfmt diff)
- Expand dependency-review allow-licenses to cover SPDX expressions
  appearing in our resolved transitive graph (BlueOak-1.0.0, MIT-0,
  Ruby) and add allow-dependencies for @anthropic-ai/claude-code*
  packages whose license ships as a README reference (LicenseRef-bad).

Closes Rustfmt + dependency-review failures observed on PR 30.

Co-Authored-By: claude-flow <ruv@ruv.net>
@ecc-tools
Copy link
Copy Markdown

ecc-tools Bot commented May 21, 2026

ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2a3ddbf698

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

name: dependency-review (PRs only)
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
continue-on-error: true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make dependency-review failures block PRs

Remove continue-on-error: true from the dependency-review job, because with this setting any vulnerability/license failure from actions/dependency-review-action is treated as non-blocking and the workflow can still pass. In PRs that introduce a new moderate/high vulnerability (or license violation), this silently bypasses the intended Layer 1 gate and allows merge unless another layer independently catches it.

Useful? React with 👍 / 👎.

Comment thread .github/workflows/supply-chain.yml Outdated
# `LicenseRef-bad-see-license-in-…`. Allow-list the package + per-
# platform binary variants by name so the unparseable LicenseRef
# doesn't gate PRs.
allow-dependencies: '@anthropic-ai/claude-code, @anthropic-ai/claude-code-darwin-arm64, @anthropic-ai/claude-code-darwin-x64, @anthropic-ai/claude-code-linux-arm64, @anthropic-ai/claude-code-linux-arm64-musl, @anthropic-ai/claude-code-linux-x64, @anthropic-ai/claude-code-linux-x64-musl, @anthropic-ai/claude-code-win32-arm64, @anthropic-ai/claude-code-win32-x64'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use a supported dependency-review allowlist input

Replace allow-dependencies with the action’s supported input (allow-dependencies-licenses using purl values), because allow-dependencies is not a recognized dependency-review-action parameter and will be ignored. As written, the Anthropic package exception does not actually apply, so PRs can still fail on those license detections despite the intended allowlist.

Useful? React with 👍 / 👎.

1. supply-chain.yml: replace invalid 'allow-dependencies' input with
   'allow-dependencies-licenses' (the v4 action rejects the former,
   causing dependency-review to fail on @anthropic-ai/claude-code
   license detection)

2. supply-chain.yml: upgrade permissions to pull-requests: write so
   comment-summary-in-pr: on-failure can post PR summaries

3. dependabot.yml: replace invalid 'dependency-type' under 'ignore'
   (only valid under 'allow') with allow: [{dependency-type: production}]

4. regression-guard.yml: split dual-pattern PCRE grep into two separate
   invocations — GNU grep -P supports only one pattern per call; the
   prior form silently errored and disabled the reentrant-lock guard

5. deny.toml: add x86_64-pc-windows-msvc target triple so cargo-deny
   checks cover Windows deps (the repo ships win32 NAPI artifacts)
@ecc-tools
Copy link
Copy Markdown

ecc-tools Bot commented May 21, 2026

ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR.

The dependency-review-action@v4 requires package-url (PURL) format
for the allow-dependencies-licenses input. Bare npm package names
cause 'package-url must start with pkg:' parse errors.

Format: pkg:npm/%40<scope>/<name> (percent-encoded @ for scoped pkgs)
@ecc-tools
Copy link
Copy Markdown

ecc-tools Bot commented May 21, 2026

ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 8 new potential issues.

Open in Devin Review

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: index.rs exceeds 500-line guideline from CLAUDE.md

CLAUDE.md's Project Architecture section states "Keep files under 500 lines." index.rs was already ~558 lines before this PR and is now 650 lines after adding the pruning logic and test_pruning_keeps_closest_not_newest test. The violation is pre-existing, but the PR extends it. The added code (pruning logic + regression test) is correctly co-located with the HNSW index implementation, so splitting requires a thoughtful refactor (e.g., extracting tests to a separate tests/ module or the pruning helper to a submodule).

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread deny.toml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: deny.toml placed in repo root vs CLAUDE.md file organization rules

The CLAUDE.md states 'NEVER save to root folder' and 'Use /config for configuration files'. However, deny.toml is a standard Rust ecosystem tool config that cargo deny check (in .github/workflows/supply-chain.yml:118) expects at the workspace root by default. The repo already has other config files in the root (Cargo.toml, package.json, .mcp.json). The tool convention effectively requires root placement unless --config config/deny.toml is passed to the CI command. This is a pragmatic exception to the organizational rule rather than a meaningful violation, since relocating it would require updating the CI workflow.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +129 to 132
let vectors_snapshot = self.vectors.read();

// Re-acquire graph lock for modifications
let mut graph = self.graph.write();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: vectors.read() held during pruning widens the critical section vs old code

The new code at crates/ruvector-router-core/src/index.rs:129 acquires self.vectors.read() and holds it across the entire graph modification loop (through line 176). In the old code, only graph.write() was held during connection modifications. The new pattern additionally blocks any concurrent self.vectors.write() (i.e., the very first step of every other concurrent insert call at line 97). This is not a deadlock (lock ordering is consistent: vectors → graph across all paths), but it serializes the initial vector store step of concurrent inserts during the pruning phase, which is a minor throughput regression for high-concurrency workloads. The test_hnsw_concurrent_inserts test passes because it validates correctness, not throughput.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines 115 to 116
return Ok(());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: is_first variable is always false — dead code at lines 114-116

The is_first block (lines 102-112) always evaluates to false because when the entry point is None, the function returns early at line 109 (return Ok(())). The check at lines 114-116 (if is_first { return Ok(()); }) is therefore unreachable dead code. This is a pre-existing pattern not introduced by this PR, but worth noting as it slightly obscures the control flow.

(Refers to lines 114-116)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +39 to +48
let stored_ids = storage.get_all_ids()?;
let total_vectors = stored_ids.len();
if !stored_ids.is_empty() {
let mut entries = Vec::with_capacity(stored_ids.len());
for id in &stored_ids {
if let Some(vector) = storage.get(id)? {
entries.push((id.clone(), vector));
}
}
index.insert_batch(entries)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: total_vectors stat assumes all stored IDs have retrievable vectors

In VectorDB::new, total_vectors is set to stored_ids.len() at line 40, but the subsequent loop at lines 43-46 only pushes entries where storage.get(id) returns Some. If any stored ID lacks a decodable vector (data corruption), the stat would overcount relative to what's actually in the index. In practice this shouldn't occur — all IDs in VECTORS_TABLE should have valid bincode-encoded vectors. The count() method (line 198-199) delegates to storage.count() which reads the table length directly, so count() and total_vectors would agree even in the corruption case — both would overcount. Minimal practical impact.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread .github/workflows/regression-guard.yml Outdated
Comment on lines +152 to +164
let mut scored: Vec<(String, f32)> = neighbor_connections
.drain(..)
.filter_map(|cid| {
vectors_snapshot.get(&cid).map(|cv| {
let d = calculate_distance(anchor_vec, cv, self.config.metric)
.unwrap_or(f32::MAX);
(cid, d)
})
})
.collect();
scored.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(Ordering::Equal));
scored.truncate(self.config.m);
*neighbor_connections = scored.into_iter().map(|(cid, _)| cid).collect();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: filter_map in pruning silently drops connections for missing vectors

At crates/ruvector-router-core/src/index.rs:154, the filter_map skips any cid whose vector is not found in vectors_snapshot. If this occurs, the connection is permanently lost from the adjacency list after drain(..). In theory this cannot happen because all IDs in the graph must have their vectors stored (vectors.write() at line 97 always executes before graph connections are established), and the snapshot is taken after all prior writes are visible. However, if a future code change introduces a path where a vector ID appears in graph connections without a corresponding entry in the vectors HashMap (e.g., a partial failure during batch insert), this would silently degrade graph connectivity rather than raising an error.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread npm/packages/diskann/package.json Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84d6ab2f46

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread .github/workflows/supply-chain.yml Outdated
- name: install cargo-deny
run: cargo install --locked cargo-deny
- name: cargo deny check
run: cargo deny check
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Run cargo-deny with all Cargo features enabled

The new cargo deny check step only scans the default feature graph, so advisories/licenses on dependencies behind non-default features are skipped. cargo-deny defaults all-features to false unless you pass --all-features or set [graph].all-features = true, and this workspace has many optional feature-gated deps (for example crates with default = [] plus optional dependencies). As written, PRs can introduce a vulnerable crate through an opt-in feature without tripping this supply-chain gate.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6c7ed83 — added --all-features to the cargo deny check command and set all-features = true in deny.toml so non-default feature-gated deps are scanned.

Comment thread deny.toml
Comment on lines +20 to +24
{ triple = "aarch64-unknown-linux-gnu" },
{ triple = "x86_64-apple-darwin" },
{ triple = "aarch64-apple-darwin" },
{ triple = "x86_64-pc-windows-msvc" },
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add missing shipped target triples to cargo-deny graph

The target filter in deny.toml still omits platforms this repo builds and publishes for (for example aarch64-pc-windows-msvc and *-unknown-linux-musl appear in CI/workspace package targets), and cargo-deny ignores target-specific dependency edges when the target isn’t listed. That leaves a real policy blind spot where advisories/license/source violations in those platform-only dependencies won’t fail the supply-chain gate.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6c7ed83 — added the 3 missing shipped targets to deny.toml:

  • x86_64-unknown-linux-musl
  • aarch64-unknown-linux-musl
  • aarch64-pc-windows-msvc

These all appear in CI build matrices (NAPI/SONA workflows) but were previously unchecked.

Address Codex review P2 findings:

1. Run cargo deny with --all-features so deps behind non-default
   feature gates are also scanned for advisories/licenses/sources.
   Set [graph].all-features = true in deny.toml as the canonical config.

2. Add the 3 shipped targets missing from deny.toml:
   - x86_64-unknown-linux-musl  (NAPI/SONA builds)
   - aarch64-unknown-linux-musl (NAPI/SONA builds)
   - aarch64-pc-windows-msvc    (NAPI builds)
   These triples appear in CI build matrices but were unchecked by
   cargo-deny, leaving a blind spot for platform-specific advisories.
@ecc-tools
Copy link
Copy Markdown

ecc-tools Bot commented May 21, 2026

ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

Open in Devin Review

Comment thread npm/packages/diskann/package.json Outdated
Comment on lines +38 to 50
found=0
if grep -rnP '(\S+)\.write\(\)[^;]*\1\.(write|read)\(\)' \
--include='*.rs' -- crates/ ; then
found=1
fi
if grep -rnP '(\S+)\.read\(\)[^;]*\1\.write\(\)' \
--include='*.rs' -- crates/ ; then
found=1
fi
if [ "$found" -eq 1 ]; then
echo "::error::Found reentrant parking_lot lock acquisition on a single statement (regression of issue #437). Bind the guard once: 'let mut g = x.write(); g.field = …;'"
exit 1
fi
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Regression guard comment claims grep -P supports only one pattern per invocation — this is incorrect

The comment at line 36-37 states 'Bash -P (PCRE) supports backreferences but only ONE pattern per invocation'. This is incorrect — GNU grep's -P flag supports multiple -e patterns per invocation, and each pattern has its own independent capture group namespace, so backreferences like \1 work correctly per-pattern. The old code (grep -rnPe 'pat1' -rnPe 'pat2') was syntactically valid. The split into two greps is functionally equivalent and arguably clearer, but the rationale in the comment is wrong.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is actually correct — GNU grep's -P flag does not support multiple -e patterns:

$ echo "x.write() x.write()" | grep -P -e '(\S+)\.write\(\)' -e '(\S+)\.read\(\)'
grep: the -P option only supports a single pattern

This is a documented limitation of GNU grep (PCRE mode). The split into two invocations is a necessary fix, not just a clarity improvement.

The 0.1.0→0.1.1 bump was README-only but the five platform-specific
optionalDependencies still pin 0.1.0. Per repo convention (visible in
@ruvector/router and @ruvector/rvf-node), the wrapper package version
must stay in lockstep with its native binaries. Revert to 0.1.0 to
avoid ABI skew when platform binaries are eventually republished.
@ecc-tools
Copy link
Copy Markdown

ecc-tools Bot commented May 21, 2026

ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR.

…all-features scan

The --all-features flag correctly expanded cargo-deny's scope to cover
optional-feature deps. This surfaced 5 pre-existing RUSTSEC advisories
(derivative, pprof, pqcrypto-dilithium, pqcrypto-kyber, wee_alloc) and
1 license violation (inferno's CDDL-1.0).

All are pre-existing transitive deps behind optional features — not
introduced by this PR. Each gets a justification + 2026-08-01 re-review
date in deny.toml. CDDL-1.0 is OSI-approved and added to the license
allow list for the inferno flamegraph library.
@ecc-tools
Copy link
Copy Markdown

ecc-tools Bot commented May 21, 2026

ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 560e85fc91

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread .github/workflows/regression-guard.yml Outdated
# The `insert` function in index.rs must reach calculate_distance()
# AFTER the `> self.config.m * 2` overflow check fires — that is
# what proves the pruner is distance-aware, not FIFO.
if ! grep -nE 'calculate_distance' crates/ruvector-router-core/src/index.rs >/dev/null ; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make pruning regression check actually inspect overflow path

This guard only checks whether calculate_distance appears anywhere in index.rs, but that function is already used in unrelated search code, so the job will still pass even if distance-based pruning is removed from the overflow branch. That leaves the new “distance-based-neighbor-pruning” CI check unable to detect the exact regression it claims to block.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 058e8d8. The check now extracts a 20-line window after the overflow gate (> self.config.m * 2) and verifies calculate_distance appears within that window, rather than grepping the whole file where search code also calls it.

Comment thread .github/workflows/supply-chain.yml Outdated

permissions:
contents: read
pull-requests: write
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict PR-write token to the dependency-review job

pull-requests: write is set at workflow scope, so every job in this workflow (including jobs that execute downloaded toolchains and registry data) receives a write-capable GITHUB_TOKEN. Only the dependency-review step needs this permission for comment-summary-in-pr, so leaving it global unnecessarily expands blast radius if any action or dependency in other jobs is compromised.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — fixed in 058e8d8. pull-requests: write is now scoped to the dependency-review job only. The other four jobs (cargo-audit, cargo-deny, npm-audit, lockfile-integrity) run with the workflow-level contents: read permission only.

1. supply-chain.yml: Move pull-requests: write from workflow-level to
   the dependency-review job only. Other jobs (cargo-audit, cargo-deny,
   npm-audit, lockfile-integrity) don't need write access and should
   run with read-only tokens to minimize blast radius.

2. regression-guard.yml: The hnsw-distance-based-neighbor-pruning check
   now verifies calculate_distance() appears within 20 lines of the
   overflow gate (> self.config.m * 2), not just anywhere in index.rs.
   The old whole-file grep would pass even if distance-based pruning
   was removed, because search code also calls calculate_distance.

Both issues flagged by Codex review.
@ecc-tools
Copy link
Copy Markdown

ecc-tools Bot commented May 21, 2026

ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

Open in Devin Review

Comment on lines +42 to +48
let mut entries = Vec::with_capacity(stored_ids.len());
for id in &stored_ids {
if let Some(vector) = storage.get(id)? {
entries.push((id.clone(), vector));
}
}
index.insert_batch(entries)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Storage rebuild reads each vector in a separate transaction (O(N) transactions)

The rebuild loop at crates/ruvector-router-core/src/vector_db.rs:43-47 calls storage.get(id) for each stored ID. Each get() call opens a new read transaction (storage.rs:182). For a database with thousands of vectors, this means thousands of separate transactions. A single-transaction bulk read would be more efficient. This is a startup-only cost so it doesn't affect runtime performance, but for large persisted databases the reopen time could be noticeable.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — this is a startup-only cost and acceptable for the initial fix. A bulk get_all() or single-transaction read is a good follow-up optimization for large persisted databases, but out of scope for this correctness fix.

@FlexNetOS FlexNetOS merged commit 8fdc94c into main May 21, 2026
69 of 70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants