MOD-15578 Track shared SVS thread pool memory & expose it through public API#972
MOD-15578 Track shared SVS thread pool memory & expose it through public API#972dor-forer wants to merge 10 commits into
Conversation
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
There was a problem hiding this comment.
Pull request overview
This PR improves VecSim memory accounting by routing the shared SVS thread pool singleton allocations through a tracked allocator and exposing that process-wide (non-index) memory via a new public C API and additional VECSIM_INFO fields.
Changes:
- Added
VecSim_GetGlobalMemory()and appendedGLOBAL_MEMORYto the top-level debug info iterator returned byVecSimIndex_DebugInfoIterator. - Tracked shared SVS thread pool allocations and exposed them via
SHARED_SVS_THREADPOOL_MEMORYin SVS debug info. - Updated unit-test comparators/field-order expectations and added a test to enforce the invariant between
GLOBAL_MEMORY,SHARED_SVS_THREADPOOL_MEMORY, andVecSim_GetGlobalMemory().
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/unit_test_utils.h | Extends debug-info comparator signatures to optionally expect GLOBAL_MEMORY. |
| tests/unit/unit_test_utils.cpp | Updates debug-info comparators and expected field orders to include GLOBAL_MEMORY / SHARED_SVS_THREADPOOL_MEMORY. |
| tests/unit/test_svs.cpp | Adds a typed test validating global-memory vs shared-threadpool-memory invariants. |
| tests/unit/test_svs_tiered.cpp | Updates SVS threadpool wrapper construction to pass a VecSimAllocator. |
| tests/unit/test_svs_threadpool.cpp | Updates threadpool wrapper tests to pass an allocator and resets shared pool state in setup. |
| src/VecSim/vec_sim.h | Adds the new public API declaration VecSim_GetGlobalMemory(). |
| src/VecSim/vec_sim.cpp | Implements VecSim_GetGlobalMemory() and appends GLOBAL_MEMORY in the C API debug iterator wrapper. |
| src/VecSim/vec_sim_common.h | Clarifies that per-index memory excludes process-wide/global allocations. |
| src/VecSim/utils/vec_utils.h | Adds new debug field name constants for GLOBAL_MEMORY / SHARED_SVS_THREADPOOL_MEMORY. |
| src/VecSim/utils/vec_utils.cpp | Defines the new debug field name strings. |
| src/VecSim/index_factories/svs_factory.cpp | Accounts for per-index threadpool wrapper allocation in SVS initial size estimation. |
| src/VecSim/algorithms/svs/svs.h | Constructs the per-index SVS threadpool wrapper with the index allocator and adds SHARED_SVS_THREADPOOL_MEMORY to debug info. |
| src/VecSim/algorithms/svs/svs_utils.h | Introduces a tracked allocator for the shared pool, tracks allocations, and changes wrapper ctor to allocate parallelism via index allocator. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
introduce VecSim_GlobalStatsInfo
31c0e0a to
0adca87
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0adca87. Configure here.
The doc string claimed the field is exposed only in SVS tiered indexes, but `SVSIndex::debugInfoIterator()` emits it for flat SVS indexes too (and inside the BACKEND_INDEX section of tiered SVS). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #972 +/- ##
==========================================
- Coverage 96.99% 96.97% -0.03%
==========================================
Files 130 130
Lines 7793 7829 +36
==========================================
+ Hits 7559 7592 +33
- Misses 234 237 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
* Drop the unconditional `#include "VecSim/algorithms/svs/svs_utils.h"` at the top of `unit_test_utils.cpp` (added in 3c2023a). The header pulls in `<svs/core/distance.h>`, `<svs/index/vamana/dynamic_index.h>`, etc. — none of which are available when `USE_SVS=OFF`. The HAVE_SVS-guarded include lower in the file already provides the symbols `validateSVSIndexAttributesInfo` needs. * Wrap `compareSVSIndexInfoToIterator` (definition + declaration in the header) and its call site inside `chooseCompareIndexInfoToIterator` with `#if HAVE_SVS`. The function references `VecSimSVSThreadPool`, which is undefined when SVS is disabled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0adca87 to
9d082f0
Compare
- svs_factory.cpp: mark unused 'p' variable with [[maybe_unused]] to silence -Wunused-variable under -Werror (Copilot review). - vec_utils.h: rewrite SHARED_SVS_THREADPOOL_MEMORY_STRING doc comment to match actual emission behavior — emitted by SVSIndex::debugInfoIterator() for both non-tiered (top-level) and tiered (BACKEND_INDEX) SVS responses (Copilot review).

The shared
VecSimSVSThreadPoolsingleton was previously created via rawnewwith the default allocator, so its slot vector and per-slotThreadSlotobjects bypassedVecSimAllocatorand were invisible to any memory accounting downstream (FT.INFO,INFO MODULES, etc.).This PR:
VecSimAllocator(usingVecsimSTLAllocatorfor the slot vector andstd::allocate_sharedfor thread objects).size_t VecSim_GetGlobalMemory(void)returning the total bytes of process-wide VecSim allocations not tied to any single index — today equal to the shared SVS thread pool's tracked allocation size.VECSIM_INFOvia two new fields:GLOBAL_MEMORY— appended at the top level of every algorithm's debug response by the C API wrapperVecSimIndex_DebugInfoIterator. Always present (value may be 0).SHARED_SVS_THREADPOOL_MEMORY— appended at the end of any SVS algorithm section bySVSIndex::debugInfoIterator(). Present at the top level of a non-tiered SVS response, or insideBACKEND_INDEXof a tiered SVS response.Public API change
Before
// (no API to query process-wide VecSim memory)After
Callers (e.g. RediSearch) can fold this into per-spec or process-wide memory metrics without depending on which algorithm contributes.
VECSIM_INFO(FT.DEBUG) output changeCommon header (every algorithm, unchanged)
FLAT — 11 → 12 fields
<common header × 10> BLOCK_SIZE + GLOBAL_MEMORYHNSW — 18 → 19 fields
<common header × 10> BLOCK_SIZE, M, EF_CONSTRUCTION, EF_RUNTIME, MAX_LEVEL, ENTRYPOINT, EPSILON, NUMBER_OF_MARKED_DELETED + GLOBAL_MEMORYSVS (non-tiered) — 25 → 27 fields
Tiered HNSW — 16 → 17 fields
<common header × 10> (ALGORITHM = "TIERED") MANAGEMENT_LAYER_MEMORY, BACKGROUND_INDEXING, TIERED_BUFFER_LIMIT FRONTEND_INDEX = nested [<FLAT fields, 11>] (no GLOBAL_MEMORY in nested) BACKEND_INDEX = nested [<HNSW fields, 18>] (no GLOBAL_MEMORY in nested) TIERED_HNSW_SWAP_JOBS_THRESHOLD + GLOBAL_MEMORYTiered SVS — 18 → 19 fields
Two emission rules
GLOBAL_MEMORYis appended exactly once, at the outermost iterator level, by the C API wrapper. Never appears inside a nestedFRONTEND_INDEX/BACKEND_INDEX.SHARED_SVS_THREADPOOL_MEMORYis appended at the end of any SVS algorithm section bySVSIndex::debugInfoIterator(). So it shows up at the top level of a non-tiered SVS response, or insideBACKEND_INDEXof a tiered SVS response — never duplicated.Stats / API output change
VecSim_GetGlobalMemory()Before: API did not exist. The pool's slot vector and per-slot
ThreadSlotobjects went through the default allocator and were not tracked anywhere.After:
Per-index
getAllocationSize()(SVS only)Before: Did not include any per-index portion of the parallelism slot, since the pool was untracked entirely.
After: Each SVS index's per-index allocator now tracks its own
parallelismslot (a small fixed-size structure inside the index, allocated through the index'sVecSimAllocator). The shared pool itself remains process-wide and reported viaVecSim_GetGlobalMemory().Cross-field invariant
Since the SVS thread pool is currently the only contributor to global memory:
This invariant is enforced by the new gtest
SVSTest.debugInfoGlobalMemoryEqualsSharedSVSThreadPoolMemory. If a future contributor is added toVecSim_GetGlobalMemory()without updating breakdowns, the test will catch the drift.Tests
SVSTest.debugInfoGlobalMemoryEqualsSharedSVSThreadPoolMemory(typed test, runs per SVS data type) — asserts both new fields exist exactly once in a non-tiered SVS response and report the same bytes asVecSim_GetGlobalMemory().compareFlatIndexInfoToIterator,compareHNSWIndexInfoToIterator,compareSVSIndexInfoToIteratorto take anexpect_global_memoryparameter (defaulttrue) — needed because these comparators are called both with the C API iterator (top level, has GLOBAL_MEMORY) and as nested-backend comparators insidecompareTieredIndexInfoToIterator(no GLOBAL_MEMORY).SVS25 → 26.TIERED_SVSconstant is unchanged — the newSHARED_SVS_THREADPOOL_MEMORYfield lives inside the nested SVSBACKEND_INDEXiterator (already counted as one entry at the tiered level), andGLOBAL_MEMORYis added via the comparator's+1when called with the C API iterator.FLAT/HNSW/TIERED_HNSWfield-count constants represent the C++ method count (no GLOBAL_MEMORY); the comparators add+1when called with the C API iterator.getFlatFields(),getHNSWFields(),getTieredHNSWFields(),getSVSFields(),getTieredSVSFields()updated to appendGLOBAL_MEMORY(andSHARED_SVS_THREADPOOL_MEMORYfor SVS) at the new field positions.Compatibility
VECSIM_INFO. Existing consumers parsing by field name are unaffected; consumers indexing by position must shift their expectations accordingly (covered above).VecSim_GetGlobalMemory()is purely additive.Mark if applicable
Supersedes #967 — ownership transferred to @dor-forer. Original branch authored by @meiravgri; identical commits cherry-pushed to
dor-forer-MOD-15578-track-svs-thpool-memoryso a non-self reviewer can be assigned.Note
Medium Risk
Adds public API and VECSIM_INFO fields that downstream metrics must interpret; memory totals change for SVS workloads but behavior is otherwise unchanged.
Overview
This PR makes shared SVS thread pool memory visible to operators and embedders instead of leaving it untracked on the default allocator.
The shared
VecSimSVSThreadPoolImplsingleton now allocates its slot vector andThreadSlotobjects through a dedicatedVecSimAllocator(VecsimSTLAllocator/allocate_shared). Per-indexVecSimSVSThreadPoolwrappers take the index allocator and track the smallparallelism_control block there; initial size estimation includes that cost.VecSim_GetGlobalMemory()reports process-wide bytes not attributed to a single index (today: the shared pool).VecSimIndex_DebugInfoIteratoralways appendsGLOBAL_MEMORYat the outermost level; SVS addsSHARED_SVS_THREADPOOL_MEMORYindebugInfoIterator().VecSimIndexStatsInfodocs clarify per-indexmemoryexcludes global pool bytes to avoid double-counting when aggregating indexes.An
isInitializedflag lets global memory queries return 0 without lazy-constructing the pool. Tests and debug-info comparators were updated for the new fields and C API vs nested iterator behavior.Reviewed by Cursor Bugbot for commit fda6a43. Bugbot is set up for automated code reviews on this repo. Configure here.