fix: some minor bugs#370
Merged
Merged
Conversation
ihb2032
pushed a commit
to ihb2032/zvec
that referenced
this pull request
Apr 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Asked qoder to do a full round of review before quota expired.
Bug Report: zvec Codebase Audit
Date: April 22, 2026
Scope: Full source audit of the zvec in-process vector database
Files changed: 9 source files edited, 1 file renamed
Bug 1 — Race condition in
DropIndex()ondestroyed_flagSeverity: High
File:
src/db/collection.ccDescription
CollectionImpl::DropIndex()checks thedestroyed_flag before acquiring theschema_handle_mtx_lock. Every other mutating method in the class (Destroy(),Flush(),CreateIndex(),Optimize(),AddColumn(),DropColumn(),AlterColumn()) acquires the lock first, then checksdestroyed_. This creates a TOCTOU (time-of-check-to-time-of-use) race: another thread could callDestroy()between the check and the lock acquisition, allowingDropIndex()to proceed on a destroyed collection, potentially causing use-after-free or null pointer dereferences on released resources.Before
Fix
Swapped the ordering to match every other method — acquire the lock first, then check:
Bug 2 — Integer underflow in
ConcurrentRoaringBitmap32::range_cardinality()Severity: High
File:
src/db/common/concurrent_roaring_bitmap.hDescription
range_cardinality(uint32_t min_doc_id, uint32_t max_doc_id)computesmin_doc_id - 1to calculate the rank at the position just beforemin_doc_id. Whenmin_doc_id == 0, this arithmetic wraps toUINT32_MAX(4,294,967,295) becausemin_doc_idis an unsigneduint32_t. This causesroaring_bitmap_rank()to return the total cardinality, making the subtractionmax_rank - min_rankreturn 0 (or a bogus value) instead of the correct range count.The 64-bit counterpart (
ConcurrentRoaringBitmap64::range_cardinality) already had a guard for this case (min_doc_id <= 0 ? 0 : ...), but the 32-bit version was missing it.Before
Fix
Bug 3 — Uninitialized member variables in
DocSeverity: High
File:
src/include/zvec/db/doc.hDescription
Doc::doc_id_(uint64_t) andDoc::op_(Operatorenum) have no default member initializers and are not initialized in the default constructor (Doc() = default). Reading these members before explicit assignment is undefined behavior per the C++ standard. In practice, this means a default-constructedDoccould report a garbagedoc_idor operation type, which could silently corrupt data during writes or produce incorrect log output.All other member variables in the class (
pk_,score_,fields_) are properly initialized.Before
uint64_t doc_id_; Operator op_;Fix
Bug 4 — Source filename typo:
rocbsdb_context.ccSeverity: Medium
File:
src/db/common/rocbsdb_context.ccDescription
The RocksDB wrapper implementation file was named
rocbsdb_context.cc(missing the letter 'k' — "rocbs" instead of "rocks"). While the build system (SRCS *.ccglob) compiled it without issues, the typo creates confusion and makes it hard to find the file by searching for "rocksdb". The corresponding header is correctly namedrocksdb_context.h.Fix
Renamed
rocbsdb_context.cctorocksdb_context.ccviagit mv.Bug 5 — Wrong error descriptions for
InvalidChannelCountandInvalidReplicaCountSeverity: Medium
File:
src/db/common/error_code.ccDescription
Error codes 2045 (
InvalidChannelCount) and 2046 (InvalidReplicaCount) both had their description string set to"Invalid field name", which is a copy-paste error from the preceding line (error code 2044,InvalidFieldName). This means any error message surfaced to users or logs for channel count or replica count validation failures would misleadingly say "Invalid field name".Before
Fix
Bug 6 —
DirectoryAlreadyExistsandDirectoryNotExistsdeclared but never definedSeverity: Medium
File:
src/db/common/error_code.hDescription
error_code.hdeclares two error codes viaPROXIMA_ZVEC_ERROR_CODE_DECLARE:However,
error_code.cccontains no correspondingPROXIMA_ZVEC_ERROR_CODE_DEFINEfor either. These areexterndeclarations of globalconstobjects that have no definition in any translation unit. If any code ever references them, it would cause a linker error. A codebase search confirmed they are not referenced anywhere.Fix
Removed both dead declarations from
error_code.h.Bug 7 — Incorrect comment:
64 * 1024 * 1024labeled as "128M"Severity: Low
File:
src/include/zvec/db/options.hDescription
The constant
DEFAULT_MAX_BUFFER_SIZEis defined as64 * 1024 * 1024(= 67,108,864 bytes = 64 MB), but the comment says// 128M. This is misleading for anyone reading the code to understand the default buffer size.Before
Fix
Bug 8 — Write lock used for read-only
storage_size_in_bytes()Severity: Low
File:
src/db/common/concurrent_roaring_bitmap.hDescription
ConcurrentRoaringBitmap32::storage_size_in_bytes()is a read-only operation (it callsroaring_bitmap_portable_size_in_byteswhich does not mutate the bitmap), but it acquires astd::unique_lock(exclusive/write lock). This unnecessarily blocks all concurrent readers, reducing throughput. Every other read-only method in the class correctly usesstd::shared_lock.Before
Fix
Bug 9 —
staticfunction definition in header causes per-TU duplicationSeverity: Low
File:
src/db/common/file_helper.hDescription
GetFileName(FileID)is declaredstaticin a header file. In C++,staticat file/namespace scope gives each translation unit (.ccfile) its own private copy of the function. Since this header is included across many.ccfiles, the linker includes duplicate copies of the function in the final binary, increasing code size. The function should beinlineinstead, which tells the linker to deduplicate identical definitions while still allowing the definition in the header.Before
Fix
Bug 10 — Missing
#pragma onceinclude guard inutils.hSeverity: Low
File:
src/db/common/utils.hDescription
utils.his the only header insrc/db/common/that does not have a#pragma onceinclude guard. Without it, if the header is included multiple times (directly or transitively) in the same translation unit, it causes redefinition errors forstd::string indent(int level). While the current include graph may not trigger this, it is fragile and inconsistent with the rest of the codebase.Fix
Added
#pragma onceat the top of the file.Bug 11 — Comment indices in
is_valid_type_vdo not matchValuevariant orderingSeverity: Info
File:
src/include/zvec/db/doc.hDescription
The
Valuevariant and theis_valid_type_vtype trait both list the same types, but the comment indices diverge in two places:In the
Valuevariant, index 13 isstd::vector<int64_t>and index 14 isstd::vector<uint32_t>. Inis_valid_type_v, the comments had these swapped (13 foruint32_t, 14 forint64_t).In the
Valuevariant, index 20 isstd::pair<..., std::vector<float>>and index 21 isstd::pair<..., std::vector<float16_t>>. Inis_valid_type_v, the comments had these swapped.While
is_valid_type_vis a simple OR-chain (so the ordering does not affect runtime behavior), mismatched index comments are dangerous for anyone writing serialization or deserialization code that relies on variant indices. The wrong comment could lead someone to use the wrong index in astd::get<N>()call.Fix
Reordered the entries in
is_valid_type_vand corrected the comments to match the actualValuevariant ordering exactly.Verification note: Serialization in
doc.ccusesstd::visitwithif constexprtype matching and a separateValueTypeenum — not variant indices — so reorderingis_valid_type_ventries has zero impact on existing serialized data. Thememory_usage()andoperator==functions do use.index()on the variant, but those already have the correct index-to-type mapping.Bug 12 —
DeleteByFilter()computesget_all_segments()twice (one result unused)Severity: Medium
File:
src/db/collection.ccDescription
CollectionImpl::DeleteByFilter()callsget_all_segments()at line 1552 and stores the result in a local variablesegments, but never uses it. Then at line 1560,get_all_segments()is called a second time as an argument tosql_engine_->execute(). This is a copy-paste artifact that wastes work —get_all_segments()iterates and copies all segments each time it is called.Before
Fix
Removed the unused
segmentsvariable. The single remainingget_all_segments()call insideexecute()is sufficient:VectorQuery query; query.filter_ = filter; query.topk_ = INT32_MAX; query.output_fields_ = std::vector<std::string>{}; query.include_doc_id_ = true; auto ret = sql_engine_->execute(schema_, query, get_all_segments());Bug 13 —
Open()allocatessql_engine_even aftercreate()/recovery()failureSeverity: Medium
File:
src/db/collection.ccDescription
CollectionImpl::Open()callscreate()orrecovery()and stores the status ins, but then unconditionally proceeds to allocate aProfilerand create thesql_engine_before returnings. Ifcreate()orrecovery()failed, these resources are allocated for no reason — the callers (Collection::CreateAndOpen()andCollection::Open()) check the status and discard theCollectionImpl, so the engine is immediately destroyed.Before
Fix
Added an early return via
CHECK_RETURN_STATUS(s)before thesql_engine_allocation:Bug 14 — Typo "ignnored" in
options.hcommentSeverity: Low
File:
src/include/zvec/db/options.hDescription
The comment on
enable_mmap_has a double-n typo:"ignnored"instead of"ignored".Before
Fix
Verification Notes
All 14 fixes were reviewed for correctness and compatibility:
Docserialization uses an explicitValueTypeenum withstd::visit/if constexprtype matching, not variant indices. Reorderingis_valid_type_ventries (Bug 11) and initializingdoc_id_/op_(Bug 3) have no impact on existing persisted data.doc_id_{0}initialization (Bug 3): The value0is consistent withDoc::clear()which already resetsdoc_id_to0. In all code paths,doc_idis reassigned bySegment::internal_insert()orSegment::Update()before any database operation, so the default value is never persisted.