Skip to content

[Bug]: Sparse vector search returns different results based on element insertion order #379

@ccdv-ai

Description

@ccdv-ai

Description

The sparse vector search returns completely different results depending on the order in which key-value pairs are ordered in the sparse vector dictionary. The same vector with reordered elements produces different scores and rankings.

# Same vector, different orderings
sorted_vec   = {1: 0.5, 5: 1.2, 10: 0.8, 1000: 3.0}
reversed_vec = {1000: 3.0, 10: 0.8, 5: 1.2, 1: 0.5}

Possible fix:
Make sure vectors are sorted during insertion/query
Improve the documentation

Steps to Reproduce

1. Index documents with sparse vectors
2. Query with the same query vector
3. Compare results when sparse vectors are:
     - Sorted (default)
     - Reversed (keys in reverse order)
     - Shuffled (random order)

Logs / Stack Trace

Operating System

Ubuntu 24.04 LTS

Build & Runtime Environment

Python 3.12

Additional Context

  • I've checked git status — no uncommitted submodule changes
  • I built with CMAKE_BUILD_TYPE=Debug
  • This occurs with or without COVERAGE=ON
  • The issue involves Python ↔ C++ integration (pybind11)

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions