You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pg-vector/nmslib and other projects support sparse vector capabilities.
I took a quick look at their implementation principles: they did not modify the implementation of hnsw, but only supported sparse vectors by modifying the metric/scoring function.
Q1: It seems that we can support sparse vectors by doing similar ScoreFunction optimization. Is this true? Is it possible for us to consider this direction later?
Q2: Is graph indexing (hnsw/diskAnn) the most appropriate way to index sparse vectors? Because vector products such as es(elasticsearch)/milvus/qdrant basically use inverted indexes to implement sparse vector indexes. My idea is that when the query token is long, if only sparse vector retrieval is performed (without filter), then the graph index may be relatively fast, and only the topK level documents need to be scanned from the graph index, so the graph index efficiency is acceptable. The inverted index needs to find all tokens, and then calculate the similarity of each hit document in memory. Without considering pruning optimization, if the number of hit documents is particularly large, it will take a long time to compare each document. However, es uses the inverted index technology, so I speculate that they may have considered that the inverted index consumes less CPU in the indexing stage and their pruning optimization is better. Most users use filters, and there are many optimizations for merging inverted chains, so the final amount of calculation is not that large. Therefore, the user scenario of ES is suitable for inverted index technology.
The text was updated successfully, but these errors were encountered:
pg-vector/nmslib and other projects support sparse vector capabilities.
I took a quick look at their implementation principles: they did not modify the implementation of hnsw, but only supported sparse vectors by modifying the metric/scoring function.
So I raise two questions:
The text was updated successfully, but these errors were encountered: