Skip to content

Distributed vector search#613

Merged
jasonstack merged 3 commits intodatastax:cep-vsearchfrom
jasonstack:cep-vsearch-zy
May 15, 2023
Merged

Distributed vector search#613
jasonstack merged 3 commits intodatastax:cep-vsearchfrom
jasonstack:cep-vsearch-zy

Conversation

@jasonstack
Copy link
Copy Markdown

@jasonstack jasonstack commented May 6, 2023

  • query all replicas selected by consistency level at once with full request range from ReadCommand

  • filter top-K results at coordinator in QueryPlan#postProcessor using PriorityQueue and discard rows with the lowest scores when exceeding limit

  • fail vector search if limit exceeds MAX_TOP_K (default 1000) to avoid overflow HNSW data structure or OOM in QueryPlan#postProcessor

  • skip short-read-protection, read repair and replica filtering protection because replica response will be top-k. Data mismatch may be caused by top-k filtering instead of actual data inconsistency.

  • paging will result in lower recall, because coordinator resumes from last returned row's partition

TODO: replica should return top-k result to coordinator

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants