Skip to content

Retrieval issues using Opensearch k-nn support with high top-k values #3102

@danielbichuetti

Description

@danielbichuetti

Describe the bug
There is a huge retrieval degradation when working with high top-k values using OpensearchDocumentStore. In some situations, the issue make you get 20 quality results when you expected 100 (top_k).

Error message
Low number of top-k responses

Expected behavior
Haystack should return at approx. the number requested in top-k of quality responses.

Additional context
Today, ef_search parameter is being set up during index creation in a hard-coded way (value is being set for faiss and hnsw, as 20). For the default parameters, where top_k is 10, this is ok. But when you try to get wider results, the ef_search parameter will drop search quality.

ef_search parameter is a balance of quality results vs. search speed. But this should be let to the user to make the decision. On high-end cloud instances, ef_search values around 500 won't represent any issue. Providing good search speed and great results.

ef_search parameter can be set only during index creation. So, I would like to propose to allow user to change it via a parameter. And, when using top_k, if top-k is higher than ef_search, use a warning to say that results will suffer degradation.

Indeed, Malkov (author of nmslib) suggests ef_search to always be higher than top_k.

To Reproduce
Create an OpensearchDocumentStore, with default parameters. Try to use query_by_embedding setting top_k to a high value (for example, 100).

Collect the results.

FAQ Check

System:

  • OS: Linux
  • GPU/CPU: Intel i7
  • Haystack version (commit or version number): 1.7.2
  • DocumentStore: OpensearchDocumentStore

Metadata

Metadata

Assignees

Labels

P2Medium priority, add to the next sprint if no P1 availabletopic:opensearchtype:bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions