New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect total value with k-NN search #97807
Comments
Pinging @elastic/es-search (Team:Search) |
This is interesting. I tried the same thing on the deprecated Something weird with parsing or serialization. Digging in :) |
OK, @devatadecco I have an answer for you that may incur further questions :)
In your simple example, the number of vectors is small, so it might be surprising that the total hit count for global But, let's consider the situation where we have 1M+ vectors. Per-shard, we will only gather up to
Currently, the We can contrast |
Thank you for responding. Provided example focuses on showing the behavior. My intention is to maintain consistent pagination while we set And if you're asking me! I prefer not to have Let me know if there is a proper way for pagination that I miss? |
Interesting, I can see that being a potential solution. We will have to think some more. There might be a better solution here that neither of us see. I do agree, that
The tricky part with pagination is that Really, as it is right now, there is no nice way to pagination global top k. I think some future planned work could make this better. I will try to keep this issue updated! |
Any update? |
I would argue that the "total hit" is still correct for kNN since kNN is technically always a "match_all" unless you provide some filter. It just limits to the As for improvements around kNN experience at query time check this issues out:
If y'all have requirements for these issues, let us know! |
We reverted back to using cosine similarity and script I wouldn't contradict your argument. However, I see an inconsistency between the value of K and the limit. I see that you are addressing this in #97533. |
Elasticsearch Version
Version: 8.8.0, Build: docker/c01029875a091076ed42cdb3a41c10b1a9a5a20f/2023-05-23T17:16:07.179039820Z, JVM: 20.0.1
Installed Plugins
None
Java Version
bundled
OS Version
Linux 5179425a12d1 5.10.104-linuxkit #1 SMP PREEMPT Thu Mar 17 17:05:54 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
Problem Description
Incorrect total value with k-NN search
Running knn queries total value always match the
k
param in the query while there are more than k results, if k exceed the right total value it will be shown correct.Steps to Reproduce
1 - Create index with proper mapping for knn search
2- Populate more than one document
3- Make knn search with k=1 and a vector, check total
4- Increase k value with same vector, check total
PUT /test
{
"mappings": {
"properties": {
"title-vector": {
"type": "dense_vector",
"dims": 5,
"index": true,
"similarity": "dot_product"
}
}
}
}
POST /test/_bulk?refresh=true&pretty
{ "index": { "_id": "1" } }
{ "title-vector": [1, 0, 0, 0, 0] }
{ "index": { "_id": "2" } }
{ "title-vector": [1, 0, 0, 0, 0] }
{ "index": { "_id": "3" } }
{ "title-vector": [1, 0, 0, 0, 0] }
POST test/_search
{ "knn": {
"field": "title-vector",
"query_vector": [1, 0, 0, 0, 0],
"k": 2,
"num_candidates": 10000}}
Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: