Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function score query omits relevant results on large dataset #298

Closed
yonatanshemeshhuji opened this issue Jul 29, 2021 · 21 comments
Closed
Labels
bug Something isn't working

Comments

@yonatanshemeshhuji
Copy link

Hi! First of all, let me say I admire your work, it is truly amazing.

When running the following query:

body = {
                "query": {
                    "function_score": {
                        "query": {
                            "bool": {
                                "filter": {
                                    "term": {"country": "Belgium"}
                                },
                                "boost": 1
                            }
                        },
                        "boost_mode": "replace",
                        "functions": [
                            {
                                   "elastiknn_nearest_neighbors" : {
                                    "vec": query_vector.flatten().tolist(),
                                    "field": "imVec",
                                    "similarity": self._similarity,
                                    "model": 'exact',
                                    "candidates" : 1000
                                },
                                "weight": 2
                            },
                        ]
                    }
                },
            }

I am getting great results. However, I can not afford 'exact' queries. When Trying the following query:

body = {
                "query": {
                    "function_score": {
                        "query": {
                            "bool": {
                                "filter": {
                                    "term": {"country": "Belgium"}
                                },
                                "boost": 1
                            }
                        },
                        "boost_mode": "replace",
                        "functions": [
                            {
                                "elastiknn_nearest_neighbors" : {
                                    "vec": query_vector.flatten().tolist(),
                                    "field": "imVec",
                                    "similarity": self._similarity,
                                    "model": 'permutation_lsh',
                                    "candidates" : 1000
                                },
                                "weight": 2
                            },
                        ]
                    }
                },
            }

(note that the only difference is the "model": permutation_lsh instead of exact) I'm getting bad results.
In the examples above self._similarity == "L2"
I hope I am not missing anything and wasting your time. Can this be supported?

@alexklibisz
Copy link
Owner

Hi, thanks for the kind remarks.

Permutation LSH is mostly intended for Cosine similarity. It can be computed w/ L1 and L2, but it doesn't really make sense to use them together. So if you need L2 similarity, I would recommend the Cosine LSH Mapping and query: https://elastiknn.com/api/#cosine-lsh-mapping, https://elastiknn.com/api/#cosine-lsh-query. Those can of course also be used with the function score queries.

Also, if you are bottlenecked by performance even after filtering, read this section of the docs: https://elastiknn.com/api/#using-stored-fields-for-faster-queries

@yonatanshemeshhuji
Copy link
Author

Thanks for the quick reply. Just ran the above query with cosine similarity (data is currently mapped with permutation_lsh) and got bad results.
However, When I am running with no filters using the cosine similarity with the following query:

body = {
                "query": {
                    "elastiknn_nearest_neighbors": {
                            "vec": query_vector.flatten().tolist(),
                             "field": "imVec",
                             "similarity": "angular",     # I saw you renamed it to cosine, did not pulled yet.
                             "model": "permutation_lsh",
                             "candidates": 100
                     }
                }
            }

I am getting very good results, thus I suspect that the issue is related with the combination of score_function and lsh_permutation.
I would be very grateful if you can take a look at that.
I will surely give the cosine_lsh mapping a try and see if the score function behaves nicely with this mapping.
Thanks again for your work.

@alexklibisz
Copy link
Owner

alexklibisz commented Jul 29, 2021

Just ran the above query with cosine similarity (data is currently mapped with permutation_lsh) and got bad results.

It's possible your data just isn't suited for permutation LSH. It basically assumes there are meaningful differences in the absolute values in your vectors, which is not always a good assumption. The docs explain more how that algo works.


I am getting very good results, thus I suspect that the issue is related with the combination of score_function and lsh_permutation.

Note this caveat from the docs:

When using "model": "lsh", the "candidates" parameter is ignored and vectors are not re-scored with the exact similarity like they are with a elastiknn_nearest_neighbors query. Instead, the score is: max similarity score * proportion of matching hashes. This is a necessary consequence of the fact that score functions take a doc ID and must immediately return a score.


I would be very grateful if you can take a look at that.

I'd need a sample of data and some way to easily reproduce the problem.

@yonatanshemeshhuji
Copy link
Author

Oh, this caveat does explain the behavior I was witnessing! just to make sure I understand, the caveat applies to any kind of "lsh" model (e.g. permutation_lsh), correct?
So in practice, I can rescore myself the top k results manually?
Thanks

@yonatanshemeshhuji
Copy link
Author

yonatanshemeshhuji commented Aug 1, 2021

Hi Alex,
Continuing our discussion, I will (clearly as possible) describe my experiments here. Unfortunately, I can not supply any data
or code, as I am working on a corporate project.
My data consists of ~1M images. each image was passed through a conv net to extract features from.
What I've done:

  1. mapped all 1M images vecs with some metadata alongside each sample using the syntax from here. One of these metadata fields is country.
  2. ran ANN query on a sample (that does not live in the dataset) using this syntax with number of candidates = 100. Great success, I found relevant neighbors. Here, I used cosine and l2 similarity, both yielded great results. Exactly two of the results I found from the top 100 candidates had result.country == Belgium. I will call them B1 and B2 for clarity.
  3. ran a function score on the same sample, this time filtering out samples from Belgium. used this syntax. Here, I haven't got good results.
  4. After reading your latest reply yesterday, I did part (3) again and stored the top 100 results that came back. None of them were B1 or B2. This contradicts the explanation I got from you as I understood it: given that I did part (1) with candidates = 100, I would expect B1 and B2 to appear in Belgium's top 100 results, (since Belgium samples are only a subset of the original 1M dataset I used in parts (1-2). Only when I searched for top 3,000 Belgium filtered results B1 appeared, and when increasing to top 4,000 results B2 apeared.

I hope this info is sufficient for you to check if there is maybe a bug there, concerning permutation_lsh + function score combination. I am planning to run this experiment again with model = lsh, and will update the results here soon.
Thanks, Yonatan.

@alexklibisz
Copy link
Owner

Hey, thanks for all the detail. I'll try to find some time to review your results this week.

@alexklibisz
Copy link
Owner

Ok, some followup questions:

Are you specifically using the function_score_query, documented here?

Just as a sanity check, when you did the function score query, did all of the returned docs have country = Belgium?

How many shards to you have in the index? Does the behavior change if you use a different number of shards?

I agree this is a strange behavior. I have a guess for what's happening but it could be completely wrong based on the answers to the questions above.

@alexklibisz alexklibisz added the Q&A Questions (and hopefully answers) about Elastiknn usage label Aug 9, 2021
@yonatanshemeshhuji
Copy link
Author

Hi!

Are you specifically using the function_score_query, documented here?

Yes.

Just as a sanity check, when you did the function score query, did all of the returned docs have country = Belgium?

Yes, when running the second "body" from the original post.

How many shards to you have in the index? Does the behavior change if you use a different number of shards?

At time of trying, we had number of shards = 1. I will retry with shards = 5 but only by the end of the week and keep you updated on the results.

Thanks a lot, Yonatan.

@alexklibisz
Copy link
Owner

Thanks. I'm wondering if the problem is that the query is matching the first 1k/2k/3k documents on the provided filter (Belgium) and then only scoring and re-ranking those first 1k/2k/3k documents that it matched. I don't see this behavior documented in the ES docs though. Hmm.

@alexklibisz
Copy link
Owner

I'll try to find some time this week to reproduce this pattern. If you can, it would be interesting to see if you get similar results using the query rescorer.

@yonatanshemeshhuji
Copy link
Author

Follow up:
I now ran 2 experiments, both with number of shards = 5.
Both were done using permutation_lsh, however one had similarity = cosine and one had similarity = L2.
In both experiments I've seen better results!

However, I also did the following experiment that confused me a bit:
I ran a query with no filters, setting k == candidates == 1000. 72 of the results had country = Poland. I will call this the post - query filtered list.

Then I ran a filtered query on the same vector, this time filtering country == Poland, using the syntax from above and k == candidates == 1000:

body = {
                "query": {
                    "function_score": {
                        "query": {
                            "bool": {
                                "filter": {
                                    "term": {"country": "Poland"}
                                },
                                "boost": 1
                            }
                        },
                        "boost_mode": "replace",
                        "functions": [
                            {
                                "elastiknn_nearest_neighbors" : {
                                    "vec": query_vector.flatten().tolist(),
                                    "field": "imVec",
                                    "similarity": self._similarity,
                                    "model": 'permutation_lsh',
                                    "candidates" : 1000  # ignored.
                                },
                                "weight": 2
                            },
                        ]
                    }
                },
            }

I attained a list of 1000 events from Poland, which I will call the pre-query filtered list
as I said above, when I used shards = 1 the results were very bad, and here the pre-query filtered list results were much better.

Nevertheless, I expected that all 72 results from the post - query filtered list to appear in the pre - query filtered list, since k == candidates, but that was not the case. As a matter of fact none of them were there.

Maybe related question:
Can it be possible that the number of shards increase the number of unique scores when running a score_function query?

@alexklibisz
Copy link
Owner

Thanks, this is useful information. It sounds like it could be enough to reproduce it with synthetic data. If I understand your description, it sounds like the following is happening, (quoting from above):

I'm wondering if the problem is that the query is matching the first 1k/2k/3k documents on the provided filter (Belgium) and then only scoring and re-ranking those first 1k/2k/3k documents that it matched. I don't see this behavior documented in the ES docs though.

By trying it with more shards, you've distributed the relevant documents, so more of them will show up in the first candidates of each shard. I.e., if you have 10k docs matching the term, one shard, and candidates = 1000, then it will only rescore 1000 of the docs matching the term. If you have 5 shards and candidates = 1000, then it will match and rescore about 5000 of the docs matching the term.

I'll try to reproduce this one day this week with some synthetic data in an integration test.

As a final sanity check, could you see what happens when you set "model": "exact"?

@alexklibisz alexklibisz changed the title support for term filter + permutation_lsh hash Function score query omits relevant results on large dataset Aug 16, 2021
@alexklibisz alexklibisz added the bug Something isn't working label Aug 16, 2021
@alexklibisz
Copy link
Owner

I changed the title to something that seems to describe the exact issue. I also added the bug tag because this seems like a genuine bug, or at the very least, some strange Elasticsearch behavior that should be documented.

@yonatanshemeshhuji
Copy link
Author

Hi Alex, sorry for the big delay.

As a final sanity check, could you see what happens when you set "model": "exact"?
The problem does not appear, i.e. , I do pre - filtering and get good results.
To sum it all up:

  1. when using model=permutation_lsh, number of shards = 1 ----> bad results.
  2. when using model=permutation_lsh, number of shards = 5 ----> better results but still far from good.
  3. when using model=permutation_lsh, number of shards = 1 or 5 -----> great results, however as I mentioned I can not afford exact query.

I'm wondering if the problem is that the query is matching the first 1k/2k/3k documents on the provided filter (Belgium) and then only scoring and re-ranking those first 1k/2k/3k documents that it matched. I don't see this behavior documented in the ES docs though.

I think that the fact that the results get better as number of shards increase indicate something like you describe...

Please update if there is any new discoveries on the subject,
Yonatan.

@avihe
Copy link

avihe commented Sep 5, 2021

Hi @alexklibisz ,
I have the exact same problem, did you figure out what causes this?
Thanks!

@alexklibisz
Copy link
Owner

Not yet. I will hopefully have some time to look into it this week. There's also a development guide in the repo if anyone wants to look into it in the meantime.

@alexklibisz
Copy link
Owner

alexklibisz commented Sep 8, 2021

I was able to reproduce the issue in #306, but so far no fix.

It seems my guess was correct. It looks like Elasticsearch does this:

  1. Run the standard query.
  2. Apply the function to the first size docs returned from that query.
  3. Return those docs, even if there are other docs that would have produced a higher function score.

Whereas the behavior we want is this:

  1. Run the standard query.
  2. Apply the function to all docs returned from the query.
  3. Re-rank the docs based on function score and return the top size docs.

I looked through the docs and the FunctionScoreQuery Implementation. I still don't see any option to apply the function to all documents that match the standard query.

For now my best advice is to:

  1. Increase the size parameter. Then the standard query will match and apply the function to more results.
  2. Increase the standard query specificity. The max size value is 10k, so ideally the standard query returns fewer than 10k results.
  3. Continue increasing the number of shards until each shard has fewer than 10k matches.

I posted a question on the Elastic forum. I'm curious what they say.

@alexklibisz
Copy link
Owner

For some insight, I thought that maybe the min_score parameter would help here. I thought maybe the min_score indicates the minimum overall score, i.e., after functions are applied and boosted. Unfortunately, it looks like the min_score actually just applies to the standard query.

@alexklibisz
Copy link
Owner

So far no response on the Elastic discussion board. If there's nothing by the end of the week, I'll most likely just update the docs to reflect this quirk and close this issue. I don't have the time right now to prioritize much more than that.

@yonatanshemeshhuji
Copy link
Author

Hi!
Is it possible to get the "raw results" before exact knn is applied? i.e., get (candidates * segments * shards) results?
From there I can proceed to pre-filter manually.

@alexklibisz
Copy link
Owner

I don't think so. All of that is managed by Elasticsearch so the best we can do is run the pre-filtering query as a standard query and get back the standard query response.

Repository owner locked and limited conversation to collaborators Jul 17, 2022
@alexklibisz alexklibisz converted this issue into a discussion Jul 17, 2022
Repository owner unlocked this conversation Mar 27, 2024
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants