Function score query omits relevant results on large dataset #298

yonatanshemeshhuji · 2021-07-29T13:12:46Z

Hi! First of all, let me say I admire your work, it is truly amazing.

When running the following query:

body = {
                "query": {
                    "function_score": {
                        "query": {
                            "bool": {
                                "filter": {
                                    "term": {"country": "Belgium"}
                                },
                                "boost": 1
                            }
                        },
                        "boost_mode": "replace",
                        "functions": [
                            {
                                   "elastiknn_nearest_neighbors" : {
                                    "vec": query_vector.flatten().tolist(),
                                    "field": "imVec",
                                    "similarity": self._similarity,
                                    "model": 'exact',
                                    "candidates" : 1000
                                },
                                "weight": 2
                            },
                        ]
                    }
                },
            }

I am getting great results. However, I can not afford 'exact' queries. When Trying the following query:

body = {
                "query": {
                    "function_score": {
                        "query": {
                            "bool": {
                                "filter": {
                                    "term": {"country": "Belgium"}
                                },
                                "boost": 1
                            }
                        },
                        "boost_mode": "replace",
                        "functions": [
                            {
                                "elastiknn_nearest_neighbors" : {
                                    "vec": query_vector.flatten().tolist(),
                                    "field": "imVec",
                                    "similarity": self._similarity,
                                    "model": 'permutation_lsh',
                                    "candidates" : 1000
                                },
                                "weight": 2
                            },
                        ]
                    }
                },
            }

(note that the only difference is the "model": permutation_lsh instead of exact) I'm getting bad results.
In the examples above self._similarity == "L2"
I hope I am not missing anything and wasting your time. Can this be supported?

The text was updated successfully, but these errors were encountered:

alexklibisz · 2021-07-29T13:50:54Z

Hi, thanks for the kind remarks.

Permutation LSH is mostly intended for Cosine similarity. It can be computed w/ L1 and L2, but it doesn't really make sense to use them together. So if you need L2 similarity, I would recommend the Cosine LSH Mapping and query: https://elastiknn.com/api/#cosine-lsh-mapping, https://elastiknn.com/api/#cosine-lsh-query. Those can of course also be used with the function score queries.

Also, if you are bottlenecked by performance even after filtering, read this section of the docs: https://elastiknn.com/api/#using-stored-fields-for-faster-queries

yonatanshemeshhuji · 2021-07-29T17:53:44Z

Thanks for the quick reply. Just ran the above query with cosine similarity (data is currently mapped with permutation_lsh) and got bad results.
However, When I am running with no filters using the cosine similarity with the following query:

body = {
                "query": {
                    "elastiknn_nearest_neighbors": {
                            "vec": query_vector.flatten().tolist(),
                             "field": "imVec",
                             "similarity": "angular",     # I saw you renamed it to cosine, did not pulled yet.
                             "model": "permutation_lsh",
                             "candidates": 100
                     }
                }
            }

I am getting very good results, thus I suspect that the issue is related with the combination of score_function and lsh_permutation.
I would be very grateful if you can take a look at that.
I will surely give the cosine_lsh mapping a try and see if the score function behaves nicely with this mapping.
Thanks again for your work.

alexklibisz · 2021-07-29T19:38:11Z

Just ran the above query with cosine similarity (data is currently mapped with permutation_lsh) and got bad results.

It's possible your data just isn't suited for permutation LSH. It basically assumes there are meaningful differences in the absolute values in your vectors, which is not always a good assumption. The docs explain more how that algo works.

I am getting very good results, thus I suspect that the issue is related with the combination of score_function and lsh_permutation.

Note this caveat from the docs:

When using "model": "lsh", the "candidates" parameter is ignored and vectors are not re-scored with the exact similarity like they are with a elastiknn_nearest_neighbors query. Instead, the score is: max similarity score * proportion of matching hashes. This is a necessary consequence of the fact that score functions take a doc ID and must immediately return a score.

I would be very grateful if you can take a look at that.

I'd need a sample of data and some way to easily reproduce the problem.

yonatanshemeshhuji · 2021-07-31T12:31:16Z

Oh, this caveat does explain the behavior I was witnessing! just to make sure I understand, the caveat applies to any kind of "lsh" model (e.g. permutation_lsh), correct?
So in practice, I can rescore myself the top k results manually?
Thanks

yonatanshemeshhuji · 2021-08-01T11:34:34Z

Hi Alex,
Continuing our discussion, I will (clearly as possible) describe my experiments here. Unfortunately, I can not supply any data
or code, as I am working on a corporate project.
My data consists of ~1M images. each image was passed through a conv net to extract features from.
What I've done:

mapped all 1M images vecs with some metadata alongside each sample using the syntax from here. One of these metadata fields is country.
ran ANN query on a sample (that does not live in the dataset) using this syntax with number of candidates = 100. Great success, I found relevant neighbors. Here, I used cosine and l2 similarity, both yielded great results. Exactly two of the results I found from the top 100 candidates had result.country == Belgium. I will call them B1 and B2 for clarity.
ran a function score on the same sample, this time filtering out samples from Belgium. used this syntax. Here, I haven't got good results.
After reading your latest reply yesterday, I did part (3) again and stored the top 100 results that came back. None of them were B1 or B2. This contradicts the explanation I got from you as I understood it: given that I did part (1) with candidates = 100, I would expect B1 and B2 to appear in Belgium's top 100 results, (since Belgium samples are only a subset of the original 1M dataset I used in parts (1-2). Only when I searched for top 3,000 Belgium filtered results B1 appeared, and when increasing to top 4,000 results B2 apeared.

I hope this info is sufficient for you to check if there is maybe a bug there, concerning permutation_lsh + function score combination. I am planning to run this experiment again with model = lsh, and will update the results here soon.
Thanks, Yonatan.

alexklibisz · 2021-08-04T01:41:02Z

Hey, thanks for all the detail. I'll try to find some time to review your results this week.

alexklibisz · 2021-08-08T22:18:37Z

Ok, some followup questions:

Are you specifically using the function_score_query, documented here?

Just as a sanity check, when you did the function score query, did all of the returned docs have country = Belgium?

How many shards to you have in the index? Does the behavior change if you use a different number of shards?

I agree this is a strange behavior. I have a guess for what's happening but it could be completely wrong based on the answers to the questions above.

yonatanshemeshhuji · 2021-08-09T07:39:34Z

Hi!

Are you specifically using the function_score_query, documented here?

Yes.

Just as a sanity check, when you did the function score query, did all of the returned docs have country = Belgium?

Yes, when running the second "body" from the original post.

How many shards to you have in the index? Does the behavior change if you use a different number of shards?

At time of trying, we had number of shards = 1. I will retry with shards = 5 but only by the end of the week and keep you updated on the results.

Thanks a lot, Yonatan.

alexklibisz · 2021-08-09T12:53:56Z

Thanks. I'm wondering if the problem is that the query is matching the first 1k/2k/3k documents on the provided filter (Belgium) and then only scoring and re-ranking those first 1k/2k/3k documents that it matched. I don't see this behavior documented in the ES docs though. Hmm.

alexklibisz · 2021-08-09T14:31:37Z

I'll try to find some time this week to reproduce this pattern. If you can, it would be interesting to see if you get similar results using the query rescorer.

yonatanshemeshhuji · 2021-08-15T11:48:27Z

Follow up:
I now ran 2 experiments, both with number of shards = 5.
Both were done using permutation_lsh, however one had similarity = cosine and one had similarity = L2.
In both experiments I've seen better results!

However, I also did the following experiment that confused me a bit:
I ran a query with no filters, setting k == candidates == 1000. 72 of the results had country = Poland. I will call this the post - query filtered list.

Then I ran a filtered query on the same vector, this time filtering country == Poland, using the syntax from above and k == candidates == 1000:

body = {
                "query": {
                    "function_score": {
                        "query": {
                            "bool": {
                                "filter": {
                                    "term": {"country": "Poland"}
                                },
                                "boost": 1
                            }
                        },
                        "boost_mode": "replace",
                        "functions": [
                            {
                                "elastiknn_nearest_neighbors" : {
                                    "vec": query_vector.flatten().tolist(),
                                    "field": "imVec",
                                    "similarity": self._similarity,
                                    "model": 'permutation_lsh',
                                    "candidates" : 1000  # ignored.
                                },
                                "weight": 2
                            },
                        ]
                    }
                },
            }

I attained a list of 1000 events from Poland, which I will call the pre-query filtered list
as I said above, when I used shards = 1 the results were very bad, and here the pre-query filtered list results were much better.

Nevertheless, I expected that all 72 results from the post - query filtered list to appear in the pre - query filtered list, since k == candidates, but that was not the case. As a matter of fact none of them were there.

Maybe related question:
Can it be possible that the number of shards increase the number of unique scores when running a score_function query?

alexklibisz · 2021-08-16T04:02:31Z

Thanks, this is useful information. It sounds like it could be enough to reproduce it with synthetic data. If I understand your description, it sounds like the following is happening, (quoting from above):

I'm wondering if the problem is that the query is matching the first 1k/2k/3k documents on the provided filter (Belgium) and then only scoring and re-ranking those first 1k/2k/3k documents that it matched. I don't see this behavior documented in the ES docs though.

By trying it with more shards, you've distributed the relevant documents, so more of them will show up in the first candidates of each shard. I.e., if you have 10k docs matching the term, one shard, and candidates = 1000, then it will only rescore 1000 of the docs matching the term. If you have 5 shards and candidates = 1000, then it will match and rescore about 5000 of the docs matching the term.

I'll try to reproduce this one day this week with some synthetic data in an integration test.

As a final sanity check, could you see what happens when you set "model": "exact"?

alexklibisz · 2021-08-16T04:06:55Z

I changed the title to something that seems to describe the exact issue. I also added the bug tag because this seems like a genuine bug, or at the very least, some strange Elasticsearch behavior that should be documented.

yonatanshemeshhuji · 2021-08-24T14:51:15Z

Hi Alex, sorry for the big delay.

As a final sanity check, could you see what happens when you set "model": "exact"?
The problem does not appear, i.e. , I do pre - filtering and get good results.
To sum it all up:

when using model=permutation_lsh, number of shards = 1 ----> bad results.
when using model=permutation_lsh, number of shards = 5 ----> better results but still far from good.
when using model=permutation_lsh, number of shards = 1 or 5 -----> great results, however as I mentioned I can not afford exact query.

I'm wondering if the problem is that the query is matching the first 1k/2k/3k documents on the provided filter (Belgium) and then only scoring and re-ranking those first 1k/2k/3k documents that it matched. I don't see this behavior documented in the ES docs though.

I think that the fact that the results get better as number of shards increase indicate something like you describe...

Please update if there is any new discoveries on the subject,
Yonatan.

avihe · 2021-09-05T14:39:40Z

Hi @alexklibisz ,
I have the exact same problem, did you figure out what causes this?
Thanks!

alexklibisz · 2021-09-05T15:17:19Z

Not yet. I will hopefully have some time to look into it this week. There's also a development guide in the repo if anyone wants to look into it in the meantime.

alexklibisz · 2021-09-08T02:18:39Z

I was able to reproduce the issue in #306, but so far no fix.

It seems my guess was correct. It looks like Elasticsearch does this:

Run the standard query.
Apply the function to the first size docs returned from that query.
Return those docs, even if there are other docs that would have produced a higher function score.

Whereas the behavior we want is this:

Run the standard query.
Apply the function to all docs returned from the query.
Re-rank the docs based on function score and return the top size docs.

I looked through the docs and the FunctionScoreQuery Implementation. I still don't see any option to apply the function to all documents that match the standard query.

For now my best advice is to:

Increase the size parameter. Then the standard query will match and apply the function to more results.
Increase the standard query specificity. The max size value is 10k, so ideally the standard query returns fewer than 10k results.
Continue increasing the number of shards until each shard has fewer than 10k matches.

I posted a question on the Elastic forum. I'm curious what they say.

alexklibisz · 2021-09-08T02:29:57Z

For some insight, I thought that maybe the min_score parameter would help here. I thought maybe the min_score indicates the minimum overall score, i.e., after functions are applied and boosted. Unfortunately, it looks like the min_score actually just applies to the standard query.

alexklibisz · 2021-09-13T14:07:26Z

So far no response on the Elastic discussion board. If there's nothing by the end of the week, I'll most likely just update the docs to reflect this quirk and close this issue. I don't have the time right now to prioritize much more than that.

yonatanshemeshhuji · 2021-10-05T11:05:29Z

Hi!
Is it possible to get the "raw results" before exact knn is applied? i.e., get (candidates * segments * shards) results?
From there I can proceed to pre-filter manually.

alexklibisz · 2021-10-05T15:07:37Z

I don't think so. All of that is managed by Elasticsearch so the best we can do is run the pre-filtering query as a standard query and get back the standard query response.

alexklibisz added the Q&A Questions (and hopefully answers) about Elastiknn usage label Aug 9, 2021

alexklibisz changed the title ~~support for term filter + permutation_lsh hash~~ Function score query omits relevant results on large dataset Aug 16, 2021

alexklibisz added the bug Something isn't working label Aug 16, 2021

alexklibisz added a commit that referenced this issue Sep 8, 2021

Issue #298: Integration test to reproduce the issue

da8ff87

alexklibisz added a commit that referenced this issue Sep 8, 2021

Issue #298: Integration test to reproduce the issue (#306)

1306529

alexklibisz mentioned this issue Mar 27, 2022

Future of Elastiknn #349

Closed

Repository owner locked and limited conversation to collaborators Jul 17, 2022

alexklibisz converted this issue into a discussion Jul 17, 2022

Repository owner unlocked this conversation Mar 27, 2024

alexklibisz removed the Q&A Questions (and hopefully answers) about Elastiknn usage label Mar 27, 2024

alexklibisz added a commit that referenced this issue Mar 27, 2024

Docs: add note about #298 and internals page

2cac441

alexklibisz mentioned this issue Mar 27, 2024

Docs: add note about https://github.com/alexklibisz/elastiknn/issues/298 and internals page #665

Merged

alexklibisz added a commit that referenced this issue Mar 28, 2024

Docs: add note about #298 and internals page (#665)

1737f00

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function score query omits relevant results on large dataset #298

Function score query omits relevant results on large dataset #298

yonatanshemeshhuji commented Jul 29, 2021

alexklibisz commented Jul 29, 2021

yonatanshemeshhuji commented Jul 29, 2021

alexklibisz commented Jul 29, 2021 •

edited

Loading

yonatanshemeshhuji commented Jul 31, 2021

yonatanshemeshhuji commented Aug 1, 2021 •

edited

Loading

alexklibisz commented Aug 4, 2021

alexklibisz commented Aug 8, 2021

yonatanshemeshhuji commented Aug 9, 2021

alexklibisz commented Aug 9, 2021

alexklibisz commented Aug 9, 2021

yonatanshemeshhuji commented Aug 15, 2021

alexklibisz commented Aug 16, 2021

alexklibisz commented Aug 16, 2021

yonatanshemeshhuji commented Aug 24, 2021

avihe commented Sep 5, 2021

alexklibisz commented Sep 5, 2021

alexklibisz commented Sep 8, 2021 •

edited

Loading

alexklibisz commented Sep 8, 2021

alexklibisz commented Sep 13, 2021

yonatanshemeshhuji commented Oct 5, 2021

alexklibisz commented Oct 5, 2021

Function score query omits relevant results on large dataset #298

Function score query omits relevant results on large dataset #298

Comments

yonatanshemeshhuji commented Jul 29, 2021

alexklibisz commented Jul 29, 2021

yonatanshemeshhuji commented Jul 29, 2021

alexklibisz commented Jul 29, 2021 • edited Loading

yonatanshemeshhuji commented Jul 31, 2021

yonatanshemeshhuji commented Aug 1, 2021 • edited Loading

alexklibisz commented Aug 4, 2021

alexklibisz commented Aug 8, 2021

yonatanshemeshhuji commented Aug 9, 2021

alexklibisz commented Aug 9, 2021

alexklibisz commented Aug 9, 2021

yonatanshemeshhuji commented Aug 15, 2021

alexklibisz commented Aug 16, 2021

alexklibisz commented Aug 16, 2021

yonatanshemeshhuji commented Aug 24, 2021

avihe commented Sep 5, 2021

alexklibisz commented Sep 5, 2021

alexklibisz commented Sep 8, 2021 • edited Loading

alexklibisz commented Sep 8, 2021

alexklibisz commented Sep 13, 2021

yonatanshemeshhuji commented Oct 5, 2021

alexklibisz commented Oct 5, 2021

alexklibisz commented Jul 29, 2021 •

edited

Loading

yonatanshemeshhuji commented Aug 1, 2021 •

edited

Loading

alexklibisz commented Sep 8, 2021 •

edited

Loading