-
Notifications
You must be signed in to change notification settings - Fork 24.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor fixes to the match
query
#8352
Conversation
This PR mostly applies @clintongormley 's recommandations on #6932 |
@jpountz i don't think the fuzzy rewrite should default to top_terms - it SHOULD be constant_score (unless the score only depends on edit distance, not idf?) |
@clintongormley The fuzzy query does indeed take the edit distance into account when scoring. |
We created BlendedTermQuery to resolve some of these issues in multi-match. In that case it was dealing with the side-effects of auto-expanding the set of fields being queried but applies equally when auto-expanding the set of field values considered e.g. when doing fuzzy. The reason we want to keep some notion of IDF rather than dropping it completely is so that given a search that has multiple clauses e.g. John OR Patitucci~ we still favour a match on the rarer of these 2 top-level clauses (the variants of Patitucci, all of which are rewarded equally in terms of IDF but not edit distance) |
I agree with @markharwood . The BlendedTermQuery should be used whenever two query terms are synonyms of each other and should be treated as 'one thing'. It tries to adjust statistics independently of the scoring function (which may have no concept of IDF) to deal with the problem. But I think for it to work, it would need per-term boost support? Then we need a rewrite method that can build this instead of BooleanQuery, it would look a lot like the boolean one: https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/MultiTermQuery.java#L140 |
@rmuir why does it need per-term boost? In order to eg take the edit distance into account? |
Also wondering if we should exposed the BlendedTermQuery in the DSL for expert use cases? |
yes |
I assumed it already was actually: but it looks like its only exposed as the cross_fields mode for multi match? I agree with you, it would be nice if it were exposed in some way that can be used for query-time synonyms in the same field too. |
I've opened a new issue for the BlendedTerms query (#9103) so that this PR can be merged. |
@jpountz afaik this can be merged, do you wanna go ahead? |
@jpountz pinging in case you've forgotten |
match
querymatch
query
458382e
to
4a8b0f1
Compare
Rebased to a recent master. The default rewrite method for fuzzy queries changed in the mean time so I had to change it. I'll merge this PR once #12129 is in too. |
Fixed documentation since the default rewrite method for fuzzy queries is to select top terms, fixed usage of the fuzzy rewrite method, and removed unused `rewrite` parameter. Close elastic#6932
4a8b0f1
to
da5fa6c
Compare
Minor fixes to the `match` query.
Fixed documentation since the default rewrite method for fuzzy queries is to
select top terms, fixed usage of the fuzzy rewrite method, and removed unused
rewrite
parameter.Close #6932