Minor fixes to the `match` query #8352

jpountz · 2014-11-05T14:36:27Z

Fixed documentation since the default rewrite method for fuzzy queries is to
select top terms, fixed usage of the fuzzy rewrite method, and removed unused
rewrite parameter.

Close #6932

jpountz · 2014-11-05T14:37:04Z

This PR mostly applies @clintongormley 's recommandations on #6932

clintongormley · 2014-11-05T14:39:42Z

@jpountz i don't think the fuzzy rewrite should default to top_terms - it SHOULD be constant_score (unless the score only depends on edit distance, not idf?)

jpountz · 2014-11-05T14:57:57Z

@clintongormley The fuzzy query does indeed take the edit distance into account when scoring.

clintongormley · 2014-11-05T15:21:03Z

@jpountz but it still takes IDF into account, which means that misspellings are considered more relevant than the correct spelling.

I think that fuzzy queries across the board should either:

take just edit distance into account, or
be constant score

@rmuir @s1monw thoughts?

markharwood · 2014-11-05T15:30:17Z

We created BlendedTermQuery to resolve some of these issues in multi-match. In that case it was dealing with the side-effects of auto-expanding the set of fields being queried but applies equally when auto-expanding the set of field values considered e.g. when doing fuzzy.
In any form of auto-expansion (field or terms) it is important to counter-act IDF's tendency to reward the most bizarre interpretation of the original input (the wrong field or the typo). BlendedTermQuery does this by taking all auto-expanded forms of a root clause and giving them the same DF for the purpose of IDF calcs.

The reason we want to keep some notion of IDF rather than dropping it completely is so that given a search that has multiple clauses e.g. John OR Patitucci~ we still favour a match on the rarer of these 2 top-level clauses (the variants of Patitucci, all of which are rewarded equally in terms of IDF but not edit distance)

rmuir · 2014-11-05T17:30:07Z

I agree with @markharwood . The BlendedTermQuery should be used whenever two query terms are synonyms of each other and should be treated as 'one thing'. It tries to adjust statistics independently of the scoring function (which may have no concept of IDF) to deal with the problem.

But I think for it to work, it would need per-term boost support? Then we need a rewrite method that can build this instead of BooleanQuery, it would look a lot like the boolean one: https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/MultiTermQuery.java#L140

clintongormley · 2014-11-06T09:54:57Z

@rmuir why does it need per-term boost? In order to eg take the edit distance into account?

clintongormley · 2014-11-06T09:56:18Z

Also wondering if we should exposed the BlendedTermQuery in the DSL for expert use cases?

rmuir · 2014-11-06T10:07:07Z

@rmuir why does it need per-term boost? In order to eg take the edit distance into account?

yes

rmuir · 2014-11-06T12:08:12Z

Also wondering if we should exposed the BlendedTermQuery in the DSL for expert use cases?

I assumed it already was actually: but it looks like its only exposed as the cross_fields mode for multi match?

I agree with you, it would be nice if it were exposed in some way that can be used for query-time synonyms in the same field too.

clintongormley · 2014-12-30T16:32:32Z

I've opened a new issue for the BlendedTerms query (#9103) so that this PR can be merged.

s1monw · 2015-03-20T21:34:09Z

@jpountz afaik this can be merged, do you wanna go ahead?

clintongormley · 2015-05-29T16:59:24Z

@jpountz pinging in case you've forgotten

jpountz · 2015-07-08T14:24:04Z

Rebased to a recent master. The default rewrite method for fuzzy queries changed in the mean time so I had to change it. I'll merge this PR once #12129 is in too.

Fixed documentation since the default rewrite method for fuzzy queries is to select top terms, fixed usage of the fuzzy rewrite method, and removed unused `rewrite` parameter. Close elastic#6932

Minor fixes to the `match` query.

jpountz added the review label Nov 5, 2014

jpountz removed the review label Nov 9, 2014

clintongormley added the :Query DSL label Nov 29, 2014

clintongormley mentioned this pull request Nov 29, 2014

Fuzzy query ranks typos over exact matches #3125

Closed

clintongormley mentioned this pull request Dec 30, 2014

Wrap stacked tokens in match query in a BlendedTerms query for better scoring #9103

Closed

drewr force-pushed the master branch from dcc3da0 to 7c20a8a Compare February 20, 2015 16:48

s1monw assigned jpountz Mar 20, 2015

clintongormley added >bug v2.0.0-beta1 labels May 29, 2015

clintongormley changed the title ~~Internal: Minor fixes to the match query~~ Minor fixes to the match query Jun 8, 2015

jpountz force-pushed the fix/match_rewrite branch from 458382e to 4a8b0f1 Compare July 8, 2015 14:22

Minor fixes to the match query.

da5fa6c

Fixed documentation since the default rewrite method for fuzzy queries is to select top terms, fixed usage of the fuzzy rewrite method, and removed unused `rewrite` parameter. Close elastic#6932

jpountz force-pushed the fix/match_rewrite branch from 4a8b0f1 to da5fa6c Compare July 8, 2015 15:03

jpountz added a commit that referenced this pull request Jul 8, 2015

Merge pull request #8352 from jpountz/fix/match_rewrite

acf8e2e

Minor fixes to the `match` query.

jpountz merged commit acf8e2e into elastic:master Jul 8, 2015

jpountz deleted the fix/match_rewrite branch July 8, 2015 15:03

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor fixes to the `match` query #8352

Minor fixes to the `match` query #8352

jpountz commented Nov 5, 2014

jpountz commented Nov 5, 2014

clintongormley commented Nov 5, 2014

jpountz commented Nov 5, 2014

clintongormley commented Nov 5, 2014

markharwood commented Nov 5, 2014

rmuir commented Nov 5, 2014

clintongormley commented Nov 6, 2014

clintongormley commented Nov 6, 2014

rmuir commented Nov 6, 2014

rmuir commented Nov 6, 2014

clintongormley commented Dec 30, 2014

s1monw commented Mar 20, 2015

clintongormley commented May 29, 2015

jpountz commented Jul 8, 2015

Minor fixes to the match query #8352

Minor fixes to the match query #8352

Conversation

jpountz commented Nov 5, 2014

jpountz commented Nov 5, 2014

clintongormley commented Nov 5, 2014

jpountz commented Nov 5, 2014

clintongormley commented Nov 5, 2014

markharwood commented Nov 5, 2014

rmuir commented Nov 5, 2014

clintongormley commented Nov 6, 2014

clintongormley commented Nov 6, 2014

rmuir commented Nov 6, 2014

rmuir commented Nov 6, 2014

clintongormley commented Dec 30, 2014

s1monw commented Mar 20, 2015

clintongormley commented May 29, 2015

jpountz commented Jul 8, 2015

Minor fixes to the `match` query #8352

Minor fixes to the `match` query #8352