Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor fixes to the match query #8352

Merged
merged 1 commit into from
Jul 8, 2015
Merged

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Nov 5, 2014

Fixed documentation since the default rewrite method for fuzzy queries is to
select top terms, fixed usage of the fuzzy rewrite method, and removed unused
rewrite parameter.

Close #6932

@jpountz jpountz added the review label Nov 5, 2014
@jpountz
Copy link
Contributor Author

jpountz commented Nov 5, 2014

This PR mostly applies @clintongormley 's recommandations on #6932

@clintongormley
Copy link

@jpountz i don't think the fuzzy rewrite should default to top_terms - it SHOULD be constant_score (unless the score only depends on edit distance, not idf?)

@jpountz
Copy link
Contributor Author

jpountz commented Nov 5, 2014

@clintongormley The fuzzy query does indeed take the edit distance into account when scoring.

@clintongormley
Copy link

@jpountz but it still takes IDF into account, which means that misspellings are considered more relevant than the correct spelling.

I think that fuzzy queries across the board should either:

  • take just edit distance into account, or
  • be constant score

@rmuir @s1monw thoughts?

@markharwood
Copy link
Contributor

We created BlendedTermQuery to resolve some of these issues in multi-match. In that case it was dealing with the side-effects of auto-expanding the set of fields being queried but applies equally when auto-expanding the set of field values considered e.g. when doing fuzzy.
In any form of auto-expansion (field or terms) it is important to counter-act IDF's tendency to reward the most bizarre interpretation of the original input (the wrong field or the typo). BlendedTermQuery does this by taking all auto-expanded forms of a root clause and giving them the same DF for the purpose of IDF calcs.

The reason we want to keep some notion of IDF rather than dropping it completely is so that given a search that has multiple clauses e.g. John OR Patitucci~ we still favour a match on the rarer of these 2 top-level clauses (the variants of Patitucci, all of which are rewarded equally in terms of IDF but not edit distance)

@rmuir
Copy link
Contributor

rmuir commented Nov 5, 2014

I agree with @markharwood . The BlendedTermQuery should be used whenever two query terms are synonyms of each other and should be treated as 'one thing'. It tries to adjust statistics independently of the scoring function (which may have no concept of IDF) to deal with the problem.

But I think for it to work, it would need per-term boost support? Then we need a rewrite method that can build this instead of BooleanQuery, it would look a lot like the boolean one: https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/search/MultiTermQuery.java#L140

@clintongormley
Copy link

@rmuir why does it need per-term boost? In order to eg take the edit distance into account?

@clintongormley
Copy link

Also wondering if we should exposed the BlendedTermQuery in the DSL for expert use cases?

@rmuir
Copy link
Contributor

rmuir commented Nov 6, 2014

@rmuir why does it need per-term boost? In order to eg take the edit distance into account?

yes

@rmuir
Copy link
Contributor

rmuir commented Nov 6, 2014

Also wondering if we should exposed the BlendedTermQuery in the DSL for expert use cases?

I assumed it already was actually: but it looks like its only exposed as the cross_fields mode for multi match?

I agree with you, it would be nice if it were exposed in some way that can be used for query-time synonyms in the same field too.

@clintongormley
Copy link

I've opened a new issue for the BlendedTerms query (#9103) so that this PR can be merged.

@s1monw
Copy link
Contributor

s1monw commented Mar 20, 2015

@jpountz afaik this can be merged, do you wanna go ahead?

@clintongormley
Copy link

@jpountz pinging in case you've forgotten

@clintongormley clintongormley changed the title Internal: Minor fixes to the match query Minor fixes to the match query Jun 8, 2015
@jpountz
Copy link
Contributor Author

jpountz commented Jul 8, 2015

Rebased to a recent master. The default rewrite method for fuzzy queries changed in the mean time so I had to change it. I'll merge this PR once #12129 is in too.

Fixed documentation since the default rewrite method for fuzzy queries is to
select top terms, fixed usage of the fuzzy rewrite method, and removed unused
`rewrite` parameter.

Close elastic#6932
jpountz added a commit that referenced this pull request Jul 8, 2015
Minor fixes to the `match` query.
@jpountz jpountz merged commit acf8e2e into elastic:master Jul 8, 2015
@jpountz jpountz deleted the fix/match_rewrite branch July 8, 2015 15:03
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Match query rewrite and fuzzy_rewrite not applied
5 participants