Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Provide an option to analyze wildcard/prefix in query_string / field queries #787

Closed
emedina opened this issue Mar 17, 2011 · 1 comment

Comments

@emedina
Copy link

emedina commented Mar 17, 2011

Add a flag called analyze_wildcard to both query_string and field queries, once set, a best effort will be made to analyze wildcard and prefix queries as well.

More details:

When we use an analyzer that stems terms into tokens, and then later we want to search against those analyzed terms using a wildcard, by default the search terms are not analyzed, as that analysis could lead into several tokens and the search engine would not be sure which one to use:

http://www.jguru.com/faq/view.jsp?EID=538312

However, in certain circumstances, when the liability of the search can be somehow constrained in favor of better expected results, it would be nice to tell the search engine to analyze the wildcard terms before executing the search, therefore allowing for a more precise search (at least, expected).

Let's put here an example with the Spanish analyzer (which uses the snowball stemmer):

  • We index the phrase "I have an iPhone"
  • We index the phrase "I love the triad iPad/iPhone/iPod"
  • We index the phrase "I found the perfect combination: iPhone/MP3"

If we use the standard current 'query_string', when searching for "phone", we will only get the last phrase, due to the way in which the terms have been analyzed:

"I have an iPhone":

{"tokens":[{"token":"i","start_offset":0,"end_offset":1,"type":"","position":1},{"token":"hav","start_offset":2,"end_offset":6,"type":"","position":2},{"token":"an","start_offset":7,"end_offset":9,"type":"","position":3},{"token":"iphon","start_offset":10,"end_offset":16,"type":"","position":4}]}

"I love the triad iPad/iPhone/iPod":

{"tokens":[{"token":"i","start_offset":0,"end_offset":1,"type":"","position":1},{"token":"lov","start_offset":2,"end_offset":6,"type":"","position":2},{"token":"the","start_offset":7,"end_offset":10,"type":"","position":3},{"token":"tri","start_offset":11,"end_offset":16,"type":"","position":4},{"token":"ipad","start_offset":17,"end_offset":21,"type":"","position":5},{"token":"iphon","start_offset":22,"end_offset":28,"type":"","position":6},{"token":"ipod","start_offset":29,"end_offset":33,"type":"","position":7}]}

"I found the perfect combination: iPhone/MP3":

{"tokens":[{"token":"i","start_offset":0,"end_offset":1,"type":"","position":1},{"token":"found","start_offset":2,"end_offset":7,"type":"","position":2},{"token":"the","start_offset":8,"end_offset":11,"type":"","position":3},{"token":"perfect","start_offset":12,"end_offset":19,"type":"","position":4},{"token":"combination","start_offset":20,"end_offset":31,"type":"","position":5},{"token":"iphone/mp3","start_offset":33,"end_offset":43,"type":"","position":6}]}

See how the latter stems "iPhone/MP3" as "iphone/mp3"? Hence this is the only one matching a 'query_string' equal to "phone" (and similar 'unexpected' results occur when using just one leading or trailing wildcard as well).

This result would be dissapointing for the user, as she'd expect at least something like "iPhone" or even "telephone" to be returned as a result, but due to fact that the Spanish analyzer will always remove the trailing 'e' from most of the words, it won't find them.

So enhancement would be to provide a mechanism, in the form of a parameter, for instance, in the 'query_string', that would tell the ES query parser to analyze those search terms surrounded by wildcards (i.e. either enclosed completely, or just with a leading or trailing wildcard).

Following our previous example, a 'query_string' for "phone" would be actually analyzed in the Spanish analyzer as "phon" therefore returning absolutely all the phrases previously created, which would be the expected and reasonable behaviour from a user's perspective. Of course, it could have some side effects on other searches, but as a parameter, it would be up to the search designer to either use it or not.

@kimchy
Copy link
Member

kimchy commented Mar 17, 2011

Query: Provide an option to analyze wildcard/prefix in query_string / field queries, closed by 25124b0.

@kimchy kimchy closed this as completed Mar 17, 2011
dakrone pushed a commit that referenced this issue Nov 11, 2014
…ring query

The query_string query has an option for analyzing wildcard/prefix (#787) by a best effort approach.

This adds `analyze_wildcard` option also to simple_query_string.

The default is set to `false` so the existing behavior of simple_query_string is unchanged.
dakrone pushed a commit that referenced this issue Nov 11, 2014
…ring query

The query_string query has an option for analyzing wildcard/prefix (#787) by a best effort approach.

This adds `analyze_wildcard` option also to simple_query_string.

The default is set to `false` so the existing behavior of simple_query_string is unchanged.
mindw pushed a commit to mindw/elasticsearch that referenced this issue Sep 5, 2022
add file-processor repo, use explicit profiles

* add file-processor repo, use explicit profiles

* WS


Approved-by: Can Yildiz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants