You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a flag called analyze_wildcard to both query_string and field queries, once set, a best effort will be made to analyze wildcard and prefix queries as well.
More details:
When we use an analyzer that stems terms into tokens, and then later we want to search against those analyzed terms using a wildcard, by default the search terms are not analyzed, as that analysis could lead into several tokens and the search engine would not be sure which one to use:
However, in certain circumstances, when the liability of the search can be somehow constrained in favor of better expected results, it would be nice to tell the search engine to analyze the wildcard terms before executing the search, therefore allowing for a more precise search (at least, expected).
Let's put here an example with the Spanish analyzer (which uses the snowball stemmer):
We index the phrase "I have an iPhone"
We index the phrase "I love the triad iPad/iPhone/iPod"
We index the phrase "I found the perfect combination: iPhone/MP3"
If we use the standard current 'query_string', when searching for "phone", we will only get the last phrase, due to the way in which the terms have been analyzed:
See how the latter stems "iPhone/MP3" as "iphone/mp3"? Hence this is the only one matching a 'query_string' equal to "phone" (and similar 'unexpected' results occur when using just one leading or trailing wildcard as well).
This result would be dissapointing for the user, as she'd expect at least something like "iPhone" or even "telephone" to be returned as a result, but due to fact that the Spanish analyzer will always remove the trailing 'e' from most of the words, it won't find them.
So enhancement would be to provide a mechanism, in the form of a parameter, for instance, in the 'query_string', that would tell the ES query parser to analyze those search terms surrounded by wildcards (i.e. either enclosed completely, or just with a leading or trailing wildcard).
Following our previous example, a 'query_string' for "phone" would be actually analyzed in the Spanish analyzer as "phon" therefore returning absolutely all the phrases previously created, which would be the expected and reasonable behaviour from a user's perspective. Of course, it could have some side effects on other searches, but as a parameter, it would be up to the search designer to either use it or not.
The text was updated successfully, but these errors were encountered:
…ring query
The query_string query has an option for analyzing wildcard/prefix (#787) by a best effort approach.
This adds `analyze_wildcard` option also to simple_query_string.
The default is set to `false` so the existing behavior of simple_query_string is unchanged.
…ring query
The query_string query has an option for analyzing wildcard/prefix (#787) by a best effort approach.
This adds `analyze_wildcard` option also to simple_query_string.
The default is set to `false` so the existing behavior of simple_query_string is unchanged.
Add a flag called
analyze_wildcard
to bothquery_string
andfield
queries, once set, a best effort will be made to analyze wildcard and prefix queries as well.More details:
When we use an analyzer that stems terms into tokens, and then later we want to search against those analyzed terms using a wildcard, by default the search terms are not analyzed, as that analysis could lead into several tokens and the search engine would not be sure which one to use:
http://www.jguru.com/faq/view.jsp?EID=538312
However, in certain circumstances, when the liability of the search can be somehow constrained in favor of better expected results, it would be nice to tell the search engine to analyze the wildcard terms before executing the search, therefore allowing for a more precise search (at least, expected).
Let's put here an example with the Spanish analyzer (which uses the snowball stemmer):
If we use the standard current 'query_string', when searching for "phone", we will only get the last phrase, due to the way in which the terms have been analyzed:
"I have an iPhone":
{"tokens":[{"token":"i","start_offset":0,"end_offset":1,"type":"","position":1},{"token":"hav","start_offset":2,"end_offset":6,"type":"","position":2},{"token":"an","start_offset":7,"end_offset":9,"type":"","position":3},{"token":"iphon","start_offset":10,"end_offset":16,"type":"","position":4}]}
"I love the triad iPad/iPhone/iPod":
{"tokens":[{"token":"i","start_offset":0,"end_offset":1,"type":"","position":1},{"token":"lov","start_offset":2,"end_offset":6,"type":"","position":2},{"token":"the","start_offset":7,"end_offset":10,"type":"","position":3},{"token":"tri","start_offset":11,"end_offset":16,"type":"","position":4},{"token":"ipad","start_offset":17,"end_offset":21,"type":"","position":5},{"token":"iphon","start_offset":22,"end_offset":28,"type":"","position":6},{"token":"ipod","start_offset":29,"end_offset":33,"type":"","position":7}]}
"I found the perfect combination: iPhone/MP3":
{"tokens":[{"token":"i","start_offset":0,"end_offset":1,"type":"","position":1},{"token":"found","start_offset":2,"end_offset":7,"type":"","position":2},{"token":"the","start_offset":8,"end_offset":11,"type":"","position":3},{"token":"perfect","start_offset":12,"end_offset":19,"type":"","position":4},{"token":"combination","start_offset":20,"end_offset":31,"type":"","position":5},{"token":"iphone/mp3","start_offset":33,"end_offset":43,"type":"","position":6}]}
See how the latter stems "iPhone/MP3" as "iphone/mp3"? Hence this is the only one matching a 'query_string' equal to "phone" (and similar 'unexpected' results occur when using just one leading or trailing wildcard as well).
This result would be dissapointing for the user, as she'd expect at least something like "iPhone" or even "telephone" to be returned as a result, but due to fact that the Spanish analyzer will always remove the trailing 'e' from most of the words, it won't find them.
So enhancement would be to provide a mechanism, in the form of a parameter, for instance, in the 'query_string', that would tell the ES query parser to analyze those search terms surrounded by wildcards (i.e. either enclosed completely, or just with a leading or trailing wildcard).
Following our previous example, a 'query_string' for "phone" would be actually analyzed in the Spanish analyzer as "phon" therefore returning absolutely all the phrases previously created, which would be the expected and reasonable behaviour from a user's perspective. Of course, it could have some side effects on other searches, but as a parameter, it would be up to the search designer to either use it or not.
The text was updated successfully, but these errors were encountered: