Query: Provide an option to analyze wildcard/prefix in query_string / field queries #787

emedina · 2011-03-17T09:54:50Z

Add a flag called analyze_wildcard to both query_string and field queries, once set, a best effort will be made to analyze wildcard and prefix queries as well.

More details:

When we use an analyzer that stems terms into tokens, and then later we want to search against those analyzed terms using a wildcard, by default the search terms are not analyzed, as that analysis could lead into several tokens and the search engine would not be sure which one to use:

http://www.jguru.com/faq/view.jsp?EID=538312

However, in certain circumstances, when the liability of the search can be somehow constrained in favor of better expected results, it would be nice to tell the search engine to analyze the wildcard terms before executing the search, therefore allowing for a more precise search (at least, expected).

Let's put here an example with the Spanish analyzer (which uses the snowball stemmer):

We index the phrase "I have an iPhone"
We index the phrase "I love the triad iPad/iPhone/iPod"
We index the phrase "I found the perfect combination: iPhone/MP3"

If we use the standard current 'query_string', when searching for "phone", we will only get the last phrase, due to the way in which the terms have been analyzed:

"I have an iPhone":

{"tokens":[{"token":"i","start_offset":0,"end_offset":1,"type":"","position":1},{"token":"hav","start_offset":2,"end_offset":6,"type":"","position":2},{"token":"an","start_offset":7,"end_offset":9,"type":"","position":3},{"token":"iphon","start_offset":10,"end_offset":16,"type":"","position":4}]}

"I love the triad iPad/iPhone/iPod":

{"tokens":[{"token":"i","start_offset":0,"end_offset":1,"type":"","position":1},{"token":"lov","start_offset":2,"end_offset":6,"type":"","position":2},{"token":"the","start_offset":7,"end_offset":10,"type":"","position":3},{"token":"tri","start_offset":11,"end_offset":16,"type":"","position":4},{"token":"ipad","start_offset":17,"end_offset":21,"type":"","position":5},{"token":"iphon","start_offset":22,"end_offset":28,"type":"","position":6},{"token":"ipod","start_offset":29,"end_offset":33,"type":"","position":7}]}

"I found the perfect combination: iPhone/MP3":

{"tokens":[{"token":"i","start_offset":0,"end_offset":1,"type":"","position":1},{"token":"found","start_offset":2,"end_offset":7,"type":"","position":2},{"token":"the","start_offset":8,"end_offset":11,"type":"","position":3},{"token":"perfect","start_offset":12,"end_offset":19,"type":"","position":4},{"token":"combination","start_offset":20,"end_offset":31,"type":"","position":5},{"token":"iphone/mp3","start_offset":33,"end_offset":43,"type":"","position":6}]}

See how the latter stems "iPhone/MP3" as "iphone/mp3"? Hence this is the only one matching a 'query_string' equal to "phone" (and similar 'unexpected' results occur when using just one leading or trailing wildcard as well).

This result would be dissapointing for the user, as she'd expect at least something like "iPhone" or even "telephone" to be returned as a result, but due to fact that the Spanish analyzer will always remove the trailing 'e' from most of the words, it won't find them.

So enhancement would be to provide a mechanism, in the form of a parameter, for instance, in the 'query_string', that would tell the ES query parser to analyze those search terms surrounded by wildcards (i.e. either enclosed completely, or just with a leading or trailing wildcard).

Following our previous example, a 'query_string' for "phone" would be actually analyzed in the Spanish analyzer as "phon" therefore returning absolutely all the phrases previously created, which would be the expected and reasonable behaviour from a user's perspective. Of course, it could have some side effects on other searches, but as a parameter, it would be up to the search designer to either use it or not.

The text was updated successfully, but these errors were encountered:

kimchy · 2011-03-17T20:01:35Z

Query: Provide an option to analyze wildcard/prefix in query_string / field queries, closed by 25124b0.

…ring query The query_string query has an option for analyzing wildcard/prefix (#787) by a best effort approach. This adds `analyze_wildcard` option also to simple_query_string. The default is set to `false` so the existing behavior of simple_query_string is unchanged.

add file-processor repo, use explicit profiles * add file-processor repo, use explicit profiles * WS Approved-by: Can Yildiz

kimchy closed this as completed Mar 17, 2011

jprante mentioned this issue Nov 10, 2014

Add option for analyzing wildcard/prefix to simple_query_strinq #8422

Closed

clintongormley mentioned this issue Aug 29, 2016

Remove the analyze_wildcard option #20209

Open

mindw pushed a commit to mindw/elasticsearch that referenced this issue Sep 5, 2022

Merged in dev/gabi/shared-assets-fixes (pull request elastic#787)

a4eb2ee

add file-processor repo, use explicit profiles * add file-processor repo, use explicit profiles * WS Approved-by: Can Yildiz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query: Provide an option to analyze wildcard/prefix in query_string / field queries #787

Query: Provide an option to analyze wildcard/prefix in query_string / field queries #787

emedina commented Mar 17, 2011

kimchy commented Mar 17, 2011

Query: Provide an option to analyze wildcard/prefix in query_string / field queries #787

Query: Provide an option to analyze wildcard/prefix in query_string / field queries #787

Comments

emedina commented Mar 17, 2011

kimchy commented Mar 17, 2011