Wildcard search on not_analyzed field behaves inconsistently #9973

jdutton · 2015-03-03T22:15:22Z

I am seeing inconsistent behavior with wildcard searches that makes no sense. I've created a Play to reproduce the issue I'm seeing here - https://www.found.no/play/gist/e452c1d68d6465540d85

For two simple documents:

name: "7000"

name: "T100"

With a simple not_analyzed mapping:

type:
properties:
name:
type: string
index: not_analyzed

The query for "name:7_" matches a single document (as it should), but a query for "name:T_" does not match a document. I'm seeing this bug in ES versions 1.3.2 and 1.4.4.

Trying various searches and documents, it appears that wildcarding starting with a numeric-looking string works, but starting with an alpha character (e.g. "T") fails to get any hits.

jdutton · 2015-03-03T22:17:03Z

Sorry for the terrible formatting. The YAML snippets got butchered by markdown.

dakrone · 2015-03-03T22:35:17Z

Here's a reproduction (just copying here in case the link doesn't work someday)

Create an index

DELETE /9973
{}

POST /9973
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "doc": {
      "properties": {
        "name": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}

Index docs

POST /9973/doc/1
{"name": "7000"}

POST /9973/doc/2?refresh
{"name": "T100"}

Query

This query matches correctly:

POST /9973/_search?pretty
{
  "query": {
    "query_string": {
      "query": "name:7*"
    }
  }
}

Results:

{
  "took" : 65,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "9973",
      "_type" : "doc",
      "_id" : "1",
      "_score" : 1.0,
      "_source":{"name": "7000"}
    } ]
  }
}

This one does not:

POST /9973/_search?pretty
{
  "query": {
    "query_string": {
      "query": "name:T*"
    }
  }
}

Results:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

dakrone · 2015-03-03T23:15:34Z

Interestingly, using an actual wildcard query here does the right thing (as well as using a simple_query_string query), so the issue is only with the query_string query:

POST /9973/_search?pretty
{
  "query": {
    "wildcard": {
      "name": "T*"
    }
  }
}

dakrone · 2015-03-03T23:41:18Z

@jdutton okay, @s1monw figured out what was going on here - since the name field is not analyzed, the token is "T100", however, the query_string query has the lowercase_expanded_terms option, which defaults to true, which causes it to search for "t*" instead of "T*".

This works as intended for me:

POST /9973/_search?pretty
{
  "query": {
    "query_string": {
      "query": "name:T*",
      "lowercase_expanded_terms": false
    }
  }
}

jdutton · 2015-03-04T02:20:49Z

OK, wow thank you all for the help and fast response. In my case I was really filtering, and this was a surprising and undesirable behavior. But then again, in other cases (e.g. search bar) lowercasing would be the desired behavior.

I have to do some soul searching now on how to change my application :-) Thanks again for the help!

rmuir · 2015-03-04T02:37:18Z

for the record, i think we should remove this lowercasing option completely in parsers, disable it, and let the analysis chain take care. For multitermqueries, its a little tricky, which subset of the filters should be used? For example lowercasing is reasonable, stemming is not.

But lucene already annotates each analyzer component deemed to be "reasonable" for wildcards with a marker interface (MultiTermAwareComponent). Things like lowercasefilter have it and things like stemmers dont have it. This is enough to build a "chain", automatically from the query analyzer, that acts reasonably for multitermqueries.

I know we don't use the lucene factories (es has its own), but we have a table that maps between them, i know because its in a test I wrote. So the information is there :)

All queryparsers have hooks (e.g. factory methods for prefix/wildcard/range) that make it possible to use this, for example solr does it by default, as soon as it did this, people stopped complaining about confusing behavior: both for the unanalyzed, and the analyzed case. it just works.

Sorry for the long explanation.

Compare:

s1monw · 2015-03-04T08:16:44Z

@rmuir +1 to remove the option can you open an issue?

The analysis chain should be used instead of relying on this, as it is confusing when dealing with different per-field analysers. The `locale` option was only used for `lowercase_expanded_terms`, which, once removed, is no longer needed, so it was removed as well. Fixes elastic#9978 Relates to elastic#9973

jdutton closed this as completed Mar 4, 2015

rmuir mentioned this issue Mar 4, 2015

Remove lowercase_expanded_terms option #9978

Closed

dakrone mentioned this issue Mar 13, 2015

Query string query: Remove lowercase_expanded_terms and locale options #10086

Merged

vineet85 mentioned this issue Jul 26, 2015

Wildcard searching on not analyzed fields #12468

Closed

woodsaj mentioned this issue Jan 12, 2016

events stored in new per-day indexes not searchable raintank/grafana#546

Closed

hakimio mentioned this issue May 9, 2019

SearchableStringFilterInput regexp returns bad request aws-amplify/amplify-cli#899

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wildcard search on not_analyzed field behaves inconsistently #9973

Wildcard search on not_analyzed field behaves inconsistently #9973

jdutton commented Mar 3, 2015

jdutton commented Mar 3, 2015

dakrone commented Mar 3, 2015

dakrone commented Mar 3, 2015

dakrone commented Mar 3, 2015

jdutton commented Mar 4, 2015

rmuir commented Mar 4, 2015

s1monw commented Mar 4, 2015

Wildcard search on not_analyzed field behaves inconsistently #9973

Wildcard search on not_analyzed field behaves inconsistently #9973

Comments

jdutton commented Mar 3, 2015

name: "7000"

jdutton commented Mar 3, 2015

dakrone commented Mar 3, 2015

Create an index

Index docs

Query

dakrone commented Mar 3, 2015

dakrone commented Mar 3, 2015

jdutton commented Mar 4, 2015

rmuir commented Mar 4, 2015

s1monw commented Mar 4, 2015