Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query_string query with wildcard not working when searching within nested objects #18520

Closed
butsjoh opened this issue May 23, 2016 · 5 comments
Closed

Comments

@butsjoh
Copy link

butsjoh commented May 23, 2016

Elasticsearch version: 2.3

JVM version: 1.8.0_25

OS version: OSX 10.11.5

Description of the problem including expected versus actual behavior:

Steps to reproduce:

  1. Define a nested object type in your mapping
  2. Index some documents
  3. Perform a nested query with a query_string on on of the field of the nested object using a wildcard

I am indexing the following document with a nested type locations:

curl -X PUT 'http://localhost:9201/test_nested?pretty' -d '{
  "settings": {
    "index": {
      "number_of_shards": 5,
      "number_of_replicas": 1
    }
  },
  "mappings": {
    "loc": {
      "properties": {
        "id": {
          "type": "string",
          "index": "not_analyzed"
        },
        "locations": {
          "type": "nested",
          "properties": {
            "input": {
              "type": "string",
              "index": "not_analyzed"
            },
            "country_code": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }
  }
}'

curl -X PUT 'http://localhost:9201/test_nested/loc/location_1' -d '{
    id: "location_1",
    "locations": [
      {
        input: "xxx",
        country_code: "BE"
      },
      {
        input: "yyy",
        country_code: "NL"
      }
    ]
}'

curl -X PUT 'http://localhost:9201/test_nested/loc/location_2' -d '{
    id: "location_2",
    "locations": [
      {
        input: "zzz",
        country_code: "BR"
      },
      {
        input: "vvv",
        country_code: "US"
      }
    ]
}'

When i try todo an extact match using query_string it works:

curl -X POST 'http://localhost:9201/test_nested/_search' -d '
{
 "query": {
       "nested": {
           "path": "locations",
           "query": {
               "query_string": {
                   "fields": [
                       "locations.country_code"
                   ],
                   "query": "BE"
               }
           }
       }
   }
}
'

RESULT:

{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":2.098612,"hits":[{"_index":"test_nested","_type":"loc","_id":"location_1","_score":2.098612,"_source":{
    id: "location_1",
    "locations": [
      {
        input: "xxx",
        country_code: "BE"
      },
      {
        input: "yyy",
        country_code: "NL"
      }

When i try todo a wildcard it does not return anything:

curl -X POST 'http://localhost:9201/test_nested/_search' -d '
{
 "query": {
       "nested": {
           "path": "locations",
           "query": {
               "query_string": {
                   "fields": [
                       "locations.country_code"
                   ],
                   "query": "B*"
               }
           }
       }
   }
}
'

RESULT:

{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

It seems to only not be working for nested datatypes because if i do a query_string with a wildcard on a property of the main document it works as expected:

curl -X POST 'http://localhost:9201/test_nested/_search' -d '
{
   "query": {
       "query_string": {
           "fields": [
               "id"
           ],
           "query": "location_*"
       }
   }
}
'

RESULT:

{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"test_nested","_type":"loc","_id":"location_1","_score":1.0,"_source":{
    id: "location_1",
    "locations": [
      {
        input: "xxx",
        country_code: "BE"
      },
      {
        input: "yyy",
        country_code: "NL"
      }
    ]
}},{"_index":"test_nested","_type":"loc","_id":"location_2","_score":1.0,"_source":{
    id: "location_2",
    "locations": [
      {
        input: "zzz",
        country_code: "BR"
      },
      {
        input: "vvv",
        country_code: "US"
      }
    ]
}}]}}

So is this a limitation of the nested datatype that it does not work with wildcard when using query_string? And if yes where is it documented? The weird thing is as well if i do a wildcard query it seems to work as well.

curl -X POST 'http://localhost:9201/test_nested/_search' -d '
{
 "query": {
       "nested": {
           "path": "locations",
           "query": {
               "wildcard": {
                   "locations.country_code": "B*"
               }
           }
       }
   }
}
'

{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"test_nested","_type":"loc","_id":"location_1","_score":1.0,"_source":{
    id: "location_1",
    "locations": [
      {
        input: "xxx",
        country_code: "BE"
      },
      {
        input: "yyy",
        country_code: "NL"
      }
    ]
}},{"_index":"test_nested","_type":"loc","_id":"location_2","_score":1.0,"_source":{
    id: "location_2",
    "locations": [
      {
        input: "zzz",
        country_code: "BR"
      },
      {
        input: "vvv",
        country_code: "US"
      }
    ]
}}]}}
@clintongormley
Copy link

Hi @butsjoh

Nothing to do with nested queries. The issue is that the field is not analyzed, so is indexed as BE, but the lowercase_expanded_terms param (which applies to wildcards) in the query string query defaults to true.

Duplicate of #9978

@butsjoh
Copy link
Author

butsjoh commented May 24, 2016

@clintongormley Sorry but i do not fully understand your response. How could i then change my situation in order to make it work. You have to agree that the id field in the mapping i posted is also not analyzed but there the wildcard (location_*) works and for a field in the nested object it does not? Do you then recommend setting lowercase_expanded_terms to false for the nested query?

@clintongormley
Copy link

You have to agree that the id field in the mapping i posted is also not analyzed but there the wildcard (location_*)

Yes, the id field is not_analyzed, but the value you're indexing is already lower case, so it matches. If you change the ID value to LOCATION_FOO and search that field for LOC*, it won't find anything either.

Do you then recommend setting lowercase_expanded_terms to false for the nested query?

As long as you're only planning on using wildcards on not_analyzed fields, then yes :) If you want to use it on analyzed fields too, then you'll have the same problem but in reverse.

This is not an easy problem to solve, which is why #9978 is marked as high hanging fruit. It requires a big rewrite of our analysis framework.

@butsjoh
Copy link
Author

butsjoh commented May 24, 2016

Ok i think i got it. So in order for my to avoid setting lowercase_expanded_terms to false i could analyse the field i am searching on with applying a lowercase filter on it in the mapping. Correct?

@butsjoh
Copy link
Author

butsjoh commented May 26, 2016

So i can indeed confirm if you apply a lowercase filter on the country_code field (by setting up a analyzer in the mapping) i don't need to set lowercase_expanded_terms to false anymore and a query_string search for B* or b* will return both documents.

thnx for this clarification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants