Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot aggregate on completion suggester field in elasticsearch #5930

Closed
moqichenle opened this issue Apr 24, 2014 · 7 comments
Closed

Cannot aggregate on completion suggester field in elasticsearch #5930

moqichenle opened this issue Apr 24, 2014 · 7 comments

Comments

@moqichenle
Copy link

In the mapping, the field named domain (String value) is a completion suggester as below:

"domain": {
                  "max_input_length": 50,
                  "preserve_separators": true,
                  "payloads": false,
                  "analyzer": "simple",
                  "preserve_position_increments": true,
                  "type": "completion"
               }

Currently, I would like to aggregate based on the value in the field "domain". The example of a query in Sense is shown below.
localhost:9200

POST /ourindex/_search
{
   "query": {
     "match_all": {}
   },
   "aggs": {
      "test": {
         "terms": {
            "field": "domain"
         }
      }
   }
}

When the query is run, it returns NullPointerException. The detailed response is following.

{
"error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[iQfn8u55QoqZSnlXEGGUlw][ourindex][4]: SearchParseException[[ourindex][4]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source [{\r\n   \"query\": {\r\n     \"match_all\": {}\r\n   },\r\n   \"aggs\": {\r\n      \"test\": {\r\n         \"terms\": {\r\n            \"field\": \"domain\"\r\n         }\r\n      }\r\n   }\r\n}\n]]]; nested: NullPointerException; }{[iQfn8u55QoqZSnlXEGGUlw][ourindex][2]: SearchParseException[[ourindex][2]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source [{\r\n   \"query\": {\r\n     \"match_all\": {}\r\n   },\r\n   \"aggs\": {\r\n      \"test\": {\r\n         \"terms\": {\r\n            \"field\": \"domain\"\r\n         }\r\n      }\r\n   }\r\n}\n]]]; nested: NullPointerException; }{[iQfn8u55QoqZSnlXEGGUlw][ourindex][3]: SearchParseException[[ourindex][3]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source [{\r\n   \"query\": {\r\n     \"match_all\": {}\r\n   },\r\n   \"aggs\": {\r\n      \"test\": {\r\n         \"terms\": {\r\n            \"field\": \"domain\"\r\n         }\r\n      }\r\n   }\r\n}\n]]]; nested: NullPointerException; }{[iQfn8u55QoqZSnlXEGGUlw][ourindex][0]: SearchParseException[[ourindex][0]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source [{\r\n   \"query\": {\r\n     \"match_all\": {}\r\n   },\r\n   \"aggs\": {\r\n      \"test\": {\r\n         \"terms\": {\r\n            \"field\": \"domain\"\r\n         }\r\n      }\r\n   }\r\n}\n]]]; nested: NullPointerException; }{[iQfn8u55QoqZSnlXEGGUlw][ourindex][1]: SearchParseException[[ourindex][1]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source [{\r\n   \"query\": {\r\n     \"match_all\": {}\r\n   },\r\n   \"aggs\": {\r\n      \"test\": {\r\n         \"terms\": {\r\n            \"field\": \"domain\"\r\n         }\r\n      }\r\n   }\r\n}\n]]]; nested: NullPointerException; }]",
   "status": 400
}

Does the type of the suggester matter? How can it work to aggregate by the values of domain field?

@moqichenle moqichenle changed the title Cannot aggregate completion suggester field in elasticsearch Cannot aggregate on completion suggester field in elasticsearch Apr 24, 2014
@s1monw
Copy link
Contributor

s1monw commented Apr 24, 2014

well I don't think it shoudl throw a NPE but on the other hand why whould you want to aggregate on a suggest field. you should use multi field for this IMO. @jpountz can you take a look at this NPE and provide a better error message?

@brwe brwe self-assigned this Apr 28, 2014
@brwe
Copy link
Contributor

brwe commented Apr 29, 2014

If I understand correctly, currently the completion suggester creates a field (called "domain") in the above example which contains the content of "input" and is searchable but one cannot aggregate, because of the special format of this type ( @jpountz please correct if I am wrong). A better error message would be useful.

you should use multi field for this IMO.

It is unclear to me how multi_field can be used with the suggester here. The documentation is not too explicit:

Even though you are losing most of the features of the completion suggest, you can opt in for the shortest form, which even allows you to use inside of multi fields. But keep in mind, that you will not be able to use several inputs, an output, payloads or weights.

I would have expected multi_field to work like this:

{
   "domain": {
      "type": "multi_field",
      "fields": {
         "domain": {
            "max_input_length": 50,
            "preserve_separators": true,
            "payloads": false,
            "analyzer": "simple",
            "preserve_position_increments": true,
            "type": "completion"
         },
         "input": {
            "type": "string"
         }
      }
   }
}

and then expected that aggregation would work on "domain.input". Is that supposed to work so so? If so, I think it is broken.

In any case, it might be nice to have an option to make all parameter fields searchable and and also allow aggregations on them. Currently, the only way to do this seems to be to add the parameters to a field with a different name than the suggestion field. Instead, suggester could create these fields automatically if the user wants to. We could have something like:

{
   "domain": {
      "max_input_length": 50,
      "preserve_separators": true,
      "payloads": false,
      "analyzer": "simple",
      "preserve_position_increments": true,
      "type": "completion",
      "input": {
              # Here be whatever the user likes to configure for indexing
      },
      "output": {
             ...
      },
      ...
   }
}

which would cause creation of fields "domain.input", "domain.output" and so on with user defined indexing options.

Does that make sense?

@moqichenle
Copy link
Author

I tried multi_fiield and it works as what I wanted.
The mapping I use is similar to what @brwe mentioned.

{
   "domain": {
      "type": "multi_field",
      "fields": {
         "domain": {
            "max_input_length": 50,
            "preserve_separators": true,
            "payloads": false,
            "analyzer": "simple",
            "preserve_position_increments": true,
            "type": "completion"
         },
         "input": {
            "type": "string"
         }
      }
   }
}

Thank you for the hint @s1monw .

@jpountz
Copy link
Contributor

jpountz commented Apr 29, 2014

@brwe I just tried a multi-field setup, and this seems to work fine:

DELETE /test

PUT /test
{
    "mappings": {
        "test": {
            "properties": {
                "my_field": {
                    "type": "completion",
                    "analyzer": "simple",
                    "fields": {
                        "raw": {
                            "type": "string",
                            "index": "not_analyzed"
                        }
                    }
                }
            }
        }
    }
}

PUT /test/test/1
{
    "my_field": "foo bar"
}

PUT /test/test/2
{
    "my_field": "foo bar baz"
}

POST /test/_refresh

GET /test/_suggest
{
    "my-suggest" : {
        "text" : "foo b",
        "completion" : {
            "field" : "my_field"
        }
    }
}

GET /test/_search
{
    "aggs": {
        "my_field_values": {
            "terms": {
                "field": "my_field.raw"
            }
        }
    }
}

Adding capabilities to store domain.input and domain.output as you describe feels to me like reinventing multi-fields. I think we should rather fix the error message when trying to aggregate on a completion field and recommend on either indexing the field twice (eg. if you need to specify input and output separately) or using a multi-field as above for the simple cases.

@brwe
Copy link
Contributor

brwe commented Apr 29, 2014

Ah! Now I get it. I was trying out how this works when you explicitly provide "input" and "output" when indexing. I guess this is meant by "opt in for the shortest form".
Thanks for clarifying that this is not what was meant!

I think we should rather fix the error message

Ok, I'l just add the more descriptive error message.

@s1monw
Copy link
Contributor

s1monw commented May 18, 2014

@brwe can you port this to 1.1.2?

@brwe
Copy link
Contributor

brwe commented May 19, 2014

done

@brwe brwe added the v1.1.2 label May 19, 2014
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants