Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Percentiles aggregations are always keyed and suggestion on non keyed response #5870

Closed
Mpdreamz opened this issue Apr 18, 2014 · 6 comments
Closed

Comments

@Mpdreamz
Copy link
Member

I realize the percentiles aggregations is still experimental which is probably the cause for this:

https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/search/aggregations/metrics/percentiles/PercentilesParser.java#L60

The routine that is currently in place to write the percentiles non_keyed output will write the aggregation like this:

"aggs" : {
  "my_percentiles": [
       { .. }, 
       { .. }
   ]
}

https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/search/aggregations/metrics/percentiles/InternalPercentiles.java#L131

Making it the only aggregation to directly return an array instead of within a wrapped object.

"aggs" : {
    "my_percentiles": { 
        percentiles: [
           { .. }, 
           { .. }
       ]
    }
}

Which makes the response very similar the non keyed range aggregation response:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-range-aggregation.html#search-aggregations-bucket-range-aggregation

@uboness
Copy link
Contributor

uboness commented Apr 18, 2014

@Mpdreamz percentiles are not a bucketing agg. It's a metrics agg, and like all other metrics aggs, it's not keyed (see min/stats/etc...)

@Mpdreamz
Copy link
Member Author

I realise its not a bucketing agg :)

The code path for the percentiles agg always hits the if (keyed)` code path. Making the else routine dead code.

The else routine introduces an array at a position all the other aggregations introduce an object which would make parsing the aggregations generically much much harder.

i.e "name_of_agg" : [ vs "name_of_agg : {

If we can remove the dead code and thus the chance to introduce an array at that position that would be great too.

@uboness
Copy link
Contributor

uboness commented Apr 18, 2014

@Mpdreamz Not sure I'm following you tbh... maybe I'm missing something...

if you send "keyed" : false, the else path is executed, so it's not really a dead code. We decided to make the object form the default (ie. "keyed" : true) as we believe it's probably the form most ppl would like to get back. Like with other aggs, we did leave an option to get the percentiles as an array of values

@Mpdreamz
Copy link
Member Author

Ok my bad since keyed usually defaults to false (i.e range aggregation).

keyed also usually specifies the behaviour of an inner property (i.e buckets property inside a range aggregation) where as with percentiles it controls how the entire aggregation is returned.

More specifically:

"aggregations": {
      "myagg": [
         {
            "key": 1,
            "value": 60.4
         },
         {
            "key": 5,
            "value": 62
         },
         {
            "key": 25,
            "value": 70
         },

For keyed:false responses I would much rather see it return

"aggregations": {
      "myagg": {
         values: [
         {
            "key": 1,
            "value": 60.4
         },
         {
            "key": 5,
            "value": 62
         },
         {
            "key": 25,
            "value": 70
         }
         ]
     }
}

All other aggregations follow the pattern "name_of_agg": <start_of_object> even simple metrics such as min/max

"max" : {
    value: 10
}

The way nonkeyed percentiles are implemented right now feels like this:

"max" : 10

And (as far as I could tell) non keyed percentiles are the only ones breaking the pattern here.

@uboness
Copy link
Contributor

uboness commented Apr 18, 2014

yeah.. agree, I think "values" : [] is more consistent and also makes more sense (as it's future proof)

@uboness
Copy link
Contributor

uboness commented May 7, 2014

We decided to change the response structure and instead of nesting all the percentiles directly under the aggregation name, nest it under an intermediate values object (or when the keyed flag is set false under a values array).

This is a breaking change but we feel it's important to make it while percentiles are still considered experimental. The new format is more future proof as it'll allow us to potentially add additional info under the aggregation in a later phase if there'll be a need for it. The new format is also somewhat more consistent with the other metrics aggs.

uboness added a commit that referenced this issue May 7, 2014
…ow all the percentiles are placed under a `values` object (or `values` array in case the `keyed` flag is set to `false`

Closes #5870
@uboness uboness closed this as completed in fc52db1 May 7, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants