Percentiles aggregation #5323

jpountz · 2014-03-03T16:17:05Z

A percentiles aggregation would allow to compute (approximate) values of arbitrary percentiles based on the t-digest algorithm. Computing exact percentiles is not reasonably feasible as it would require shards to stream all values to the node that coordinates search execution, which could be gigabytes on a high-cardinality field. On the other hand, t-digest allows to trade accuracy for memory by trying to summarize the set of values that have been accumulated with interesting properties/features:

compression is configurable, meaning that if you can configure it to have better accuracy at the cost of a higher memory usage,
accuracy is excellent for extreme percentiles,
percentiles are going to be accurate if few values were accumulated.

Example:

{
    "aggs" : {
        "load_time_outlier" : {
            "percentiles" : {
                "field" : "load_time",
                "percents" : [95, 99, 99.9] 
            }
        }
    }
}

The text was updated successfully, but these errors were encountered:

A new metric aggregation that can compute approximate values of arbitrary percentiles. Close #5323

lukas-vlcek · 2014-03-03T19:40:59Z

Beautiful!

otisg · 2014-03-04T16:30:32Z

Out of curiosity, why did you choose t-digest and not QDigest? Did you do extensive comparison and concluded that t-digest has both lower memory footprint, speed, and accuracy?

jpountz · 2014-03-04T16:54:19Z

The two main reasons why we did not consider q-digest are that it does not work with doubles and looked less accurate than t-digest. The t-digest paper also gives interesting explanations why t-digest performs better than q-digest.

otisg · 2014-03-04T17:06:19Z

Thanks Adrien! Sounds like you didn't actually run comparison tests, right? (not "blaming", just trying to understand). @tdunning may have more speed improvements in t-digest, a little bird told me...

jpountz added feature labels Mar 3, 2014

jpountz self-assigned this Mar 3, 2014

$@polyfractal$ polyfractal closed this as completed in 7b16c58 Mar 3, 2014

jpountz mentioned this issue Mar 3, 2014

Add median to statistical facet #2943

Closed

polyfractal added a commit that referenced this issue Mar 3, 2014

$@polyfractal$

Percentiles aggregation.

91c4c77

A new metric aggregation that can compute approximate values of arbitrary percentiles. Close #5323

clintongormley added the :Analytics/Aggregations Aggregations label Jun 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Percentiles aggregation #5323

Percentiles aggregation #5323

jpountz commented Mar 3, 2014

lukas-vlcek commented Mar 3, 2014

otisg commented Mar 4, 2014

jpountz commented Mar 4, 2014

otisg commented Mar 4, 2014

Percentiles aggregation #5323

Percentiles aggregation #5323

Comments

jpountz commented Mar 3, 2014

lukas-vlcek commented Mar 3, 2014

otisg commented Mar 4, 2014

jpountz commented Mar 4, 2014

otisg commented Mar 4, 2014