Skip to content

Commit

Permalink
Use t-digest as a dependency.
Browse files Browse the repository at this point in the history
Our improvements to t-digest have been pushed upstream and t-digest also got
some additional nice improvements around memory usage and speedups of quantile
estimation. So it makes sense to use it as a dependency now.

This also allows to remove the test dependency on Apache Mahout.

Close #6142
  • Loading branch information
jpountz committed May 13, 2014
1 parent 889fa6b commit 17fa67d
Show file tree
Hide file tree
Showing 10 changed files with 19 additions and 2,268 deletions.
Expand Up @@ -184,15 +184,15 @@ This balance can be controlled using a `compression` parameter:
The TDigest algorithm uses a number of "nodes" to approximate percentiles -- the
more nodes available, the higher the accuracy (and large memory footprint) proportional
to the volume of data. The `compression` parameter limits the maximum number of
nodes to `100 * compression`.
nodes to `20 * compression`.

Therefore, by increasing the compression value, you can increase the accuracy of
your percentiles at the cost of more memory. Larger compression values also
make the algorithm slower since the underlying tree data structure grows in size,
resulting in more expensive operations. The default compression value is
`100`.

A "node" uses roughly 48 bytes of memory, so under worst-case scenarios (large amount
A "node" uses roughly 32 bytes of memory, so under worst-case scenarios (large amount
of data which arrives sorted and in-order) the default settings will produce a
TDigest roughly 480KB in size. In practice data tends to be more random and
TDigest roughly 64KB in size. In practice data tends to be more random and
the TDigest will use less memory.
7 changes: 3 additions & 4 deletions pom.xml
Expand Up @@ -75,10 +75,9 @@
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-core</artifactId>
<version>0.9</version>
<scope>test</scope>
<groupId>com.tdunning</groupId>
<artifactId>t-digest</artifactId>
<version>3.0</version>
</dependency>

<dependency>
Expand Down

This file was deleted.

0 comments on commit 17fa67d

Please sign in to comment.