Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use t-digest as a dependency. #6142

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -184,15 +184,15 @@ This balance can be controlled using a `compression` parameter:
The TDigest algorithm uses a number of "nodes" to approximate percentiles -- the
more nodes available, the higher the accuracy (and large memory footprint) proportional
to the volume of data. The `compression` parameter limits the maximum number of
nodes to `100 * compression`.
nodes to `20 * compression`.

Therefore, by increasing the compression value, you can increase the accuracy of
your percentiles at the cost of more memory. Larger compression values also
make the algorithm slower since the underlying tree data structure grows in size,
resulting in more expensive operations. The default compression value is
`100`.

A "node" uses roughly 48 bytes of memory, so under worst-case scenarios (large amount
A "node" uses roughly 32 bytes of memory, so under worst-case scenarios (large amount
of data which arrives sorted and in-order) the default settings will produce a
TDigest roughly 480KB in size. In practice data tends to be more random and
TDigest roughly 64KB in size. In practice data tends to be more random and
the TDigest will use less memory.
7 changes: 3 additions & 4 deletions pom.xml
Expand Up @@ -75,10 +75,9 @@
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-core</artifactId>
<version>0.9</version>
<scope>test</scope>
<groupId>com.tdunning</groupId>
<artifactId>t-digest</artifactId>
<version>3.0</version>
</dependency>

<dependency>
Expand Down

This file was deleted.