Skip to content

MnO2/zw-fast-quantile

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

zw-fast-quantile

CI Crates.io

Zhang Wang Fast Approximate Quantiles Algorithm in Rust: link to reference. It's the original algorithm inspired the techniques in Tensorflow Boosted Trees. You can find a quick context on my blog.

Installation

Add this to your Cargo.toml:

[dependencies]
zw-fast-quantile = "0.2"

then you are good to go. If you are using Rust 2015 you have to extern crate zw-fast-quantile to your crate root as well.

Example

There are two implementations in this library: FixedSizeEpsilonSummary is with the size of the stream known before hand, and UnboundEpsilonSummary is for the stream with unknown size. You can adjust the error rate epsilon of your own to trade-off between space and accuracy.

let epsilon = 0.1;
let n = 10;
let mut s = FixedSizeEpsilonSummary::new(n, epsilon);
for i in 1..=n {
    s.update(i);
}

let ans = s.query(0.0);
let expected = 1;
assert!(expected == ans);
let epsilon = 0.1;
let n = 10;
let mut s = UnboundEpsilonSummary::new(epsilon);
for i in 1..=n {
    s.update(i);
}

let ans = s.query(0.0);
let expected = 1;
assert!(expected == ans);

Benchmarks

We benchmark against GK01 implemented in quantiles. To test the UPDATE, we insert 5000 values in order with error rate of 0.01. GK01 is roughly 2.6x slower than ZW.

zw unbound quantile update
                        time:   [60.780 us 60.855 us 60.936 us]
                        change: [-1.4032% -0.9510% -0.5005%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe
gk quantile update      time:   [156.84 us 157.02 us 157.24 us]
                        change: [-0.1907% -0.0503% +0.0969%] (p = 0.50 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe

As for the query time, we query 11 quantiles from 0.0, 0.1 ... 1.0. GK01 is about 1.5x slower than ZW.

zw unbound quantile query
                        time:   [229.62 ns 230.16 ns 230.77 ns]
                        change: [+1.3422% +1.8105% +2.2504%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe
gk quantile query       time:   [350.21 ns 350.48 ns 350.76 ns]
                        change: [-0.4638% -0.3109% -0.1670%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  2 (2.00%) high mild
  5 (5.00%) high severe

Related Projects

About

Zhang Wang Fast Approximate Quantiles Algorithm in Rust

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Languages