Skip to content

alecmocatta/streaming_algorithms

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
src
 
 
 
 
 
 
 
 
 
 

streaming_algorithms

Crates.io MIT / Apache 2.0 licensed Build Status

📖 Docs | 💬 Chat

SIMD-accelerated implementations of various streaming algorithms.

This library is a work in progress. PRs are very welcome! Currently implemented algorithms include:

  • Count–min sketch
  • Top k (Count–min sketch plus a doubly linked hashmap to track heavy hitters / top k keys when ordered by aggregated value)
  • HyperLogLog
  • Reservoir sampling

A goal of this library is to enable composition of these algorithms; for example Top k + HyperLogLog to enable an approximate version of something akin to SELECT key FROM table GROUP BY key ORDER BY COUNT(DISTINCT value) DESC LIMIT k.

Run your application with RUSTFLAGS="-C target-cpu=native" and the nightly feature to benefit from the SIMD-acceleration like so:

RUSTFLAGS="-C target-cpu=native" cargo run --features "streaming_algorithms/nightly" --release

See this gist for a good list of further algorithms to be implemented. Other resources are Probabilistic data structures – Wikipedia, DataSketches – A similar Java library originating at Yahoo, and Algebird – A similar Java library originating at Twitter.

As these implementations are often in hot code paths, unsafe is used, albeit only when necessary to a) achieve the asymptotically optimal algorithm or b) mitigate an observed bottleneck.

License

Licensed under either of

at your option.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

About

Performant implementations of various streaming algorithms, including Count–min sketch, Top k, HyperLogLog, Reservoir sampling.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE.txt
MIT
LICENSE-MIT.txt

Stars

Watchers

Forks

Packages

No packages published

Languages