An efficient native implementation of the HyperLogLog cardinality estimator for Ruby
C C++ Ruby
Latest commit dc7bedd Nov 16, 2012 @besquared Merge pull request #4 from nearbuy/builder-restore
Added HyperBuilder.load
Failed to load latest commit information.
ext Merge pull request #4 from nearbuy/builder-restore Nov 16, 2012
spec Merge pull request #4 from nearbuy/builder-restore Nov 16, 2012
.gitignore add gitignore and write offer method Apr 23, 2012
CHANGELOG initial commit Apr 23, 2012

HyperLogLog for Ruby

HyperLogLog is an algorithm for estimating the cardinality of a set. The HyperLogLog strategy has several nice properties:

  1. It is near-optimal in its estimation ability
  2. allows you some coarse tuning on the amount of standard error you can tolerate
  3. The data structures that are used for the estimation are fast, easily compressed and stored, and can be recombined to provide estimates of both the union and intersection of multiple sets

The API is broken out into 2 pieces, the HyperBuilder and the HyperEstimator. This is done for clarity as well as performance optimizations in the future.


gem install hyperloglog


require 'hyperloglog'

# Build a new estimator
builder =
0.upto(100).each{|user_id| builder.offer(user_id.to_s)}

# Read an estimator from bytes on disk
estimator ='bytes.txt'))

# Estimate the union of our two sources
estimate = HyperEstimator.estimate(builder.estimator, estimator)

# puts estimate
# => 147

External Libraries Included