Counting Bloom Filter implemented in Ruby
C Ruby
Pull request Compare This branch is 1 commit ahead, 95 commits behind igrigorik:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
examples
ext
lib
test
.gitignore
README.rdoc
Rakefile
bloomfilter.gemspec

README.rdoc

BloomFilter

Counting Bloom Filter implemented in Ruby.

Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. For more detail: en.wikipedia.org/wiki/Bloom_filter

Implementation

Instead of using k different hash functions, this implementation seeds the CRC32 hash with k different initial values (0, 1, …, k-1). This may or may not give you a good distribution, it all depends on the data.

Example

require 'bloomfilter'
# M (size of bit array)
# K (number of hash functions)
# R (random seed) 100000000, k=4, random seed=1
#                    M,  K, R
bf = BloomFilter.new(10, 2, 1)
bf.insert("test")
bf.include?("test")
=> true
bf.include?("test2")
=> false
bf.delete("test")
bf.include?("test")
=> false
# Hash with a bloom filter!
bf["test2"] = "bar"
bf["test2"]
=> "bar"
bf["test3"]
=> nil
bf.stats
Number of filter bits (m): 10
Number of filter elements (n): 2
Number of filter hashes (k) : 2
Predicted false positive rate = 10.87%

Configuring Bloom Filter

Performance of the Bloom filter depends on a number of variables:

  • size of the bit array

  • number of hash functions

To figure out the values for these parameters, refer to:

http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/

Credits

Tatsuya Mori <valdzone@gmail.com> (Original: vald.x0.com/sb/)