This repository is private.
All pages are served over SSL and all pushing and pulling is done over SSH.
No one may fork, clone, or view it unless they are added as a member.
Every repository with this icon (
) is private.
Every repository with this icon (
This repository is public.
Anyone may fork, clone, or view it.
Every repository with this icon (
) is public.
Every repository with this icon (
| name | age | message | |
|---|---|---|---|
| |
.gitignore | Mon Dec 29 09:54:53 -0800 2008 | |
| |
README.rdoc | Sun Feb 22 19:33:02 -0800 2009 | |
| |
Rakefile | Mon Dec 29 09:54:53 -0800 2008 | |
| |
bloomfilter.gemspec | Sat Apr 04 10:17:23 -0700 2009 | |
| |
examples/ | Sun Feb 22 19:28:17 -0800 2009 | |
| |
ext/ | Thu Jul 30 20:52:06 -0700 2009 | |
| |
lib/ | Wed Feb 18 18:10:15 -0800 2009 | |
| |
test/ | Wed Jun 03 14:56:18 -0700 2009 |
README.rdoc
BloomFilter
Counting Bloom Filter implemented in Ruby.
Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. For more detail: en.wikipedia.org/wiki/Bloom_filter
Implementation
Instead of using k different hash functions, this implementation seeds the CRC32 hash with k different initial values (0, 1, …, k-1). This may or may not give you a good distribution, it all depends on the data.
Example
require 'bloomfilter'
# M (size of bit array)
# K (number of hash functions)
# R (random seed) 100000000, k=4, random seed=1
# M, K, R
bf = BloomFilter.new(10, 2, 1)
bf.insert("test")
bf.include?("test")
=> true
bf.include?("test2")
=> false
bf.delete("test")
bf.include?("test")
=> false
# Hash with a bloom filter!
bf["test2"] = "bar"
bf["test2"]
=> "bar"
bf["test3"]
=> nil
bf.stats
Number of filter bits (m): 10
Number of filter elements (n): 2
Number of filter hashes (k) : 2
Predicted false positive rate = 10.87%
Configuring Bloom Filter
Performance of the Bloom filter depends on a number of variables:
- size of the bit array
- number of hash functions
To figure out the values for these parameters, refer to:
http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/
Credits
Tatsuya Mori <valdzone@gmail.com> (Original: vald.x0.com/sb/)







