github github
  • Home
  • Pricing and Signup
  • Training
  • Gist
  • Blog
  • Login

igrigorik / bloomfilter

  • Admin
  • Watch Unwatch
  • Fork
  • Your Fork
  • Pull Request
  • Download Source
    • 98
    • 4
  • Source
  • Commits
  • Network (4)
  • Issues (0)
  • Graphs
  • Branch: master

click here to add a description

click here to add a homepage

  • Switch Branches (1)
    • master ✓
  • Switch Tags (0)
  • Branch List
Sending Request…

Counting Bloom Filter implemented in Ruby: C version, Redis version (with TTL's) — Read more

  Cancel

http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/

  Cancel
  • HTTP
  • Git Read-Only

This URL has Read+Write access

marshall/unmarshall bloomfilter logic (C version) 
igrigorik (author)
Wed Aug 25 20:16:23 -0700 2010
commit  625eace1c64fdf2ccb1d
tree    be6630250b351fdd233d
parent  42810663bcf5e181eb41
bloomfilter /
name age
history
message
file .gitignore Mon Dec 29 09:54:53 -0800 2008 adding a test task and a test [tenderlove]
file README.rdoc Wed Jan 06 09:55:55 -0800 2010 correct link in the readme [igrigorik]
file Rakefile Sat May 22 00:43:27 -0700 2010 using rake-compiler is much easier [joshbuddy]
file VERSION Wed Aug 25 20:16:23 -0700 2010 marshall/unmarshall bloomfilter logic (C version) [igrigorik]
file bloomfilter.gemspec Wed Aug 25 20:16:23 -0700 2010 marshall/unmarshall bloomfilter logic (C version) [igrigorik]
directory examples/ Tue Jan 05 21:40:11 -0800 2010 update readme with redis examples [igrigorik]
directory ext/ Wed Aug 25 20:16:23 -0700 2010 marshall/unmarshall bloomfilter logic (C version) [igrigorik]
directory lib/ Wed Aug 25 20:16:23 -0700 2010 marshall/unmarshall bloomfilter logic (C version) [igrigorik]
directory spec/ Wed Aug 25 20:16:23 -0700 2010 marshall/unmarshall bloomfilter logic (C version) [igrigorik]
README.rdoc

BloomFilter

Counting Bloom Filter implemented in Ruby.

Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. For more detail: en.wikipedia.org/wiki/Bloom_filter

Performance of the Bloom filter depends on a number of variables:

  • size of the bit array
  • size of the counter bucket
  • number of hash functions

To figure out the values for these parameters, refer to:

  • www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/

To learn about applications and reasons for the time based bloom filters, refer to:

  • www.igvita.com/2010/01/06/flow-analysis-time-based-bloom-filters/

Implementation

Instead of using k different hash functions, this implementation seeds the CRC32 hash with k different initial values (0, 1, …, k-1). This may or may not give you a good distribution, it all depends on the data.

Example

  require 'bloomfilter'

  bf = BloomFilter.new(:size => 100, :hashes => 2, :seed => 1, :bucket => 3, :raise => false)
  bf.insert("test")
  bf.include?("test")
  => true
  bf.include?("test2")
  => false
  bf.delete("test")
  bf.include?("test")
  => false

  # Hash with a bloom filter!
  bf["test2"] = "bar"
  bf["test2"]
  => "bar"
  bf["test3"]
  => nil

  bf.stats
  Number of filter bits (m): 10
  Number of filter elements (n): 2
  Number of filter hashes (k) : 2
  Predicted false positive rate = 10.87%

Redis-backed counting Bloom Filter with TTL’s

  bf = BloomFilter.new(:type => :redis, :ttl => 2, :server => {:host => 'localhost'})

  bf.insert('test')
  bf.include?('test')
  => true

  sleep(2)
  bf.include?('test')
  => false

Credits

Tatsuya Mori <valdzone@gmail.com> (Original C implementation: vald.x0.com/sb/)

Dedicated Server Powered by the Dedicated Servers and
Cloud Computing of Rackspace Hosting®
  • Blog
  • Support
  • Training
  • Job Board
  • Shop
  • Contact
  • API
  • Status
  • © 2010 GitHub Inc. All rights reserved.
  • Terms of Service
  • Privacy
  • Security
  • English
  • Deutsch
  • Français
  • 日本語
  • Português (BR)
  • 中文
  • See all available languages

Your current locale selection: English. Choose another?

  • English
  • Afrikaans
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Hrvatski
  • Indonesia
  • Italiano
  • 日本語
  • Nederlands
  • Norsk
  • Polski
  • Português (BR)
  • Српски
  • Svenska
  • 中文