This repository is private.
All pages are served over SSL and all pushing and pulling is done over SSH.
No one may fork, clone, or view it unless they are added as a member.
Every repository with this icon (
) is private.
Every repository with this icon (
This repository is public.
Anyone may fork, clone, or view it.
Every repository with this icon (
) is public.
Every repository with this icon (
commit 764f18fc98a43c58c90d3dd41cb149b1deda28e1
tree aff3b38089c49aceb0d485497a31dc257d10c6f2
parent e9f6038890737f43d11e6b73e644dc861cf487d2
tree aff3b38089c49aceb0d485497a31dc257d10c6f2
parent e9f6038890737f43d11e6b73e644dc861cf487d2
| name | age | message | |
|---|---|---|---|
| |
.gitignore | Wed Mar 25 22:05:35 -0700 2009 | |
| |
LICENSE.txt | Thu Mar 26 09:08:56 -0700 2009 | |
| |
MANIFEST.in | ||
| |
README.txt | Wed Mar 25 22:05:35 -0700 2009 | |
| |
ez_setup.py | Wed Mar 25 22:05:35 -0700 2009 | |
| |
pybloom/ | ||
| |
setup.py | Tue Mar 24 10:25:54 -0700 2009 |
README.txt
pybloom is a module that includes a Bloom Filter data structure along with an implmentation of Scalable Bloom Filters as discussed in: P. Almeida, C.Baquero, N. PreguiƧa, D. Hutchison, Scalable Bloom Filters, (GLOBECOM 2007), IEEE, 2007. Bloom filters are great if you understand what amount of bits you need to set aside early to store your entire set. Scalable Bloom Filters allow your bloom filter bits to grow as a function of false positive probability and size. A filter is "full" when at capacity: M * ((ln 2 ^ 2) / abs(ln p)), where M is the number of bits and p is the false positive probability. When capacity is reached a new filter is then created exponentially larger than the last with a tighter probability of false positives and a larger number of hash functions. >>> from pybloom import BloomFilter >>> f = BloomFilter(capacity=1000, error_rate=0.001) >>> [f.add(x) for x in range(10)] [False, False, False, False, False, False, False, False, False, False] >>> all([(x in f) for x in range(10)]) True >>> 10 in f False >>> 5 in f True >>> f = BloomFilter(capacity=1000, error_rate=0.001) >>> for i in xrange(0, f.capacity): ... _ = f.add(i) >>> abs((len(f) / float(f.capacity)) - 1.0) <= f.error_rate True >>> from pybloom import ScalableBloomFilter >>> sbf = ScalableBloomFilter(mode=ScalableBloomFilter.SMALL_SET_GROWTH) >>> count = 10000 >>> for i in xrange(0, count): ... _ = sbf.add(i) ... >>> abs((len(sbf) / float(count)) - 1.0) <= sbf.error_rate True # len(sbf) may not equal the entire input length. 0.006% error is well # below the default 0.1% error threshold








