Permalink
Browse files

fixed bug with optimal-n-and-k; identified limitation with BitSet and…

… BigInteger
  • Loading branch information...
rn-superg committed Nov 9, 2010
1 parent b619eec commit 42604916a40f2e80e2575e10cac89591f0538265
Showing with 16 additions and 2 deletions.
  1. +8 −0 README.textile
  2. +1 −1 project.clj
  3. +1 −1 src/com/github/kyleburton/clj_bloom.clj
  4. +6 −0 test/com/github/kyleburton/clj_bloom_test.clj
View
@@ -125,6 +125,14 @@ h2. Hash Functions
The @words.clj@ example shows how to create your own hash function. As part of the definition of a bloom filter, a number of hashes @k@ are executed to determine the bits to set. The interface for this hash function should take a @java.lang.String@ and an @int@. The string being the data to compute the hash for, and the @int@ being the size in bits (@n@) of the bit array. The hash function must return the sequence of bit locations to be set in the filter for the given input string.
+h1. Limitations
+
+h2. @java.util.BitSet@
+
+The current implementation uses a @java.util.BitSet@, which is limited to @2^31 - 1@ bits. This puts some boundaries on the number of elements and the false-positive probability.
+
+Both @java.util.BitSet@ and @java.math.BigInteger@ have this limitation. I will look into implementing a version of the filter that uses @byte@ arrays under the hood so that this limitation can be lifted.
+
h1. Installation
If you're using Leiningen, add the following to your @project.clj@ file's @:dependencies@:
View
@@ -1,4 +1,4 @@
-(defproject com.github.kyleburton/clj-bloom "1.0.1"
+(defproject com.github.kyleburton/clj-bloom "1.0.2"
:description "Bloom Filter implementation in Clojure, see also: http://github.com/kyleburton/clj-bloom"
:warn-on-reflection true
:dependencies
@@ -91,7 +91,7 @@
(defn optimal-n-and-k [entries prob]
(let [n (num-bits-for-entries-and-fp-probability entries prob)
k (num-hash-fns-for-entries-and-bits entries n)]
- [(int (Math/ceil n)) (int (Math/ceil k))]))
+ [(long (Math/ceil n)) (long (Math/ceil k))]))
;; (optimal-n-and-k 10000 0.01) [ 95851 7]
;; (optimal-n-and-k 10000 0.001) [ 143776 10]
@@ -44,5 +44,11 @@
2)]
(is (not (= (first pair) (second pair))))))))
+(deftest test-optimal-n-and-k
+ ;; This tests a bug where optimal-n-and-k used ints instead of longs...
+ (is (bf/optimal-n-and-k 400000000 0.01)))
+;; (deftest test-make-optimal-filter
+;; ;; this tests a bug that existed in <=1.0.1 with large filters...
+;; (is (bf/make-optimal-filter 400000000 0.01)))

0 comments on commit 4260491

Please sign in to comment.