Switch branches/tags
Nothing to show
Find file History
Pull request Compare This branch is 24 commits behind kyleburton:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


This has been moved out to it’s own project: JBit


Bit set implementation to support arbitrariliy sized bit arrays natively in Java.

Current Java and JVM Limitations

Java’s java.util.BitSet uses @int@s in its interface and is thus limited to a
maximum of 2^31 - 1 bits. java.math.BigInteger also has this limitation.

To get beyond this, it is possible in 64bit JVMs to allocate a byte array and
use primitve bit operations to emulate a larger bit set. Since native Java
arrays are indexed via @int@s, the maximum byte array we can allocate is
2,147,483,647 which is enough space to represent 17,179,869,176 bits.

If we use an array of byte arrays lifts the limit to
295,147,904,904,474,918,976, which is larger than 2^64
(18,446,744,073,709,551,616) – the maximum amount of space a 64 bit JVM could
possibly address or allocate. Uisng an array of byte arrays gives us a
practical limit.

Hash Function Limitations

The current clj-bloom library includes support for the crc32 and adler32 hasing
functions. Since these functions produce 32 bit values, they are insufficient
for use in Bloom filters with more than 2^32 bits – they are, by definition,
unable to set any bits above 2^32 which will result in a higher than expected
false positive rate for the filter, regardless of the number of hashes or making
the filter wider.

New thoughts…

Very Large (unbounded?) Java Bit Arrays

Instead of going all the way down to using byte arrays – why not use an array
of BitSets or BigIntegers? They already support all of the bit operations
that we want. The only work the wrapper library would have to do then is
bit-shifting work on the values being set.

The Interface / usage might look like:

// NB: for now all I care about is get/set because that’s all I need for the // bloom filter public interface LargeBitSet { public boolean get( BigInteger bit ); public void set( BigInteger bit ); }