Better utilization of allocated memory #15

cmarxer · 2016-10-31T13:11:13Z

UnsafeBitArray always allocates a multiple of 64 bits. For example if numberOfBits is 250 then 256 bits are allocated but 6 bits remain always unused.

Wouldn't it make sense that Bloomfilter.optimalNumberOfBits(..) returns the next higher number which is a multiple of 64? The false positive rate would become little better while all allocated memory is potentially used.

The text was updated successfully, but these errors were encountered:

alexandrnikitin · 2016-10-31T18:52:58Z

I wanted to follow the original paper in the implementation of the BF math. Basically it is based on Google's Guava code. I think the result of that change is negligible. The waste is just 1 byte. In case of common use case BFs takes tens of MBs and more. It may be the case for very small BFs. Do you think it's worth to change that? What's your use case?

cmarxer · 2016-10-31T19:52:37Z

As I see it, the waste can be up to 63 bits (~8 byte) since UnsafeBitArray allocatess/accesses memory Long-wise (8 byte chunks). I think, it would be possible to replace getLong and putLong with getByte and putByte in which case, yes, the waste would be at most 7 bits. More "fine-grained" memory allocation and access than byte-wise (ideally bit-wise) is unfortunately not possible. That is why there might always be superfluous bits if numberOfBits is not a multiple of 8.

My situation is the following: I am working with relatively small bloom filters and need to pack them (among other information) into a single UDP packet. That is why in my case it is necessary to be very carefully with space.

There is also another minor flaw regarding (de)serialization: Restoring a bloom filter should in my opinion accept arbitrary bytes fitting the length of the UnsafeByteArray. If there are superfluous bits (or even bytes) at the end of the allocated memory, it must be guaranteed that these are always 0.

alexandrnikitin · 2017-02-07T13:42:23Z

@cmarxer I've implemented byte based array of bits (instead of long based). Take a look please #21 It seems it's a bit faster.

I'm reluctant to changed the logic that calculates optimal number of bits. I want to keep it close to the original paper. I would suggest to call the BloomFilter constructor directly instead of apply() like this:

    var nb = BloomFilter.optimalNumberOfBits(numberOfItems, falsePositiveRate)
    // round nb to your requirements, eg 8 bit based
    nb = ...
    val nh = BloomFilter.optimalNumberOfHashes(numberOfItems, nb)
    val bf = new BloomFilter[T](nb, nh)

alexandrnikitin mentioned this issue Feb 6, 2017

Byte based UnsafeBitArray #21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better utilization of allocated memory #15

Better utilization of allocated memory #15

cmarxer commented Oct 31, 2016

alexandrnikitin commented Oct 31, 2016

cmarxer commented Oct 31, 2016

alexandrnikitin commented Feb 7, 2017

Better utilization of allocated memory #15

Better utilization of allocated memory #15

Comments

cmarxer commented Oct 31, 2016

alexandrnikitin commented Oct 31, 2016

cmarxer commented Oct 31, 2016

alexandrnikitin commented Feb 7, 2017