diff --git a/README.md b/README.md index 352dc8bd6..c4677c04a 100644 --- a/README.md +++ b/README.md @@ -278,17 +278,30 @@ The worst-case in-memory size of an LSM-tree is *O*(*n*). - The worst-case in-memory size of the Bloom filters is *O*(*n*). - The total in-memory size of all Bloom filters depends on the Bloom + The total in-memory size of all Bloom filters is the number of bits + per physical entry multiplied by the number of physical entries. The + required number of bits per physical entry is determined by the Bloom filter allocation strategy, which is determined by the `confBloomFilterAlloc` field of `TableConfig`. `AllocFixed bitsPerPhysicalEntry` - The total in-memory size of all Bloom filters is the number of bits - per physical entry multiplied by the number of physical entries. + The number of bits per physical entry is specified as + `bitsPerPhysicalEntry`. `AllocRequestFPR requestedFPR` - **TODO**: How does one determine the bloom filter size using - `AllocRequestFPR`? + The number of bits per physical entry is determined by the requested + false-positive rate, which is specified as `requestedFPR`. + + The false-positive rate scales exponentially with the number of bits + per entry: + + | False-positive rate | Bits per entry | + |---------------------|----------------| + | 1 in 10 |  ≈ 4.77 | + | 1 in 100 |  ≈ 9.85 | + | 1 in 1, 000 |  ≈ 15.79 | + | 1 in 10, 000 |  ≈ 22.58 | + | 1 in 100, 000 |  ≈ 30.22 | - The worst-case in-memory size of the indexes is *O*(*n*). diff --git a/lsm-tree.cabal b/lsm-tree.cabal index 6d4dec2c7..a40f738ed 100644 --- a/lsm-tree.cabal +++ b/lsm-tree.cabal @@ -139,12 +139,29 @@ description: * The worst-case in-memory size of the Bloom filters is \(O(n)\). - The total in-memory size of all Bloom filters depends on the Bloom filter allocation strategy, which is determined by the @confBloomFilterAlloc@ field of @TableConfig@. + The total in-memory size of all Bloom filters is the number of bits per physical entry multiplied by the number of physical entries. + The required number of bits per physical entry is determined by the Bloom filter allocation strategy, which is determined by the @confBloomFilterAlloc@ field of @TableConfig@. [@AllocFixed bitsPerPhysicalEntry@]: - The total in-memory size of all Bloom filters is the number of bits per physical entry multiplied by the number of physical entries. + The number of bits per physical entry is specified as @bitsPerPhysicalEntry@. [@AllocRequestFPR requestedFPR@]: - __TODO__: How does one determine the bloom filter size using @AllocRequestFPR@? + The number of bits per physical entry is determined by the requested false-positive rate, which is specified as @requestedFPR@. + + The false-positive rate scales exponentially with the number of bits per entry: + + +---------------------------+---------------------+ + | False-positive rate | Bits per entry | + +===========================+=====================+ + | \(1\text{ in }10\) | \(\approx 4.77 \) | + +---------------------------+---------------------+ + | \(1\text{ in }100\) | \(\approx 9.85 \) | + +---------------------------+---------------------+ + | \(1\text{ in }1{,}000\) | \(\approx 15.79 \) | + +---------------------------+---------------------+ + | \(1\text{ in }10{,}000\) | \(\approx 22.58 \) | + +---------------------------+---------------------+ + | \(1\text{ in }100{,}000\) | \(\approx 30.22 \) | + +---------------------------+---------------------+ * The worst-case in-memory size of the indexes is \(O(n)\). diff --git a/src/Database/LSMTree/Internal/Config.hs b/src/Database/LSMTree/Internal/Config.hs index 9e4cf4734..ff0b31746 100644 --- a/src/Database/LSMTree/Internal/Config.hs +++ b/src/Database/LSMTree/Internal/Config.hs @@ -78,7 +78,7 @@ instance NFData TableConfig where -- | A reasonable default 'TableConfig'. -- -- This uses a write buffer with up to 20,000 elements and a generous amount of --- memory for Bloom filters (FPR of 2%). +-- memory for Bloom filters (FPR of 1%). -- defaultTableConfig :: TableConfig defaultTableConfig =