# Bloom Filter

> ...

## Code

In [None]:
import hashlib

class BloomFilter:
    def __init__(self, size, hash_count):
        """
        Initialize the Bloom Filter.
        :param size: Size of the bit array.
        :param hash_count: Number of hash functions to use.
        """
        self.size = size
        self.hash_count = hash_count
        self.bit_array = [0] * size  # Initialize the bit array with zeros

    def _hash(self, item, seed):
        """
        Hash function using SHA-256 with a seed for multiple hash functions.
        :param item: The item to hash.
        :param seed: A seed to generate different hash functions.
        :return: A hash value.
        """
        hash_object = hashlib.sha256(str(item).encode() + str(seed).encode())
        return int(hash_object.hexdigest(), 16) % self.size

    def add(self, item):
        """
        Add an item to the Bloom Filter.
        :param item: The item to add.
        """
        for seed in range(self.hash_count):
            index = self._hash(item, seed)
            self.bit_array[index] = 1

    def __contains__(self, item):
        """
        Check if an item is in the Bloom Filter.
        :param item: The item to check.
        :return: True if the item is probably in the set, False otherwise.
        """
        for seed in range(self.hash_count):
            index = self._hash(item, seed)
            if self.bit_array[index] == 0:
                return False
        return True

## Test

In [None]:
# Step 2: Test the Bloom Filter
bloom = BloomFilter(size=20, hash_count=3)

# Add elements to the Bloom Filter
elements = {"a", "b", 1, 2, 12}
for element in elements:
    bloom.add(element)

In [None]:
# Test membership
test_elements = ["a", "b", 1, 2, 12, "c", 3, 13]  # Include some elements not in the set
for element in test_elements:
    if element in bloom:
        print(f"'{element}' is probably in the set.")
    else:
        print(f"'{element}' is definitely not in the set.")

'a' is probably in the set.
'b' is probably in the set.
'1' is probably in the set.
'2' is probably in the set.
'12' is probably in the set.
'c' is definitely not in the set.
'3' is definitely not in the set.
'13' is definitely not in the set.


## Performace

**Space Efficiency**:
- Bloom Filters are **space-efficient** because they use a bit array to represent the set.
- The **size of the bit array** $m$ and the **number of hash functions** $k$ determine the space usage.
- For a given number of elements $n$ and a desired false positive probability $p$, the optimal size of the bit array is:
  - $m = -\frac{n \ln p}{(\ln 2)^2}$
- The optimal number of hash functions is:
  - $k = \frac{m}{n} \ln 2$
  
**Time Complexity**:
- **Insertion**: $O(k)$, where $k$ is the number of hash functions.
  - Each element is hashed $k$ times, and the corresponding bits in the bit array are set to `1`.
- **Lookup**: $O(k)$.
  - Each element is hashed $k$ times, and the corresponding bits in the bit array are checked.
- **Deletion**: Not supported.
  - Bloom Filters do not support deletion because setting a bit to `0` could affect other elements.


## References

- [Writing a full-text engine using Bloom filter](https://news.ycombinator.com/item?id=23473365)
- [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter)