# Bloom Filter

Its a Bit Vector with *m* bits, *k* hash functions and *n* elements expected to be inserted in the filter.

### What can it do? 

It can tell us whether an element **is not** in the specific set or it **may be** in the set. 

### How to build it?

Variables to decide the filter : *m,k,n* 

With larger *m* we achieve less false positive rate but commit to use more memory.

With larger *k* we achieve less false positive rate but the filter gets slower.

Given *n* and *m* we can calculate the optimal value for *k* by : *(m/n)ln2*

### Algorithm

The user can create a set of strings and test if a new string is in the set.

*Input* : Set of strings, a new string.

*Output* : If the string is not, or may be in the set.





### Libraries

In [1]:
!pip install mmh3
!pip install bitarray
from bitarray import bitarray
import mmh3
import numpy as np

Collecting mmh3
  Using cached https://files.pythonhosted.org/packages/fa/7e/3ddcab0a9fcea034212c02eb411433db9330e34d626360b97333368b4052/mmh3-2.5.1.tar.gz
Building wheels for collected packages: mmh3
  Building wheel for mmh3 (setup.py): started
  Building wheel for mmh3 (setup.py): finished with status 'error'
  Running setup.py clean for mmh3
Failed to build mmh3
Installing collected packages: mmh3
  Running setup.py install for mmh3: started
    Running setup.py install for mmh3: finished with status 'error'


  ERROR: Command errored out with exit status 1:
   command: 'C:\Users\Mike\Anaconda3\envs\budapest\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Mike\\AppData\\Local\\Temp\\pip-install-3hfzshn5\\mmh3\\setup.py'"'"'; __file__='"'"'C:\\Users\\Mike\\AppData\\Local\\Temp\\pip-install-3hfzshn5\\mmh3\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\Mike\AppData\Local\Temp\pip-wheel-d8mnueic' --python-tag cp37
       cwd: C:\Users\Mike\AppData\Local\Temp\pip-install-3hfzshn5\mmh3\
  Complete output (5 lines):
  running bdist_wheel
  running build
  running build_ext
  building 'mmh3' extension
  error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": https://visualstudio.microsoft.com/downloads/
  ----------------------------------------
  ERROR: Failed building wheel 

Collecting bitarray
  Downloading https://files.pythonhosted.org/packages/c7/2a/35d3bd5bffa9e179267318057a12adc41f837310edf043d8e6d939719f95/bitarray-1.0.1.tar.gz
Building wheels for collected packages: bitarray
  Building wheel for bitarray (setup.py): started
  Building wheel for bitarray (setup.py): finished with status 'error'
  Running setup.py clean for bitarray
Failed to build bitarray
Installing collected packages: bitarray
  Running setup.py install for bitarray: started
    Running setup.py install for bitarray: finished with status 'error'


  ERROR: Command errored out with exit status 1:
   command: 'C:\Users\Mike\Anaconda3\envs\budapest\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Mike\\AppData\\Local\\Temp\\pip-install-qltpo_t0\\bitarray\\setup.py'"'"'; __file__='"'"'C:\\Users\\Mike\\AppData\\Local\\Temp\\pip-install-qltpo_t0\\bitarray\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\Mike\AppData\Local\Temp\pip-wheel-i81r625f' --python-tag cp37
       cwd: C:\Users\Mike\AppData\Local\Temp\pip-install-qltpo_t0\bitarray\
  Complete output (11 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-3.7
  creating build\lib.win-amd64-3.7\bitarray
  copying bitarray\test_bitarray.py -> build\lib.win-amd64-3.7\bitarray
  copying bitarray\__init__.py -> build\lib.win-amd64-3.7\bitarr

ModuleNotFoundError: No module named 'bitarray'

### Initialize filter

In [2]:
m = int(input("Enter size of filter: " ))
k = int(input("Enter number of hash functions < 20: " ))

array = bitarray(m)
array.setall(0)


hash_seeds = np.random.randint(1, 20, int(k))



Enter size of filter: 5
Enter number of hash functions < 20: 2


NameError: name 'bitarray' is not defined

### Read input set from User and update filter

In [26]:
n = int(input("How many items will you insert in the set? "))
arr = []
for _ in range(n):
    x = input()
    for h in range(k):
      temp = mmh3.hash(x,hash_seeds[h]) % m
      print("h"+ str(h) +" returns "+ str(temp))
      array[temp] = 1
       
print("Thanks!")

How many items will you insert in the set? 5
Mike
h0 returns 17
h1 returns 18
Alex
h0 returns 16
h1 returns 5
Jose
h0 returns 17
h1 returns 11
Daniel
h0 returns 7
h1 returns 12
Aek
h0 returns 5
h1 returns 6
Thanks!


### Check a new string

In [0]:
string = "Yes"
while string != "NO":
  string = input("Give me a string to check or say NO to stop: ")
  if string == "NO":
    break
  flag = 0
  for h in range(k):
      temp = mmh3.hash(string,hash_seeds[h]) % m
      print("h"+ str(h) +" returns "+ str(temp))
      if array[temp] == 0 :
        flag = 1
  if flag == 1 :
    print(" String is not in the set!")
  else:
    print(" String might be in the set!")


Give me a string to check or say NO to stop: Paok
h0 returns 7
h1 returns 6
 String might be in the set!
Give me a string to check or say NO to stop: Pikatchu
h0 returns 13
h1 returns 7
 String is not in the set!
Give me a string to check or say NO to stop: Rick
h0 returns 18
h1 returns 1
 String is not in the set!
Give me a string to check or say NO to stop: Alex
h0 returns 16
h1 returns 5
 String might be in the set!
