# Bloom Filter Name/DOB Scrambling
Demonstrates the core routine used to Bloom filter encode a plain text string and discusses the python window application functionality.

In [4]:
import pandas as pd
import numpy as np
import copy
import re, math
import hashlib
from scipy.special import factorial

## Windows Application Functionality
The purpose of the windows application is to use the bloom filter routine below to scramble names and birth dates located in a CSV file provided by an agency data partner like AHS or CPS.  
1. The agency partner generates a CSV file with columns containing first name, last name, birth year, birth month and birth day.  This is the input CSV file.
1. When the windows application is launched, a file explorer window appears that allows the user to select the input CSV file and click ok.
1. A second window appears that queries the user for an alpha numeric string that serves as an encryption key.  This string is the `salt` bloom filter function argument.  Click ok after entering the key.
1. A third window appears with column names extracted from the first line of the input CSV file.  Each column name should have a check box that allows the user to specify whether or not the column will be bloom filter encrypted.  Click ok after making the checkbox selections.
1. A fourth window appears that allows the user to specify the name of an output CSV file.  It can be located in the same directory as the input file.  Click ok after entering the name.
1. The application now reads through all of the columns marked for encryption and scrambles them using the bloom filter routine.  The output of the bloom filter routine is placed in the corresponding row and column in the output CSV file.  Columns not marked for encryption are copied across without change.

## Bloom Filter Routine

In [5]:
def bloom_filter(plnTxt, saltStr, qGrmLen=2, filterLen=32):
    
    plnTxt = f'_{plnTxt}_'
    bloomFilter = 0
    
    for i in range(len(plnTxt)-qGrmLen+1):
        byteStr = (saltStr + plnTxt[i:i+qGrmLen]).encode('utf-8','replace')
    
        idxMd5 = int( hashlib.md5(byteStr).hexdigest(), 16) % filterLen
        idxSha = int( hashlib.sha256(byteStr).hexdigest(), 16) % filterLen
        
        bloomFilter |= 1 << idxMd5
        bloomFilter |= 1 << idxSha
        
    return bloomFilter
        

The `salt` variable is equivalent to an encryption key.  This is an alpha numeric string of arbtrary length that should be specified by the user using a pop-up window.

In [6]:
salt = 'dkf13478xxhf10'

In [7]:
plainText = 'messier'

In [8]:
result = bloom_filter(plainText,salt)

The bloom filter result should be written to the output CSV file as a string in hex format with no prefix (ie. no `0x`).

In [9]:
print(result)

4925106e
