# Anonymize

See documentation for more details: https://inforcehub.readthedocs.io/en/latest/modules/anon.html

In [1]:
# Ignore this cell - this is just used to make this example notebook work
import sys
sys.path.append("../") 

## Load demo product data that we will anonymize

In [2]:
import pandas as pd

FILENAME = 'data/product_data.csv'
df = pd.read_csv(FILENAME)
df

Unnamed: 0,Contract number,Product,Status,First name,Last name,Age,DOB,Start date,Distributor code,Sum Assured,In-force premium
0,33463634,Mousetrap,Active,John,McGee,45,2-Nov-1954,1-Jan-2013,3456,100000,150.3
1,223422342,Mousetrap,Active,Jane,Morrison,67,4-Jan-1967,1-Jan-2013,355,200000,204.0
2,4464646,Mousetrap Pro,Active,Bruce,Lane,62,1-Jan-1943,1-Jan-2013,5757,500000,950.0
3,4234335,Mousetrap Pro,Paid-up,Betty,Brown,45,12-Dec-1986,1-Jan-2013,355,550000,0.0
4,33535363,Mousetrap Pro,Lapsed,Arnold,Brown,53,4-Apr-1972,1-Jan-2013,355,400000,0.0


## Import the module and initialize

In [3]:
from inforcehub import Anonymize

Normally a new Anonymize object will be created with no parameters.
This will give more security as the salt (passphrase) will be generated randomly.

In [4]:
anon = Anonymize()

# Check the salt (passphrase) initialized at random
anon.salt

b'$2b$12$X/8yROejinjGCwoXna2lGO'

Alternatively a passphrase can be set manually for the `salt` parameter if you want to be able to reproduce the exact same encryption in the future.

In [5]:
anon_reproduce = Anonymize(salt='A_LONG_PASSWORD_THING')
anon_reproduce.salt

b'A_LONG_PASSWORD_THING'

## Anonymize the dataset

First list the columns that need to be encrypted.

In [6]:
to_transform = ['Contract number', 'First name', 'Last name', 'DOB']

Then use the `transform()` method to anonymize the dataframe 

**Note** that `transform()` encrypts the original dataframe you pass the function and does not 
take a copy in order to not use too much memory. Take a copy of the dataframe 
with `original_df = df.copy(deep=True)` first if you want to keep the original.

Using the `verbose=True` option will output the status as it does the encryption

In [7]:
anon.transform(df, to_transform, verbose=True)
df

Will convert columns: Contract number, First name, Last name, DOB
Encrypting 5 rows per column ...

Finished encrypting column Contract number
Finished encrypting column First name
Finished encrypting column Last name
Finished encrypting column DOB


Unnamed: 0,Contract number,Product,Status,First name,Last name,Age,DOB,Start date,Distributor code,Sum Assured,In-force premium
0,cfae755a8fbf9435bbe43b82e2ffa364,Mousetrap,Active,7440295ff79e92612e0cf07410d9560b,0c7a86bd41a6f5e5c3babedd4137807b,45,9b50ae74419de74b2ada42b2eafdb4d2,1-Jan-2013,3456,100000,150.3
1,a2061cb8d0e764b7ef3a69a2fd5a0af6,Mousetrap,Active,dc2c7cf671ddc6a823ebecf08ee16bc1,b2edc6804c363ecbf97643800f59f166,67,9c6ac21152ab7caf6108d85c07a83f51,1-Jan-2013,355,200000,204.0
2,37d358515ff5aba70f20cc7aa9d84e47,Mousetrap Pro,Active,d40c93d90f1c9a99143c4e86d369fc2b,fc76b79c382be6e219da0e5f453b2e2d,62,3a62702855fb1084240d1b448c9bdac5,1-Jan-2013,5757,500000,950.0
3,d83dbee2c4abc12d3dbf190077247c91,Mousetrap Pro,Paid-up,dec02ebf1c8a6dc565e8f3ac31efc058,619a6e28fb844cac77529699592a137c,45,7ec73220a65cd76f34e646c68ff93a30,1-Jan-2013,355,550000,0.0
4,8050aa496533cae3c9d6dd41060f7eb3,Mousetrap Pro,Lapsed,f0fae80126b98148181ac694ed9c9c9a,619a6e28fb844cac77529699592a137c,53,1ac7cf470be3fde20794527a96000fbf,1-Jan-2013,355,400000,0.0


## Pseudo-anonymize the dataset

The same process is used, and collect the **lookup keys** are returned by the `tranform()` method

In [8]:
# Reset our dataframe from our file
df = pd.read_csv(FILENAME)

In [9]:
# Collect the lookup keys
lookup_df = anon.transform(df, to_transform, verbose=True)

Will convert columns: Contract number, First name, Last name, DOB
Encrypting 5 rows per column ...

Finished encrypting column Contract number
Finished encrypting column First name
Finished encrypting column Last name
Finished encrypting column DOB


The lookup keys are a dataframe of column pairs for the encrypted and unencrypted values

In [10]:
lookup_df

Unnamed: 0,Contract number,Contract number_,First name,First name_,Last name,Last name_,DOB,DOB_
0,33463634,cfae755a8fbf9435bbe43b82e2ffa364,John,7440295ff79e92612e0cf07410d9560b,McGee,0c7a86bd41a6f5e5c3babedd4137807b,2-Nov-1954,9b50ae74419de74b2ada42b2eafdb4d2
1,223422342,a2061cb8d0e764b7ef3a69a2fd5a0af6,Jane,dc2c7cf671ddc6a823ebecf08ee16bc1,Morrison,b2edc6804c363ecbf97643800f59f166,4-Jan-1967,9c6ac21152ab7caf6108d85c07a83f51
2,4464646,37d358515ff5aba70f20cc7aa9d84e47,Bruce,d40c93d90f1c9a99143c4e86d369fc2b,Lane,fc76b79c382be6e219da0e5f453b2e2d,1-Jan-1943,3a62702855fb1084240d1b448c9bdac5
3,4234335,d83dbee2c4abc12d3dbf190077247c91,Betty,dec02ebf1c8a6dc565e8f3ac31efc058,Brown,619a6e28fb844cac77529699592a137c,12-Dec-1986,7ec73220a65cd76f34e646c68ff93a30
4,33535363,8050aa496533cae3c9d6dd41060f7eb3,Arnold,f0fae80126b98148181ac694ed9c9c9a,Brown,619a6e28fb844cac77529699592a137c,4-Apr-1972,1ac7cf470be3fde20794527a96000fbf
