# Demo

In this demo, a single iteration of MDM tuning (and Setup Wizard) is shown. 
1. The input MDM JSON config file (V1) is loaded 
2. A Dataset is constructed with Smile Client Data and FEBRL
3. A ML algorithm is applied onto the input MDM JSON config file and the dataset
4. A resulting optimized MDM JSON config file (V2) is saved

After additional data is generated from the client using the optimized MDM JSON file, it can be input back into the optimization system (Repeat steps 1-4).

WIP is denoted in sections where dummy functions are used for sake of demonstration. 

### Import the Classes

In [1]:
from mdmconfig import MDMConfig
from mdmdataset import FebrlDataset
from mdmmodel import MDMModel
from sklearn import metrics

### Load the JSON Config

This will load:
- The Block Filtering
- The MDM algorithms
- The Match Result Map

Each of these are used in later processes.

In [2]:
mdmconfig = MDMConfig("./demo_rules/beta-v0.2+3/mdm_demo_config_voting_v1.json")

block_filtering = mdmconfig.getFilteringRules()
mdmalgos = mdmconfig.getMDMAlgos()

### Load The Data

The dataset takes as input the block_filtering information and applies it to the data

The dataset can also call functions to generate synthetic data for the dataset.

The dataset class can also be called to load Smile Client Data to be used during the training process.

In [3]:
dataset = FebrlDataset(mdmconfig)
#dir = "./dataset_data/mdm_source_data_synthetic/"
dataset.load_febrl()
#dataset.load_smile(dir + 'source_v1.csv', dir + 'links_v1.csv')
features, X_train, X_test, X_val, y_train, y_test, y_val = dataset.split_()

100%|██████████| 926/926 [00:00<00:00, 1135318.77it/s]
100%|██████████| 926/926 [00:00<00:00, 2989.76it/s]
100%|██████████| 926/926 [00:00<00:00, 4109.18it/s]
100%|██████████| 926/926 [00:00<00:00, 24926.84it/s]
100%|██████████| 926/926 [00:00<00:00, 1243254.00it/s]
100%|██████████| 926/926 [00:00<00:00, 72705.46it/s]
100%|██████████| 926/926 [00:00<00:00, 94315.82it/s]
100%|██████████| 926/926 [00:00<00:00, 94683.70it/s]
100%|██████████| 926/926 [00:00<00:00, 53726.27it/s]


In [4]:
print(features)
print(X_train[0])
print(y_train[0])

Index(['birthday', 'address_1', 'address_2', 'suburb', 'postcode', 'state',
       'firstname-caverphone', 'lastname-caverphone', 'firstname-jaro'],
      dtype='object')
[False 0.47058823529411764 0.08333333333333337 1.0 False 1.0 False False
 0.44761904761904764]
False


### Applying ML

The ML model can be defined and fed. Examples of ML models include:
- MDM Match Result Map as Decision Tree (as implemented in Smile)
- Logistic Regression
- Decision Tree
- Random Forest 
- etc (WIP)...

After defining the ML model, the ML model can be fed into a trainer class. The trainer class includes:
- Basic Trainer
- Ensemble Trainers
    - Sequential
    - Voting/Stacking

The trainer can train the model to get the ML suggested changes to implement onto the JSON config file

In [5]:
model = MDMModel(mdmconfig, features)
model.train(X_train, y_train)

The following will add the additional fields to the config file:

### Using and Saving the Optimizations

The trained model can then be used to inference on test data. 

The trained model can also save itself as a new JSON config file for the next iteration.

In [6]:
matches, possible_matches = model.infer(X_test)
print("Purely MDM Algorithms")
print("accuracy: ", metrics.accuracy_score(y_test, matches[0]))
print("f1: ", metrics.f1_score(y_test, matches[0]))

print("Firstname/Lastname MDM Algo + Voting Address")
print("accuracy: ", metrics.accuracy_score(y_test, matches[1]))
print("f1: ", metrics.f1_score(y_test, matches[1]))

print("Voting Firstname/Lastname + MDM Algo Addresses")
print("accuracy: ", metrics.accuracy_score(y_test, matches[2]))
print("f1: ", metrics.f1_score(y_test, matches[2]))

print("Voting Firstname/Lastname + Voting Addresses")
print("accuracy: ", metrics.accuracy_score(y_test, matches[3]))
print("f1: ", metrics.f1_score(y_test, matches[3]))


model.save("./demo_rules/beta-v0.2+3/mdm_demo_config_voting_v2")

Purely MDM Algorithms
accuracy:  0.6881720430107527
f1:  0.5797101449275363
Firstname/Lastname MDM Algo + Voting Address
accuracy:  0.6881720430107527
f1:  0.5797101449275363
Voting Firstname/Lastname + MDM Algo Addresses
accuracy:  0.8602150537634409
f1:  0.8470588235294119
Voting Firstname/Lastname + Voting Addresses
accuracy:  0.8709677419354839
f1:  0.8604651162790697


### The Next Iteration

This optimized MDM JSON config file can be deployed for customers to generate more data.

With additional data, the optimized MDM JSON config file can then be iterated upon in the same pipeline (repeat steps 1-4).
