# Basic Example

First, let's load the libraries and example data:

In [1]:
import pandas as pd
import BlackBoxAuditing as BBA

import pylab
%matplotlib inline

from BlackBoxAuditing.model_factories import SVM, DecisionTree, NeuralNetwork
german_data = BBA.load_data("german")
ricci_data = BBA.load_data("ricci")

Next, we create an "Auditor" object, which will run the model with obscured features in order to check for potential influence. The `Auditor` object needs to know about how to build a model, and so it takes a `model` field. This is a `ModelFactory` instance, and our library provides you with a few predefined choices about this. (TBD: do we want to add a section about how to create a new `ModelFactory` subclass?)

It takes a bit of time for this to run (a few seconds per attribute in our laptop):

In [None]:
auditor = BBA.Auditor()
auditor.ModelFactory = SVM
auditor(german_data, output_dir="german-audit-output")

Training initial model. (15:13:24)
Calculating original model statistics on test data:
	Training Set:
		Conf-Matrix: {'good': {'good': 453, 'bad': 15}, 'bad': {'bad': 123, 'good': 75}}
		accuracy: 0.8648648648648649
		BCR: 0.7945804195804196
	Testing Set:
		Conf-Matrix {'good': {'good': 211, 'bad': 21}, 'bad': {'bad': 37, 'good': 65}}
		accuracy: 0.7425149700598802
		BCR: 0.6361139283299526
Auditing: 'checking_status' (1/20). (15:13:25)
Auditing: 'duration' (2/20). (15:13:27)
Auditing: 'credit_history' (3/20). (15:13:31)
Auditing: 'purpose' (4/20). (15:13:33)
Auditing: 'credit_amount' (5/20). (15:13:36)


Our auditing technique always works relatively to some existing model, and some measure of accuracy. The list of ranked features can be different depending on the measure used, and that's sometimes important. Often, however, they tend to correlate fairly strongly:

In [None]:
bcr_data = pd.read_csv("german-audit-output/BCR.png.data")
acc_data = pd.read_csv("german-audit-output/accuracy.png.data")

def compute_influence(dataset):
    return (dataset.iloc[0][1:] - dataset.iloc[-1][1:])

bcr_influence = compute_influence(bcr_data)
acc_influence = compute_influence(acc_data)

In [None]:
pylab.plot(acc_influence, bcr_influence, 'ko')

## Loading your own data

In order to use your own data with our auditing, you'll probably need to make a few conversions. Our code uses a minimal encoding of this metadata required. Specifically, you will need to tell our code about the types of your columns, and which column is the value to be predicted.

Let's create some synthetic data:

In [None]:
import numpy as np
import random
import pandas as pd
from BlackBoxAuditing.data import load_from_file

iq = np.array(np.random.randn(200)) * 20 + 100
gender = [random.choice(["man", "woman"]) for i in range(200)]
sat = [i * 10 + (50 if g == "man" else 0) for (i, g) in zip(iq, gender)]
admit = ["True" if s > 1100 else "False" for s in sat]

df = pd.DataFrame(
    {"admit": admit,
     "gender": gender,
     "iq": iq,
     "sat": sat})
df.to_csv("/tmp/test.csv", 
          index=False, 
          columns=['gender', 'admit', 'iq', 'sat']) # Make sure this order matches the order you're loading below
synthetic_data = load_from_file("/tmp/test.csv", correct_types = [str, str, float, float], response_header = 'admit')

And now we can audit this dataset with one of the existing classifiers:

In [None]:
auditor = BBA.Auditor()
auditor.ModelFactory = SVM
auditor(synthetic_data, output_dir="synthetic-audit-output")

# Auditing your own model

But what if you want to audit your own model? Here we show a very simple example of a (hard-coded) classifier.

In [None]:
from BlackBoxAuditing.model_factories.AbstractModelFactory import AbstractModelFactory
from BlackBoxAuditing.model_factories.AbstractModelVisitor import AbstractModelVisitor

class SATPredictor(AbstractModelVisitor):
    def __init__(self):
        pass
    def test(self, test_set, test_name=""):
        return [(v[1], "True" if v[3] > 1100 else "False")
                for v in test_set]
class SATPredictorBuilder(AbstractModelFactory):
    def __init__(self, *args, **kwargs):
        AbstractModelFactory.__init__(self, *args, **kwargs)
        self.verbose_factory_name = "SATPredictor"
    def build(self, train_set):
        return SATPredictor()
    
auditor = BBA.Auditor()
auditor.ModelFactory = SATPredictorBuilder
auditor(synthetic_data, output_dir="synthetic-audit-output")