# Binary Classification Example

In this notebook, we will go over how to mimic a scikit learn model that performs binary classification using the `aisquared` library.

In [1]:
# import all required packages
from aisquared import utils
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import MinMaxScaler
import numpy as np

In [2]:
# Load the data and split into training and testing sets
data = load_breast_cancer()['data']
labels = load_breast_cancer()['target']

x_train, x_test, y_train, y_test = train_test_split(data, labels)

scaler = MinMaxScaler().fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)

In [3]:
# Train the scikit-learn classifier
clf = DecisionTreeClassifier(max_depth = 4).fit(x_train, y_train)

In [4]:
# Create the neural network to train with
nnet = utils.get_model('fc', x_train.shape[-1], 1, output_activation = 'sigmoid', size = 'medium')

In [5]:
# Use the mimic_model function to train the neural network to replicate the original model's performance
nnet = utils.mimic_model(
clf,
nnet,
x_train,
x_test,
y_test,
'classification',
'binary_crossentropy',
'accuracy',
'adam'
)

Epoch 1/100


2023-01-05 14:38:55.187579: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


 1/14 [=>............................] - ETA: 3s - loss: 0.6919 - accuracy: 0.3750Performance measure set to accuracy
Model performance has not reached pruning threshold for 1 epoch(s)
Epoch 2/100
 1/14 [=>............................] - ETA: 0s - loss: 0.4938 - accuracy: 0.8750Model performance has not reached pruning threshold for 2 epoch(s)
Epoch 3/100
 1/14 [=>............................] - ETA: 0s - loss: 0.1942 - accuracy: 0.9375Model performance reached 0.92, sparsifying to 5
Epoch 4/100
 1/14 [=>............................] - ETA: 0s - loss: 0.1415 - accuracy: 0.9375Model performance reached 0.93, sparsifying to 10
Epoch 5/100
 1/14 [=>............................] - ETA: 0s - loss: 0.2385 - accuracy: 0.9062Model performance reached 0.96, sparsifying to 15
Epoch 6/100
 1/14 [=>............................] - ETA: 0s - loss: 0.0820 - accuracy: 1.0000Model performance reached 0.97, sparsifying to 20
Epoch 7/100
 1/14 [=>............................] - ETA: 0s - loss: 0.0131 - a

In [6]:
# Get the predictions on test data from the original classifier and the neural network
clf_preds = clf.predict(x_test)
nnet_preds = (nnet.predict(x_test) >= 0.5).astype(int)



In [7]:
# Present the confusion matrices for both models

print('Performance on test data for original model:')
print(confusion_matrix(y_test, clf_preds))

print('\n\n')

print('Performance on test data for neural network:')
print(confusion_matrix(y_test, nnet_preds))

Performance on test data for original model:
[[47  6]
 [ 2 88]]



Performance on test data for neural network:
[[46  7]
 [ 5 85]]
