# Confusion Matrix Demo

In this demo, you will see how to create a confusion matrix to evaluate the accuracy of a model using scikit-learn's `confusion_matrix()` function. For more information, consult the online [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html).

### Import Packages

Before you get started, import a few packages. Run the code cell below. 

In [1]:
import pandas as pd
import numpy as np
import os 
import matplotlib.pyplot as plt
import seaborn as sns


We will also import the scikit-learn `DecisionTreeClassifier`, the `train_test_split()` function for splitting the data into training and test sets, the function `accuracy_score()` to evaluate your model, and the function `confusion_matrix()` to create a confusion matrix.

In [2]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix

##  Step 1: Load a 'ready-to-fit' Data Set 

We will work with the "cell2celltrain" data set. This data set is already preprocessed, with the proper formatting, outliers, and missing values taken care of, and all numerical columns scaled to the [0, 1] interval. One-hot encoding has been performed on all categorical columns. Run the cell below to load the data set and save it to DataFrame `df`.

In [3]:
filename = os.path.join(os.getcwd(), "data", "cell2celltrain.csv")
df = pd.read_csv(filename, header=0)

## Step 2: Create Training and Test Data Sets

#### Create Labeled Examples

In [5]:
y = df['Churn']
X = df.drop(columns = 'Churn', axis=1)
X.head()

Unnamed: 0,CustomerID,ChildrenInHH,HandsetRefurbished,HandsetWebCapable,TruckOwner,RVOwner,HomeownershipKnown,BuysViaMailOrder,RespondsToMailOffers,OptOutMailings,...,Occupation_Crafts,Occupation_Homemaker,Occupation_Other,Occupation_Professional,Occupation_Retired,Occupation_Self,Occupation_Student,Married_False,Married_True,Married_nan
0,3000002,False,False,True,False,False,True,True,True,False,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0
1,3000010,True,False,False,False,False,True,True,True,False,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0
2,3000014,True,False,False,False,False,False,False,False,False,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3,3000022,False,False,True,False,False,True,True,True,False,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
4,3000026,False,False,False,False,False,True,True,True,False,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0


####  Split Labeled Examples Into Training and Test Sets

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state=1234)

## Step 3: Fit a Decision Tree Classifier and Make Predictions

In [18]:
from sklearn.model_selection import cross_val_score

accuracy_scores = []
model = DecisionTreeClassifier(max_depth = 2, min_samples_leaf=1)
    
# 2. Perform a k-fold cross-validation for the decision tree
# YOUR CODE HERE
acc_score = cross_val_score(model, X_train, y_train, cv = 5)

# 3. Find the mean of the resulting accuracy scores 
# YOUR CODE HERE
acc_mean = np.mean(acc_score)

# 4. Append the mean score to the list accuracy_scores
# YOUR CODE HERE
accuracy_scores.append(acc_mean)
    

## Step 4: Check the Accuracy of Your Model

Execute the code cell below to see the accuracy score of your model and the confusion matrix.

In [20]:
# Compute and print model's accuracy score
model.fit(X_train, y_train) 
class_label_predictions= model.predict(X_test) 
acc_score = accuracy_score(y_test, class_label_predictions)
print('Accuracy score: ' + str(acc_score))

# Display a confusion matrix
print('Confusion Matrix for the model: ')

pd.DataFrame(
confusion_matrix(y_test, class_label_predictions, labels=[True, False]),
columns=['Predicted: Customer Will Leave', 'Predicted: Customer Will Stay'],
index=['Actual: Customer Will Leave', 'Actual: Customer Will Stay']
)


Accuracy score: 0.715181194906954
Confusion Matrix for the model: 


Unnamed: 0,Predicted: Customer Will Leave,Predicted: Customer Will Stay
Actual: Customer Will Leave,30,1433
Actual: Customer Will Stay,21,3621
