Open a Jupyter Notebook to implement this exercise and import all the
required elements:

In [1]:
from sklearn.datasets import load_breast_cancer
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import tree
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score

The fourth line imports the tree module from scikit-learn, which will be used to
train a decision tree model on the training data in this exercise. The lines of code
below that will import the different evaluation metrics that will be calculated
during this exercise.

The breast cancer toy dataset contains the final diagnosis (malignant or benign)
of the analysis of masses found in the breasts of 569 women. Load the dataset
and create features and target Pandas DataFrames, as follows:

In [2]:
data = load_breast_cancer()

X = pd.DataFrame(data.data)
Y = pd.DataFrame(data.target)

Split the dataset using the conventional split approach:

In [3]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.1, random_state=0)

First, instantiate the DecisionTreeClassifier class from scikit-learn's
tree module. Next, train a decision tree on the train set. Finally, use the model
to predict the class label on the test set. Use the following code to do this:

In [4]:
model = tree.DecisionTreeClassifier(random_state=0)
model = model.fit(X_train, Y_train)
Y_pred = model.predict(X_test)

## Confusion Matrix

Use scikit-learn to construct a confusion matrix, as follows:

In [5]:
confusion_matrix(Y_test, Y_pred)

array([[21,  1],
       [ 6, 29]])

## Accuracy

Calculate the accuracy, precision, and recall of the model by comparing Y_test
and Y_pred:

In [6]:
accuracy = accuracy_score(Y_test, Y_pred)
print("accuracy:", accuracy)

accuracy: 0.8771929824561403


## Precision

In [7]:
precision = precision_score(Y_test, Y_pred)
print("precision:", precision)

precision: 0.9666666666666667


## Recall

In [8]:
recall = recall_score(Y_test, Y_pred)
print("recall:", recall)

recall: 0.8285714285714286
