Classification and Regression Trees // ML Practice
----

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

Training a Basic Classification Tree on Tumor Data
----
Train a classification tree to predict whether a tumor is malignant or benign using two features: radius_mean and concave points_mean. The data is already split into training and testing sets, with X_train and y_train used for training, and X_test and y_test reserved for evaluation. Use a fixed random seed (SEED = 1) to ensure reproducibility. Fit the decision tree classifier to the training data and prepare it for evaluation on the test set.

Prepping Dataset

In [2]:
cancer_df = pd.read_csv(r"C:\Users\Emigb\Documents\Data Science\datasets\wbc.csv")
cancer_df.head()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


In [3]:
cancer_data = cancer_df[['diagnosis', 'radius_mean', 'concave points_mean']]
cancer_data.head()

Unnamed: 0,diagnosis,radius_mean,concave points_mean
0,M,17.99,0.1471
1,M,20.57,0.07017
2,M,19.69,0.1279
3,M,11.42,0.1052
4,M,20.29,0.1043


In [4]:
cancer_dum = pd.get_dummies(cancer_data['diagnosis'], drop_first=True).astype(int)
cancer_dec = pd.concat([cancer_data, cancer_dum], axis=1)
cancer_dec.head()

Unnamed: 0,diagnosis,radius_mean,concave points_mean,M
0,M,17.99,0.1471,1
1,M,20.57,0.07017,1
2,M,19.69,0.1279,1
3,M,11.42,0.1052,1
4,M,20.29,0.1043,1


In [5]:
cancer_dec.drop('diagnosis', axis=1, inplace=True)
cancer_dec.head()

Unnamed: 0,radius_mean,concave points_mean,M
0,17.99,0.1471,1
1,20.57,0.07017,1
2,19.69,0.1279,1
3,11.42,0.1052,1
4,20.29,0.1043,1


In [6]:
X = cancer_dec.drop('M', axis=1).values
y = cancer_dec['M'].values

print(X.shape)
print(y.shape)

(569, 2)
(569,)


In [7]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.3, random_state=21, stratify = y)

In [10]:
#1. Import DecisionTreeClassifier from sklearn.tree.
from sklearn.tree import DecisionTreeClassifier

#2. Instantiate a DecisionTreeClassifier dt of maximum depth equal to 6.
dt = DecisionTreeClassifier(max_depth=6)

#3. Fit dt to the training set.
dt.fit(X_train, y_train)

#4. Predict the test set labels and assign the result to y_pred.
y_pred = dt.predict(X_test)
print(y_pred[0:5])

[1 1 1 0 1]


**Task: Evaluating the Classification Tree Model**
---
Evaluate the performance of the trained decision tree model `dt` using the test dataset. Use the `X_test` feature matrix to generate predictions and compare them with the true labels in `y_test`. Calculate the accuracy score, which represents the proportion of correct predictions made by the model on the test set. This gives a quick assessment of how well the model generalizes to unseen data.


In [11]:
#1. Import the function accuracy_score from sklearn.metrics.
from sklearn.metrics import accuracy_score

#2. Predict the test set labels and assign the obtained array to y_pred.
y_pred = dt.predict(X_test)

#3. Evaluate the test set accuracy score of dt by calling accuracy_score() and assign the value to acc.
acc = accuracy_score(y_test, y_pred)
acc

0.9122807017543859