# Decision Tree

In this notebook, we will use a **Decision Tree** model to predict customer churn using the already pre-processed `customer_churn_processed.csv` dataset.

We will also evaluate the model for accuracy, precision and recall, and store the results in a file for comparative analysis of results with other models in later stages of this project phase.

In [19]:
# import dependencies
import time
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score

# input file containing preprocessed data
input_csv = "../../data/customer_churn_processed.csv"
# output file to be saved containing model results
output_csv = "../model_results/decision_tree_results.csv"

## Data

Load and prepare data for training and testing the model.

In [20]:
# Load the data
data = pd.read_csv(input_csv)

# Split the data into X and y
X = data.drop('Churn', axis=1)
y = data['Churn']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

We will define a **Decision Tree** model with default parameters, however, we will set the parameter `n_jobs=-1` to use all available cores in the machine.

In [21]:
# Initialize the decision tree classifier
model = DecisionTreeClassifier(random_state=42)

## Model Training & Prediction

Let's now train the model on the training data and make predictions on the test data.

In [22]:
# Record the start time before training the model
start = time.time()

# Train the model
model.fit(X_train, y_train)

# Record the end time after the model has been trained
end = time.time()

# Record the training time
training_time = end - start

# Record the start time before running the model
start = time.time()

# Make predictions
y_pred = model.predict(X_test)

# Record the end time after the model has been run
end = time.time()

# Record the training time
prediction_time = end - start

## Model Evaluation

Finally, we will evaluate the model for accuracy, precision and recall.

In [23]:
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
fscore = f1_score(y_test, y_pred)

evaluation_results = pd.DataFrame({
    'Model': ['Decision Tree'],
    'Accuracy': [accuracy],
    'Precision': [precision],
    'Recall': [recall],
    'F1 score': [fscore],
    'Training Time': [training_time],
    'Prediction Time': [prediction_time]
})

# Print the evaluation metrics
evaluation_results

Unnamed: 0,Model,Accuracy,Precision,Recall,F1 score,Training Time,Prediction Time
0,Decision Tree,0.996667,0.988314,0.994958,0.991625,0.014074,0.001288


As can be observed from above, the **Decision Tree Classifier** model achieved a very high score for all measured metrics, which is a good indicator that the model is performing well on our dataset.

## Save Results

We will save the results in a file for comparative analysis of results with other models in later stages of this project phase.

In [24]:
# save results for Decision Tree Classifier model
evaluation_results.to_csv(output_csv, index=False)