# Gradient Boosting

In this notebook, we will use a **Gradient Boosting** model to predict customer churn using the already pre-processed `customer_churn_processed.csv` dataset.

We will also evaluate the model for accuracy, precision and recall, and store the results in a file for comparative analysis of results with other models in later stages of this project phase.

In [13]:
# import dependencies
import time
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score

from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score

# input file containing preprocessed data
input_csv = "../../data/customer_churn_processed.csv"
# output file to be saved containing model results
output_csv = "../model_results/gradient_boosting_results.csv"

## Data

Load and prepare data for training and testing the model.

In [14]:
# Load the data
data = pd.read_csv(input_csv)

# Split the data into X and y
X = data.drop('Churn', axis=1)
y = data['Churn']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

## Initialize the Model

Here, we will define a **gradient boosting** model with completely default values.

In [15]:
# Initialize the model
model = GradientBoostingClassifier()

## Model Training & Prediction

Let's now train the model on the training data and make predictions on the test data.

In [16]:
# Record the start time before training the model
start = time.time()

# Train the model
model.fit(X_train, y_train)

# Record the end time after the model has been trained
end = time.time()

# Record the training time
training_time = end - start

# Make predictions
y_pred = model.predict(X_test)

## Model Evaluation

Finally, we will evaluate the model for accuracy, precision and recall.

In [17]:
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
fscore = f1_score(y_test, y_pred)

# Create DataFrame to store evaluation results and training time
evaluation_results = pd.DataFrame({
    'Model': ['Gradient Boosting'],
    'Accuracy': [accuracy],
    'Precision': [precision],
    'Recall': [recall],
    'F1 score': [fscore],
    'Training Time': [training_time]
})

# Print the evaluation metrics
evaluation_results

Unnamed: 0,Model,Accuracy,Precision,Recall,F1 score,Training Time
0,Gradient Boosting,0.997,0.989708,0.994828,0.992261,0.406395


As can be observed from above, the **Gradient Boosting** model achieved a very high score for all measured metrics, which is a good indicator that the model is performing well on our dataset.

## Save Results

We will save the results in a file for comparative analysis of results with other models in later stages of this project phase.

In [18]:
# save results to output_csv
evaluation_results.to_csv(output_csv, index=False)