<h1 style="color:#032652 ; font-family:'Times New Roman' ">Customer Churn Prediction and Analysis for Krisp </h1>


<p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 18px ;text-align: justify "> <br> This notebook presents a comprehensive machine learning analysis on a pre-processed dataset obtained from the previous analysis notebook. The dataset has been thoroughly cleaned and prepared for further analysis. In this notebook, we will evaluate several machine learning models and conduct hyperparameter tuning for each model. All the models will be trained on the same dataset, enabling us to compare their performance and identify the best model for our specific use case. </p>

> <p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 18px ;text-align: justify "> Loading libraries and data </p>

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tabulate import tabulate

from sklearn.preprocessing import LabelEncoder, StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split, cross_val_score, KFold, RepeatedStratifiedKFold, GridSearchCV
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import make_scorer, roc_auc_score
from sklearn.metrics import accuracy_score, classification_report, roc_auc_score, roc_curve, precision_score, recall_score, f1_score, confusion_matrix

from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
from xgboost import XGBClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from bayes_opt import BayesianOptimization
from scipy.stats import randint as sp_randint

import warnings
warnings.filterwarnings("ignore")

In [2]:
# Loading the data into a dataframe 
churn_data = pd.read_csv("cleaned_data.csv")
churn_data_lr = pd.read_csv("cleaned_data_lr.csv")
churn_data.head()

Unnamed: 0,domain category,channel grouping,desktop category,overall nc(mins),average monthly nc(mins),tenure,churn,plan interval,payment method,country tier
0,Generic Domain,(Other),Win,196757.7,7027.05,12.0,No,year,stripe,Tier 2
1,Generic Domain,Referral,Win,49593.68,3542.4,13.7,No,year,paypal,Tier 1
2,Generic Domain,Social,Win,226809.67,10309.52,24.0,No,year,stripe,Tier 1
3,Work Domain,(Other),Win,20165.22,1120.28,24.3,No,year,stripe,Tier 1
4,Generic Domain,(Other),Both,46416.12,1856.63,24.0,No,year,stripe,Tier 2


> <p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 18px ;text-align: justify ">  Splitting the dataset into dependent and independent variables and generating training and test datasets </p>

<p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 17px ;text-align: justify "> In order to perform an accurate analysis on the dataset, it is crucial to first separate the dataset into dependent and independent variables. In this specific case, the dependent variable 'Churn' is the target value while the remaining independent variables will be assigned to the X variable. After separating the variables, the dataset will be split into training and testing sets with a standard 80/20 ratio.<br>
<br>Once the dataset is divided into X and y values, we will split the data into training and testing sets using an 80/20 ratio. Before proceeding with modeling, it is important to ensure that the categorical variables are transformed into numerical variables. To prevent any potential data leakage, we will use LabelEncoder to transform categorical columns into numerical ones after splitting the data into training and testing sets.<br>
    <br>These necessary data preprocessing steps will be applied correspondingly to both normal and logistic regression datasets, ensuring that our models are trained on accurately prepared data.</p>

In [3]:
# changing churn categorical into numerical
churn_data['churn'] = churn_data['churn'].map({'No': 0, 'Yes': 1})
churn_data_lr['churn'] = churn_data_lr['churn'].map({'No': 0, 'Yes': 1})

In [4]:
y = churn_data['churn']
X = churn_data.drop('churn', axis= 1)

In [5]:
y_lr = churn_data_lr['churn']
X_lr = churn_data_lr.drop('churn', axis= 1)


In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.2, stratify= y, random_state= 0)

print("The shape of the X_train dataset: ", X_train.shape)
print("The shape of the y_train dataset: ", y_train.shape)
print("The shape of the X_test dataset: ", X_test.shape)
print("The shape of the y_test dataset: ", y_test.shape)

The shape of the X_train dataset:  (58715, 9)
The shape of the y_train dataset:  (58715,)
The shape of the X_test dataset:  (14679, 9)
The shape of the y_test dataset:  (14679,)


In [7]:
X_train_lr, X_test_lr, y_train_lr, y_test_lr = train_test_split(X_lr, y_lr, test_size= 0.2, stratify= y_lr, random_state= 0)

In [8]:
#identifying categorical columns 
categorical_cols = X_train.select_dtypes(exclude=np.number).columns

#initializing an empty dictionary to store LabelEncoder instances for each categorical column.
encoders = {}

for i in categorical_cols:
    encoders[i] = LabelEncoder()
    X_train[i] = encoders[i].fit_transform(X_train[i])
    
for i in categorical_cols:
    X_test[i] = encoders[i].transform(X_test[i])

In [9]:
categorical_cols_lr = X_train_lr.select_dtypes(exclude=np.number).columns
encoders = {}
for i in categorical_cols_lr:
    encoders[i] = LabelEncoder()
    X_train_lr[i] = encoders[i].fit_transform(X_train_lr[i])
    
for i in categorical_cols_lr:
    X_test_lr[i] = encoders[i].transform(X_test_lr[i])

> <p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 18px ;text-align: justify "> Feature Scaling </p>

<p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 17px ;text-align: justify ">It’s quite important to normalize the variables before conducting any machine learning (classification) algorithms so that all the training and test variables are scaled within a range of 0 to 1.</p>

In [10]:
sc = StandardScaler()

In [11]:
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

In [12]:
X_train_lr = sc.fit_transform(X_train_lr)
X_test_lr = sc.fit_transform(X_test)

> <p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 18px ;text-align: justify ">Logistic Regression </p>

<p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 17px ;text-align: justify "> Logistic regression  works by modeling the relationship between the input features and the probability of belonging to a particular class. The output is a probability estimate between 0 and 1, which can be thresholded to make binary predictions.<br>
    <br> Grid search hyper parameter tuning is used to find the optimal combination of hyperparameters for a model, which can improve its performance. Logistic regression has several hyperparameters, such as the regularization parameter and the solver algorithm, that can affect its accuracy and generalization ability. By using grid search, we can exhaustively search through a range of hyperparameters and select the best combination that maximizes a performance metrics. This helps to avoid the risk of overfitting to the training set or underfitting to the test set, and can lead to a more robust and accurate model.<p>

In [13]:
model = LogisticRegression()

In [14]:
# Define the hyperparameters to tune
hyperparameters = {'penalty': ['l1', 'l2', 'elasticnet', 'none'],
                   'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000],
                   'solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'],
                   'max_iter': [100, 500, 1000, 5000]}


In [16]:
# Create the GridSearchCV object
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(model, hyperparameters, cv=cv, verbose=0, scoring='f1')


In [17]:
# Fit the GridSearchCV object to the data
grid_search.fit(X_train_lr, y_train_lr)


In [18]:
# Print the best hyperparameters
print("Best hyperparameters: ", grid_search.best_params_)

Best hyperparameters:  {'C': 1000, 'max_iter': 1000, 'penalty': 'l1', 'solver': 'saga'}


In [19]:
# create a new instance of the model with the best hyperparameters and fit it to the entire training dataset
best_model = LogisticRegression(**grid_search.best_params_)
best_model.fit(X_train, y_train)


In [20]:
# make predictions on the test dataset and evaluate the model's performance
y_pred = best_model.predict(X_test)
print('Accuracy:', accuracy_score(y_test, y_pred))
print('Classification report:', classification_report(y_test, y_pred))
print('ROC AUC score:', roc_auc_score(y_test, y_pred))

Accuracy: 0.7717146944614756
Classification report:               precision    recall  f1-score   support

           0       0.80      0.88      0.84      9767
           1       0.70      0.57      0.62      4912

    accuracy                           0.77     14679
   macro avg       0.75      0.72      0.73     14679
weighted avg       0.77      0.77      0.77     14679

ROC AUC score: 0.7206270412500512


<p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 17px ;text-align: justify ">Precision: The precision of the model for the non-churned class is 0.80, indicating that out of all the positive predictions made by the model for non-churned customers, 80% of them are actually true positive. For the churned class, the precision is 0.70, meaning that when the model predicts a positive outcome for churned customers, it is correct 70% of the time.<br>
    <br>Recall: The recall of the model for the non-churned class is 0.88, which implies that the model correctly identifies 88% of all actual positive instances in the dataset for non-churned customers. However, for churned customers, the recall is only 0.57, indicating that the model is unable to identify 43% of churned customers.<br>
    <br> F1-score: The F1-score for the positive class (non-churned) is 0.84, which is a harmonic mean of precision and recall. For the negative class (churned), the F1-score is 0.62. The F1 score measures the model's balance between precision and recall. The score of 0.84 for non-churned customers is good, but the score of 0.62 for churned customers is not optimal. <br> ROC curves with an AUC ≤0.75 are not considered clinically useful.</p>

> <p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 18px ;text-align: justify ">Random Forest</p>

 <p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 17px ;text-align: justify ">Random Forest is an ensemble learning method that combines multiple decision trees to improve the overall performance of the model.Compared to Grid Search, Randomized Search can search a larger space of hyperparameters in a shorter time. This is particularly useful for Random Forest Classifier, which has many hyperparameters that need to be tuned.</p>

In [15]:
# Create a Random Forest Classifier object
rf = RandomForestClassifier()

In [16]:
# Define the hyperparameter space to search
param_dist = {'n_estimators': sp_randint(50, 500), # Number of trees in random forest
              'max_features': sp_randint(1, 10), # Number of features to consider at every split
              'max_depth': sp_randint(1, 20), # Maximum number of levels in tree
              'min_samples_split': sp_randint(2, 20), # Minimum number of samples required to split a node
              'min_samples_leaf': sp_randint(1, 20), # Minimum number of samples required at each leaf node
             'bootstrap' : [True, False]} # Method of selecting samples for training each tree

In [17]:
# Perform the random search
random_search = RandomizedSearchCV(rf, param_distributions=param_dist, n_iter= 30, n_jobs=-1, cv=5, random_state=42)
random_search.fit(X_train, y_train)

In [18]:
# Print the best hyperparameters found
print('Best hyperparameters:', random_search.best_params_)

Best hyperparameters: {'bootstrap': True, 'max_depth': 15, 'max_features': 8, 'min_samples_leaf': 7, 'min_samples_split': 12, 'n_estimators': 137}


In [19]:
# Fit the model with the best hyperparameters and make predictions on the test set
rf_best = RandomForestClassifier(**random_search.best_params_)
rf_best.fit(X_train, y_train)
y_pred = rf_best.predict(X_test)

In [21]:

# Evaluate the model's performance on the test set
print('Precision:', precision_score(y_test, y_pred))
print('Recall:', recall_score(y_test, y_pred))
print('F1 score:', f1_score(y_test, y_pred))
print('ROC-AUC score:', roc_auc_score(y_test, y_pred))
print(classification_report(y_test, y_pred))


Precision: 0.7874306839186691
Recall: 0.6938110749185668
F1 score: 0.7376623376623376
ROC-AUC score: 0.7998081687687949
              precision    recall  f1-score   support

           0       0.85      0.91      0.88      9767
           1       0.79      0.69      0.74      4912

    accuracy                           0.83     14679
   macro avg       0.82      0.80      0.81     14679
weighted avg       0.83      0.83      0.83     14679



> <p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 18px ;text-align: justify ">Support Vector Classifier</p>

 <p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 17px ;text-align: justify ">Support Vector Classifier (SVC)  tries to find the best possible boundary that can separate the churn and non-churned customers in the dataset by maximizing the margin between the two classes.. By using grid search hyperparameter tuning for SVC, we can fine-tune the model and select the optimal values of hyperparameters such as kernel type, regularization parameter, and gamma to achieve the best possible performance for churn prediction.</p>


In [13]:
# Defining parameter grid for GridSearchCV
param_grid = {'C':  [10, 1.0, 0.1, 0.01], 
              'gamma': [1,0.1,0.01, 0.001],
              'kernel': ['linear','rbf', 'poly', 'sigmoid']}

In [14]:
svc = SVC()

In [None]:
cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=3, random_state=1)
# Initializing GridSearchCV object
grid_search = GridSearchCV(svc, param_grid, cv=cv, scoring='f1')
grid_search.fit(X_train, y_train)

In [None]:
# Get the best hyperparameters
best_params = grid_search.best_params_

In [None]:
# Create a new instance of the SVC model with the best hyperparameters
svc_model_best = SVC(**best_params)

In [None]:
# Fit the model on the training data using the best hyperparameters
svc_model_best.fit(X_train, y_train)

In [None]:
# Make predictions on the test data using the fitted model
y_pred = svc_model_best.predict(X_test)

In [None]:
# Evaluate the performance of the model on the test set
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
confusion_matrix = confusion_matrix(y_test, y_pred)
classification_report = classification_report(y_test, y_pred)


print('Precision:', precision)
print('Recall:', recall)
print('F1 Score:', f1)
print('confusion_matrix:', confusion_matrix)
print('classification_report:', classification_report)

> <p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 18px ;text-align: justify ">Decsion Trees</p>

 <p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 17px ;text-align: justify ">Grid search hyper parameter tuning is used with decision tree classifiers to find the optimal values for hyperparameters such as the maximum depth of the tree, the minimum number of samples required to split an internal node, and the minimum number of samples required to be at a leaf node</p>

In [20]:
# Define the decision tree classifier
dt = DecisionTreeClassifier(random_state=42)

In [21]:
from sklearn.model_selection import GridSearchCV

In [22]:
# Create the parameter grid based on the results of random search 
params = {
    'max_depth': [2, 3, 5, 10, 20],
    'min_samples_leaf': [5, 10, 20, 50, 100],
    'criterion': ["gini", "entropy"]
}

In [23]:
grid_search = GridSearchCV(estimator=dt, param_grid=params, cv=5, n_jobs=-1, verbose=1, scoring = "f1")

In [24]:
grid_search.fit(X_train, y_train)

Fitting 5 folds for each of 50 candidates, totalling 250 fits


In [25]:
# Print the best hyperparameters and their corresponding score
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Score:", grid_search.best_score_)

Best Hyperparameters: {'criterion': 'entropy', 'max_depth': 10, 'min_samples_leaf': 20}
Best Score: 0.7308942178549648


In [26]:
# Fit the classifier using the best hyperparameters and predict on test data
dt_best = DecisionTreeClassifier(random_state=42, **grid_search.best_params_)
dt_best.fit(X_train, y_train)
y_pred = dt_best.predict(X_test)

In [27]:
# Print the classification report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.84      0.90      0.87      9767
           1       0.77      0.65      0.71      4912

    accuracy                           0.82     14679
   macro avg       0.80      0.78      0.79     14679
weighted avg       0.82      0.82      0.81     14679



> <p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 18px ;text-align: justify ">Gradient Boosting</p>

<p style=" color:#032652 ; font-family:'Times New Roman'; font-size: 17px ;text-align: justify ">The Gradient Boosting Classifier is an algorithm that builds a series of decision trees iteratively, correcting the mistakes of the previous ones. It is highly accurate and can handle complex non-linear relationships in data, making it a good choice for churn prediction. It is also robust against overfitting, and can handle missing values and outliers in data. To optimize the performance of the model, we will use Grid Search method for hyper parameter tuning.</p>

In [13]:
# Defining the model
gbm = GradientBoostingClassifier()

In [14]:
# Defining the hyperparameters grid for GridSearchCV
param_grid = {
    'n_estimators': [100, 200, 300],
    'learning_rate': [0.01, 0.05, 0.1],
    'max_depth': [3, 5, 7],
    'min_samples_split': [2, 4, 6],
    'min_samples_leaf': [1, 2, 4]
}


In [15]:
# Creating the GridSearchCV object
gbm_grid = GridSearchCV(gbm, param_grid=param_grid, scoring = "f1", cv=5, n_jobs=-1)


In [None]:
# Fitting the model with the best hyperparameters
gbm_grid.fit(X_train, y_train)

In [None]:
# Getting the best parameters and the best score
print("Best parameters: ", gbm_grid.best_params_)
print("Best score: ", gbm_grid.best_score_)

In [None]:

# Predicting on the test set with the best model
y_pred = gbm_grid.predict(X_test)


In [None]:
# Calculating the accuracy, precision, recall, and F1 score
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Printing the classification report and the confusion matrix
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Precision: ", precision)
print("Recall: ", recall)
print("F1 Score: ", f1)
