### Telecom Churn Prediction

Customer churn, also known as customer attrition, customer turnover, or customer defection, is the loss of clients or customers.

Problem Statement:

Using the method of Boosting, classify whether or not the customer will churn

Why solve this project ?

After completing this project, you will have a better understanding of how to build a boosting model. In this project, you will apply the following concepts.

- Handling missing values in data
- Applying AdaBoost
- Applying XGBoost
- Interpreting evaluation metrics

In [5]:
import warnings
warnings.filterwarnings('ignore')

##### Load data

In [1]:
path

In [6]:
import pandas as pd
from sklearn.model_selection import train_test_split
#path - Path of file 

# Code starts here

#Reading of file
df = pd.read_csv(path)
print(df.head())
#Extracting features
X = df.drop(['Churn','customerID'],1)

#Extracting target class
y = df['Churn']

#Splitting data into train and test
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.3, random_state = 0)

   customerID  gender  SeniorCitizen Partner Dependents  tenure PhoneService  \
0  7590-VHVEG  Female              0     Yes         No       1           No   
1  5575-GNVDE    Male              0      No         No      34          Yes   
2  3668-QPYBK    Male              0      No         No       2          Yes   
3  7795-CFOCW    Male              0      No         No      45           No   
4  9237-HQITU  Female              0      No         No       2          Yes   

      MultipleLines InternetService OnlineSecurity  ... DeviceProtection  \
0  No phone service             DSL             No  ...               No   
1                No             DSL            Yes  ...              Yes   
2                No             DSL            Yes  ...               No   
3  No phone service             DSL            Yes  ...              Yes   
4                No     Fiber optic             No  ...               No   

  TechSupport StreamingTV StreamingMovies        Contract Pape

##### Clean Data

In [7]:
import numpy as np
from sklearn.preprocessing import LabelEncoder

# Code starts here

#Replacing spaces with 'NaN' in train dataset
X_train['TotalCharges'].replace(' ',np.NaN, inplace=True)

#Replacing spaces with 'NaN' in test dataset
X_test['TotalCharges'].replace(' ',np.NaN, inplace=True)

#Converting the type of column from X_train to float
X_train['TotalCharges'] = X_train['TotalCharges'].astype(float)

#Converting the type of column from X_test to float
X_test['TotalCharges'] = X_test['TotalCharges'].astype(float)

#Filling missing values
X_train['TotalCharges'].fillna(X_train['TotalCharges'].mean(),inplace=True)
X_test['TotalCharges'].fillna(X_train['TotalCharges'].mean(), inplace=True)

#Check value counts
print(X_train.isnull().sum())

cat_cols = X_train.select_dtypes(include='O').columns.tolist()

#Label encoding train data
for x in cat_cols:
    le = LabelEncoder()
    X_train[x] = le.fit_transform(X_train[x])

#Label encoding test data    
for x in cat_cols:
    le = LabelEncoder()    
    X_test[x] = le.fit_transform(X_test[x])

#Encoding train data target    
y_train = y_train.replace({'No':0, 'Yes':1})

#Encoding test data target
y_test = y_test.replace({'No':0, 'Yes':1})


gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
dtype: int64


##### AdaBoost Implementation

In [8]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix

# Code starts here

# Initialising AdaBoostClassifier model
ada_model = AdaBoostClassifier(random_state=0)

#Fitting the model on train data
ada_model.fit(X_train,y_train)

#Making prediction on test data
y_pred = ada_model.predict(X_test)

#Finding the accuracy score
ada_score = accuracy_score(y_test,y_pred)
print("Accuracy: ",ada_score)

#Finding the confusion matrix
ada_cm=confusion_matrix(y_test,y_pred)
print('Confusion matrix: \n', ada_cm)

#Finding the classification report
ada_cr=classification_report(y_test,y_pred)
print('Classification report: \n', ada_cr)

Accuracy:  0.795551348793185
Confusion matrix: 
 [[1371  189]
 [ 243  310]]
Classification report: 
               precision    recall  f1-score   support

           0       0.85      0.88      0.86      1560
           1       0.62      0.56      0.59       553

    accuracy                           0.80      2113
   macro avg       0.74      0.72      0.73      2113
weighted avg       0.79      0.80      0.79      2113



##### XGBoost Implementation

In [10]:
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV

#Parameter list
parameters={'learning_rate':[0.1,0.15,0.2,0.25,0.3],
            'max_depth':range(1,3)}

# Code starts here

#Initializing the model
xgb_model = XGBClassifier(random_state=0)

#Fitting the model on train data
xgb_model.fit(X_train,y_train)

#Making prediction on test data
y_pred = xgb_model.predict(X_test)

#Finding the accuracy score
xgb_score = accuracy_score(y_test,y_pred)
print("Accuracy: ",xgb_score)

#Finding the confusion matrix
xgb_cm=confusion_matrix(y_test,y_pred)
print('Confusion matrix: \n', xgb_cm)

#Finding the classification report
xgb_cr=classification_report(y_test,y_pred)
print('Classification report: \n', xgb_cr)


### GridSearch CV
print()
print("Applying GridSearcCv")
print()
#Initialsing Grid Search
clf = GridSearchCV(xgb_model, parameters)

#Fitting the model on train data
clf.fit(X_train,y_train)

#Making prediction on test data
y_pred = clf.predict(X_test)

#Finding the accuracy score
clf_score = accuracy_score(y_test,y_pred)
print("Accuracy: ",clf_score)

#Finding the confusion matrix
clf_cm=confusion_matrix(y_test,y_pred)
print('Confusion matrix: \n', clf_cm)

#Finding the classification report
clf_cr=classification_report(y_test,y_pred)
print('Classification report: \n', clf_cr)

#Code ends here

Accuracy:  0.79649787032655
Confusion matrix: 
 [[1388  172]
 [ 258  295]]
Classification report: 
               precision    recall  f1-score   support

           0       0.84      0.89      0.87      1560
           1       0.63      0.53      0.58       553

    accuracy                           0.80      2113
   macro avg       0.74      0.71      0.72      2113
weighted avg       0.79      0.80      0.79      2113


Applying GridSearcCv

Accuracy:  0.8017037387600567
Confusion matrix: 
 [[1394  166]
 [ 253  300]]
Classification report: 
               precision    recall  f1-score   support

           0       0.85      0.89      0.87      1560
           1       0.64      0.54      0.59       553

    accuracy                           0.80      2113
   macro avg       0.75      0.72      0.73      2113
weighted avg       0.79      0.80      0.80      2113

