## Customer churn model training for customer retention

The following process will compare customer churn prediction based on a simple Neural Network (Deep Learning) approach compared to tradition ML approaches. While Deep Learning is the buzz word in AI circles these days, that does not mean it will always outperform tradition ML methods - particularly when run on tabluar data.

### Load Dataset

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split

In [2]:
churn_df = pd.read_csv('Telco-Customer-Churn.csv')

In [3]:
churn_df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


### Data cleaning and test-train split

In [4]:
# Setting up X and y variables from dataset
X = pd.get_dummies(churn_df.drop(['customerID', 'Churn'], axis = 1))
y = churn_df.Churn.apply(lambda x : 1 if x=='Yes' else 0)

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

### Model 1 - Simple Neural Network (Deep Learning) Model

In [6]:
# Import Deep Learning Model Dependencies
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense
from sklearn.metrics import accuracy_score

In [22]:
# Building a base 2 layer model of 32, and 64 neurons
model = Sequential()
model.add(Dense(units=32, activation='relu', input_dim=len(X_train.columns)))
model.add(Dense(units=64, activation='relu'))
model.add(Dense(units=1, activation='sigmoid'))

In [23]:
# Compile the model
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics='accuracy')

In [24]:
# Fit to the data
model.fit(X_train, y_train, validation_split=0.25, epochs=100, batch_size=32)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100


Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<tensorflow.python.keras.callbacks.History at 0x17d91a04b50>

### Predict on the test dataset

In [25]:
# Predict on test data
y_hat = model.predict(X_test)
y_hat = [0 if val < 0.5 else 1 for val in y_hat]

In [26]:
# Get accuracy metric
accuracy_score(y_test, y_hat)

0.7487579843860894

The accuracy score seen here is 75% for it's predictions of customer churn on the test data. This is a reasonably accuracy given the difficulty of the dataset. While this model trains reasonably quickly, it is a bare-bones base model of only 2 layers without hyperparameter tuning, and data scaling. It could definitely be improved further through hyperparameter tuning, increasing the epochs, making a deeper layer neural network, or adding CNN layers etc.

Nonetheless, it has produced a reasonable result considering to overall data.

Now lets run a more standard LinearSVC and LogisticRegression for comparison.

### LinearSVC and LogisticRegression for comparison.

To illustrate the difference in the TF neural network here vs other models, LinearSVC and LogisticRegression have been imported below for comparison.

#### LinearSVC model

In [32]:
from sklearn.svm import LinearSVC

# Create LinearSVC model and train
lsvc = LinearSVC(verbose=0)
lsvc.fit(X_train, y_train)

# Get test training score
acc = lsvc.score(X_train, y_train)
print("Accuracy: ", acc)



Accuracy:  0.7547035853745119


Despite failing to converge, a simple LinearSVC classifier has performed to the same level as our base deep learning model. Results could easily be improved here by scaling the data beforehand and looking at other possible methods such as PCA.

This shows a good example of how deep learning is not always the best answer in comparison to a simple algorithm which can perfomr similarly yet produce explainable results.

In [36]:
from sklearn.metrics import classification_report

# Predict on test data
lsvc_y_hat = lsvc.predict(X_test)

# Generate classification report
cr = classification_report(y_test, lsvc_y_hat)
print(cr)

              precision    recall  f1-score   support

           0       0.75      0.99      0.86      1055
           1       0.57      0.02      0.04       354

    accuracy                           0.75      1409
   macro avg       0.66      0.51      0.45      1409
weighted avg       0.71      0.75      0.65      1409



#### LogisticRegression model

In [38]:
from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression(random_state=0, solver='lbfgs').fit(X_train, y_train)
log_reg_y_hat = log_reg.predict(X_test)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [41]:
# Get test training score
log_reg.score(X_train, y_train)

0.8146964856230032

As can be seen here, the base LogisticRegression model, despite failing to converge, has outperformed the based deeplearing model with an accuracy of 81%. This again could easily be improved, but has at it's base, done a reasonable job and the best of the three.

In [40]:
# Generate classification report
cr = classification_report(y_test, log_reg_y_hat)
print(cr)

              precision    recall  f1-score   support

           0       0.86      0.89      0.87      1055
           1       0.63      0.57      0.60       354

    accuracy                           0.81      1409
   macro avg       0.75      0.73      0.74      1409
weighted avg       0.80      0.81      0.81      1409



As can be seen, both of these model have performed at the standard of or outperformed the deep learning model. Taking an ensemble model such as RandomForestClassifier is likely to beat the model even further. So, as a final test, let's try it!

#### RandomForestClassifier ensemble model

In [42]:
from sklearn.ensemble import RandomForestClassifier

rfcl = RandomForestClassifier()
rfcl.fit(X_train, y_train)


# Get test training score
acc = rfcl.score(X_train, y_train)
print("Accuracy: ", acc)

Accuracy:  0.9976925807596734


In [43]:
rfcl_y_hat = log_reg.predict(X_test)

# Generate classification report
cr = classification_report(y_test, rfcl_y_hat)
print(cr)

              precision    recall  f1-score   support

           0       0.86      0.89      0.87      1055
           1       0.63      0.57      0.60       354

    accuracy                           0.81      1409
   macro avg       0.75      0.73      0.74      1409
weighted avg       0.80      0.81      0.81      1409



As can be seen our ensemble RandomForestClassifier, without any hyperparameter tuning, working on non-scaled data, has produced an accuracy of 99.7%. Massively outperforming our other classifiers as well as the deep learning model. 

### Summary

While deep learning is often the go-to response and the buzzword in data science right now, it is not always the optimal choice for the problem at hand. Not only can it be outperformed by standard classifiers, but the results of the standard classifiers are explainable rather than black-box should you incur a problem.