<a href="https://colab.research.google.com/github/albertopolini/Advanced-Machine-Learning/blob/main/predict_churn_template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd

#preprocessing ops
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import preprocessing


In [2]:
# DL ops
import tensorflow
from keras.models import Sequential
#just import some random layers to showcase -- won't use them all
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.noise import GaussianNoise
from keras.layers.advanced_activations import PReLU
from keras.utils import np_utils

# Context
"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]

# Content
Each row represents a customer, each column contains customer’s attributes described on the column Metadata.

The data set includes information about:

- Customers who left within the last month – the column is called Churn
- Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
- Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges
- Demographic info about customers – gender, age range, and if they have partners and dependents

In [3]:
columns = [
    'state',
    'account length', 
    'area code', 
    'phone number', 
    'international plan', 
    'voice mail plan', 
    'number vmail messages',
    'total day minutes',
    'total day calls',
    'total day charge',
    'total eve minutes',
    'total eve calls',
    'total eve charge',
    'total night minutes',
    'total night calls',
    'total night charge',
    'total intl minutes',
    'total intl calls',
    'total intl charge',
    'number customer service calls',
    'churn']


In [7]:
df = pd.read_csv('churn.data.txt', header=None, names=columns)
df

Unnamed: 0,state,account length,area code,phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,total eve minutes,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,number customer service calls,churn
0,KS,128,415,382-4657,no,yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.70,1,False.
1,OH,107,415,371-7191,no,yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.70,1,False.
2,NJ,137,415,358-1921,no,no,0,243.4,114,41.38,121.2,110,10.30,162.6,104,7.32,12.2,5,3.29,0,False.
3,OH,84,408,375-9999,yes,no,0,299.4,71,50.90,61.9,88,5.26,196.9,89,8.86,6.6,7,1.78,2,False.
4,OK,75,415,330-6626,yes,no,0,166.7,113,28.34,148.3,122,12.61,186.9,121,8.41,10.1,3,2.73,3,False.
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3328,AZ,192,415,414-4276,no,yes,36,156.2,77,26.55,215.5,126,18.32,279.1,83,12.56,9.9,6,2.67,2,False.
3329,WV,68,415,370-3271,no,no,0,231.1,57,39.29,153.4,55,13.04,191.3,123,8.61,9.6,4,2.59,3,False.
3330,RI,28,510,328-8230,no,no,0,180.8,109,30.74,288.8,58,24.55,191.9,91,8.64,14.1,6,3.81,2,False.
3331,CT,184,510,364-6381,yes,no,0,213.8,105,36.35,159.6,84,13.57,139.2,137,6.26,5.0,10,1.35,2,False.


In [8]:
#quick preprocessing
mapping = {'no': 0., 'yes':1., 'False.':0., 'True.':1.}
df.replace({'international plan' : mapping, 'voice mail plan' : mapping, 'churn':mapping}, regex=True, inplace=True)

In [9]:
#discard some features
df.drop('phone number', axis=1, inplace=True)
df.drop('area code', axis=1, inplace=True)
df.drop('state', axis=1, inplace=True)

print("Dataset shape" + str(df.shape))

Dataset shape(3333, 18)


What is the **churn** distribution?

In [10]:
df["churn"].value_counts()

0.0    2850
1.0     483
Name: churn, dtype: int64

btw, anything to be worried about?


In [11]:
d_1 = df[df["churn"]==1] #churners
d_2 = df[df["churn"]==0] #loyal users

df = d_1.append(d_2[:400])


In [12]:
df.shape

(883, 18)

In [17]:
# split train - test 90% 10%
X = df.drop(['churn'], axis=1)
Y = df['churn']

X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size=.1, random_state=0)

In [18]:
#some other preprocessing ops

# just as a note -- churn or not churn
nb_classes = 1 

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

#scale the inputs for NN -- last time we used standard scaling so...
scaler = preprocessing.MinMaxScaler((-1,1))
scaler.fit(X)

XX_train = scaler.transform(X_train.values)
XX_test  = scaler.transform(X_test.values) 

YY_train = Y_train.values 
YY_test  = Y_test.values 

In [19]:
print (X_train.shape, YY_train.shape)
print (X_test.shape, YY_test.shape)

(794, 17) (794,)
(89, 17) (89,)


### Building the model

In [30]:
# For a single-input model with 2 classes (binary classification):

model = Sequential()

# FC @ 64, non-linear
model.add(Dense(64, activation="relu", input_shape=(17,)))

# FC @ 32, non linear
model.add(Dense(32,activation='relu'))

# output layer (nb_classes) -- what is the activation function in this case??
model.add(Dense(1))
model.add(Activation("sigmoid"))

# compile: optimizer & losses/metrics
model.compile(optimizer='sgd',
              loss='binary_crossentropy',
              metrics = ["accuracy"])


In [31]:
#get the summary

model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 64)                1152      
_________________________________________________________________
dense_4 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 33        
_________________________________________________________________
activation_1 (Activation)    (None, 1)                 0         
Total params: 3,265
Trainable params: 3,265
Non-trainable params: 0
_________________________________________________________________


In [32]:
#train!
# 10 epochs, gradient batched each 100 samples
n_epochs = 50
batch_size = 16

history = model.fit(X_train, Y_train, epochs=n_epochs, batch_size=batch_size)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [33]:
print('history dict:', history.history)

history dict: {'loss': [13.46065616607666, 0.6928432583808899, 0.6900278329849243, 0.688410758972168, 0.6864780187606812, 0.6867249011993408, 0.6862202882766724, 0.6856791973114014, 0.6848858594894409, 0.6847025156021118, 0.6837198138237, 0.6833397746086121, 0.6847541332244873, 0.6832312941551208, 0.6853747367858887, 0.6825917959213257, 0.682026207447052, 0.6820873022079468, 0.6824637055397034, 0.6822057962417603, 0.6828603148460388, 0.6813333630561829, 0.6817652583122253, 0.6807335615158081, 0.6825217008590698, 0.6842789649963379, 0.682375431060791, 0.6811506748199463, 0.6814261674880981, 0.6810770034790039, 0.6798630952835083, 0.6810523271560669, 0.6812704205513, 0.680747389793396, 0.6806411743164062, 0.6804722547531128, 0.6805704832077026, 0.6810193061828613, 0.6808373928070068, 0.6796569228172302, 0.6812548041343689, 0.6804344654083252, 0.68083256483078, 0.6806163191795349, 0.6805606484413147, 0.6805204153060913, 0.6802798509597778, 0.680134654045105, 0.6811712384223938, 0.67921531

In [34]:
# quickly get the performance score
score = model.evaluate(XX_test, YY_test, batch_size=batch_size) #evaluating the models accuracy or loss,
print('test loss, test acc:', score)

test loss, test acc: [0.7171128988265991, 0.483146071434021]


In [35]:
print("\n%s: %.2f%%" % (model.metrics_names[1], score[1]*100))
print("\n%s: %.2f" % (model.metrics_names[0], score[0]))


accuracy: 48.31%

loss: 0.72


Generate predictions

In [28]:
predictions = model.predict(XX_test)
print('predictions shape:', predictions.shape)
predictions[:3]

predictions shape: (89, 1)


array([[0.63031685],
       [0.5748527 ],
       [0.61258996]], dtype=float32)

## Performance measures

The **sklearn.metrics** module implements functions assessing prediction error for specific purposes. These metrics are detailed in sections on Classification metrics, Multilabel ranking metrics, Regression metrics and Clustering metrics.

Since we are dealing with a classification problem, we are interested on Accuracy, Precision, Recall and F-measure

![alt text](https://cdn-images-1.medium.com/max/1600/1*pOtBHai4jFd-ujaNXPilRg.png)

(image credits: https://medium.com/@shrutisaxena0617/precision-vs-recall-386cf9f89488)

In [36]:
from sklearn.metrics import accuracy_score

y_classes = predictions.argmax(axis=-1)
print(accuracy_score(y_classes, YY_test))

0.5168539325842697


###**Question**

Which accuracy we would have obtained by considering all the dataset? Would it be higher? Why?

In [37]:
from sklearn.metrics import classification_report
print(classification_report(y_classes, YY_test))

              precision    recall  f1-score   support

         0.0       1.00      0.52      0.68        89
         1.0       0.00      0.00      0.00         0

    accuracy                           0.52        89
   macro avg       0.50      0.26      0.34        89
weighted avg       1.00      0.52      0.68        89



  _warn_prf(average, modifier, msg_start, len(result))


In [38]:
from sklearn.metrics import precision_recall_fscore_support
precision, recall, f_score, support = precision_recall_fscore_support(YY_test, y_classes)
print(precision)
print(recall)
print(f_score)
print(support)

[0.51685393 0.        ]
[1. 0.]
[0.68148148 0.        ]
[46 43]


  _warn_prf(average, modifier, msg_start, len(result))


In [39]:
precision, recall, f_score, support = precision_recall_fscore_support(YY_test, y_classes, average = "macro")
print(precision)
print(recall)
print(f_score)

0.25842696629213485
0.5
0.3407407407407408


  _warn_prf(average, modifier, msg_start, len(result))


In [40]:
precision, recall, f_score, support = precision_recall_fscore_support(YY_test, y_classes, average = "weighted")
print(precision)
print(recall)
print(f_score)

0.2671379876278248
0.5168539325842697
0.3522263836870579


  _warn_prf(average, modifier, msg_start, len(result))


##  just FYI -- Optimizer

If you need to, you can further configure your optimizer. A core principle of Keras is to make things reasonably simple, while allowing the user to be fully in control when they need to (the ultimate control being the easy extensibility of the source code).
Here we used <b>SGD</b> (stochastic gradient descent) as an optimization algorithm for our trainable weights.  

See [https://keras.io/optimizers/](https://keras.io/optimizers/)

<img src="http://ruder.io/content/images/2016/09/saddle_point_evaluation_optimizers.gif" width="40%">

Source & Reference: http://sebastianruder.com/content/images/2016/09/saddle_point_evaluation_optimizers.gif

In [41]:
model.compile(optimizer='rmsprop', #adadelta, adam, rmsprop 
              loss='binary_crossentropy',
              metrics=['accuracy'])
# note -- will see an alternative to instantiate optimizers with custom settings

history = model.fit(X_train, Y_train, epochs=n_epochs, batch_size=batch_size) 

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
