### INFO 284 – Machine Learning
#### Spring 2021
### Lab week 11 (Mar 15st – Mar 19th)
Neural networks and Colab
For practical purposes it may be useful to work with Colab to develop common python code. In
addition to being a collaboration tool it is also a service that gives access to computing resources. In
particular, when learning neural networks this may useful.
So check out Colab (https://colab.research.google.com/notebooks/intro.ipynb), and use it for this
week’s lab.
One of the data sets we have worked with is the churn data set:
https://www.kaggle.com/blastchar/telco-customer-churn
We shall work with this data set also this week, but now with the use of neural networks.
Tasks:
1. Prepare the churn data for neural network learning (you may already have done much of this in lab 5).
2. Test out different neural network structures using scikit’s MLPClassifier
    * a. Vary the number of layers
    * b. Vary the number of nodes in each layer
    * c. Test out various activation functions
    * d. Test out regularization (the alpha parameter in MLPClassifier allows for L2
    regularisation of weights)

In [391]:
import pandas as pd

In [392]:
pd.set_option("display.max_rows", 100)
pd.set_option("display.max_columns", 100)

In [393]:
telco = pd.read_csv('telco.csv')

In [394]:
telco.head(10)

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes
5,9305-CDSKC,Female,0,No,No,8,Yes,Yes,Fiber optic,No,No,Yes,No,Yes,Yes,Month-to-month,Yes,Electronic check,99.65,820.5,Yes
6,1452-KIOVK,Male,0,No,Yes,22,Yes,Yes,Fiber optic,No,Yes,No,No,Yes,No,Month-to-month,Yes,Credit card (automatic),89.1,1949.4,No
7,6713-OKOMC,Female,0,No,No,10,No,No phone service,DSL,Yes,No,No,No,No,No,Month-to-month,No,Mailed check,29.75,301.9,No
8,7892-POOKP,Female,0,Yes,No,28,Yes,Yes,Fiber optic,No,No,Yes,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,104.8,3046.05,Yes
9,6388-TABGU,Male,0,No,Yes,62,Yes,No,DSL,Yes,Yes,No,No,No,No,One year,No,Bank transfer (automatic),56.15,3487.95,No


In [395]:
telco.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


In [396]:
telco.isnull().sum()

customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64

### Preprocessing

In [397]:
telco.drop("customerID", axis=1, inplace=True)

In [398]:
telco["TotalCharges"] = telco["TotalCharges"].replace(' ', '')

In [399]:
telco["TotalCharges"] = pd.to_numeric(telco["TotalCharges"])

In [400]:
telco = telco.dropna()

In [401]:
telco = telco.reset_index(drop=True)

In [402]:
# Encode target labels with value between 0 and n_classes-1.

from sklearn import preprocessing
lb = preprocessing.LabelEncoder()
telco['Churn'] = lb.fit_transform(telco['Churn'])

In [403]:
cat = ["gender", "SeniorCitizen", "Partner", "Dependents", "PhoneService",
       "MultipleLines", "InternetService", "OnlineSecurity",
       "OnlineBackup", "DeviceProtection", "TechSupport",
       "StreamingTV", "StreamingMovies", "Contract",
       "PaperlessBilling", "PaymentMethod"]

num = ["tenure", "MonthlyCharges", "TotalCharges"]

In [404]:
data_X = telco.loc[:, telco.columns != "Churn"]
data_Y = telco[["Churn"]]

In [405]:
enc_df = pd.get_dummies(data_X[cat])

In [406]:
from sklearn.preprocessing import StandardScaler

# perform a robust scaler transform of the dataset
mms = StandardScaler()
mms_data = mms.fit_transform(data_X[num])
mms_df = pd.DataFrame(mms_data)

In [407]:
data_X = pd.concat([enc_df, mms_df], axis=1)

In [408]:
data_X = data_X.reset_index()

In [409]:
data_X.head()

Unnamed: 0,index,SeniorCitizen,gender_Female,gender_Male,Partner_No,Partner_Yes,Dependents_No,Dependents_Yes,PhoneService_No,PhoneService_Yes,MultipleLines_No,MultipleLines_No phone service,MultipleLines_Yes,InternetService_DSL,InternetService_Fiber optic,InternetService_No,OnlineSecurity_No,OnlineSecurity_No internet service,OnlineSecurity_Yes,OnlineBackup_No,OnlineBackup_No internet service,OnlineBackup_Yes,DeviceProtection_No,DeviceProtection_No internet service,DeviceProtection_Yes,TechSupport_No,TechSupport_No internet service,TechSupport_Yes,StreamingTV_No,StreamingTV_No internet service,StreamingTV_Yes,StreamingMovies_No,StreamingMovies_No internet service,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaperlessBilling_No,PaperlessBilling_Yes,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check,0,1,2
0,0,0,1,0,0,1,1,0,1,0,0,1,0,1,0,0,1,0,0,0,0,1,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,-1.280248,-1.161694,-0.994194
1,1,0,0,1,1,0,1,0,0,1,1,0,0,1,0,0,0,0,1,1,0,0,0,0,1,1,0,0,1,0,0,1,0,0,0,1,0,1,0,0,0,0,1,0.064303,-0.260878,-0.17374
2,2,0,0,1,1,0,1,0,0,1,1,0,0,1,0,0,0,0,1,0,0,1,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,0,1,-1.239504,-0.363923,-0.959649
3,3,0,0,1,1,0,1,0,1,0,0,1,0,1,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,0,1,0,0,0,1,0,1,0,1,0,0,0,0.512486,-0.74785,-0.195248
4,4,0,1,0,1,0,1,0,0,1,1,0,0,0,1,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,-1.239504,0.196178,-0.940457


In [410]:
data_X.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7032 entries, 0 to 7031
Data columns (total 46 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   index                                    7032 non-null   int64  
 1   SeniorCitizen                            7032 non-null   int64  
 2   gender_Female                            7032 non-null   uint8  
 3   gender_Male                              7032 non-null   uint8  
 4   Partner_No                               7032 non-null   uint8  
 5   Partner_Yes                              7032 non-null   uint8  
 6   Dependents_No                            7032 non-null   uint8  
 7   Dependents_Yes                           7032 non-null   uint8  
 8   PhoneService_No                          7032 non-null   uint8  
 9   PhoneService_Yes                         7032 non-null   uint8  
 10  MultipleLines_No                         7032 no

In [411]:
from sklearn.model_selection import train_test_split

train_X, test_X, train_Y, test_Y = train_test_split(data_X, data_Y,
                                                    test_size=0.2,
                                                    shuffle = True,
                                                    stratify=data_Y,
                                                    random_state=0)

In [412]:
train_X = train_X.values
train_Y = train_Y.values.ravel()
test_X = test_X.values
test_Y = test_Y.values.ravel()

### Neural Network
https://scikit-learn.org/stable/modules/neural_networks_supervised.html

In [426]:
from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(10, 10, 10), max_iter=1000)
mlp.fit(train_X, train_Y)

MLPClassifier(hidden_layer_sizes=(10, 10, 10), max_iter=1000)

In [427]:
predictions = mlp.predict(test_X)

In [428]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(test_Y,predictions))
print(classification_report(test_Y,predictions))

[[1007   26]
 [ 305   69]]
              precision    recall  f1-score   support

           0       0.77      0.97      0.86      1033
           1       0.73      0.18      0.29       374

    accuracy                           0.76      1407
   macro avg       0.75      0.58      0.58      1407
weighted avg       0.76      0.76      0.71      1407



## Keras Neural Network

In [429]:
!pip install Keras

Collecting Keras
  Downloading Keras-2.4.3-py2.py3-none-any.whl (36 kB)
Installing collected packages: Keras
Successfully installed Keras-2.4.3


In [431]:
!pip install tensorflow

Collecting tensorflow
  Downloading tensorflow-2.4.1-cp37-cp37m-win_amd64.whl (370.7 MB)
Collecting gast==0.3.3
  Downloading gast-0.3.3-py2.py3-none-any.whl (9.7 kB)
Collecting absl-py~=0.10
  Downloading absl_py-0.12.0-py3-none-any.whl (129 kB)
Collecting astunparse~=1.6.3
  Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting flatbuffers~=1.12.0
  Downloading flatbuffers-1.12-py2.py3-none-any.whl (15 kB)
Collecting google-pasta~=0.2
  Downloading google_pasta-0.2.0-py3-none-any.whl (57 kB)
Collecting grpcio~=1.32.0
  Downloading grpcio-1.32.0-cp37-cp37m-win_amd64.whl (2.5 MB)
Collecting keras-preprocessing~=1.1.2
  Downloading Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
Collecting opt-einsum~=3.3.0
  Downloading opt_einsum-3.3.0-py3-none-any.whl (65 kB)
Collecting tensorboard~=2.4
  Downloading tensorboard-2.4.1-py3-none-any.whl (10.6 MB)
Collecting tensorboard-plugin-wit>=1.6.0
  Downloading tensorboard_plugin_wit-1.8.0-py3-none-any.whl (781 kB)
Collectin

In [432]:
from keras.models import Sequential

In [433]:
model_krs = Sequential()

In [434]:
from keras import layers
from keras.layers.core import Dropout

In [436]:
Input_Shape = train_X.shape[1]

In [437]:
Input_Shape

46

In [438]:
model_krs.add(layers.Dense(1024, input_shape=(Input_Shape,), activation='relu'))
##Dropout for not memorize or overfitting the train data
model_krs.add(Dropout(0.2))

In [439]:
model_krs.add(layers.Dense(1024, activation='relu'))
model_krs.add(Dropout(0.2))

In [440]:
model_krs.add(layers.Dense(1, activation='sigmoid'))

In [441]:
model_krs.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [442]:
model_krs.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 1024)              48128     
_________________________________________________________________
dropout (Dropout)            (None, 1024)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 1024)              1049600   
_________________________________________________________________
dropout_1 (Dropout)          (None, 1024)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 1025      
Total params: 1,098,753
Trainable params: 1,098,753
Non-trainable params: 0
_________________________________________________________________


In [444]:
fit_keras = model_krs.fit(train_X, train_Y,
          epochs=100,
          verbose=True,
          validation_data=(test_X, test_Y),
          batch_size=30)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
 18/188 [=>............................] - ETA: 2s - loss: 0.5580 - accuracy: 0.7510

KeyboardInterrupt: 

In [None]:
accuracy = model_krs.evaluate(train_X, train_Y, verbose=False)
print("Training Score: {:.4f}".format(accuracy[0]))
print("Training Accuracy: {:.4f}".format(accuracy[1]))

In [None]:
accuracy = model_krs.evaluate(test_X, test_Y, verbose=False)
print("Testing Score: {:.4f}".format(accuracy[0]))
print("Testing Accuracy: {:.4f}".format(accuracy[1]))