<a href="https://colab.research.google.com/github/badrinarayanan02/deep_learning/blob/main/2348507_DLLab4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Deep Neural Network Model to Predict Customer Churn Rates**

Tasked with developing a deep neural network (DNN) model to predict customer churn for a telecommunications company. The dataset provided contains various features such as customer demographics, usage patterns, and service subscription details. Objective is to implement dropout, layer-wise dropout, and Monte Carlo dropout techniques in the DNN architecture to assess their impact on model performance and generalization.

In [57]:
import pandas as pd
from tensorflow import keras
import tensorflow
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.models import Sequential
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score
from tensorflow.keras.layers import  Dense,Dropout

In [58]:
churnData = pd.read_csv('/content/WA_Fn-UseC_-Telco-Customer-Churn.csv')
churnData

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.30,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.70,151.65,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,6840-RESVB,Male,0,Yes,Yes,24,Yes,Yes,DSL,Yes,...,Yes,Yes,Yes,Yes,One year,Yes,Mailed check,84.80,1990.5,No
7039,2234-XADUH,Female,0,Yes,Yes,72,Yes,Yes,Fiber optic,No,...,Yes,No,Yes,Yes,One year,Yes,Credit card (automatic),103.20,7362.9,No
7040,4801-JZAZL,Female,0,Yes,Yes,11,No,No phone service,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.60,346.45,No
7041,8361-LTMKD,Male,1,Yes,No,4,Yes,Yes,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Mailed check,74.40,306.6,Yes


In [59]:
churnData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


**Inference:** There is no null values in the dataset

In [60]:
churnData.shape

(7043, 21)

# **Data Preprocessing**

In [61]:
churnData.drop('customerID',axis=1)

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,Male,0,No,No,45,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.30,1840.75,No
4,Female,0,No,No,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.70,151.65,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,Male,0,Yes,Yes,24,Yes,Yes,DSL,Yes,No,Yes,Yes,Yes,Yes,One year,Yes,Mailed check,84.80,1990.5,No
7039,Female,0,Yes,Yes,72,Yes,Yes,Fiber optic,No,Yes,Yes,No,Yes,Yes,One year,Yes,Credit card (automatic),103.20,7362.9,No
7040,Female,0,Yes,Yes,11,No,No phone service,DSL,Yes,No,No,No,No,No,Month-to-month,Yes,Electronic check,29.60,346.45,No
7041,Male,1,Yes,No,4,Yes,Yes,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Mailed check,74.40,306.6,Yes


Converting object datatype to numeric

In [62]:
cat_cols = churnData.select_dtypes(include=['object']).columns.tolist()
le = LabelEncoder()
for col in cat_cols:
    churnData[col] = le.fit_transform(churnData[col])

In [63]:
churnData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   int64  
 1   gender            7043 non-null   int64  
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   int64  
 4   Dependents        7043 non-null   int64  
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   int64  
 7   MultipleLines     7043 non-null   int64  
 8   InternetService   7043 non-null   int64  
 9   OnlineSecurity    7043 non-null   int64  
 10  OnlineBackup      7043 non-null   int64  
 11  DeviceProtection  7043 non-null   int64  
 12  TechSupport       7043 non-null   int64  
 13  StreamingTV       7043 non-null   int64  
 14  StreamingMovies   7043 non-null   int64  
 15  Contract          7043 non-null   int64  
 16  PaperlessBilling  7043 non-null   int64  


Splitting features and target variable

In [64]:
x = churnData.drop('Churn',axis=1)
y = churnData['Churn']

Splitting the data

In [65]:
X_train,X_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=42)

# **Basic DNN architecture**

In [66]:
model = Sequential([
    Dense(64, activation='relu',input_shape=(X_train.shape[1],)),
    Dense(32, activation='relu'),
    Dense(1, activation='sigmoid')

])

Compiling the model

In [67]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Training the baseline model

In [68]:
train = model.fit(X_train, y_train, epochs=50, batch_size=64, validation_data=(X_test, y_test), verbose=0)

# **Implementing the dropout model**

In [69]:
dropoutModel = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dropout(0.5),
    Dense(32, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

Compiling the model

In [70]:
dropoutModel.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Train the dropout model

In [71]:
dropoutModel= dropoutModel.fit(X_train, y_train, epochs=50, batch_size=64, validation_data=(X_test, y_test), verbose=0)

# **Implementing Layer wise dropout model**

In [72]:
layerwiseDropoutModel = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dropout(0.2),
    Dense(32, activation='relu'),
    Dropout(0.3),
    Dense(1, activation='sigmoid')
])


Compiling the layer wise dropout model

In [73]:
layerwiseDropoutModel.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])

Training the layer-wise dropout model

In [74]:
layerwiseDropout = layerwiseDropoutModel.fit(X_train, y_train, epochs=50, batch_size=64, validation_data=(X_test, y_test), verbose=0)

# **Implementing monte carlo dropout model**

In [75]:
class MCDropout(tensorflow.keras.layers.Dropout):
    def call(self, inputs):
        return super().call(inputs, training=True)

mcDropoutModel = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    MCDropout(0.5),
    Dense(32, activation='relu'),
    MCDropout(0.5),
    Dense(1, activation='sigmoid')
])

Compiling the model

In [76]:
mcDropoutModel.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Training the monte carlo dropout model

In [77]:
mcDropout = mcDropoutModel.fit(X_train, y_train, epochs=50, batch_size=64, validation_data=(X_test, y_test), verbose=0)

Evaluating the model

In [80]:
def evaluateModel(model, X_test, y_test, threshold=0.5):
    y_pred_prob = model.predict_classes(X_test)
    y_pred = (y_pred_prob > threshold).astype(int)
    accuracy = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    roc_auc = roc_auc_score(y_test, y_pred_prob)
    return accuracy, f1, roc_auc

In [81]:
baseline_accuracy, baseline_f1, baseline_roc_auc = evaluateModel(model, X_test, y_test)
dropout_accuracy, dropout_f1, dropout_roc_auc = evaluateModel(dropoutModel, X_test, y_test)
layerwise_dropout_accuracy, layerwise_dropout_f1, layerwise_dropout_roc_auc = evaluateModel(layerwiseDropoutModel, X_test, y_test)
mc_dropout_accuracy, mc_dropout_f1, mc_dropout_roc_auc = evaluateModel(mcDropoutModel, X_test, y_test)


AttributeError: 'Sequential' object has no attribute 'predict_classes'