# **INTRODUCTION TO THE PROJECT**

The dataset are customer information in a telecom company(From IBM sample datasets)

The aim of this project is to build a system that'll help them detect if any customer would stop doing buisiness with them so that they can take customer retention measures.

The primary goal is to make sure that the system predicts correctly as many customers as possible that are to leave(At least up to 98%).

The secondary goal would be to make sure that the system does not wrongly classify any customer that will not leave. This is to avoid wastage of resources (Note: The primary goal is still making sure that no customer stops patronizing them)

# **IMPORTING AND CLEANING THE DATA**

In [None]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import tensorflow as tf
import seaborn as sns
import math

In [None]:
# prompt: code to import the dataset
telData = pd.read_csv('/content/WA_Fn-UseC_-Telco-Customer-Churn.csv', encoding = 'utf-8')
telData.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


Checking the dataset to determine the format each columns should be in

In [None]:
telData.shape

(7043, 21)

In [None]:
for i in telData.columns:
  print(f'{i} : {telData[i].unique()}')
  print('')

customerID : ['7590-VHVEG' '5575-GNVDE' '3668-QPYBK' ... '4801-JZAZL' '8361-LTMKD'
 '3186-AJIEK']

gender : ['Female' 'Male']

SeniorCitizen : [0 1]

Partner : ['Yes' 'No']

Dependents : ['No' 'Yes']

tenure : [ 1 34  2 45  8 22 10 28 62 13 16 58 49 25 69 52 71 21 12 30 47 72 17 27
  5 46 11 70 63 43 15 60 18 66  9  3 31 50 64 56  7 42 35 48 29 65 38 68
 32 55 37 36 41  6  4 33 67 23 57 61 14 20 53 40 59 24 44 19 54 51 26  0
 39]

PhoneService : ['No' 'Yes']

MultipleLines : ['No phone service' 'No' 'Yes']

InternetService : ['DSL' 'Fiber optic' 'No']

OnlineSecurity : ['No' 'Yes' 'No internet service']

OnlineBackup : ['Yes' 'No' 'No internet service']

DeviceProtection : ['No' 'Yes' 'No internet service']

TechSupport : ['No' 'Yes' 'No internet service']

StreamingTV : ['No' 'Yes' 'No internet service']

StreamingMovies : ['No' 'Yes' 'No internet service']

Contract : ['Month-to-month' 'One year' 'Two year']

PaperlessBilling : ['Yes' 'No']

PaymentMethod : ['Electronic check' 'Maile

In [None]:
print(telData.dtypes)

customerID           object
gender               object
SeniorCitizen         int64
Partner              object
Dependents           object
tenure                int64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges         object
Churn                object
dtype: object


DATA STANDARD FOR EACH COLUMN

1.   **CustomerID:** This column should be removed.

5. **SeniorCitizen:** This column should be left as it is (O for No and 1 for yes)

2.   **Gender, Partner, Dependents, PhoneService, MultipleLines, OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies, PaperlessBilling, Churn:** These Columns should be made yes or No (For those not having only 2 unique values yet) and then turned to 0 for No and 1 for Yes

3. **MonthlyCharges, TotalCharges and Tenure:** This columns should be in float format

4. **InternetService, Contract and PaymentMethod:** This column should beleft as it is and encoded before using it to train the model



In [None]:
print(telData.isnull().sum())
# Luckily this dataset has no missing values

customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64


Getting the dataset to conform to the data standard

In [None]:
# Changing some columns to Yes or No, to simplify the dataset.

columns_to_process = ['MultipleLines',
                      'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
                      'TechSupport', 'StreamingTV', 'StreamingMovies',
                      'PaperlessBilling', 'Churn']

for col in columns_to_process:
  unique_values = telData[col].unique()
  for value in unique_values:
    if value not in ['Yes', 'No']:
      telData[col] = telData[col].replace(value, 'No')

In [None]:
# Changing Gender column to 0(Male) and 1(Female)
telData['gender'] = telData['gender'].map({'Male': 0, 'Female': 1})

In [None]:
# Changing Yes to 1 and No to 0 in some columns

columns_to_replace = ['PhoneService', 'Partner', 'Dependents', 'MultipleLines',
                      'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
                      'TechSupport', 'StreamingTV', 'StreamingMovies',
                      'PaperlessBilling', 'Churn']

for col in columns_to_replace:
  telData[col] = telData[col].map({'No': 0, 'Yes': 1})

In [None]:
# Changing MonthlyCharges, TotalCharges and Tenure column to float

telData['MonthlyCharges'] = telData['MonthlyCharges'].astype(float)
telData['TotalCharges'] = pd.to_numeric(telData['TotalCharges'], errors='coerce')
telData['tenure'] = telData['tenure'].astype(float)

# Removing some hidden empty values that is created by this
telData.dropna(inplace=True)

In [None]:
# Checking for duplicates based on customerID
duplicates = telData[telData.duplicated(subset=['customerID'], keep=False)]
print("Duplicate entries based on 'customerID':")
duplicates
#Fortunately there are no duplicates

Duplicate entries based on 'customerID':


Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn


In [None]:
# Removing the customerID column

telData.drop(['customerID'], axis=1, inplace=True)

In [None]:
#Checking out the dataset after cleaning
telData

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,1,0,1,0,1.0,0,0,DSL,0,1,0,0,0,0,Month-to-month,1,Electronic check,29.85,29.85,0
1,0,0,0,0,34.0,1,0,DSL,1,0,1,0,0,0,One year,0,Mailed check,56.95,1889.50,0
2,0,0,0,0,2.0,1,0,DSL,1,1,0,0,0,0,Month-to-month,1,Mailed check,53.85,108.15,1
3,0,0,0,0,45.0,0,0,DSL,1,0,1,1,0,0,One year,0,Bank transfer (automatic),42.30,1840.75,0
4,1,0,0,0,2.0,1,0,Fiber optic,0,0,0,0,0,0,Month-to-month,1,Electronic check,70.70,151.65,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,0,0,1,1,24.0,1,1,DSL,1,0,1,1,1,1,One year,1,Mailed check,84.80,1990.50,0
7039,1,0,1,1,72.0,1,1,Fiber optic,0,1,1,0,1,1,One year,1,Credit card (automatic),103.20,7362.90,0
7040,1,0,1,1,11.0,0,0,DSL,1,0,0,0,0,0,Month-to-month,1,Electronic check,29.60,346.45,0
7041,0,1,1,0,4.0,1,1,Fiber optic,0,0,0,0,0,0,Month-to-month,1,Mailed check,74.40,306.60,1


# **EXPLORATORY DATA ANALYSIS**

See this in the EDA.ipynb file

# **MODEL DEVELOPMENT(Artificial Neural Network) / HANDLING DATA IMBALANCE**

Show effects of data imbalance and different ways of treating it.
show confusion matrix in colors and explain

## GENERAL PREPROCESSSING

In [None]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import tensorflow as tf
import seaborn as sns
import math
from sklearn.preprocessing import StandardScaler
import os

# Set random seeds for reproducibility
SEED = 42
os.environ['PYTHONHASHSEED'] = str(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)

# The following line caused an error in this environment, it has been removed:
# tf.config.experimental.enable_determinism()

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split

In [None]:
encoder = OneHotEncoder(sparse_output=False, handle_unknown='ignore', drop='first')
scaler = StandardScaler()

# Define categorical and numerical columns for consistent preprocessing
categorical_cols = [col for col in telData.columns.to_list() if col not in ['tenure', 'MonthlyCharges', 'TotalCharges', 'Churn']]
numerical_cols = ['tenure', 'MonthlyCharges', 'TotalCharges']

print(f"Categorical columns for encoding: {categorical_cols}")
print(f"Numerical columns for scaling/concatenation: {numerical_cols}")

Categorical columns for encoding: ['gender', 'SeniorCitizen', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod']
Numerical columns for scaling/concatenation: ['tenure', 'MonthlyCharges', 'TotalCharges']


## UNDERSAMPLING TO REMOVE DATA IMBALANCE

In [None]:
# Separate majority and minority classes
df_majority = telData[telData.Churn == 0]
df_minority = telData[telData.Churn == 1]

# Undersample majority class
df_majority_undersampled = df_majority.sample(n=len(df_minority), random_state=42)

# Combine minority class with undersampled majority class
telData_balanced = pd.concat([df_majority_undersampled, df_minority])

# Shuffle the balanced dataframe
telData_balanced = telData_balanced.sample(frac=1, random_state=42).reset_index(drop=True)

# Display new class counts
print("Class counts after undersampling:")
print(telData_balanced.Churn.value_counts())

# Update relbank to the balanced dataframe
telData_undersample = telData_balanced.copy()

Class counts after undersampling:
Churn
0    1869
1    1869
Name: count, dtype: int64


In [None]:
x = telData_undersample.drop('Churn', axis = 1)
y = telData_undersample['Churn']

xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)

# Separate categorical and numerical features for encoding and scaling
xtrain_categorical = xtrain[categorical_cols]
xtrain_numerical = xtrain[numerical_cols]

xtest_categorical = xtest[categorical_cols]
xtest_numerical = xtest[numerical_cols]

# Encode categorical features
encoder.fit(xtrain_categorical)
xtrain_categorical_encoded = encoder.transform(xtrain_categorical)
xtest_categorical_encoded = encoder.transform(xtest_categorical)

# Scale numerical features
scaler.fit(xtrain_numerical)
xtrain_numerical_scaled = scaler.transform(xtrain_numerical)
xtest_numerical_scaled = scaler.transform(xtest_numerical)

# Concatenate encoded categorical features and scaled numerical features
xtrain = np.hstack((xtrain_categorical_encoded, xtrain_numerical_scaled))
xtest = np.hstack((xtest_categorical_encoded, xtest_numerical_scaled))

print("Shape of xtrain after preprocessing:", xtrain.shape)
print("Shape of xtest after preprocessing:", xtest.shape)

Shape of xtrain after preprocessing: (2990, 23)
Shape of xtest after preprocessing: (748, 23)


In [None]:
modelUS = Sequential()

modelUS.add(Dense(32, activation='relu', input_shape=(xtrain.shape[1],)))
modelUS.add(Dense(16, activation='relu'))
modelUS.add(Dense(8, activation='relu'))
modelUS.add(Dense(4, activation='relu'))
modelUS.add(Dense(2, activation='relu'))
modelUS.add(Dense(1, activation='sigmoid'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [None]:
modelUS.compile( optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
history = modelUS.fit(xtrain, ytrain, epochs=20, batch_size=32, validation_split=0.2,verbose=1)

Epoch 1/20
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 12ms/step - accuracy: 0.5254 - loss: 0.6934 - val_accuracy: 0.6505 - val_loss: 0.6668
Epoch 2/20
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - accuracy: 0.6874 - loss: 0.6402 - val_accuracy: 0.7341 - val_loss: 0.6117
Epoch 3/20
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 18ms/step - accuracy: 0.7660 - loss: 0.5969 - val_accuracy: 0.7475 - val_loss: 0.6009
Epoch 4/20
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step - accuracy: 0.7692 - loss: 0.5836 - val_accuracy: 0.7492 - val_loss: 0.5938
Epoch 5/20
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 7ms/step - accuracy: 0.7750 - loss: 0.5733 - val_accuracy: 0.7408 - val_loss: 0.5876
Epoch 6/20
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 7ms/step - accuracy: 0.7758 - loss: 0.5648 - val_accuracy: 0.7441 - val_loss: 0.5819
Epoch 7/20
[1m75/75[0m [32m━━━━━━

In [None]:
y_prob = modelUS.predict(xtest)
y_pred = y_prob > 0.5

[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step


In [None]:
from sklearn.metrics import classification_report

print("Classification Report for Artificial Neural Network:")
print(classification_report(ytest, y_pred, labels=[0, 1]))

Classification Report for Artificial Neural Network:
              precision    recall  f1-score   support

           0       0.75      0.75      0.75       376
           1       0.75      0.74      0.74       372

    accuracy                           0.75       748
   macro avg       0.75      0.75      0.75       748
weighted avg       0.75      0.75      0.75       748



## OVERSAMPLING TO REMOVE DATA IMBALANCE

In [None]:
from imblearn.over_sampling import SMOTE

x = telData.drop('Churn', axis = 1)
y = telData['Churn']

xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)

# Separate categorical and numerical features for encoding and scaling
xtrain_categorical = xtrain[categorical_cols]
xtrain_numerical = xtrain[numerical_cols]

xtest_categorical = xtest[categorical_cols]
xtest_numerical = xtest[numerical_cols]

# Encode categorical features
encoder.fit(xtrain_categorical)
xtrain_categorical_encoded = encoder.transform(xtrain_categorical)
xtest_categorical_encoded = encoder.transform(xtest_categorical)

# Scale numerical features
scaler.fit(xtrain_numerical)
xtrain_numerical_scaled = scaler.transform(xtrain_numerical)
xtest_numerical_scaled = scaler.transform(xtest_numerical)

# Concatenate encoded categorical features and scaled numerical features
xtrain = np.hstack((xtrain_categorical_encoded, xtrain_numerical_scaled))
xtest = np.hstack((xtest_categorical_encoded, xtest_numerical_scaled))

smote = SMOTE(sampling_strategy='minority')
xtrain_os, ytrain_os = smote.fit_resample(xtrain, ytrain)
print("After SMOTE:\n", ytrain_os.value_counts())

After SMOTE:
 Churn
1    4130
0    4130
Name: count, dtype: int64


In [None]:
modelOS = Sequential()

modelOS.add(Dense(32, activation='relu', input_shape=(xtrain_os.shape[1],)))
modelOS.add(Dense(16, activation='relu'))
modelOS.add(Dense(8, activation='relu'))
modelOS.add(Dense(4, activation='relu'))
modelOS.add(Dense(2, activation='relu'))
modelOS.add(Dense(1, activation='sigmoid'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [None]:
modelOS.compile( optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
history = modelOS.fit(xtrain_os, ytrain_os, epochs=20, batch_size=32, validation_split=0.2,verbose=1)

Epoch 1/20
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - accuracy: 0.6615 - loss: 0.5941 - val_accuracy: 0.6816 - val_loss: 0.8474
Epoch 2/20
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.7688 - loss: 0.4939 - val_accuracy: 0.6925 - val_loss: 0.7790
Epoch 3/20
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.7773 - loss: 0.4762 - val_accuracy: 0.6846 - val_loss: 0.7565
Epoch 4/20
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.7846 - loss: 0.4652 - val_accuracy: 0.6858 - val_loss: 0.7230
Epoch 5/20
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.7883 - loss: 0.4578 - val_accuracy: 0.6780 - val_loss: 0.7129
Epoch 6/20
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.7892 - loss: 0.4519 - val_accuracy: 0.6780 - val_loss: 0.7006
Epoch 7/20
[1m207/207[0m 

In [None]:
y_prob = modelOS.predict(xtest)
y_pred = y_prob > 0.5

[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step


In [None]:
from sklearn.metrics import classification_report

print("Classification Report for Artificial Neural Network:")
print(classification_report(ytest, y_pred, labels=[0, 1]))

Classification Report for Artificial Neural Network:
              precision    recall  f1-score   support

           0       0.84      0.87      0.86      1033
           1       0.60      0.54      0.57       374

    accuracy                           0.78      1407
   macro avg       0.72      0.71      0.71      1407
weighted avg       0.78      0.78      0.78      1407



## IGNORING DATA IMBALANCE

In [None]:
x = telData.drop('Churn', axis = 1)
y = telData['Churn']

xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)

# Separate categorical and numerical features for encoding and scaling
xtrain_categorical = xtrain[categorical_cols]
xtrain_numerical = xtrain[numerical_cols]

xtest_categorical = xtest[categorical_cols]
xtest_numerical = xtest[numerical_cols]

# Encode categorical features
encoder.fit(xtrain_categorical)
xtrain_categorical_encoded = encoder.transform(xtrain_categorical)
xtest_categorical_encoded = encoder.transform(xtest_categorical)

# Scale numerical features
scaler.fit(xtrain_numerical)
xtrain_numerical_scaled = scaler.transform(xtrain_numerical)
xtest_numerical_scaled = scaler.transform(xtest_numerical)

# Concatenate encoded categorical features and scaled numerical features
xtrain = np.hstack((xtrain_categorical_encoded, xtrain_numerical_scaled))
xtest = np.hstack((xtest_categorical_encoded, xtest_numerical_scaled))

In [None]:
model = Sequential()

model.add(Dense(32, activation='relu', input_shape=(xtrain.shape[1],)))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [None]:
model.compile( optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
history = model.fit(xtrain, ytrain, epochs=20, batch_size=32, validation_split=0.2,verbose=1)

Epoch 1/20
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.7239 - loss: 0.5484 - val_accuracy: 0.8062 - val_loss: 0.4204
Epoch 2/20
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 7ms/step - accuracy: 0.7884 - loss: 0.4380 - val_accuracy: 0.8124 - val_loss: 0.4076
Epoch 3/20
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.7920 - loss: 0.4298 - val_accuracy: 0.8124 - val_loss: 0.4024
Epoch 4/20
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.7973 - loss: 0.4258 - val_accuracy: 0.8151 - val_loss: 0.4000
Epoch 5/20
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8026 - loss: 0.4229 - val_accuracy: 0.8151 - val_loss: 0.3985
Epoch 6/20
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8025 - loss: 0.4206 - val_accuracy: 0.8160 - val_loss: 0.3980
Epoch 7/20
[1m141/141[0m 

In [None]:
y_prob = model.predict(xtest)
y_pred = y_prob > 0.5

[1m44/44[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


In [None]:
from sklearn.metrics import classification_report

print("Classification Report for Artificial Neural Network:")
print(classification_report(ytest, y_pred, labels=[0, 1]))

Classification Report for Artificial Neural Network:
              precision    recall  f1-score   support

           0       0.82      0.89      0.86      1033
           1       0.61      0.48      0.54       374

    accuracy                           0.78      1407
   macro avg       0.72      0.68      0.70      1407
weighted avg       0.77      0.78      0.77      1407



# **MODEL DEPLOYMENT**

## SAVING THE MODEL

In [None]:
import pickle

bundle = {
    "CCmodel": modelUS,
    "encoder": encoder,
    "columns": categorical_cols,
    "scaler": scaler
}

with open("CCmodel.pkl", "wb") as f:
    pickle.dump(bundle, f)


## STREAMLIT MODEL DEPLOYMENT CODE

In [None]:
import streamlit as st
import pandas as pd
import pickle
import numpy as np

# Load the model and preprocessing objects
@st.cache_resource
def load_model():
    with open("CCmodel.pkl", "rb") as f:
        bundle = pickle.load(f)
    return bundle["CCmodel"], bundle["encoder"], bundle["columns"], bundle["scaler"]

modelUS, encoder, categorical_cols_for_encoder, scaler = load_model()

def predict(raw_user_inputs):
  input_df = pd.DataFrame([raw_user_inputs])

  columns_to_normalize_no = ['MultipleLines', 'OnlineSecurity', 'OnlineBackup',
                             'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies']
  for col in columns_to_normalize_no:
      if col == 'MultipleLines':
          input_df[col] = input_df[col].replace('No phone service', 'No')
      else:
          input_df[col] = input_df[col].replace('No internet service', 'No')


  # 2. Map Gender to 0 (Male) and 1 (Female)
  input_df['gender'] = input_df['gender'].map({'Male': 0, 'Female': 1})

  # 3. Map 'Yes' to 1 and 'No' to 0 for relevant binary columns
  binary_cols_to_map = ['SeniorCitizen', 'Partner', 'Dependents', 'PhoneService',
                        'MultipleLines', 'OnlineSecurity', 'OnlineBackup',
                        'DeviceProtection', 'TechSupport', 'StreamingTV',
                        'StreamingMovies', 'PaperlessBilling']
  for col in binary_cols_to_map:
      input_df[col] = input_df[col].map({'No': 0, 'Yes': 1})

  # Separate numerical and categorical features
  numerical_cols = ['tenure', 'MonthlyCharges', 'TotalCharges']
  categorical_input_df = input_df[categorical_cols_for_encoder].copy()
  numerical_input_df = input_df[numerical_cols].copy()

  # One-hot encode categorical features
  encoded_categorical_features = encoder.transform(categorical_input_df)

  # Scale numerical features
  scaled_numerical_features = scaler.transform(numerical_input_df)

  # Concatenate encoded categorical features and scaled numerical features
  final_features = np.hstack((encoded_categorical_features, scaled_numerical_features))

  prediction_prob = modelUS.predict(final_features)[0][0]
  prediction_class = (prediction_prob > 0.5).astype(int)

  st.subheader('Prediction Result:')
  if prediction_class == 1:
      st.error(f'This customer is likely to CHURN! (Probability: {prediction_prob:.2f})')
  else:
      st.success(f'This customer is likely to stay. (Probability: {prediction_prob:.2f})')


# UI
st.title('Customer Churn Prediction')
st.write('Enter customer details to predict churn.')

st.sidebar.header('Customer Details')
raw_user_inputs = {}

gender_display_options = ['Female', 'Male']
yes_no_display_options = ['No', 'Yes']
multiple_lines_display_options = ['No phone service', 'No', 'Yes']
internet_service_display_options = ['DSL', 'Fiber optic', 'No']
online_service_display_options = ['No', 'Yes', 'No internet service']
contract_display_options = ['Month-to-month', 'One year', 'Two year']
payment_method_display_options = ['Electronic check', 'Mailed check', 'Bank transfer (automatic)', 'Credit card (automatic)']


raw_user_inputs['gender'] = st.sidebar.radio("What's your customer's Gender", gender_display_options, key='gender_ui')
raw_user_inputs['SeniorCitizen'] = st.sidebar.radio('Is the customer a Senior Citizen?', yes_no_display_options, key='seniorcitizen_ui')
raw_user_inputs['Partner'] = st.sidebar.radio('Does the customer have a Partner?', yes_no_display_options, key='partner_ui')
raw_user_inputs['Dependents'] = st.sidebar.radio('Does the customer have Dependents?', yes_no_display_options, key='dependents_ui')
raw_user_inputs['PhoneService'] = st.sidebar.radio('Does the customer have a Phone Service subscription?', yes_no_display_options, key='phoneservice_ui')
raw_user_inputs['MultipleLines'] = st.sidebar.selectbox('Deos the customer have Multiple Lines?', multiple_lines_display_options, key='multiplelines_ui')
raw_user_inputs['InternetService'] = st.sidebar.selectbox('Does the customer have an Internet Service?', internet_service_display_options, key='internetservice_ui')
raw_user_inputs['OnlineSecurity'] = st.sidebar.selectbox('Does the customer have Online Security subscription?', online_service_display_options, key='onlinesecurity_ui')
raw_user_inputs['OnlineBackup'] = st.sidebar.selectbox('Does the customer have Online Backup?', online_service_display_options, key='onlinebackup_ui')
raw_user_inputs['DeviceProtection'] = st.sidebar.selectbox('Does the customer have Device Protection subscription?>', online_service_display_options, key='deviceprotection_ui')
raw_user_inputs['TechSupport'] = st.sidebar.selectbox('Does the customer have Tech Support subscription?', online_service_display_options, key='techsupport_ui')
raw_user_inputs['StreamingTV'] = st.sidebar.selectbox('Does the customer have a Streaming TV subscription', online_service_display_options, key='streamingtv_ui')
raw_user_inputs['StreamingMovies'] = st.sidebar.selectbox('Does the customer Stream Movies?', online_service_display_options, key='streamingmovies_ui')
raw_user_inputs['Contract'] = st.sidebar.selectbox('What kind of contract does the customer have?', contract_display_options, key='contract_ui')
raw_user_inputs['PaperlessBilling'] = st.sidebar.radio('Does the customer use Paperless Billing method?', yes_no_display_options, key='paperlessbilling_ui')
raw_user_inputs['PaymentMethod'] = st.sidebar.selectbox('What kind of Payment Method does the customer use?', payment_method_display_options, key='paymentmethod_ui')
raw_user_inputs['tenure'] = st.sidebar.slider('Tenure (months)?', min_value=1.0, max_value=72.0, value=32.0, step=1.0, key='tenure_ui')
raw_user_inputs['MonthlyCharges'] = st.sidebar.number_input('What are the customer's Monthly Charges?', min_value=18.25, max_value=118.75, value=65.0, step=0.01, key='monthlycharges_ui')
raw_user_inputs['TotalCharges'] = st.sidebar.number_input('What are the customer's Total Charges?', min_value=18.80, max_value=8684.80, value=2000.0, step=0.01, key='totalcharges_ui')


if st.sidebar.button('Predict Churn'):
    try:
        predict(raw_user_inputs)
    except Exception as e:

        st.error(f"Prediction error: {str(e)}")