<a href="https://colab.research.google.com/github/AanchalSingh98/TELECOM-CHURN-PREDICTOR/blob/main/Internship_Project_Aanchal.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **MODEL : CHURN PREDICTOR FOR ConnectSphere Telecom**

# Step 1 : Importing Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix
from keras.models import Sequential
from keras.layers import Dense

In this block I have imported the necesaary modules that I needed while building this project.

# Step 2 : Importing The Dataset

In [None]:
df=pd.read_csv("/content/sample_data/TelecomCustomerChurn_Updated.csv")
df.head()

Unnamed: 0,customerID,Gender,SeniorCitizen,Partner,Dependents,Tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,Call Duration,Data Usage
0,7590-VHVEG,Female,0,Yes,No,1,No,No,DSL,No,...,No,No,Monthly,Yes,Manual,29.85,29.85,No,57,325
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,No,No,One year,No,Manual,56.95,1889.5,No,57,1324
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,Monthly,Yes,Manual,53.85,108.15,Yes,36,4065
3,7795-CFOCW,Male,0,No,No,45,No,No,DSL,Yes,...,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No,60,1064
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,Monthly,Yes,Manual,70.7,151.65,Yes,17,709


In this cell , I have imported the csv dataset and made a 'df' named pointer to perform actions on the file.Then I have also used head function to cross-verify that the file has been imported and see the top 5 rows.

# Step 3 : Performing EDA

In [None]:
df.columns

Index(['customerID', 'Gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'Tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn',
       'Call Duration', 'Data Usage'],
      dtype='object')

Here , I got to know about the column names to undersatnd the dataset.

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 23 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   Gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   Tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


Here , I checked the data type of each column and encountered the problem of columns having object as datatype that will need refinement.

In [None]:
df.isnull().sum()

Unnamed: 0,0
customerID,0
Gender,0
SeniorCitizen,0
Partner,0
Dependents,0
Tenure,0
PhoneService,0
MultipleLines,0
InternetService,0
OnlineSecurity,0


Here , I checked for the presence of null values which came out false . Thus I will not need to either drop data or replace any value.

#Step 4 : Preprocessing

In [None]:
df["TotalCharges"] = pd.to_numeric(df["TotalCharges"], errors="coerce")
df["TotalCharges"].fillna(df["TotalCharges"].median(), inplace=True)

df["Churn"] = df["Churn"].map({"No": 0, "Yes": 1})

df = pd.get_dummies(df, drop_first=True)


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df["TotalCharges"].fillna(df["TotalCharges"].median(), inplace=True)


Here , I overcame the problem of object data type by converting some directly to numerals and by creating dummy variables in the csv file.

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Columns: 7067 entries, SeniorCitizen to PaymentMethod_Manual
dtypes: bool(7060), float64(2), int64(5)
memory usage: 47.8 MB


Here , I cross verified my code and it worked as only data type available now are of float,int or bool . Thus , object data type is eradicated.

#Step 5 : Splitting For Training And Testing


In [None]:
X = df.drop(columns=["Churn"])
y = df["Churn"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

In this cell , I performed the splitting of dataset into testing dataset and training dataset.

#Step 6 : Feature Scaling


In [None]:
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Here , for implying ANN Model I performed feature scaling of the training and testing input so that it does not face any issue while fitting the model.

#Step 7 : Building ANN

In [None]:
model = Sequential()
model.add(Dense(40, activation='relu', input_shape=(X.shape[1],)))
model.add(Dense(20, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dense(1, activation='sigmoid'))


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In this cell , I built a Binary Classification Model that has in total 6 layers among which 4 are hidden layers, one is of input and one is of output where input layer uses relu like hidden layers and output layer uses sigmoid activation function.  

#Step 8 : Training The ANN

In [None]:
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

Here , I trained my model using adam as optimizer .

In [None]:
model.fit(X_train, y_train,batch_size = 32, epochs = 100)

Epoch 1/100
[1m177/177[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - accuracy: 0.6946 - loss: 0.5920
Epoch 2/100
[1m177/177[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.8727 - loss: 0.3731
Epoch 3/100
[1m177/177[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.9740 - loss: 0.0769
Epoch 4/100
[1m177/177[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.9963 - loss: 0.0144
Epoch 5/100
[1m177/177[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.9994 - loss: 0.0038
Epoch 6/100
[1m177/177[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.9992 - loss: 0.0026
Epoch 7/100
[1m177/177[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.9996 - loss: 0.0013
Epoch 8/100
[1m177/177[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.9995 - loss: 9.1576e-04
Epoch 9/100
[1m177/177[0m 

<keras.src.callbacks.history.History at 0x7ec63074cce0>

In this cell I fitted the data in the model in the batch size of 32.

#Step 9 : Prediction and Evaluation


In [None]:
y_pred = np.round(model.predict(X_test))
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))

[1m45/45[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
              precision    recall  f1-score   support

           0       0.74      1.00      0.85      1035
           1       1.00      0.00      0.01       374

    accuracy                           0.74      1409
   macro avg       0.87      0.50      0.43      1409
weighted avg       0.81      0.74      0.62      1409



At last , my model made predictions based on the neural network built in above lines of code whose classification report is displayed here.