## Customer Churn Prediction Using Artificial Neural Network (ANN)
### Customer churn prediction is to measure why customers are leaving a business

![](https://images.squarespace-cdn.com/content/v1/588f9607bebafbc786f8c5f8/1607924812500-Y1JR8L6XP5NKF2YPHDUX/image6.png)

In [None]:
from matplotlib import image
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
from tensorflow import keras
%matplotlib inline

### Load the data

In [None]:
df = pd.read_csv("../input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv")
df.head()

## Dropping the customerID as it has no use

In [None]:
df.drop('customerID',axis='columns',inplace=True)

In [None]:
df.dtypes

### On viewing the datatypes, we see that TotalCharges is object, so we change it to integer

In [None]:
df.TotalCharges.values

In [None]:
df[pd.to_numeric(df.TotalCharges,errors='coerce').isnull()].shape

## Remove rows with space in TotalCharges

In [None]:
df1 = df[df.TotalCharges!=" "]
df1.shape

In [None]:
df1.TotalCharges = pd.to_numeric(df1.TotalCharges)
df1.TotalCharges.dtypes

## Data Visualisation, No. of customers vs Tenure

In [None]:
tenure_churn_no = df1[df1.Churn == 'No'].tenure
tenure_churn_yes = df1[df1.Churn == 'Yes'].tenure

plt.hist([tenure_churn_yes,tenure_churn_no],color=['green','red'],label=['Churn="No"','Churn="Yes"'])
plt.legend()
plt.xlabel('Tenure')
plt.ylabel('No. of customers')
plt.title('Customer churn visualisation prediction')

## Let's check the different values in the different columns

In [None]:
for column in df1:
  if df1[column].dtypes == object:  
    print(column+" :",df1[column].unique())

#### Some of the columns have values like 'No internet service' or 'No phone service' which is essentially No so replacing all such values with No

In [None]:
df1.replace('No internet service','No',inplace=True)
df1.replace('No phone service','No',inplace=True)

In [None]:
for column in df1:
  if df1[column].dtypes == object:  
    print(column+" :",df1[column].unique())

### Converting all 'Yes' to 1 and all 'No' to 0

In [None]:
yes_no_columns = ['Partner','Dependents','PhoneService','MultipleLines','OnlineSecurity','OnlineBackup','DeviceProtection',
                  'TechSupport', 'StreamingTV', 'StreamingMovies','PaperlessBilling','Churn']
            
for col in yes_no_columns:
  df1[col].replace({'Yes':1,'No':0},inplace=True)

In [None]:
for column in df1:
  if df1[column].dtypes == object:  
    print(column+" :",df1[column].unique())

In [None]:
df1['gender'].replace({'Female':1,'Male':0},inplace=True)

In [None]:
for column in df1:
  if df1[column].dtypes == object:  
    print(column+" :",df1[column].unique())

### One hot encoding for categorical columns

In [None]:
df2 = pd.get_dummies(data=df1,columns=['InternetService','Contract','PaymentMethod'])

In [None]:
df2.columns

In [None]:
cols = ['tenure','MonthlyCharges','TotalCharges']
scaler = MinMaxScaler()
df2[cols] = scaler.fit_transform(df2[cols])

In [None]:
df2.sample(5)

In [None]:
X = df2.drop('Churn',axis='columns')
y = df2['Churn']

### Train Test Split

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=5)

In [None]:
X_train.shape

In [None]:
X_test.shape

In [None]:
len(X_train.columns)

### Build a model (ANN) in tensorflow/keras

![](https://upload.wikimedia.org/wikipedia/commons/thumb/3/3d/Neural_network.svg/250px-Neural_network.svg.png)

In [None]:
model = keras.Sequential([
        keras.layers.Dense(20, input_shape=(26,), activation='relu'),
        keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer = 'adam',
              loss = 'binary_crossentropy',
              metrics = ['accuracy']
)

model.fit(X_train,y_train,epochs=100)

### Here we are getting a 82% accuracy, evaluating accuracy on test data

In [None]:
model.evaluate(X_test,y_test)

In [None]:
yp = model.predict(X_test)
yp[:5]

In [None]:
y_test[:10]

### To compare the predicted values with the actual values, we assume any predicted value >= 0.5 as 1 and others as 0

In [None]:
y_pred = []
for element in yp:
  if element>0.5:
    y_pred.append(1)
  else:
    y_pred.append(0)

In [None]:
y_pred[:10]

## Now comparing the predicted and actual values, we can get the classification reports
## 1. Confusion Matrix
![](https://glassboxmedicine.files.wordpress.com/2019/02/confusion-matrix.png)

In [None]:
cm = tf.math.confusion_matrix(labels=y_test,predictions=y_pred)
plt.figure(figsize=(10,7))
sns.heatmap(cm,annot=True, fmt='d')
plt.xlabel('Predicted')
plt.ylabel('Truth')

## 2. Classification Report

![](https://static.packt-cdn.com/products/9781785282287/graphics/B04223_10_02.jpg)

In [None]:
from sklearn.metrics import confusion_matrix, classification_report

print(classification_report(y_pred,y_test))

## Please upvote if this notebook was helpful 
## Thank you