<a href="https://colab.research.google.com/github/GoAshim/Artificial-Neural-Network-with-TensorFlow/blob/main/06_Binary_Classification_using_TensorFlow_Predict_Customer_Churn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Binary Classification using TensorFlow - Predict Customer Churn

## Business Problem
In this project, we are going to use Artificial Neural Network (ANN) with TensorFlow to perform binary classification. We have the dataset of 10000 random customers of a bank with some important features including an indicator that if the customer either remained with or left the bank in the last 6 months. We will apply ANN model on the dataset and see how well that model can predict customer churn. This can benefit the bank to know in advance which customers have more probability of leaving and can build strategy to tackle the problem.

## Data Processing

### Import Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

In [2]:
# Check the version of tensorflow
tf.__version__

'2.15.0'

### Import Data

In [3]:
# Read data from the CSV file into a pandas dataframe
df01 = pd.read_csv("/content/sample_data/Churn_Modelling.csv")

In [4]:
# Check first few records of the dataframe
df01.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


### Exploratory Data Analysis (EDA)

In [6]:
# Shape of the dataframe
print("Number of records: ", df01.shape[0])
print("Number of features: ", df01.shape[1])

Number of records:  10000
Number of features:  14


In [7]:
# Data types of each columns
df01.dtypes

RowNumber            int64
CustomerId           int64
Surname             object
CreditScore          int64
Geography           object
Gender              object
Age                  int64
Tenure               int64
Balance            float64
NumOfProducts        int64
HasCrCard            int64
IsActiveMember       int64
EstimatedSalary    float64
Exited               int64
dtype: object

#### Observation:
After reviewing the dataset, we see that the first 3 features are not going to have any impact on determining whether the member will stay or leave the bank. So we are going to drop those 3 columns.

In [8]:
# Drop columns and create new dataframe
df02 = df01.drop(columns=['RowNumber', 'CustomerId', 'Surname'])

In [9]:
# Review the new dataframe
df02.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [10]:
# Review the statistical distribution of data in numerical columns
df02.describe()

Unnamed: 0,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,650.5288,38.9218,5.0128,76485.889288,1.5302,0.7055,0.5151,100090.239881,0.2037
std,96.653299,10.487806,2.892174,62397.405202,0.581654,0.45584,0.499797,57510.492818,0.402769
min,350.0,18.0,0.0,0.0,1.0,0.0,0.0,11.58,0.0
25%,584.0,32.0,3.0,0.0,1.0,0.0,0.0,51002.11,0.0
50%,652.0,37.0,5.0,97198.54,1.0,1.0,1.0,100193.915,0.0
75%,718.0,44.0,7.0,127644.24,2.0,1.0,1.0,149388.2475,0.0
max,850.0,92.0,10.0,250898.09,4.0,1.0,1.0,199992.48,1.0


In [11]:
# Review how records are distributed on each of the two categorical variable
print("Count of records by geography:")
print(df02['Geography'].value_counts())
print("\nCount of records by gender:")
print(df02['Gender'].value_counts())

Count of records by geography:
France     5014
Germany    2509
Spain      2477
Name: Geography, dtype: int64

Count of records by gender:
Male      5457
Female    4543
Name: Gender, dtype: int64


### Handle Missing Data

### Encode Categorical Data

### Scale Numeric Data

### Split Data for Training and Testing