<h1> Artificial Neural Networks for Classification Problems </h1>

<p> Artificial Neural Networks (ANN), also known as neural networks or simulated neural networks (SNNs), are a subset of machine learning and are at the heart of deep learning algorithms. Their name and structure are inspired by the human brain, mimicking the way that biological neurons signal to one another. </p>

<p> Artificial neural networks (ANNs) are comprised of a node layers, containing an input layer, one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network. </p>

We will be using the following:
* Python
* Jupyter Notebook / Google Colab
* Pandas
* NumPy
* Tensorflow
* Scikit-Learn

For our modeling, we will be using Churn modeling dataset from Kaggle. You can find it [here.](https://www.kaggle.com/datasets/shrutimechlearn/churn-modelling)

The dataset contains the following attributes:
* RowNumber: Represents the number of rows
* CustomerId: Represents customerId
* Surname: Represents surname of the customer
* CreditScore: Represents credit score of the customer
* Geography: Represents the city to which customers belongs to
* Gender: Represents Gender of the customer
* Age: Represents age of the customer
* Tenure: Represents tenure of the customer with a bank
* Balance: Represents balance hold by the customer
* NumOfProducts: Represents the number of bank services used by the customer
* HasCrCard: Represents if a customer has a credit card or not
* IsActiveMember: Represents if a customer is an active member or not
* EstimatedSalary: Represents estimated salary of the customer
* Exited: Represents if a customer is going to exit the bank or not.

<p> Our major objective here is to build an artificial neural network that will examine all independent factors (the first 13) and forecast whether or not our customer will leave the bank (Exited is dependent variable here). </p>


In [3]:
# Importing the necessary libraries 
import numpy as np
import pandas as pd
import tensorflow as tf

In [5]:
# Loading the data
data = pd.read_csv('/content/sample_data/Churn_Modelling.csv')
data.head()

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [7]:
# Declaring the independent and dependent variables 
X = data.iloc[:,3:-1].values 
y = data.iloc[:,-1].values

Now that we have declared our X and y, we must perform some feature engineering to encode our categorical variables Gender and Country. 

In [10]:
# Encoding our categorical variable Gender using LabelEncoder()
from sklearn.preprocessing import LabelEncoder
LE1 = LabelEncoder()
X[:,2] = np.array(LE1.fit_transform(X[:,2]))
X

array([[619, 'France', 0, ..., 1, 1, 101348.88],
       [608, 'Spain', 0, ..., 0, 1, 112542.58],
       [502, 'France', 0, ..., 1, 0, 113931.57],
       ...,
       [709, 'France', 0, ..., 0, 1, 42085.58],
       [772, 'Germany', 1, ..., 1, 0, 92888.52],
       [792, 'France', 0, ..., 1, 0, 38190.78]], dtype=object)

In [12]:
# Let us now see how many unique countries are there in our dataset
data['Geography'].unique()

array(['France', 'Spain', 'Germany'], dtype=object)

To encode our second categorical variable Country, we will be using one-hot encoding since there are 3 different categories of countries. In one hot encoding, all the string values are converted into binary streams of 0’s and 1’s. One-hot encoding ensures that the machine learning algorithm does not assume that higher numbers are more important.

</br>

We will be using ColumnTransformer for this encoding. This estimator allows different columns or column subsets of the input to be transformed separately and the features generated by each transformer will be concatenated to form a single feature space. This is useful for heterogeneous or columnar data, to combine several feature extraction mechanisms or transformations into a single transformer. You can find the documentation [here.]('https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html') 

In [13]:
#Encoding our categorical variable Geography
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct =ColumnTransformer(transformers=[('encoder',OneHotEncoder(),[1])],remainder="passthrough")
X = np.array(ct.fit_transform(X))
X

array([[1.0, 0.0, 0.0, ..., 1, 1, 101348.88],
       [0.0, 0.0, 1.0, ..., 0, 1, 112542.58],
       [1.0, 0.0, 0.0, ..., 1, 0, 113931.57],
       ...,
       [1.0, 0.0, 0.0, ..., 0, 1, 42085.58],
       [0.0, 1.0, 0.0, ..., 1, 0, 92888.52],
       [1.0, 0.0, 0.0, ..., 1, 0, 38190.78]], dtype=object)

In [15]:
#Splitting dataset into training and testing dataset
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(X,y,test_size=0.2,random_state=42)

Further, we will perform feature scaling. This can be done in one of two ways
* Normalizing
* Standardizing

Whenever standardization is performed, all values in the dataset will be converted into values ranging between -3 to +3. While in the case of normalization, all values will be converted into a range between -1 to +1.

Usually, Normalization is used only when our dataset follows a normal distribution while standardization is a universal technique that can be used for any dataset irrespective of the distribution. Here we are going to use Standardization.

In [16]:
#Performing Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [17]:
# Initializing Artificial Neural Networks
ann = tf.keras.models.Sequential()

Keras which was a separate library in itself is now integrated with tensorflow and is now considered as a sub-library of tensorflow. You can find more about Tensorflow [here.](https://www.tensorflow.org/) 

</br>

Once we initialize our ann, we are now going to create layers for the same. Here we are going to create a network that will have 2 hidden layers, 1 input layer, and 1 output layer. So, let’s create our very first hidden layer


In [18]:
#Adding First Hidden Layer
ann.add(tf.keras.layers.Dense(units=6,activation="relu")) #Rectified linear unit - activation function

we have created our first hidden layer by using the Dense class which is part of the layers module. This class accepts 2 inputs:-

* units: number of neurons that will be present in the respective layer
* activation: specify which activation function to be used

Since we are going to create two hidden layers, this same step we are going to repeat for the creation of the second hidden layer as well.

In [19]:
 #Adding Second Hidden Layer
ann.add(tf.keras.layers.Dense(units=6,activation="relu"))

we are now going to create our output layer for ann. The output layer will be responsible for giving output.

In [20]:
#Adding Output Layer
ann.add(tf.keras.layers.Dense(units=1,activation="sigmoid"))

In a binary classification problem(like this one) where we will be having only two classes as output (1 and 0), we will be allocating only one neuron to output this result. For the multiclass classification problem, we have to use more than one neuron in the output layer. For example – if our output contains 4 categories then we need to create 4 different neurons(one for each category).

For the binary classification problems, the activation function that should always be used is sigmoid. For a multiclass classification problem, the activation function that should be used is softmax.

</br>

We have now created layers for our neural network. In this step, we are going to compile our ANN.

In [21]:
#Compiling ANN
ann.compile(optimizer="adam",loss="binary_crossentropy",metrics=['accuracy'])

We have used compile method of our ANN object in order to compile our network. Compile method accepts the below inputs:

* optimizer:- specifies which optimizer to be used in order to perform stochastic gradient descent. 

* loss:- specifies which loss function should be used. For binary classification, the value should be binary_crossentropy. For multiclass classification, it should be categorical_crossentropy.

* metrics:- which performance metrics to be used in order to compute performance. Here we have used accuracy as a performance metric.

</br>

The last step in creating an ANN is to fit our model using the training data

In [22]:
#Fitting ANN
ann.fit(X_train,Y_train,batch_size=32,epochs = 100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7ff911843310>

Here we have used the fit method in order to train our ann. The fit method is accepting 4 inputs in this case:-

* X_train: Matrix of features for the training dataset
* Y_train: Dependent variable vectors for the training dataset
* batch_size: how many observations should be there in the batch. Usually, the value for this parameter is 32 but we can experiment with any other value as well.
* epochs: How many times neural networks will be trained. Here the optimal value that I have found from my experience is 100.

</br>

Here we can see that in each epoch our loss is decreasing and our accuracy is increasing. As we can see here that our final accuracy is 86.59 which is pretty remarkable for a neural network with this simplicity.

</br>

To create single-point predict for custom values, we can use the following code

In [23]:
#Predicting result for Single Observation
print(ann.predict(sc.transform([[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1,50000]])) > 0.5)

[[False]]


Here our neural network is trying to predict whether our customer is going to exit or not based on the values of independent variables