# Artrficial Neural Networks
---
In the dataset for this project we have a list of customers to a bank.\
The bank has noticed a recent rise customers leaving the bank.\
They are requesting a system that can predict if a customer is likely to leave as a customer.\

**TABLE COLUMNS:**\
RowNumber, CustomerId, Surname, CreditScore, Geography, Gender, Age, Tenure, Balance, NumOfProducts, HasCrCard, IsActiveMember, EstimatedSalary, Exited

### Needed Packages
- Pandas
- NumPy
- TenserFlow
- scikit-learn

```
pip install pandas numpy tenserflow scikit-learn
```

### Library Imports
The only thing to add to the block is a little more detail about the Enoders.\
LableEncoder is a tool from scikit-learn that will give each individual instance of a string an integer value,
the strings in the already existing column will be replaced with these assigned integers.\
OneHotEncoder is also a scikit-learn tool for handeling categorical data.\
As an example, if a column, 'Color', contained the possible strings of ('red','blue','green').\
When used with ColumnTransformer, the 'color' column will be removed, three new column will be inserted, a 1 will be placed in the respective colums,\
otherwise the column will receive a 0.

In [12]:
import pandas as pd # To work with data files like .csv
import numpy as np # For handeling and manipulating data
import tensorflow as tf # To build the learning model
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler # Binary encoding, Multiclass encoding, Feature scaler
from sklearn.compose import ColumnTransformer # Changes Multiclass column into multiple columns & inserts values
from sklearn.model_selection import train_test_split # Splits data into two pairs of train and test sets
from sklearn.metrics import confusion_matrix, accuracy_score # To see model results

## Data Preprocessing
Before we pass the data to the model we need to work the data a bit.\
Right now as it is we have columns in the table that aren't needed.\
We also have columns that contain strings.\
The model can't work with strings, so this will need to be handled.

### Import Dataset
To import the data, we use Pandas.\
Pandas will return what is called a dataframe.\
Panda dataframes are like arrays, but are not arrays.\
They have their prebuilt methods for indexing and parsing data.\
\
We are going to start with importing the data, then parsing out only the data we need.\
Keep in mind we are looking input features and outputs.\
The bank wants to know if someone is likely to "Exit" the bank, so we know the last column are the output we want our model to predict.\
We may also imply that RowNumber, CustomerId, and Surname wouldn't be relevant in a customers decison on leaving the bank.\
So we will collect all the columns, minus the "outputs", and the independent variables "X".\
Then fetch the last column as the dependent variables "y".

In [2]:
df = pd.read_csv('dataset.csv') # Imports Data

X = df.iloc[:,3:-1].values # Parses out [All Rows, 3rd to 2nd last column]
y = df.iloc[:,-1].values # Parses out [All Rows, last column only]

### Encode Categorical Data
Now we need to handle the categorical data, or data that is not represented by an interger or float.\
Fist we'll handle the gender column, which only contains 2 individual strings.\
\
For this we will use the LableEncoder class from scikit-learn.\
We create an instance of the LabelEncoder class.\
Then we use the class instance to encode the column with an index of 2, all rows.

In [3]:
le = LabelEncoder() 
X[:,2] = le.fit_transform(X[:,2])

Now, we handle the "Geography" column.\
This time we will use ColumnTransformer with OneHotEncoder.\
How this affect the dataset was explained above, what we'll cover here are the arguments we are passing the ColumnTransformer.\
\
The first argument transformers, will receive an array of tuples, each tuple being a different encoder.\
This first element in the tuple is the name of the encoder. It can be anything, this is for referencing later.\
The second element is the encoding function we intend to use,\
The third element of the tuple is the column index we wish to encode.\
\
The second argument remainder, is what the ColumnTransformer will do with all the column not being encoded.\
By default it is set to drop, but in this case we want to keep the data so we set it to passthrough,\
this will leave the rest of the column as they are.

In [4]:
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough') # transformer name, encoding func, col to encode
X = np.array(ct.fit_transform(X)) # we convert the data to a numpy array because the next step will be expecting numpy arrays

### Split Data Into Train\Test Sets
Now that we made sure all the data is either an int or float we can split the data into 2 sets.\
One set for training and the second, unseen to the model durning training, to test with.\
\
As we aleardy know, we have two sets of data, the input features and the outputs.\
So it will make sense that they both need to be split into train and test sets.\
When doing so we need to make sure that we keep the right outputs inline with the input features.\
Scikit-Learn provides us a tool for that, train_test_split.\
\
train_test_split returns 4 numpy arrays in this order: *input-train, input-test, output-train, output-test*
For arguments, only the first 2 are required.\
First being the input-feature, independent variables, or "X".\
Second being the output, dependent variables, or "y".\
Train_size is the percent value of the data you want to save for testing, it defaults to 0.2, which is a common starting point.\
\
When selecting data for the test sets, rows are picked at random.\
Setting random_state=0 will mean it the rows selected are the always the same.\
This is for us humans to learn easier, normally this will not be set.

In [5]:
# By setting random_state to 0 we will get the train test split every time, for human learning. 
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.2, random_state=0) # input features, expected outputs, % of data for test  

### Scale Data
The last thing we have to do with the data is scale it.\
Right now some values are small like 0, 1, or 2, while others are larger, 1000's or even 10's of 1000's.\
We don't want these larger values to have any more influence on the models decision than any other column.\
So we will put them in a more comparable scale.\
\
For this we will use the StandaredScaler class.\
This will take all and convert it into much smaller positive and negative values.\
More importantly, these new values assigned are all based on the scale, giving all the columns the same influence on the model.\
\
We will want to make sure we scale both the train and test sets.

In [6]:
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

## Construct Artificial Neural Network
With Tensorflow we have access to a sub-package called keras, which we will use to build the neural network.\
We'll start with initiating our model with the keras.Sequential() model.\
The Sequential model is appropriate when each layer will have only one input/output tensor.\
\
Once the model has been initiated, we can start adding layers using Sequential's add method.\
Again we turn to keras to construct the network layers.\
With keras.layers we have access to the Dense layer, which is just a regual densely-connected NN layer.\
When adding a Dense layer to the network, we only need 2 arguments.\
Units which is the number of nodes in that layer.\
Activation, is short for activation function.\
\
When building the NN we need to make sure that units of the first layer match the number features we have in our dataset.\
Then we also want to make sure that last layer macthes the number of outputs.\
Outside of that the number of layers and nodes is based on the neccessity of the problem, and knowing when to start comes with experience.
\
Last thing we do is compile the model, which is configure the model for training.\
All the parameters have preset values, but typically we will set the optimizer and cost fucntion.\
If we want to get any specific metrics, we can set that now too.\
\
The Adam optimization algorithm is a widely used optizer that is an extension to stochastic gradient descent.\
Binary_crossentropy is a cost function good for caluclation errors in binary solutions.\
When we train the model it will automatically print the loss, by adding 'accuracy' to the metrics, we will get that too.\

In [13]:
ann = tf.keras.Sequential()
ann.add(tf.keras.layers.Dense(units=6, activation='relu')) # Input Layer / 6 nodes / rectifier activation func
ann.add(tf.keras.layers.Dense(units=6, activation='relu')) # Hidden Layer / 6 nodes / rectifier activation func
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid')) # Output Layer / 6 nodes / sigmoid activation func
ann.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

## Train The Neural Network
Simply put, .fit() will "fit" the model the training data.\
So lets cover the arguments.\
\
First will be the training input features("X"), and the second, training outputs("y").\
Batch size is tell the model how many rows to randomly select and pass through the model.\
Epochs is how many iterations of passing batches too, and updating the model will take place.

In [14]:
ann.fit(X_train, y_train, batch_size=32, epochs=40)

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


<keras.src.callbacks.History at 0x1f39aced570>

## Test The Neural Network
Once trained, we can use the .predict() method to a single obsevation and an array of observations.\
The model will return a array of arrays, in the nested arrays are the values of the output nodes.\
in our case there is only 1 output.\
\
After passing the test set through the model, we iterate through the returned values, reseting that index to true if over 0.5, otherwise false.

In [15]:
y_predict = ann.predict(X_test)
y_predict = (y_predict > .5)



### Confusion Matrix
Scikit_Learn provides us with tool to visualize the results of our model.\
One way to visualize the results is the Confusion Matrix.\
We will pass both the predicted outputs to the actual outputs, and it will return an with a nested array for each possible outcome.\
In our case True or False, so we will get 2 nested arrays.\
In each array there will be 2 values, 1st being how many times it predicted that outcome correctly, 2nd being incorrect guesses of that outcome.\
\
**Example:**\
['Guessed false correctly (n) amount of times', Guessed false incorrectly (n) amount of times'],\
['Guessed true correctly (n) amount of times', Guessed true incorrectly (n) amount of times']\
\
acurracy will compare the two, predicted and actual, and return a float value representing the accuracy rate.


In [16]:
cm = confusion_matrix(y_test, y_predict)
print(f'Guessed False {cm[0][0]} times correctly and {cm[0][1]} time incorrectly')
print(f'Guessed True {cm[1][0]} times correctly and {cm[1][1]} time incorrectly')
print(f'{round(accuracy_score(y_test, y_predict)*100)}% Accuracy')

Guessed False 6127 times correctly and 237 time incorrectly
Guessed True 1102 times correctly and 534 time incorrectly
83% Accuracy


## Single Prediction

In [18]:
# data  = [[1,0,0,600,1,40,3,60000,2,1,1,50000]]
data  = [[0,0,1,700,1,60,1,0,1,1,1,101000]]

data = sc.transform(data)

print(f'Probabilty of customer leaving: {round(ann.predict(data)[0][0]*100)}%')
print('Customer will leave') if ann.predict(data)[0][0] > .5 else print('Customer will stay')

Probabilty of customer leaving: 33%
Customer will stay
