# Neural Network

In [33]:
import numpy as np
import pandas as pd
import tensorflow as tf 
import math
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler

In [12]:
filePath = "bank-additional\\bank-additional\\bank-additional-full.csv"
data = pd.read_csv(filePath, sep=";")

## Preprocessing

### Encoding categorical variables
Transforming non-numeric labels into numeric labels.

In [13]:
categorical_label_encoder = LabelEncoder()

for variable in data.select_dtypes(include=["object"]).columns:
    data[variable] = categorical_label_encoder.fit_transform(data[variable])

### Seperating features and target
Seperating/Dropping the target variable (AKA the output variable) from the features (input variables). This is done because the model the target variable itself isn't used to train the model, instead the features are to train the model in order to predict the target variable; therefore, to prevent the model from learning incorrect patterns and because the model is evaluated on the target variable, it is necessary to drop the target variable.

In [14]:
features = data.drop("y", axis=1)
target = data["y"]

### Standardization
The aim of Standardization is to increase data quality through eliminating errors, inconsistencies and redundant data. Through having a higher quality dataset, the model will be able to learn patterns from more verbose data leading to better predictions.

In [15]:
standard_scaler = StandardScaler()
scaled_features = standard_scaler.fit_transform(features)

### Train Test Split
A dataset is split into two parts, the training dataset and the testing dataset. This done so a model can be tested/evaluated after training it, without testing the model, it won't be possbile to find out whether a it is underfitted or overfitted (Bias-Variance Tradeoff).

In [16]:
X_train, X_test, y_train, y_test = train_test_split(scaled_features, target, train_size=0.8, test_size=0.2, random_state=42)

## Creating the Neural Network Model

This model will be using a Multilayer Perceptron with an input layer, a hidden layer and an output layer. 

In [41]:
number_of_features = features.shape[1]
number_of_features

20

### Deciding the Number of Neurons

#### Deciding the Number of Neurons in the Input Layer and the Output Layer
Since there are 20 inputs, there will be 20 neurons in the input layer. Since an instance can only be classified into one of two types, this model is using **Binary Classification**, therefore it will be using one neuron for the output.

#### Deciding the Number of Neurons in the Hidden Layer/Layers
However, deciding the number of Neurons in the hidden layers, or in this case, the hidden layer is not as straightforward as the other layers. The complexity of the dataset (dimensionality, sample size, noise, distribution, etc) and whether the model is underfit or overfit has an effect on how many neurons should be in the hidden layer.

According to [The Number of Hidden Layers](https://www.heatonresearch.com/2017/06/01/hidden-layers.html), there are a few rules of thumb for determining the number of neurons in a hidden layer (Additionally, this article discusses about Deep Learning and why multiple hidden layers are used despite the Universal Approximation Theorem proving that a single hidden layer neural network can learn anything). 

1. The average of the sum of the number of Input and Output Neurons.
2. Sum of 2/3rds of the number of Input Neurons and the number of Output Neurons.
3. Less than two times the number of Input Neurons

The first heuristic tends to give a smaller number of hidden neurons, since there are benefits to a smaller networks such as a faster training time and being less prone to overfitting, therefore we will be deciding hte number of hidden neurons by getting the average of the sum of of the number of Input and Output Neurons.

In [46]:
number_of_input_neurons = number_of_features
number_of_output_neurons = 1
number_of_hidden_neurons = math.ceil((number_of_input_neurons + number_of_output_neurons) / 2)
number_of_hidden_neurons

11

### Deciding the Activation Function
For the activation function, ReLU will be due to the fact that it is simple and efficient, it is also the standard activation function that is used for classification tasks, with other functions such as sigmoid or Tanh being used for situations where ReLU isn't optimal.

### Coding the Model

In [62]:
input_layer = Dense(number_of_input_neurons, input_shape=(number_of_features,))
hidden_layer = Dense(number_of_hidden_neurons, activation="relu")
output_layer = Dense(number_of_output_neurons)

neural_network_model = Sequential([input_layer, hidden_layer, output_layer])