# Neural Networks

## Introduction

Artifical Neural Networks are quickly becoming one of the most popular and widely used mechanisms in Machine Learning and Data Analysis. In the last number of years, numerous libraries and software has been developed to equip programmers with a set of tools for modeling and analysing data in order to recognise patterns and make predictions using large data sets. In today's age of [Big Data](https://en.wikipedia.org/wiki/Big_data) it is important to try make sense of all of the data we have in society. This could range from social media pattern recognitions from anything to finance and economic trends. The reality is that today we have more data in existence than ever before and it growing at a vast and exponential rate.

Artifical Neural Networks aim to mimic and replicate the neurons of a human brain and using the power of the complex mathematical functions allow us to process and model data in such a way that we can form rational assumptions on a given data set.

Given the sheer amount of data out there it is important to note that data we may analyse is often subject to human error and may not always hold a valid essense of truth. For the purpose of this example we will take a look at the [Iris Data Set](https://archive.ics.uci.edu/ml/datasets/iris). 

Throughout the notebook we aim to build an Artifical Neural Network capable of making predictions of species of Iris Flowers using [Keras](https://keras.io) - Keras is a high-level neural networks API, written in Python and capable of running on top of [Tensorflow](https://www.tensorflow.org/).

## Importing the data set

In [6]:
# imports and preliminaries
import csv
import numpy as np
import keras as kr

# Load the Iris dataset.
iris = list(csv.reader(open('iris-data-set.csv')))[1:]

## Inputs and Outputs
### Data Investigation and Classification

Before trying to create a model for our Neural Network we first need to investigate our data and determine what will be the inputs and what will be our outputs. The CSV file provided contains 5 columns with:

- Sepal Length
- Sepal Width
- Petal Length
- Petal Width
- Species

Judging by the fact that we are trying to make predictions we must split our data set into sets of:

- Inputs - Numerical data values
- Outputs - Classification of Iris Flower species


Now that we have the data set loaded we can extract the data we need into appropriate data sets in preparation for training and testing our Model.

In [7]:
# The inputs are four floats: sepal length, sepal width, petal length, petal width.
inputs  = np.array(iris)[:,:4].astype(float)

# Outputs are initially individual strings: setosa, versicolor or virginica.
outputs = np.array(iris)[:,4]

# Convert the output strings to ints.
outputs_vals, outputs_ints = np.unique(outputs, return_inverse=True)

## Categorical Classification

Here we are using the Keras utility `to_categorical()` to allow us to turn our output categories into binary class matrices. This is often refered to as "One-Hot" encoding. This is for use with categorical_crossentropy and classification of our species (setosa, versicolor and virginica). 

Each Species will be represented as a binary class matrix.

- Setosa [1 0 0]
- Versicolor [0 1 0]
- Virginica [0 0 1]

In [8]:
# Encode the category integers as binary categorical vairables.
outputs_cats = kr.utils.to_categorical(outputs_ints)

## Divide & Conquer 
### Splitting the data

We can now randomly split the data into two sets for:

- Training
- Testing

In [9]:
# Split the input and output data sets into training and test subsets.
inds = np.random.permutation(len(inputs))
train_inds, test_inds = np.array_split(inds, 2)
inputs_train, outputs_train = inputs[train_inds], outputs_cats[train_inds]
inputs_test,  outputs_test  = inputs[test_inds],  outputs_cats[test_inds]

## Creating a Model

Below we can see an example of a how a Neural Network can be visualized. Every Neural Network is made up of these three main consituents.

- Input Layer
- $x$ number of Hidden Layers
- Output Layer

![neural_net](img/neural_net.jpeg)

### Keras Models

Keras offers a very useful and high level API to handle creation of Neural Networks. The [Keras Sequential Model](https://keras.io/getting-started/sequential-model-guide/) is defined as a *linear stack of layers*. This is perfect for what we need to create an Artificial Neural Network consisting of Input, Output and Hidden nodes. We define our Model and add the layers to it.

We are trying to create a model that will look somewhat similar to below: 

![iris_model](img/iris_model.png)

In [10]:
# Create a neural network.
model = kr.models.Sequential()

# Add an initial layer with 4 input nodes, and a hidden layer with 16 nodes.
model.add(kr.layers.Dense(16, input_shape=(4,)))
# Apply the sigmoid activation function to that layer.
model.add(kr.layers.Activation("sigmoid"))
# Add another layer, connected to the layer with 16 nodes, containing three output nodes.
model.add(kr.layers.Dense(3))
# Use the softmax activation function there.
model.add(kr.layers.Activation("softmax"))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


## Activation Functions

An [Activation Function](https://en.wikipedia.org/wiki/Activation_function) in a Neural Network defines the output of a given node given its input or set of inputs. Above we applying two activation functions in separate layers.

### Sigmoid
A sigmoid function is a mathematical function having an "S" shaped curve (sigmoid curve). Often, sigmoid function refers to the special case of the logistic function shown in the first figure and defined by the formula:


![sigmoid](img/sigmoid.svg)

Below we see a plot of the "S" shaped curved or "Sigmoid Curve".

![curve](img/Logistic-curve.svg.png)

It's usage in neural network are:
1. Activation function that transform linear inputs to nonlinear outputs.
2. Bound output to between 0 and 1 so that it can be interpreted as a probability.
3. Make computation easier than arbitrary activation functions.

### Softmax

[Softmax regression](http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/) (or multinomial logistic regression) is a generalization of logistic regression to the case where we want to handle multiple classes.

Softmax regression is defined by the mathematical formula: 

![softmax](img/softmax.svg)

Here are using Softmax to allow us to let our data flow through the hidden layers and essentially end up as one of our defined classes:

- Setosa
- Versicolor
- Virginica


In [11]:
# Display our Model using the summary function
model.summary()

## Configure the Model for training and fit the training data

We configure the Model using the `compile()` function defined in the [Keras Model API](https://keras.io/models/model/).
We define an Optimizer, a Loss function and an additional metric - accuracy.

So before we can use our Model for we must first train it. Using the training data subset which we extracted before we can now fit it to our Model. 

The goal here is for the Optimizer to essentially minimize the Loss.

We fit the model passing our inputs and our expected outputs and train it across 100 "Epochs" or training cycles. On each iteration we improve the improve the accuracy and miniize the loss.

In [12]:
# Configure the model for training.
# Uses the adam optimizer and categorical cross entropy as the loss function.
# Add in some extra metrics - accuracy being the only one.
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

# Fit the model using our training data.
model.fit(inputs_train, outputs_train, epochs=100, batch_size=1, verbose=1)

Epoch 1/100
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 824us/step - accuracy: 0.3188 - loss: 1.0204   
Epoch 2/100
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 676us/step - accuracy: 0.5140 - loss: 0.9621   
Epoch 3/100
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 676us/step - accuracy: 0.6148 - loss: 0.9434
Epoch 4/100
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 676us/step - accuracy: 0.6833 - loss: 0.8974
Epoch 5/100
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 716us/step - accuracy: 0.6375 - loss: 0.8682
Epoch 6/100
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 689us/step - accuracy: 0.7462 - loss: 0.8142
Epoch 7/100
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 771us/step - accuracy: 0.7122 - loss: 0.7871
Epoch 8/100
[1m75/75[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 762us/step - accuracy: 0.7148 - loss: 0.7379
Epoch 9/100
[1m75/75[0m 

<keras.src.callbacks.history.History at 0x2bac1a64b20>

## Evaluate the Loss and Accuracy of the Model

Now that we have trained our Model we can evalate it using the test data which we extracted before. Using `evaluate()` we expect our return values of loss and accuracy for our given Test set.

In [13]:
# Evaluate the model using the test data set.
loss, accuracy = model.evaluate(inputs_test, outputs_test, verbose=1)

# Output the accuracy of the model.
print("\n\nLoss: %6.4f\tAccuracy: %6.4f" % (loss, accuracy))

[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9816 - loss: 0.1231  


Loss: 0.1127	Accuracy: 0.9867


## Making predictions using the Model

To make predictions using our Model we must first prepare the input data to be what the model expects. Here we use a couple of Numpy functions such as `around()` and `expand_dims()` to prepare the input data for prediction.

We can then pass get our prediction as a String value from `outputs_vals` which defined earlier in the Notebook.

In [16]:
# Predict the class of a single flower.
prediction = np.around(model.predict(np.expand_dims(inputs_test[0], axis=0))).astype(int)[0]

print("Actual: %s\tEstimated: %s" % (outputs_test[0].astype(int), prediction))
print("That means it's a %s" % outputs_vals[prediction.astype(bool)][0])

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step
Actual: [0 1 0]	Estimated: [0 1 0]
That means it's a versicolor


## Saving and Loading the Model

Keras offers a very simplistic way to save and load your model.

In [75]:
# Save the model to a file for later use.
model.save("iris_neural_network.h5")

We can easily reload the model in another script using `model = load_model("path_to_model.h5")`

## Convolutional Neural Networks

In [35]:
import random
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation
# from keras.utils import np_utils

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Conv2D, MaxPooling2D, ZeroPadding2D, GlobalAveragePooling2D, Flatten
from tensorflow.keras.layers import BatchNormalization
from keras.utils import to_categorical

In [36]:
# The MNIST data is split between 60,000 28 x 28 pixel training images and 10,000 28 x 28 pixel images
(X_train, y_train), (X_test, y_test) = mnist.load_data()

print("X_train shape", X_train.shape)
print("y_train shape", y_train.shape)
print("X_test shape", X_test.shape)
print("y_test shape", y_test.shape)

X_train shape (60000, 28, 28)
y_train shape (60000,)
X_test shape (10000, 28, 28)
y_test shape (10000,)


In [37]:
X_train = X_train.reshape(60000, 28, 28, 1) #add an additional dimension to represent the single-channel
X_test = X_test.reshape(10000, 28, 28, 1)

X_train = X_train.astype('float32')         # change integers to 32-bit floating point numbers
X_test = X_test.astype('float32')

X_train /= 255                              # normalize each value for each pixel for the entire vector for each input
X_test /= 255

print("Training matrix shape", X_train.shape)
print("Testing matrix shape", X_test.shape)

Training matrix shape (60000, 28, 28, 1)
Testing matrix shape (10000, 28, 28, 1)


In [38]:
# one-hot format classes

nb_classes = 10 # number of unique digits

Y_train = to_categorical(y_train, nb_classes)
Y_test = to_categorical(y_test, nb_classes)

In [39]:
model = Sequential()                                 # Linear stacking of layers

# Convolution Layer 1
model.add(Conv2D(32, (3, 3), input_shape=(28,28,1))) # 32 different 3x3 kernels -- so 32 feature maps
model.add(BatchNormalization(axis=-1))               # normalize each feature map before activation
convLayer01 = Activation('relu')                     # activation
model.add(convLayer01)

# Convolution Layer 2
model.add(Conv2D(32, (3, 3)))                        # 32 different 3x3 kernels -- so 32 feature maps
model.add(BatchNormalization(axis=-1))               # normalize each feature map before activation
model.add(Activation('relu'))                        # activation
convLayer02 = MaxPooling2D(pool_size=(2,2))          # Pool the max values over a 2x2 kernel
model.add(convLayer02)

# Convolution Layer 3
model.add(Conv2D(64,(3, 3)))                         # 64 different 3x3 kernels -- so 64 feature maps
model.add(BatchNormalization(axis=-1))               # normalize each feature map before activation
convLayer03 = Activation('relu')                     # activation
model.add(convLayer03)

# Convolution Layer 4
model.add(Conv2D(64, (3, 3)))                        # 64 different 3x3 kernels -- so 64 feature maps
model.add(BatchNormalization(axis=-1))               # normalize each feature map before activation
model.add(Activation('relu'))                        # activation
convLayer04 = MaxPooling2D(pool_size=(2,2))          # Pool the max values over a 2x2 kernel
model.add(convLayer04)
model.add(Flatten())                                 # Flatten final 4x4x64 output matrix into a 1024-length vector

# Fully Connected Layer 5
model.add(Dense(512))                                # 512 FCN nodes
model.add(BatchNormalization())                      # normalization
model.add(Activation('relu'))                        # activation

# Fully Connected Layer 6                       
model.add(Dropout(0.2))                              # 20% dropout of randomly selected nodes
model.add(Dense(10))                                 # final 10 FCN nodes
model.add(Activation('softmax'))                     # softmax activation

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [40]:
model.summary()

In [41]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [42]:
# data augmentation prevents overfitting by slightly changing the data randomly
# Keras has a great built-in feature to do automatic augmentation

gen = ImageDataGenerator(rotation_range=8, width_shift_range=0.08, shear_range=0.3,
                         height_shift_range=0.08, zoom_range=0.08)

test_gen = ImageDataGenerator()

In [43]:
# We can then feed our augmented data in batches
# Besides loss function considerations as before, this method actually results in significant memory savings
# because we are actually LOADING the data into the network in batches before processing each batch

# Before the data was all loaded into memory, but then processed in batches.

train_generator = gen.flow(X_train, Y_train, batch_size=128)
test_generator = test_gen.flow(X_test, Y_test, batch_size=128)

In [44]:
# We can now train our model which is fed data by our batch loader
# Steps per epoch should always be total size of the set divided by the batch size

# SIGNIFICANT MEMORY SAVINGS (important for larger, deeper networks)

model.fit(train_generator, steps_per_epoch=60000//128, epochs=5, verbose=1, 
                    validation_data=test_generator, validation_steps=10000//128)

Epoch 1/5


  self._warn_if_super_not_called()


[1m468/468[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m49s[0m 98ms/step - accuracy: 0.9088 - loss: 0.2903 - val_accuracy: 0.9805 - val_loss: 0.0695
Epoch 2/5
[1m468/468[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 171us/step - accuracy: 0.9844 - loss: 0.0496 - val_accuracy: 1.0000 - val_loss: 0.0031
Epoch 3/5


  self.gen.throw(typ, value, traceback)


[1m468/468[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m52s[0m 110ms/step - accuracy: 0.9838 - loss: 0.0527 - val_accuracy: 0.9907 - val_loss: 0.0286
Epoch 4/5
[1m468/468[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 62us/step - accuracy: 0.9922 - loss: 0.0171 - val_accuracy: 1.0000 - val_loss: 0.0046
Epoch 5/5
[1m468/468[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m52s[0m 110ms/step - accuracy: 0.9878 - loss: 0.0385 - val_accuracy: 0.9921 - val_loss: 0.0253


<keras.src.callbacks.history.History at 0x214112fda20>

In [46]:
score = model.evaluate(X_test, Y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.9908 - loss: 0.0290
Test loss: 0.025290779769420624
Test accuracy: 0.9921000003814697
