# Artificial Neural Networks and the Iris Data

The Iris Flower data set contains 150 examples from three species Iris Setosa, Iris Virginica and Iris Versicolor.
There are 50 examples of each species and each example has four measurements (features) Sepal Length, Sepal Width,
Petal Length and Petal Width.
The Iris data is often used as an example for machine learning classifiers and we are going to build and test an ANN
to classify the data i.e. given a set of measurements, what is the species?

![Iris-image](resources/iris_image.png "Iris-image Image")

Real data very rarely comes in a format that is suitable for input to a machine learning algorithm.
So first we need to prepare the data ready for classification.
It is also often useful to visualise the data because this might help us select what kind of classifier is suitable
and predict how well they might perform.

We also need to replace the species labels with numbers and convert them to numbers.
In this case we are going to use ‘one-hot encoding’,
which means each species label will be replaced with a set of binary values which indicate which of the three
species it is i.e 'Iris-setosa' = 1 0 0, 'Iris-versicolor' = 0 1 0 and 'Iris-virginica' = 0 0 1.

We also need to get all of the features from the relevant columns and split the data into training and test sets.

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns ;sns.set()
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
%matplotlib inline

# Read data from csv
iris_data = pd.read_csv("data/iris.csv")
iris_data.drop("Id", axis=1, inplace=True)

# Plot the various combinations on 2D graph
iris_data_plt = sns.pairplot(iris_data, hue="Species")

# Replace the species with 0, 1 or 2 as appropriate
iris_data['Species'].replace(['Iris-setosa', 'Iris-virginica', 'Iris-versicolor'], [0, 1, 2], inplace=True)

# Get labels and encode to one-hot
labels = iris_data['Species'].to_numpy()
labels = np.eye(np.max(labels) + 1)[labels]

# Get Features
feature_names = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']
features = iris_data[feature_names].to_numpy()

# Split data to training and test data, 2/3 for training and 1/3 for testing
train_x, test_x, train_y, test_y = train_test_split(features, labels, test_size=0.33)

# Show the first 5 iris examples
iris_data.head()

### ANN - Implementation

The previous neural network implementation was a little long winded.
If we had to manually add new variables for each weight/node it would be quite unmanageable.
For example with 4 inputs and 6 hidden nodes (+ 6 bias) = 30 weight variables for just one layer.
Instead we can represent the entire layer as a matrix, in this case the hidden layer will be a 4x6 matrix.
This also allows us to perform the calculations on the entire layer at once, rather than using loops,
which is much more efficient and easier to code.

This code will create a network with a single hidden layer.
The forward and backward passes through the network have also been split into separate functions, so that they can
be called independently within the train and predict functions.

In [None]:
class NeuralNetwork:
    def __init__(self, num_inputs, num_outputs, num_hidden_nodes):
        # Get the number of inputs, outputs and hidden nodes
        self.num_inputs = num_inputs
        self.num_ouputs = num_outputs
        self.num_hidden_nodes = num_hidden_nodes
        
        # Initialise weights in the range -0.5 to 0.5
        # Hidden layer weights with shape (number of input features x number of hidden nodes)
        self.hidden_weights = np.random.uniform(-0.5, 0.5, size=(self.num_inputs, self.num_hidden_nodes))
        self.hidden_bias = np.random.uniform(-0.5, 0.5, size=(1, self.num_hidden_nodes))
        # Output layer weights with shape (number of hidden nodes x number of output classes)
        self.output_weights = np.random.uniform(-0.5, 0.5, size=(self.num_hidden_nodes, self.num_ouputs))
        self.output_bias = np.random.uniform(-0.5, 0.5, size=(1, self.num_ouputs))

    @staticmethod
    def sigmoid(x):
        return 1 / (1 + np.exp(-x))
    
    @staticmethod
    def sigmoid_deriv(x):
        return x * (1 - x)
    
    def forward_pass(self, x):
        """ Forward Pass - propagates input data through the network.
        
        Args:
            x (np.array): The input data to propagate through the network. shape=[num_examples, num_features]
        
        Returns:
            hidden_output (np.array): Output (activation) of hidden layer. shape=[num_examples, num_hidden_nodes]
            output (np.array): Output (activation of output layer. shape=[num_examples, num_outputs]
        """
        
        # Input layer is just the input data
        input_layer = x
        # Hidden layer sigmoid(W * X + b)
        hidden_output = self.sigmoid(np.dot(input_layer, self.hidden_weights) + self.hidden_bias)
        # Output layer sigmoid(W * X + b)
        output = self.sigmoid(np.dot(hidden_output, self.output_weights) + self.output_bias)

        # Return both layers output
        return hidden_output, output
    
    def backward_pass(self, x, y, hidden_output, output, lr):
        """ Backpropagation - propagates the error backwards through the network.
                
        Args:
            x (np.array): The input data to propagate through the network. shape=[num_examples, num_features]
            y (np.array): The input data target labels. shape=[num_examples, num_outputs]
            hidden_output (np.array): Output (activation) of hidden layer. shape=[num_examples, num_hidden_nodes]
            output (np.array): Output (activation of output layer. shape=[num_examples, num_outputs]
            lr (float): The learning rate, amount to adjust weights.
        """
        
        # Calculate output layer error
        output_error = y - output
    
        # Calculate the derivative of the error with respect to the weights
        # Note: just need this layers error for the bias
        output_layer_delta = output_error * self.sigmoid_deriv(output)
        output_bias_delta = np.sum(output_error, axis=0)
        
        # Calculate hidden layer errors (from the output layers weights and gradient)
        hidden_layer_error = output_layer_delta.dot(self.output_weights.T)
        
        # Calculate the derivative of the error with respect to the weights
        # Note: just need this layers error for the bias
        hidden_layer_delta = hidden_layer_error * self.sigmoid_deriv(hidden_output)
        hidden_bias_delta = np.sum(hidden_layer_error, axis=0)
         
        """ Update the Weights - update the weights using the error gradients, input and learning rate."""
        # Change in weight = learning rate * layers input * layers gradient
        self.output_weights += lr * hidden_output.T.dot(output_layer_delta)
        self.output_bias += lr * output_bias_delta
        
        self.hidden_weights += lr * x.T.dot(hidden_layer_delta)
        self.hidden_bias += lr * hidden_bias_delta
        
    def predict(self, x):
        """ Generate predictions on input data.
        
        Args:
            x (np.array): The input data to make predictions on. shape=[num_examples, num_features]
        
        Returns:
            preds (np.array): The predictions for the input data. shape=[num_examples]
        """
        
        # Pass the data through the network and generate outputs
        _, outputs = self.forward_pass(x)
        
        # Prediction is the output node with the highest value
        preds = np.argmax(outputs, axis=1)
        return preds
    
    def train(self, x, y, lr=0.01, epochs=200, eval_epochs=10):
        """ Train the network on the input data.
        
        Args:
            x (np.array): The input data to propagate through the network. shape=[num_examples, num_features]
            y (np.array): The input data target labels. shape=[num_examples, num_outputs]
            lr (float): The learning rate, amount to adjust weights.
            epochs (int): Number of epochs to train the network. Default=200
            eval_epochs (int): Evaluate the network on training data every this many epochs. Default=10
            
        Returns:
            train_errors (array): List of errors recorded during training.
            train_accuracies (array): List of accuracies recorded during training.
        """
        
        # For recording error and accuracy - for graph later
        train_errors, train_accuracies = [], []
        
        # Train for number of epochs
        for epoch in range(epochs + 1):
            # Forward pass
            hidden_output, outputs = self.forward_pass(x)
            # Backward pass/weight update
            self.backward_pass(x, y, hidden_output, outputs, lr)
            
            # Every 'eval_epochs' record error and accuracy on training and test set
            if (epoch % eval_epochs) == 0:
                
                # Mean squared error over all errors this epoch
                error = np.square(y - outputs).mean() 
                train_errors.append(error)
   
                # Get the prediction i.e. the output with the highest value
                predictions = self.predict(x)
                # Get the actual labels
                actual_labels = np.argmax(y, axis=1)

                # If they match the prediction was correct
                correct_predictions = np.sum(predictions == actual_labels)
                accuracy = (100 / len(train_x)) * correct_predictions
                train_accuracies.append(accuracy)
                
                print("Epoch: " + str(epoch) + " Error: " + str(round(error, 5)) + " Accuracy: " + str(round(accuracy, 3)) + "%")

        return train_errors, train_accuracies
    

### Iris - Train

First the network architecture needs to be defined, by specifying the number of input, hidden and output nodes.
Then we can call the train function with the number of epochs and learning rate.

Every 10 epochs we will record the mean squared error and accuracy of predictions.
You should see the error drop and accuracy increase smoothly(ish) over time.

In [None]:
# Learning rate
learning_rate = 0.01
# Number of training epochs
num_epochs = 200

# Network architecture parameters
num_features = 4
num_classes = 3
num_hidden_nodes = 6

# Build the network
ann = NeuralNetwork(num_features, num_classes, num_hidden_nodes)
# Call the train function
train_errors, train_accuracies = ann.train(train_x, train_y, learning_rate, num_epochs)

# Plot the training accuracy and error
x_range = [i*10 for i in range(len(train_errors))]
figure, ax = plt.subplots(1, 2, figsize=(16, 6))
sns.lineplot(x=x_range, y=train_accuracies, color='b', ax=ax[0])
ax[0].title.set_text("Accuracy")
sns.lineplot(x=x_range, y=train_errors, color='b', ax=ax[1])
ax[1].title.set_text("Error")
plt.show()

### Iris - Test

Now we can test the trained model by making predictions on the test data.
The predict function returns a list of predictions for each example.
To calculate accuracy we just need to compare the predictions to the actual labels and count how many times they match.

In [None]:
# Generate predictions
test_predictions = ann.predict(test_x)
# Get the actual labels
actual_labels = np.argmax(test_y, axis=1)

# If they match the prediction was correct
correct_predictions = np.sum(test_predictions == actual_labels)
test_accuracy = (100 / len(test_x)) * correct_predictions
print('Test Accuracy: ' + str(test_accuracy) + '%')

# ANN - Wheat Seeds Data

The Wheat Seeds Dataset involves the prediction of species given measurements of seeds from different varieties of wheat.
It is a 3-class classification problem. The number of examples for each class is balanced and there are 210 examples
with 7 feature variables.

The data is being processed in a similar way as the Iris data, but you should see that it is much harder to separate
the different classes of wheat seeds.

In [None]:
# Read data from csv
wheat_data = pd.read_csv("data/wheat_seeds.csv")
wheat_data.drop("Id", axis=1, inplace=True)

# Plot the various combinations on 2D graph
wheat_data_plt = sns.pairplot(wheat_data, hue="Class", diag_kws={'bw':1.0})

# Replace the class with 0, 1 or 2 as appropriate
wheat_data['Class'].replace(['class-1', 'class-2', 'class-3'], [0, 1, 2], inplace=True)

# Get labels and encode to one-hot
labels = wheat_data['Class'].to_numpy()
labels = np.eye(np.max(labels) + 1)[labels]

# Get Features
feature_names = ['Area', 'Perimeter', 'Compactness', 'Length of Kernel', 'Width of Kernel', 'Asymmetry Coefficient', 'Length of Kernel Groove']
features = wheat_data[feature_names].to_numpy()

# Split data to training and test data, 2/3 for training and 1/3 for testing
train_x, test_x, train_y, test_y = train_test_split(features, labels, test_size=0.33)
 
# Show the first 5 examples
wheat_data.head()

### Wheat Seeds - Train

In [None]:
# Learning rate
learning_rate = 0.001
# Number of training epochs
num_epochs = 2000

# Network architecture parameters
num_features = 7
num_classes = 3
num_hidden_nodes = 8

# Build the network
ann = NeuralNetwork(num_features, num_classes, num_hidden_nodes)
# Call the train function
train_errors, train_accuracies = ann.train(train_x, train_y, learning_rate, num_epochs, eval_epochs=100)

# Plot the training accuracy and error
x_range = [i*10 for i in range(len(train_errors))]
figure, ax = plt.subplots(1, 2, figsize=(16, 6))
sns.lineplot(x=x_range, y=train_accuracies, color='b', ax=ax[0])
ax[0].title.set_text("Accuracy")
sns.lineplot(x=x_range, y=train_errors, color='b', ax=ax[1])
ax[1].title.set_text("Error")
plt.show()

### Wheat Seeds - Test

In [None]:
# Generate predictions
test_predictions = ann.predict(test_x)
# Get the actual labels
actual_labels = np.argmax(test_y, axis=1)

# If they match the prediction was correct
correct_predictions = np.sum(test_predictions == actual_labels)
test_accuracy = (100 / len(test_x)) * correct_predictions
print('Test Accuracy: ' + str(test_accuracy) + '%')