# Neural Networks: Simple

## Chapter 7

### Dot Product

In [1]:
from typing import List
from math import exp

def dot_product(xs: List[Float], ys: List[Float]) -> float:
    """
    Dot product of two lists.
    """
    return sum(x * y for x, y in zip(xs, ys))

### Sigmoid Activation Function

In [2]:
def sigmoid(x: float) -> float:
    """
    Sigmoid activation function.
    """
    return 1 / (1 + exp(-x))

def derivative_sigmoid(x: float) -> float:
    """
    Derivative of sigmoid activation function.
    """
    return sigmoid(x) * (1 - sigmoid(x))

## Neuron

Must have 

* weights
* delta
* learning_rate
* cache of last output
* activation_function
* derivative of activation_function

In [29]:
from typing import Callable


class Neuron:
    """
    Basic unit for a neural network. Must have weights, activation function, derivative of activation function,
    learning rate, cache of last output, and delta.
    """

    def __init__(
        self,
        weights: List[float],
        learning_rate: float,
        activation_function: Callable[[float], float],
        derivative_activation_function: Callable[[float], float],
    ) -> None:
        self.weights: List[float] = weights
        self.activation_function: Callable[[float], float] = activation_function
        self.derivative_activation_function: Callable[
            [float], float
        ] = derivative_activation_function
        self.learning_rate: float = learning_rate
        self.output_cache: float = 0.0
        self.delta: float = 0.0

    def output(self, inputs: List[float]) -> float:
        """
        Feed forward pass of neuron.
        """
        self.output_cache = dot_product(inputs, self.weights)
        return self.activation_function(self.output_cache)

## Layer

A layer must have:

* neurons
* output cache (after the activation function is applied to neuron's output)
* previous layer

In [61]:
from __future__ import annotations
from typing import Optional
from random import random


class Layer:
    """
    Base class for a neural network layer. Must know the previous layer, the neurons, and an output cache (after activation function).
    """

    def __init__(
        self,
        previous_layer: Optional[Layer],
        num_neurons: int,
        learning_rate: float,
        activation_function: Callable[[float], float],
        derivative_activation_function: Callable[[float], float],
    ) -> None:
        self.previous_layer: Optional[Layer] = previous_layer
        self.neurons: List[Neuron] = []

        # Add neurons
        for i in range(num_neurons):
            if previous_layer is None:
                random_weights: List[float] = []
            else:
                # Each neuron is connected to every neuron in previous layer
                random_weights = [random() for _ in range(len(previous_layer.neurons))]
            neuron: Neuron = Neuron(
                random_weights,
                learning_rate,
                activation_function,
                derivative_activation_function,
            )
            self.neurons.append(neuron)
        # Initialize empty output cache
        self.output_cache: List[float] = [0.0 for _ in range(num_neurons)]

    def outputs(self, inputs: List[float]) -> List[float]:
        """
        Calculate outputs of all neurons in layer.
        """
        if self.previous_layer is None:  # This is an input layer
            self.output_cache = inputs
        else:
            self.output_cache = [n.output(inputs) for n in self.neurons]
        return self.output_cache

    def calculate_deltas_for_output_layer(self, expected: List[float]) -> None:
        """
        Calculate deltas for output layer. Delta = f'(output_cache) * error
        
        f'(output_cache) is derivative of activation with respect to output cache
        error is expected - actual
        """
        for n in range(len(self.neurons)):
            # Call the derivative of the activation function on the neuron's output and
            # multiply by the expected value
            self.neurons[n].delta = self.neurons[n].derivative_activation_function(
                self.neurons[n].output_cache
            ) * (expected[n] - self.output_cache[n])

    def calculate_deltas_for_hidden_layer(self, next_layer: Layer) -> None:
        """
        Calculate deltas for hiddne layer. Delta = f'(output_cache) * (next_weights * next_deltas)
        """
        for index, neuron in enumerate(self.neurons):
            next_weights: List[float] = [n.weights[index] for n in next_layer.neurons]
            next_deltas: List[float] = [n.delta for n in next_layer.neurons]
            sum_weights_and_deltas: float = dot_product(next_weights, next_deltas)
            neuron.delta = (
                neuron.derivative_activation_function(neuron.output_cache)
                * sum_weights_and_deltas
            )

## Network

In [62]:
from typing import TypeVar, Tuple
from functools import reduce

# Output type of interpretation of neural network
T = TypeVar("T")


class Network:
    """
    Base class for neural network. Must keep state of layers.
    """

    def __init__(
        self,
        layer_structure: List[int],
        learning_rate: float,
        activation_function: Callable[[float], float] = sigmoid,
        derivative_activation_function: Callable[[float], float] = derivative_sigmoid,
    ) -> None:
        if len(layer_structure) < 3:
            raise ValueError(
                "Error: Should be at least 3 layers (input, hidden, output)"
            )

        self.layers: List[layer] = []
        # Input layer
        input_layer: Layer = Layer(
            previous_layer=None,
            num_neurons=layer_structure[0],
            learning_rate=learning_rate,
            activation_function=activation_function,
            derivative_activation_function=derivative_activation_function,
        )
        self.layers.append(input_layer)

        # Hidden layers and output layer
        for previous, num_neurons in enumerate(layer_structure[1::]):
            next_layer = Layer(
                previous_layer=self.layers[previous],
                num_neurons=num_neurons,
                learning_rate=learning_rate,
                activation_function=activation_function,
                derivative_activation_function=derivative_activation_function,
            )
            self.layers.append(next_layer)

    def outputs(self, input: List[float]) -> List[float]:
        """
        Calculate outputs for entire network.
        """
        # Pushes input data to first layer, then output from first as input to second, output from second
        # as input to third and so on
        return reduce(
            lambda inputs, layer: layer.outputs(inputs),  # Function
            self.layers,  # Sequence
            input,  # Initial
        )

    def backpropagate(self, expected: List[float]) -> None:
        """
        Calculate delta (change) for each neuron in each layer. Move backwards through the network
        from output towards input.
        """
        # Calculate delta for output
        last_layer: int = len(self.layers) - 1
        self.layers[last_layer].calculate_deltas_for_output_layer(expected)

        # Calculate delta for hidden layers moving from end to beginning
        for l in range(last_layer - 1, 0, -1):
            # Send in previous layer
            self.layers[l].calculate_deltas_for_hidden_layer(self.layers[l + 1])

    def update_weights(self) -> None:
        """
        Apply formula to update the weights of all neurons.
        
        weight = weight + learning_rate * input * neuron_delta
        """
        for layer in self.layers[1:]:
            for neuron in layer.neurons:
                for w in range(len(neuron.weights)):
                    neuron.weights[w] = neuron.weights[w] + (
                        neuron.learning_rate
                        * layer.previous_layer.output_cache[w]
                        * neuron.delta
                    )

    def train(self, inputs: List[List[float]], expecteds: List[List[float]]) -> None:
        """
        Train the network to map from the inputs to the expecteds using the neuron's weights.
        """
        for location, xs in enumerate(inputs):
            ys: List[float] = expecteds[location]
            outs: List[float] = self.outputs(xs)
            # Backpropagate to calculate deltas
            self.backpropagate(ys)
            # Update weights with deltas
            self.update_weights()

    def validate(
        self,
        inputs: List[List[float]],
        expecteds: List[T],
        interpret_output: Callable[[List[float]], T],
    ) -> Tuple[int, int, float]:
        """
        Validate the network's outputs on a dataset. Returns the correct number of trials and the percentage
        correct out of the total. Only applicate for classification tasks. 
        The callable must interpret the outputs in the problem context.
        """
        correct: int = 0
        for input, expected in zip(inputs, expecteds):
            # Calculate output from input
            outs: List[float] = self.outputs(input)
            # Interpret the floats in the data context.
            result: T = interpret_output(outs)

            if result == expected:
                correct += 1

        percentage: float = correct / len(inputs)
        return correct, len(inputs), percentage

## Feature Scaling

We will scale inputs to between 0 and 1 for input to the neural network.

In [63]:
def normalize_by_feature_scaling(dataset: List[[List[float]]]) -> None:
    for col_num in range(len(dataset[0])):
        column: List[float] = [row[col_num] for row in dataset]
        maximum, minimum = max(column), min(column)
        for row_num in range(len(column)):
            dataset[row_num][col_num] = (dataset[row_num][col_num] - minimum) / (maximum - minimum)
            
d = [[2, 4, 3, 8], [4, 6, 2, 7], [3, 5, 2.5, 7.5]]
normalize_by_feature_scaling(d)
d

[[0.0, 0.0, 1.0, 1.0], [1.0, 1.0, 0.0, 0.0], [0.5, 0.5, 0.5, 0.5]]

In [64]:
reduce(lambda input, output: input + output, [4, 5, 6], 3)

18

## Testing with Classic Iris Dataset

This dataset serves as a benchmark for classification models. It is simple, yet will let us ascertain if our model is behaving as expected.

In [65]:
import csv
from random import shuffle

def read_iris():
    """
    Read in the iris data file as list of lists.
    """
    iris_parameters: List[List[float]] = []
    iris_classifications: List[List[float]] = []
    iris_species: List[str] = []
        
    with open('iris.csv', mode='r') as iris_file:
        irises: List = list(csv.reader(iris_file))
        shuffle(irises) # Mix randomly
        for iris in irises:
            # The first 4 columns are the features
            parameters: List[float] = [float(n) for n in iris[:4]]
            iris_parameters.append(parameters)
            species: str= iris[4]
            if species == 'Iris-setosa':
                iris_classifications.append([1.0, 0.0, 0.0])
            elif species == 'Iris-versicolor':
                iris_classifications.append([0.0, 1.0, 0.0])
            elif species == 'Iris-virginica':
                iris_classifications.append([0.0, 0.0, 1.0])
            iris_species.append(species)
            
    normalize_by_feature_scaling(iris_parameters)
    
    return iris_parameters, iris_classifications, iris_species

iris_parameters, iris_classifications, iris_species = read_iris()

In [66]:
from collections import Counter
Counter(iris_species)

Counter({'Iris-versicolor': 50, 'Iris-setosa': 50, 'Iris-virginica': 50})

In [67]:
max([n[0] for n in iris_parameters])
min([n[0] for n in iris_parameters])

max([n[1] for n in iris_parameters])
min([n[1] for n in iris_parameters])

1.0

0.0

1.0

0.0

## Instantiate Network

In [68]:
iris_network: Network = Network(layer_structure=[4, 6, 3], learning_rate=0.1)

### Implement Interpretation of Model Output for Dataset

In [114]:
def iris_interpret_output(output: List[float]) -> str:
    """
    Translate float output from network to string species classification.
    """
    if output.index(max(output)) == 0:
        return 'Iris-setosa'
    elif output.index(max(output)) == 1:
        return 'Iris-versicolor'
    elif output.index(max(output)) == 2:
        return 'Iris-virginica'

In [115]:
train_size = 120

iris_trainers: List[List[float]] = iris_parameters[:train_size]
iris_trainers_expected: List[List[float]] = iris_classifications[:train_size]

In [116]:
for _ in range(50):
    iris_network.train(inputs=iris_trainers, expecteds=iris_trainers_expected)

In [117]:
iris_testers = iris_parameters[train_size:]
iris_testers_species = iris_species[train_size:]

iris_results = iris_network.validate(
    inputs=iris_testers,
    expecteds=iris_testers_species,
    interpret_output=iris_interpret_output,
)

In [118]:
print(f"{iris_results[0]} correct out of {iris_results[1]} total. {iris_results[2] * 100:.2f}%")

28 correct out of 30 total. 93.33%


In [119]:
iris_network.outputs(input=iris_testers[0])

[0.9883140840958653, 0.17155672732408708, 2.8872456012232333e-05]

In [120]:
iris_testers_expected[0]

[1.0, 0.0, 0.0]

In [121]:
species_dict = {0: 'Iris-setosa', 1:'Iris-versicolor', 2: 'Iris-virginica'}
out = [iris_network.outputs(iris) for iris in iris_testers]
probs = [max(o) for o in out]
classes = [species_dict[iris.index(max(iris))] for iris in out]

for guess, correct, p in zip(classes, iris_testers_species, probs):
    print(f'{"True" if guess == correct else "False":5} Guess: {guess:18} Actual: {correct:18} Probabability: {p:.2f}.')

True  Guess: Iris-setosa        Actual: Iris-setosa        Probabability: 0.99.
True  Guess: Iris-setosa        Actual: Iris-setosa        Probabability: 0.97.
True  Guess: Iris-setosa        Actual: Iris-setosa        Probabability: 0.99.
True  Guess: Iris-setosa        Actual: Iris-setosa        Probabability: 0.96.
True  Guess: Iris-virginica     Actual: Iris-virginica     Probabability: 0.86.
False Guess: Iris-versicolor    Actual: Iris-virginica     Probabability: 0.60.
True  Guess: Iris-setosa        Actual: Iris-setosa        Probabability: 0.99.
False Guess: Iris-virginica     Actual: Iris-versicolor    Probabability: 0.47.
True  Guess: Iris-virginica     Actual: Iris-virginica     Probabability: 0.89.
True  Guess: Iris-virginica     Actual: Iris-virginica     Probabability: 0.80.
True  Guess: Iris-versicolor    Actual: Iris-versicolor    Probabability: 0.80.
True  Guess: Iris-versicolor    Actual: Iris-versicolor    Probabability: 0.67.
True  Guess: Iris-setosa        Actual: 

## Classifying Wine Problem

In [125]:

def read_wine():
    """
    Read in the wine data set and return features, expected outputs, and species names.
    """
    wine_parameters: List[List[float]] = []
    wine_classifications: List[List[float]] = []
    wine_species: List[int] = []
    with open('wine.csv', mode='r') as wine_file:
        wines: List = list(csv.reader(wine_file, quoting=csv.QUOTE_NONNUMERIC))
        shuffle(wines) # get our lines of data in random order
        for wine in wines:
            parameters: List[float] = [float(n) for n in wine[1:14]]
            wine_parameters.append(parameters)
            species: int = int(wine[0])
            if species == 1:
                wine_classifications.append([1.0, 0.0, 0.0])
            elif species == 2:
                wine_classifications.append([0.0, 1.0, 0.0])
            else:
                wine_classifications.append([0.0, 0.0, 1.0])
            wine_species.append(species)
    normalize_by_feature_scaling(wine_parameters)
    return wine_parameters, wine_classifications, wine_species

In [126]:
wine_parameters, wine_classifications, wine_species = read_wine()

def wine_interpret_output(output: List[float]) -> int:
    """
    Interpret float outputs from wine neural network as species classifications.
    """
    if max(output) == output[0]:
        return 1
    elif max(output) == output[1]:
        return 2
    else:
        return 3

There are 13 features in this dataset. We should use a larger, deeper neural network for this problem.

In [127]:
len(wine_parameters[0])

13

There are 178 examples. We'll use the same training size of 120 observations.

In [129]:
len(wine_parameters)

178

We need as many neurons in the input layer as there are features. We need as many neurons in the output layer as there are species. The learning rate is set rather high in this example. Our network has a total of 4 layers, 1 input, 2 hidden, and 1 output. 

In [147]:
wine_network = Network(layer_structure=[13, 8, 16, 3], learning_rate=0.8)
print(wine_network)

<__main__.Network object at 0x1264b1400>


In [148]:
wine_trainers = wine_parameters[:train_size]
wine_trainers_expected = wine_classifications[:train_size]

for _ in range(50):
    wine_network.train(inputs=wine_trainers, expecteds=wine_trainers_expected)

In [149]:
wine_testers = wine_parameters[train_size:]
wine_testers_species = wine_species[train_size:]

wine_results = wine_network.validate(
    inputs=wine_testers,
    expecteds=wine_testers_species,
    interpret_output=wine_interpret_output,
)

In [150]:
print(f"{wine_results[0]} correct out of {wine_results[1]} total. {wine_results[2] * 100:.2f}%")

53 correct out of 58 total. 91.38%


In [151]:
species_dict = {0: 1, 1:2, 2:3}
out = [wine_network.outputs(wine) for wine in wine_testers]
probs = [max(o) for o in out]
classes = [species_dict[wine.index(max(wine))] for wine in out]

for guess, correct, p in zip(classes, wine_testers_species, probs):
    print(f'{"True" if guess == correct else "False":5} Guess: {guess:2} Actual: {correct:2} Probabability: {p:.2f}.')

True  Guess:  3 Actual:  3 Probabability: 0.99.
True  Guess:  2 Actual:  2 Probabability: 1.00.
True  Guess:  3 Actual:  3 Probabability: 0.98.
True  Guess:  1 Actual:  1 Probabability: 0.98.
True  Guess:  3 Actual:  3 Probabability: 0.98.
True  Guess:  1 Actual:  1 Probabability: 0.98.
True  Guess:  1 Actual:  1 Probabability: 0.98.
True  Guess:  1 Actual:  1 Probabability: 0.98.
True  Guess:  2 Actual:  2 Probabability: 1.00.
True  Guess:  2 Actual:  2 Probabability: 0.59.
False Guess:  1 Actual:  3 Probabability: 0.58.
True  Guess:  3 Actual:  3 Probabability: 0.99.
False Guess:  1 Actual:  2 Probabability: 0.53.
True  Guess:  3 Actual:  3 Probabability: 0.99.
True  Guess:  1 Actual:  1 Probabability: 0.96.
True  Guess:  2 Actual:  2 Probabability: 0.94.
True  Guess:  1 Actual:  1 Probabability: 0.94.
True  Guess:  2 Actual:  2 Probabability: 0.98.
True  Guess:  2 Actual:  2 Probabability: 1.00.
True  Guess:  2 Actual:  2 Probabability: 0.85.
True  Guess:  3 Actual:  3 Probabability

## Speeding Up Neural Networks

Dedicated hardware has resulted in signicant neural network speed-ups. 

* GPU: Graphics processing units for highly parallelizable operations
* SIMD: single instruction, multiple data instructions in numpy allow multiple pieces of data to be processed at once -> vector instructions
* Numpy automatically chooses optimizations based on underlying architecture. 
* Use optimized implementations when working on actual problems.



# Neural Network Problems

The largest shortcoming may be a lack of interpretability. We know that neural networks achieve high accuracy, but little about why this they are effective (this holds across different ml models). 

We can look at individual weights in the network, but this does not provide much insight because of the large number of weights and multiple hidden layers. 

Neural networks require massive amounts of data to achieve high accuracy. For more complex applications - such as image classification - a neural network requires orders of magnitude more data than for our simple problem because it must learn many more features. This application - image classification - would also require a significantly larger network to learn the features. 

Neural networks are computationally expensive because of the large number of cacluations. SIMD and GPU hardware is essential to training of modern neural networks. 

Using the network is not as computationally intensive as training. After you train your network, inference - making predictions - does not require significant resources.

# Extensions

There are many other types of neural networks for tasks such as image classification and speech recognition.

* Convolutional neural networks that process neighboring floats simulataneously
* Recurrent neural networks that process data moving from one float to the next, considering the data sequence. 
* Bias added to neurons (in addition to weight) provide a constant input allowing next layer's output to represent more functions. (Bias is still learned through training)

# Applications

Did not become common place until later this decade thanks to availability of data and computer hardware (and some algorithm changes). One of the highest potential areas in machine learning because they are effective.

## Practical Voice Recognition

## Image Recognition and Tagging

## Optical Character Recognition

## Recommendation systems

## Outcome of real-world events

## Any problem that can be represented as an unknown function to be learned

Hardest part of neural networks may be deciding on the structure of the network. Empirical results are more effective than theory for neural networks.