# Cooking a simple neural network library

## Ingredients

- `numpy`
- [a loss function](#Loss-function)
- [some layers](#Layers)
- [a neural net](#Neural-network)
- [an optimizer](#Optimizer)
- [a batch data provider](#Batch-generator)
- [a training routine](#Training)
- \+ an [application exercise](#Application-exercise)

Hopefully by the end of this tutorial you will have an understanding of the building blocks needed for training (deep) neural networks. 

## Foreword

We will purely rely on numpy for this tutorial. Make sure to import it here.

In [None]:
import numpy as np

#### Object Oriented Python

Object-oriented Python, a.k.a _classes_, will be used intensively in this tutorial.  
For those not familiar with Python classes, know that you will only be required to write some definitions and Python code **within the** class **methods** and **not** actually **write any class**.  

If you want to know more about Python classes, here is a step by step [tutorial](https://aboucaud.github.io/slides/2016/python-classes).

#### Type hints

This notebook uses a feature from Python 3 called ***type hints*** or ***type annotations*** (see [PEP 0526](https://www.python.org/dev/peps/pep-0526/)). This acts like optional static typing since Python will still run if the type does not match, but has two main advantages IMO:
- make sure you understand what you're doing
- act like documentation for an external user

The types for the base Python objects (lists, dicts, iterables) can be found in the [`typing` library](https://docs.python.org/3/library/typing.html).
For instance, here are all the needed imports for this tutorial.

In [None]:
from typing import (Dict, Tuple, Callable, 
                    Sequence, Iterator, NamedTuple)

Any other Python object can serve as a type. We will use the `numpy.ndarray` in this tutorial to mock a tensor. We thus create a `Tensor` object to use as type hint throughout the code, and an object `Func` for a function that acts element-wise on a tensor and returns a tensor.

In [None]:
from numpy import ndarray as Tensor

Func = Callable[[Tensor], Tensor]

#### Checking the type

If type hints are optional for Python, they can still be used to actually check the consistency of the code. For this task, there is a module called [`mypy`](https://github.com/python/mypy) that can be used (not in this tutorial). 

Check out the [doc](http://mypy-lang.org/) if you are interested.

---

## Loss function

A loss function measures how good our predictions are,
we can use this to adjust the parameters of our network

Here is generic loss class. It implements the computation of **the loss** from the true label and the predicted one, as well as **the gradient of the loss** for the backpropagation.

In [None]:
class Loss:
    def loss(self, predicted: Tensor, actual: Tensor) -> float:
        raise NotImplementedError

    def grad(self, predicted: Tensor, actual: Tensor) -> Tensor:
        raise NotImplementedError

### Exercice #1 - mean square error loss

***3 min*** - *Implement the `MeanSquareError` class*

For info, the loss function is

$$MSE(y_{true}, y_{pred}) = \sum \left(y_{pred} - y_{true}\right) ^ 2$$



In [None]:
class MeanSquareError(Loss):
    def loss(self, predicted: Tensor, actual: Tensor) -> float:
        return ...
    
    def grad(self, predicted: Tensor, actual: Tensor) -> Tensor:
        return ...

## Layers

Our neural nets will be made up of layers.
Each layer needs to pass its inputs forward
and propagate gradients backward. For example,
a neural net might look like
inputs -> Linear -> Tanh -> Linear -> output

In [None]:
class Layer:
    def __init__(self) -> None:
        self.params: Dict[str, Tensor] = {}
        self.grads: Dict[str, Tensor] = {}

    def forward(self, inputs: Tensor) -> Tensor:
        raise NotImplementedError

    def backward(self, grad: Tensor) -> Tensor:
        raise NotImplementedError

### Exercice #2 - linear layer

***5 min*** - *Implement the `forward` and `backward` methods of the linear layer.*

In [None]:
class Linear(Layer):
    """
    Inputs are of size (batch_size, input_size)
    Outputs are of size (batch_size, output_size)
    """
    def __init__(self, input_size: int, output_size: int) -> None:
        # Inherit from base class Layer
        super().__init__()
        # Initialize the weights and bias with random values
        self.params["w"] = np.random.randn(input_size, output_size)
        self.params["b"] = np.random.randn(output_size)

    def forward(self, inputs: Tensor) -> Tensor:
        """
        inputs shape is (batch_size, input_size)
        """
        self.inputs = inputs
        # Compute here the feed forward pass
        return ... 
        

    def backward(self, grad: Tensor) -> Tensor:
        """
        grad shape is (batch_size, output_size)
        """
        # Compute here the gradient parameters for the layer
        self.grads["w"] = ...
        self.grads["b"] = ...  
        # Compute here the feed backward pass
        return ...             

### Activation layers

In [None]:
class Activation(Layer):
    """
    An activation layer just applies a function
    elementwise to its inputs
    """
    def __init__(self, f: Func, f_prime: Func) -> None:
        super().__init__()
        self.f = f
        self.f_prime = f_prime

    def forward(self, inputs: Tensor) -> Tensor:
        self.inputs = inputs
        return self.f(inputs)

    def backward(self, grad: Tensor) -> Tensor:
        """
        if y = f(x) and x = g(z)
        then dy/dz = f'(x) * g'(z)
        """
        return self.f_prime(self.inputs) * grad

### Exercice #3 - tanh and sigmoid

***5 min*** - *Implement the hyperbolic tangent and sigmoid layers and their derivatives.*


In [None]:
def tanh(x: Tensor) -> Tensor:
    # Write here the tanh function
    return ...  

def tanh_prime(x: Tensor) -> Tensor:
    # Write here the derivative of the tanh
    return ...  

class Tanh(Activation):
    def __init__(self):
        super().__init__(tanh, tanh_prime)
        

def sigmoid(x: Tensor) -> Tensor:
    # Write here the sigmoid function
    return ...  

def sigmoid_prime(x: Tensor) -> Tensor:
    # Write here the derivative of the sigmoid
    return ...  

class Sigmoid(Activation):
    def __init__(self):
        super().__init__(sigmoid, sigmoid_prime)

## Neural network

A neural net is a collection of layers and should behave closely with a `forward` and a `backward` pass.

In addition, we add a method `params_and_grads` that will be used by the optimizer to update the values of the weights and bias of each layer.

In [None]:
class NeuralNet:
    def __init__(self, layers: Sequence[Layer]) -> None:
        self.layers = layers

    def forward(self, inputs: Tensor) -> Tensor:
        """
        The forward pass takes the layers in order
        """
        for layer in self.layers:
            inputs = layer.forward(inputs)
        return inputs

    def backward(self, grad: Tensor) -> Tensor:
        """
        The backward pass is the other way around
        """
        for layer in reversed(self.layers):
            grad = layer.backward(grad)
        return grad

    def params_and_grads(self) -> Iterator[Tuple[Tensor, Tensor]]:
        for layer in self.layers:
            for name, param in layer.params.items():
                grad = layer.grads[name]
                yield param, grad

## Optimizer

The role of the optimizer is to adjust the network parameters based on the gradients computed during backpropagation.

The main attribute of an optimizer is the _learning rate_ (a.k.a. `lr`), which defines the size of the jump taken in the direction of the gradients. 

In [None]:
class Optimizer:
    def step(self, net: NeuralNet) -> None:
        raise NotImplementedError

### Exercice #4 - Stochastic Gradient Descent

***2 min*** - write the optimizer step

Here we have a very basic implementation of a _Stochastic Gradient Descent_ (a.k.a. `SGD`). 

The step that needs to be written iterates over the neural network layers and updates the layers parameters in the direction _opposite_ to the gradient.

In [None]:
class SGD(Optimizer):
    def __init__(self, lr: float = 0.01) -> None:
        self.lr = lr

    def step(self, net: NeuralNet) -> None:
        for param, grad in net.params_and_grads(): 
            # Write here the parameters update
            ...

## Batch generator

It can be costly to compute the gradients and update the weights after every entry of the training dataset. In order to minimize such computational cost, the inputs of the network are traditionally fed in batches and the gradients are thus averages over those batches of data.

A batch size of 32 is a default in multiple training sets. Some recent [study](https://arxiv.org/abs/1804.07612) claims this number is the perfect balance between computing efficiency and training stability.

During an epoch the network will iterate over the whole dataset. Adding some shuffling in the process ensures the batches are not fed exactly in the same order at each epoch.

In [None]:
Batch = NamedTuple("Batch", [("inputs", Tensor), ("targets", Tensor)])


class DataIterator:
    def __call__(self, inputs: Tensor, targets: Tensor) -> Iterator[Batch]:
        raise NotImplementedError

        
class BatchIterator(DataIterator):
    def __init__(self, batch_size: int = 32, shuffle: bool = True) -> None:
        self.batch_size = batch_size
        self.shuffle = shuffle

    def __call__(self, inputs: Tensor, targets: Tensor) -> Iterator[Batch]:
        starts = np.arange(0, len(inputs), self.batch_size)
        if self.shuffle:
            np.random.shuffle(starts)

        for start in starts:
            end = start + self.batch_size
            batch_inputs = inputs[start:end]
            batch_targets = targets[start:end]
            yield Batch(batch_inputs, batch_targets)

## Training

The training routine uses all objects defined above and executes actions **in the right order** to train the neural network.

The dataset being usually small with respect to the number of free parameters of the neural net, going through the dataset multiple times during the training is a necessity. This corresponds to the number of epochs, which has to be specified.

### Exercise #5 - build the training routine

***5 min*** - write the sequential steps needed for training at each epoch

_Hints_:
- feed forward
- compute the loss and the gradients
- feed backwards
- update the net

In [None]:
def train(net: NeuralNet, inputs: Tensor, targets: Tensor,
          loss: Loss = MeanSquareError(), 
          optimizer: Optimizer = SGD(),
          iterator: DataIterator = BatchIterator(),
          num_epochs: int = 5000) -> None:
    for epoch in range(num_epochs):
        epoch_loss = 0.0
        for batch in iterator(inputs, targets):
            # Write here the various steps (in order) needed 
            # at each epoch
            ...
        # Print status every 50 iterations
        if epoch % 50 == 0:
            print(epoch, epoch_loss)

## Application exercise

Now that you have build your own neural network library, let's use it to solve a problem and then put it in application.

### XOR

Canonical problem in ML as there is not linear way to map the inputs to the output.

```
[0, 0] => 0  
[0, 1] => 1  
[1, 0] => 1  
[1, 1] => 0  
```

In [None]:
inputs = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
targets = np.array([[0], [1], [1], [0]])

def result_xor(net, inputs, targets, epochs=2000):
    train(net, inputs, targets, num_epochs=epochs)
    y_pred = net.forward(inputs)
    print('\nX => y => y_pred')
    for x, y, z in zip(inputs, targets, y_pred):
        print(f'{x} => {y} => {z.round()}')

In [None]:
net1 = NeuralNet([
    Linear(input_size=2, output_size=1),
])

result_xor(net1, inputs, targets)

In [None]:
# Write a deeper neural net and see the results
net2 = NeuralNet([
    ...
])

result_xor(net2, inputs, targets)

### Star-galaxy separation

Use the neural net library you've just created to perform the star-galaxy classification you worked on this morning (cf. [notebook](star-galaxy_classification.ipynb)) using Neural Networks !

To help you, here is a helper function that will reload the data and perform the preprocessing for you (you're welcome :-)

In [None]:
%load ../code/load_star_gal_data.py

In [None]:
X, y = get_star_gal_data()

Up to you now..

## Acknowledgements

The idea and the code for this tutorial have been for the most part inspired by the video "Deep Learning Madness" https://youtu.be/o64FV-ez6Gw by [Joel Grus](https://twitter.com/joelgrus)