# Mandatory assignment 1



IN5400 / IN9400 - Machine Learning for Image Analysis<br>
University of Oslo<br>
Spring 2020<br>
  
  
Handout: 2020.02.10<br>
Delivery deadline: 2020.03.02

## Part 1: Image classification with deep learning and dense neural networks

In this exercise, you are supposed to implement a basic neural network for image classification. The network will be densly connected, with an arbitrary number of layers, and number of nodes in each layer. We shall implement a ReLu activation function, and use a softmax activation in the last layer. The error will be measured with a cross-entropy loss function, and the cost will be minimized using a stochastic gradient descent optimization routine.


### Evaluation format

You will be guided through the implementation step by step, and you can check your implementation at each step. Each subtask will be judged independently, so it should be possible to do one task even if you have not succeded in the previous. Note, however, that each step needs to be correct in order for the whole method to work at the end.

### Exercise content

- All subtasks that you are to answer is found in this notebook.
- In addition, there is a `dnn` folder that contains a skeleton for the classifier, including a complete way to import data.
- All implementation should be done in the respective files in the `dnn` folder

```
def implement_this_function(argument_1):
    """This is an illustrative dummy function"""
    # TODO: Task X.Y
    result = None
    
    return result
```
- Some function headers are already given, and necessary, as they are called by the subtasks in this notebook.
- Everything else you feel you need to implement, you can implement as you like.
- When you have implemented everything (correctly), you should be able to run the whole classifier as `python dnn/main.py`
- Suggestion about the values of different hyperparameters will be given, but you are encouraged to experiment in the final subtask.

### What you should implement

The skeleton of this program that is already implemented contains things such as:
- Program setup
- Configurations
- Data import of three datasets: mnist, cifar10, and svhn
- Training framework
- Evaluation framework

You should implement the content in the training framework. All steps will be given as tasks and subtasks below. The following are *you* supposed to implement.
1. Parameter initialization
2. Forward propagation through a network with *arbitrary number of layer* where each layer has an *arbitrary number of nodes*
  1. ReLu activation function
  2. Softmax function
  3. The rest of the forward propagation
3. Cross Entropy cost function
4. Backward propagation through network with *arbitrary number of layer* where each layer has an *arbitrary number of nodes*
  1. Derivative of the ReLu activation function
  2. The rest of the backward propagation
5. Parameter update using Gradient Descent optimization
6. Run the finished method
  1. Reproduce result with default settings
  2. Exceed the default result by experimenting with different hyperparameter configurations.


### Additional notes

Most variables should be self-explanatory, but there are four important dictionaries worth mentioning, as they will control the data flow of the entire program

- `conf`: Contains all configurations of the program. These configurations will be passed around most functions, even though we most often will only need a couple of them; this is so that you are freer to experiment outside the boundaries of the program skeleton. They are set with some default values in the `config()` function in `dnn/main.py`.
- `params`: Contains all trainiable parameters, that is, all weight and bias arrays.
- `grads`: Contains the gradients of the respective trainable parameters.
- `features`: Contains input and output data, in addition to linear combination arrays `Z` and activation arrays `A`.

It is *strongly* encouraged to implement the vectorized version of things, otherwise, things are to slow.

This should be it. Let us begin.


### Task 1.1: Parameter initialization

We will see in the lecture on training neural networks that the way we initialize the weights will be important for efficient training. The weights should be initialized to small, different random numbers drawn from a Gaussian distribution with zero mean and given variance. The numbers should also be scaled. The scaling factor depends on the activation function used. With ReLU activations, we use He initialization as given below.  

The function you are to implement is `initialization(layer_dimensions)`, located in `dnn/model.py`. The parameters shall have the following shape

$$
\begin{aligned}
    W^{[l]} &\in \mathbb{R}^{n^{[l-1]}\times n^{[l]}} \\
    b^{[l]} &\in \mathbb{R}^{n^{[l]}}
\end{aligned}
$$

and have the following values

$$
\begin{aligned}
    W_{jk}^{[l]} &\sim \mathcal{N}\left(0, \frac{2}{n^{[l-1]}}\right) \\
    b_k^{[l]} &= 0
\end{aligned}
$$

for all $j = 1, \ldots, n^{[l-1]}$, $k = 1, \ldots, n^{[l]}$, $l = 1, \ldots, L$. Here $x \sim \mathcal{N}(\mu, \sigma^2)$ means that $x$ is sampled from a normal (or gaussian) distribution with mean $\mu$ and variance $\sigma^2$. In order to achieve the normal sampling in python, you can use the `numpy.random.normal()` function.  

Note that $b^{[l]}$ should have dimension $[n_l,1]$, not just $[n_l]$.

This initialization fits well with ReLu activations, and is proposed in [He et al. (2015)](https://arxiv.org/pdf/1502.01852.pdf). For another common initialization scheme, you can study the paper by [Glorot and Hinton (2010)](http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf).

The purpose of this initialization will be to let the weights in all layers be initialized with random numbers with equal variance for all layers.

# Master Project: Predicting Strain Using Machine Learning
