# Exercise #1

Welcome to the first exercise in this course. In this and the following exercises we will implement a (small) neural network that can predict the probability of a single condition being true or false, also known binary classification.

In this first exercise we will implement the required functions to build a neural network and test their correctness with artificial data. In the next exercises this neural network will be tested and trained with real data.

## Imports

Before we can start we need to include `NumPy`, the main Python library that we are going to use. This library _"adds support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays."_ ([source](https://en.wikipedia.org/wiki/NumPy)).

To include it execute the next cell by selecting it and either click the `Run` button in the toolbar or press `CTRL+ENTER` on your keyboard.

In [None]:
import numpy as np

## Dense

Let's build some basic building block which we can use to construct the neural network model.

A neural network consists of multiple layers. A layer first computes an intermediate value `z` using the output of the previous layer (`a_prev`) and the weights (`w`) and bias (`b`) of this layer. Remember that we want to implement the following function.

$$z_n=a_{n-1} w_n + b_n$$

We will call this function `dense` because a layer that is implemented like this is also called a "dense" or "fully connected" layer, i.e. it uses all its input values, in contrast of a "sparse" layer that only uses some of its input. The latter is less commonly used.

So implement this function in the code below, between the two commented lines.

**Hint**: For multiplying matrices you should use the [np.matmul](https://docs.scipy.org/doc/numpy/reference/generated/numpy.matmul.html) function instead of the `*` operator. The latter is mapped to the [np.multiply](https://docs.scipy.org/doc/numpy/reference/generated/numpy.multiply.html) function that performs an element-wise multiplication, which is not what we want. Note that the addition of the bias value should be done element-wise.

In [None]:
def dense(a_prev, w, b):
    #### BEGIN IMPLEMENTATION ####
    z = ...
    #### END IMPLEMENTATION ####
    return z

Let's see if you implemented it correctly by feeding it 2 samples of a previous layer that has 4 output units and a weight/bias combination that result in 3 output units.

In [None]:
a_prev = np.array([[0.7443503, 0.25197198, 0.07746765, -0.04006432], [0.82262378, -0.88750386, -0.36685496, 0.84961117]])
w = np.array([[-0.23438933, -0.20918998,  0.38962773],
              [ 0.05811497, -0.4372891 ,  0.4132518 ],
              [-0.00410555, -0.25582833, -0.0910004 ],
              [ 0.33127268,  0.43464842,  0.40134338]])
b = np.array([[0.56929805, -0.53694105, -0.37223993]])

print(dense(a_prev, w, b))

The output should be equal to 

    [[ 0.39588336 -0.84006859 -0.00122167]
     [ 0.60786566  0.14220411 -0.04411569]]

If this is not the case, then check your implementation.

## Activations

Next up are the activation functions. For our network we need the functions ReLU and Sigmoid.

The ReLU activation function is defined as:

$$g(z) = \begin{cases} z, & z \gt 0 \\ 0, & z \le 0 \end{cases}$$

And the sigmoid function is defined as:

$$g(z) = \frac{1}{1 + e^{-z}} $$

**Hint** You might need the functions [np.maximum](https://docs.scipy.org/doc/numpy/reference/generated/numpy.maximum.html) and [np.exp](https://docs.scipy.org/doc/numpy/reference/generated/numpy.exp.html).

In [None]:
def relu(z):
    #### BEGIN IMPLEMENTATION ####
    a = 
    #### END IMPLEMENTATION ####
    return a

def sigmoid(z):
    #### BEGIN IMPLEMENTATION ####
    a = 
    #### END IMPLEMENTATION ####
    return a

Let's see if you implemented them correctly by feeding them the output of the previous `dense` function.

In [None]:
z = np.array([[0.39588336, -0.84006859, -0.00122167], [0.60786566, 0.14220411, -0.04411569]])
print(sigmoid(z))
print(relu(z))

The output should be equal to

    [[0.59769819 0.30152034 0.49969458]
     [0.64745378 0.53549124 0.48897287]]
    [[0.39588336 0.         0.        ]
     [0.60786566 0.14220411 0.        ]]

If this is not the case, then check your implementation before you continue to the next part.

## Model

With these functions we can now build our neural network model. We will create a model with 3 layers. Two "hidden" layers both with 64 units that use the ReLU activation function and an output layer with 1 unit that uses the Sigmoid activation function.

![model architecture](figures/model.png "Model architecture")

The Sigmoid function in the output layer will make sure the (single) output value is always between 0 and 1, representing the probability of the condition being `true` (i.e. prob >= 0.5) or `false`.

The model is build as a class with a constructor to initialize the parameters and a single `predict` function to compute the output value.

#### Constructor
During initialization of the model (i.e. in the constructor) the weights and biases of all layers must be initialized. The weights must be initialized with a matrix (of the right shape) with uniform random values, uniformly distributed within -0.5 and +0.5. The biases must be initialized with only zeros.

You might need the functions [np.zeros](https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html) and [np.random.uniform](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.uniform.html). Be aware that `np.zeros` requires a single parameter, a tuple, to define the size (i.e. shape) of the tensor.

#### Predict

In the `predict` function you should stack up all the computations to so it results in a single prediction.

Here you should use the previously implemented `dense`, `relu` and `sigmoid` functions.

In [None]:
class Model(object):
    def __init__(self):
        N0, N1, N2, N3 = 5, 64, 64, 1
        #### BEGIN IMPLEMENTATION ####
        self.w1 = 
        self.b1 = 
        self.w2 = 
        self.b2 = 
        self.w3 = 
        self.b3 = 
        #### END IMPLEMENTATION ####

    def predict(self, x):
        a0 = x
        #### BEGIN IMPLEMENTATION ####
        z1 = 
        a1 = 
        z2 = 
        a2 = 
        z3 = 
        a3 = 
        #### END IMPLEMENTATION ####
        return a3        

## Predict

Let's see if you implemented everything correctly by predicting a value for two samples.

In [None]:
from siouxdnn import reset_seed
x = np.array([[-0.64863997, -0.52876784,  0.18748115, -0.8999688 , -0.40311535],
              [-0.58094129, -0.68657316, -0.46113119, -0.34206706,  0.08281399]])
reset_seed()
model = Model()
y_pred = model.predict(x)
print(y_pred)

Output should be equal to

    [[0.28766463]
     [0.42664955]]

Check your implementation if this is not the case.

This concludes the forward propagation part of the model. You have now implemented a neural network that can, given one or more input vectors, predict a probability where a certain condition is true. A binary classifier.