# My first Neural Network in Python


In this Lab, you will program your first neural network using python and the numpy library for array operations. In the last exercise, you will use all the functions you implemented to initialize your first neural network and make a forward propagation. In following labs, you will learn how to train a neural network.



**IMPORTANT: This notebook will be graded every time you save the notebook. After saving the notebook, wait 1 second before refreshing the window to see the feedback appended at the end of the notebook. Do not modify any of the functions names that will be graded.** 

## Tasks

- Implement the ReLu activation function for a layer.
- Implement a softmax layer.
- Initialize a 3 layer neural network.
- Implement forward propagation through a neural network.


# Task 1: Implement the ReLu activation function



The Rectifier Linear (ReLU) is a function used as an activation function for neurons for its non-linearity. The function is applied per neuron independently. The following figure shows that ReLu returns a 0 for all negative values of x.


![ReLu activation function](./images/relu.png)


Your first task is to implement the ReLu activation function for a single value. Then, you will follow the same strategy for an array of values, which represent the outputs of $n$ neurons for $k$ samples of data.

ReLu exercises:

- Exercise1: Implement the relu function for single value.
- Exercise2: Implement the relu function for $k$ samples and $n$ neurons.


In [None]:
import numpy as np

# Exercise 1
def relu_function(x):
    """
    Implement the ReLu function for a scalar.
    HINT: there is a python built-in function that does exactly what the
    figure shows when one of the parameters is 0.
    x: a float
    return: a float
    """
    # Your code here
    o = max(0,x)
    return o



output = relu_function(5)
print(output) # Expected output: 5
output = relu_function(-2)
print(output) # Expected output: 0

Now that you understand that the ReLu function is really a max function, you have to implement it for a layer. The strategy is the same as before: perform a max operation between of each element in an array and 0. There are multiple ways to achieve it. Although you could use a loop, there is function in the *numpy* library that can produce the expected result in a single line of code.

In [None]:
# Exercise 2
def relu_function_layer(x):
    """
    Implement the ReLu function for an array.
    HINT: there are multiple ways to achieve it using numpy.
    x: a numpy array of floats
    return: a numpy array of floats
    """
    # Your code here
    o = np.maximum(x,0)
    return o


x = np.array([[5, -2, 3, 4, -5], [-1, 2, 3, 4, 5]])
output = relu_function_layer(x)
print(output) # Expected output: [[5, 0, 3, 4, 0], [0, 2, 3, 4, 5]]

# Task 2: Implement a softmax layer.

The softmax layer is a dense layer that uses the softmax function to convert the logits into probabilities. The logits are the raw predictions of the neurons. The prediction of neuron $i$ is calculated using the following equation:

>$z_i=\mathbf{x} \cdot \mathbf{w}_i + b_i$,

where $\mathbf{x}$ is the input vector of the layer for a single sample, $\mathbf{w}_i$ is the weights of the neuron $i$ and $b_i$ is a scalar known as the bias. In contrast to the ReLu activation function that is applied to each neuron independently, the softmax function takes into consideration the outputs of all neurons in a layer to assign a probability to each neuron using the following equation

>$\sigma(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{N} e^{z_j}}$,

where $z_i$ is the logit of neuron $i$, $z_j$ is the logit of neuron $j$ and $e$ is euler's number. Since the softmax function converts all logits into probabilities, their sum adds to one: 

> $\sum_{j}^{n} \sigma(z_{j}) = 1$.


In the following exercise, you will calculate the output of each neuron independently to understand how a neuron works. Then, you will calculate the logits for all neurons in a layer. Finally, you will use the numpy library to implement the softmax function.



In [None]:
# Exercise 3
def calculate_neuron_logit(x, w, b):
    """
    Calculate the logit of a neuron for a single sample by multiplying two vectors and summing the bias.
    x: input vector
    w: weight vector
    b: floating-point number bias
    return: a floating-point number.
    """
    # Your code here
    logit = x@w+b
    return logit

# Generate a dummy input with 10 features.
x = np.random.rand(10)
# Randomly initialize the weight for 10 features.
w = np.random.rand(10)
# Randomly initialize the bias.
b = np.random.rand(1)

logit = calculate_neuron_logit(x, w, b)
print(logit)

Now that you have learned to calculate the logit of a neuron for a single sample. You can move on to learning how to process multiple samples in parallel. 

A neural network receives $k$ samples the same time and generates an output per sample. If the samples have $f$ features, the input of the network is really a matrix with $k$ rows and $f$ columns. Since the input is a matrix, we shall refer to it as $\mathbf{X}$. We can get the output of a neuron by multiplying $\mathbf{X}$ by its weight vector $\mathbf{w}$. 

$z_{i} = \mathbf{X} \mathbf{w}_i + b_i$

Nonetheless, there is a better approach. Imagine we have a weight matrix $\mathbf{W}$ that has all the weights of all neurons in a layer. One dimension would be the number of neurons $n$ and the other would be the number of input features $f$ (number of weights in each neuron). We could calculate the output of all neurons for all samples with by multiplying the input matrix $\mathbf{X}$ by the weight matrix $\mathbf{W}$:

$\mathbf{Z} = \mathbf{X} \mathbf{W} + \mathbf{b}$

Since the multiplication requires that the number of columns in matrix $\mathbf{X}$ be equal to the number of rows in matrix $\mathbf{W}$, the matrix $\mathbf{W}$ must have $f$ rows and $n$ columns. In other words, all weights associated with neuron $i$ will be in column $i$. 

  

In [None]:
# Exercise 4
def calculate_logits_layer(x, w, b):
    """
    Calculate the logits of all neurons in parallel.
    HINT: USE THE MATRIX MULTIPLICATION IMPLEMENTED IN NUMPY.
    x: input matrix
    w: weight matrix
    b: bias vector
    return: a numpy array of logits.
    """
    # Your code here
    o = x@w+b
    return o

# Generate a dummy input with 3 samples and 8 features.
X = np.random.uniform(low=-1.0, high=1.0, size=(3,8))
# Randomly initialize the weight for 8 features and 4 neurons.
W = np.random.uniform(low=-1.0, high=1.0, size=(8, 4))
# Randomly initialize the bias for 4 neurons.
B = np.random.uniform(low=-1.0, high=1.0, size=4)

logits = calculate_logits_layer(X, W, B)
print(logits.shape) # Expected shape is (3, 4)
print(logits)

In [None]:
# Exercise 5
def softmax_layer(x, w, b):
    """
    Implement a softmax layer by calculating the logits of a layer
    and applying the softmax function to the logits.
    HINT: YOU CAN USE exp() AVAILABLE IN NUMPY FOR IMPLEMENTING SOFTMAX.
    x: input matrix
    w: weight matrix
    b: bias vector
    """
    logits = calculate_logits_layer(x, w, b)

    probabilities =  np.exp(logits) / np.sum(np.exp(logits), axis=1).reshape(-1,1) # Your code here
    
    
    return probabilities


probabilities = softmax_layer(X, W, B)
print(probabilities)



# Task 3: Implement a neural network

For this task, you will create a neural network using the functions you have implemented so far. The neural network must have the following architecture:

1. Dense layer with **ReLu activation function** with **32 neurons**.
2. Dense layer with **Relu activation function** with **32 neurons**.
3. Dense layer with **softmax activation function** with **4 neurons**.

The weights of the neural network should be randomly initialized. You can use  the normal distribution as shown in the next function. **Take into consideration that the number of features per sample is 8**.

In [None]:
# Exercise 6
def weight_initialization(f, n1, n2, n3):
    """
    Randomly initialize the weights of a neural network 
    with 3 layers using a normal distribution.
    HINT: Use numpy to generate random matrices.
    f: number of features
    n1: number of neurons in the first hidden layer
    n2: number of neurons in the second hidden layer
    n3: number of neurons in the output layer
    return: a tuple of weight matrices
    """
    # Your code here
    w1 = np.random.normal(size=(f,n1)) #Replace contants with the corresponding variables.
    w2 = np.random.normal(size=(n1,n2))
    w3 = np.random.normal(size=(n2,n3))
    b1 = np.random.normal(size=(n1))
    b2 = np.random.normal(size=(n2))
    b3 = np.random.normal(size=(n3))
    return w1, w2, w3, b1, b2, b3

# Example of weight initialization for layers with 10, 8, 6 and 4 neurons, respectively.
w1, w2, w3, b1, b2, b3 = weight_initialization(10, 8, 6, 4)
print(w1.shape) # Expected output: (10, 8)
print(w2.shape) # Expected output: (8, 6)
print(w3.shape) # Expected output: (6, 4)
print(b1.shape) # Expected output: (8,)
print(b2.shape) # Expected output: (6,)
print(b3.shape) # Expected output: (4,)

# Modify the parameters to create the specified neural network architecture.
w1, w2, w3, b1, b2, b3 = weight_initialization(10, 8, 6, 4)

The process of calculating the output of a neural network is called by several names like fordward propagation and inference. In the following exercise, you will use the functions you implemented for forward propagation of the specified network architecture. 

In [None]:
def neural_network(x, w1, b1, w2, b2, w3, b3):
    """
    Generate the output for a neural network with 3 layers.
    x: input matrix
    wn: weight matrix of layer n.
    bn: bias vector of layer n.
    return: tuple of numpy arrays for output of first layer, output of second layer, output of the network.
    """
    # Your code here

    o1 = relu_function_layer(calculate_logits_layer(x,w1,b1)) # Output of first layer.
    o2 = relu_function_layer(calculate_logits_layer(o1,w2,b2)) # Output of second layer.
    probabilities = softmax_layer(o2, w3,b3) # Output of softmax layer.
    
    
    return o1, o2, probabilities


o1, o2, predictions = neural_network(w1, w2, w3, b1, b2, b3)
print(predictions)

# Results of test

