In [10]:
import numpy as np

# Feed forward

## Data Matrices

For the fully connected feed forward neural network, we have multiple important matrices to keep track of. Let's first define the data matrices for the undefined supervised learning problem that our network will be training on. 

$$X = \begin{bmatrix}
    x_{1,1} & \cdots & x_{1,m} \\
    \vdots & \ddots & \vdots \\
    x_{n_x,1} & \cdots & x_{n_x,m}
\end{bmatrix}$$

Here, $n_x$ is the number of features in the input dataset. $m$ is the number of training examples we will feed into the network. So $X$ can be thought of a collection of column vectors where each one represents a datapoint.

$\underline{Y}$ is defined similarly, where this is the matrix that contains the labels for the training data. 

$$\underline{Y} = \begin{bmatrix}
    y_1 & \cdots & y_i \cdots & y_m
\end{bmatrix}$$

$y_i$ is intentionally left ambiguous here, as it will vary depending on the problem. If it is regression or binary classification, it will be a number, if it is multiclass classification, it will be a vector.

In [12]:
# For now, invent meaningless data that is already normalised for development.
X = np.random.randn(10, 1000) # 100 features, 1000 examples
Y = np.random.randint(2, size=(1,1000)) # the training labels for binary classification for 1000 examples
nx = X.shape[0]
m = X.shape[1]
ny = Y.shape[0]
print("The number of features are ", nx)
print("The number of training examples are ", m)

The number of features are  10
The number of training examples are  1000


## Layer Matrices

### Layer 1

The transformations of the data at each layer are defined by a weight and a bias matrix. These define the sum of linear regressions that are happening at each layer in each neuron in the network. 

For the first hidden layer, the feature matrix $X$ undergoes the linear transformation 

$$Z^{[1]} = W^{[1]}X + B^{[1]}$$

where $W^{[1]}X$ is the weight matrix of the first layer of dimension $(n_{h}^{[1]}, n_x)$ where $n_{h}^{[1]}$ is the number of neurons in the first hidden layer. This ensures that the product $W^{[1]}X$ has the dimension $(n_{h}^{[1]}, m)$, effectively propagating the training samples through the network. 

$B^{[1]}$ is actually a column vector of dimension $(n_{h}^{[1]}, 1)$ that is broadcast to have the dimension $(n_{h}^{[1]}, m)$ to make the calculations easier. 

A non-linear activation function $g^{[1]}(Z^{[1]})$ is applied to the linear transformation $Z^{[1]}$ to give the output of the first layer, 

$$A^{[1]} = g^{[1]}(Z^{[1]}) = g^{[1]}(W^{[1]}X + B^{[1]})$$

$g^{[1]}$ is usually the ReLU function for hidden layers. 

### All Layers

In general, the output of any layer $l$ within the neural network is given by 

$$A^{[l]} = g^{[l]}(Z^{[l]}) = g^{[l]}(W^{[l]}A^{[l-1]} + B^{[1]})$$

where $A^{[l]}$ is of shape $(n_h^{[l]}, m)$ and $W^{[l]}$ is of shape $(n_h^{[l]}, n_h^{[l-1]})$. 

$ l \in [0, L]$ where $l=0$ corresponds to the input layer and $l=L$ is the ouput layer where $A^{[L]} = \hat{Y}$ which are the predictions of the network for the labels $Y$.

When initialising, $B$ can be $0$, but the weight matrices $W$ must not be set to $0$ to ensure that there is symmetry breaking and the network can learn. 

In [23]:
# Let us imagine a 4 hidden layer neural network
layer_dims = [nx, 5, 4, 3, ny]
L = len(layer_dims) - 1

# Initialise the weights and biases, storing in a parameters dictionary
parameters = {}
for l in range (1, L+1):
    parameters[f"W{l}"] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01
    parameters[f"B{l}"] = np.zeros((layer_dims[l], m))

In [24]:
# Build the needed activation functions
def relu(Z):
    return np.maximum(0, Z)

def sigmoid(Z):
    return 1 / (1 + np.exp(-Z))

In [32]:
# Now calculate the activations - forward propagation through the network
A_prev = X
# All layers but the last use relu
for l in range(1, L):
    Z = np.dot(parameters[f"W{l}"], A_prev) + parameters[f"B{l}"]
    A = relu(Z)
    A_prev = A

# Do final layer with sigmoid 
Z = np.dot(parameters[f"W{L}"], A_prev) + parameters[f"B{L}"]
A = sigmoid(Z)