# Forward Propagation
This notebook implements the algorithm for forward propagation

In [1]:
import numpy as np

In [2]:
def dense(a_in, W, b, g):
    units = W.shape[1]
    a_out = np.zeros(units)
    for j in range(units):
        w = W[:, j]
        z = np.dot(w, a_in) + b[j]
        a_out[j] = g(z)
    return a_out

"dense" function takes input of "a_in" which is a matrix of m * n shape where m is the number of rows or data and n is the number of features or columns.<br>
Then takes W as a matrix with shape of m * units where m is the number of features or rows and units is the number of units in the layer.<br>
b also is a matrix of units shape which is one dimentional and has bias for every unit.<br>
"g" is the activation function.

Let's assume we have 2 layers. First layer contains 3 units and second layer contains 1 unit. Also $\overrightarrow{x}$ has 3 rows with 2 columns.

In [3]:
X = np.array([[1, 2],
              [3, 4],
              [5, 6]])
W1 = np.array([[0.1, 0.2, 0.3],
               [0.4, 0.5, 0.6]])
b1 = np.array([-1, 0, 1])

W2 = np.array([[0.9],
               [0.1],
               [0]])
b2 = np.array([2])

In [4]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

In [5]:
def sequential(X, W1, b1, W2, b2):
    a1 = dense(X, W1, b1, sigmoid)
    a2 = dense(a1, W2, b2, sigmoid)
    return a2

In [6]:
sequential(X[0], W1, b1, W2, b2)

array([0.92444769])

Here we see if our input $\overrightarrow{x}$ has $m$ rows with 2 features giving it to a layer with $u1$ units, each unit of that layer will have 2 features and the result of that layer will be a matrix with $m$ number of rows and $u1$ features or columns.

## Vectorized Implementation
We will use matrix multiplication to optimize the code for forward propagation

In [7]:
def dense(A_in, W, B, g):
    Z = np.matmul(A_in, W) + B
    A_out = g(Z)
    return A_out

Function above takes a matrix of m rows and n columns where m is the number of data and n is the number of features.<br>
W is a n * u matrix with n rows and u columns where n represents the number of features and u the number of units for that specific layer.<br>
B is a matrix with 1 row and u columns representing bias.<br>
g is the activation function.

In [8]:
X = np.array([[1, 2],
              [3, 4],
              [5, 6]])
W1 = np.array([[0.1, 0.2, 0.3],
               [0.4, 0.5, 0.6]])
B1 = np.array([[-1, 0, 1]])

W2 = np.array([[0.9],
               [0.1],
               [0]])
B2 = np.array([[2]])

In [9]:
a_out = sequential(X, W1, B1, W2, B2)
a_out

array([[0.92444769],
       [0.93894264],
       [0.94690438]])

Let's see the matrix multiplication here.<br>
Let's first have one data and use vector-matrix multiplication to see the result. Then use matrix-matrix multiplication.<br>
$
\overrightarrow{a}^{[l]} = A\_in × W^{[l]} + B^{[l]}
$

$
\begin{bmatrix}
    1 & 2
\end{bmatrix} 
×
\begin{bmatrix}
    0.1 & 0.2 & 0.3\\
    0.4 & 0.5 & 0.6
\end{bmatrix}
+
\begin{bmatrix}
    -1 & 0 & 1
\end{bmatrix}
=
\begin{bmatrix}
    -0.1 &  1.2 &  2.5
\end{bmatrix}
$

As it is seen, given a data with 2 features and pass it to a layer with 3 units results in a vector with 3 features.
This is also true if we pass the whole dataset to the first layer:

$
\begin{bmatrix}
    1 & 2\\
    3 & 4\\
    5 & 6
\end{bmatrix} ×
\begin{bmatrix}
    0.1 & 0.2 & 0.3\\
    0.4 & 0.5 & 0.6
\end{bmatrix} +
\begin{bmatrix}
    -1 & 0 & 1
\end{bmatrix} =
\begin{bmatrix}
    -0.1 &  1.2 & 2.5\\
    0.9 &  2.6 & 4.3\\
    1.9 &  4.0 & 6.1
\end{bmatrix}
$
<br><br>
Then this matrix is given to sigmoid function which result in:<br><br>
$
    Sigmoid\bigg( 
    \begin{bmatrix}
        -0.1 &  1.2 & 2.5\\
        0.9 &  2.6 & 4.3\\
        1.9 &  4.0 & 6.1
    \end{bmatrix}
     \bigg)=
     \begin{bmatrix}
        0.47502081 & 0.76852478 & 0.92414182\\
        0.7109495 & 0.93086158 & 0.98661308\\
        0.86989153 & 0.98201379 & 0.99776215
    \end{bmatrix}
$

This new matrix is $\overrightarrow{a}^{[1]}$ which will then be passed to the second layer with only one unit.

$
\begin{bmatrix}
    0.47502081 & 0.76852478 & 0.92414182\\
    0.7109495 & 0.93086158 & 0.98661308\\
    0.86989153 & 0.98201379 & 0.99776215
\end{bmatrix} × 
\begin{bmatrix}
    0.1\\
    0.9\\
    0
\end{bmatrix} +
\begin{bmatrix}
    2
\end{bmatrix} =
\begin{bmatrix}
    2.50437121\\
    2.73294071\\
    2.88110375
\end{bmatrix}
$
<br><br>
$
\overrightarrow{a}^{[2]} = Sigmoid\bigg( 
    \begin{bmatrix}
        2.50437121\\
        2.73294071\\
        2.88110375
    \end{bmatrix} \bigg) =
\begin{bmatrix}
    0.92444769\\
    0.93894264\\
    0.94690438
\end{bmatrix}
$

A matrix with 3 rows is returned as the output of the neural network. Since the last layer has only one unit, output matrix has only one column as feature.

End