# Lab 5. Vector and Matrix Backpropogation
by Domrachev Ivan, B20-Ro-01

In [1]:
from nn_from_scratch.nodes import SoftMax, ReLU
from nn_from_scratch.neurons import Linear
import numpy as np

## Part 1. Vectorized softmax function

> Note: accidentally, I've implemented this functionality in the previous lab assignment. Here is the copy of corresponding charpter from there

All the logic is implemented in the file [backprop.py](./backprop.py). There is an abstract class Node, which implements all the expected logic except for the function itself and the jacobian matrix calculation.

In [2]:
sm = SoftMax(5)

# Random input values (x itself, and assumed partial derivative)
x_input = np.array([0, 1, 2, 3, 4])
dL_dy = np.array([1, 2, 3, 2, 1])

# Forward call
y_value = sm.forward(x_input)

# Backpropogation
dL_dx = sm.backward(dL_dy)

y_value, dL_dx

(array([0.01165623, 0.03168492, 0.08612854, 0.23412166, 0.63640865]),
 array([-0.00510617,  0.01780491,  0.1345273 ,  0.13156147, -0.27878751]))

The verification was described in the previous lab, the code (as well as outputs) haven't changed, so let's omit that here.

## Part 2. ReLU

The function ReLU, as well as the partial derivative, were implemented according to the lecture. Note that despite derivative $\frac{\partial ReLU}{\partial x} (0)$ was chosen, it was assigned to $0$ in my implementation. 

In [3]:
relu = ReLU(5)

# Random input values (x itself, and assumed partial derivative)
x_input = np.array([0, -1, 2, -1, 4])
dL_dy = np.array([1, 2, 3, 2, 1])

# Forward call
y_value = relu.forward(x_input)

# Backpropogation
dL_dx = relu.backward(dL_dy)

y_value, dL_dx

(array([0, 0, 2, 0, 4]), array([0, 0, 3, 0, 1]))

It is easy to see from that example that `ReLU` function works correctly.

Also, `ReLU` supports matrix input. In this case, it applies `ReLU` functoin row-wise

In [4]:
relu = ReLU((2, 5))

# Random input values (x itself, and assumed partial derivative)
x_input = np.array([
    [0, -1, 2, -1, 4],
    [0, -1, 2, -1, 4]
])
dL_dy = np.array([
    [1, 2, 3, 2, 1], 
    [1, 2, 3, 2, 1]
])

# Forward call
y_value = relu.forward(x_input)

# Backpropogation
dL_dx = relu.backward(dL_dy)

y_value, dL_dx

(array([[0, 0, 2, 0, 4],
        [0, 0, 2, 0, 4]]),
 array([[0, 0, 3, 0, 1],
        [0, 0, 3, 0, 1]]))

## Part 3. Linear layer

> *Note. Due to structure of the code, the function $Y = XW$ would be implemented*

To implement linear node $Y = XW$ in my structure, I decided to make the following assumptions:
1. The weigts matrix `W` is an inner parameter of the class, which could be accessed via a getter. It's partial derivative is also stored inside it.
2. Despite there are functions to compute derivatives w.r.t. `W` and `X`, I decided to override the functoin `forward` with dot products from `numpy`, because this approach is more efficient

You can explore the implementation in `backprop.py` file.
> *Note. To compute the jacobian $\frac{\partial Y}{\partial X}$ and $\frac{\partial Y}{\partial W}$, the key note is that:*
> 1. *$\frac{\partial Y}{\partial X}_{i, j}$ is a zero matrix, except for the $j$-th column, which equals to:*
> $$\frac{\partial Y}{\partial X}_{i, j, l, j} = W_{l, j}$$
> 2. *Similarly, $\frac{\partial Y}{\partial W}_{i, j}$ is a zero matrix, except for the $i$-th raw, which equals to:*
> $$\frac{\partial Y}{\partial W}_{i, j, j, l} = X_{j, l}$$

In [5]:
input_dim=(3, 2)
output_dim=(3, 5)
linear = Linear(input_dim, output_dim)

# Random input values (x itself, and assumed partial derivative)
x_input = np.random.rand(*input_dim)
dL_dy = np.random.rand(*output_dim)

# Forward call
y_value = linear.forward(x_input)

# Backpropogation
dL_dx = linear.backward(dL_dy)
dL_dw = linear._W_pd

print("Forward output:")
print(y_value)
print("Jacobian dL/dx:")
print(dL_dx)
print("Jacobian dL/dw:")
print(dL_dw)

Forward output:
[[0.80514121 0.978518   0.84292487 0.86549299 1.03282263]
 [0.91727081 1.0731285  0.96631008 0.97080347 1.12494457]
 [0.77990627 0.90017716 0.8277365  0.82073204 0.93491457]]
Jacobian dL/dx:
[[1.68644384 1.77122432]
 [1.26119332 1.4110765 ]
 [0.85105349 0.93005214]]
Jacobian dL/dw:
[[1.37818991 2.31176235 0.98901982 1.64949482 1.93039748]
 [0.51568113 1.19270197 0.286197   0.70855942 0.7288801 ]
 [0.73371861 0.94336793 0.59995179 0.78918571 1.12469091]]


Also note that calculations of back propogation wia jacobians lead to the same result:

In [6]:
dL_dx_jac = linear.backward(dL_dy, use_jacobian=True)
dL_dw_jac = linear._W_pd

print("Jacobian dL/dx:")
print(dL_dx_jac)

print("Jacobian dL/dw")
print(dL_dw_jac)

Jacobian dL/dx:
[[1.68644384 1.77122432]
 [1.26119332 1.4110765 ]
 [0.85105349 0.93005214]]
Jacobian dL/dw
[[1.37818991 2.31176235 0.98901982 1.64949482 1.93039748]
 [0.51568113 1.19270197 0.286197   0.70855942 0.7288801 ]
 [0.73371861 0.94336793 0.59995179 0.78918571 1.12469091]]


`Linear` also accepts vector inputs, automatically casting them to matrix `(1, n_input)` shape.

In [7]:
input_dim = 3
output_dim = 5
linear = Linear(input_dim, output_dim)

# Random input values (x itself, and assumed partial derivative)
x_input = np.random.rand(input_dim)
dL_dy = np.random.rand(output_dim)

# Forward call
y_value = linear.forward(x_input)

# Backpropogation
dL_dx = linear.backward(dL_dy)
dL_dw = linear._W_pd

print("Forward output:")
print(y_value)
print("Jacobian dL/dx:")
print(dL_dx)
print("Jacobian dL/dw:")
print(dL_dw)

Forward output:
[1.67283515 1.88932967 1.69074605 1.82291818 1.8870572 ]
Jacobian dL/dx:
[1.277602   1.38791781 1.44945885]
Jacobian dL/dw:
[[0.39846947 0.6232179  0.92781628 0.2222169  0.52429699]
 [0.39761295 0.62187827 0.92582189 0.22173923 0.52316999]
 [0.34877575 0.54549546 0.81210692 0.19450389 0.45891113]
 [0.27312598 0.42717701 0.63596021 0.15231583 0.35937289]]


Also note that calculations of back propogation wia jacobians lead to the same result:

In [8]:
dL_dx_jac = linear.backward(dL_dy, use_jacobian=True)
dL_dw_jac = linear._W_pd

print("Jacobian dL/dx:")
print(dL_dx_jac)

print("Jacobian dL/dw")
print(dL_dw_jac)

Jacobian dL/dx:
[1.277602   1.38791781 1.44945885]
Jacobian dL/dw
[[0.39846947 0.6232179  0.92781628 0.2222169  0.52429699]
 [0.39761295 0.62187827 0.92582189 0.22173923 0.52316999]
 [0.34877575 0.54549546 0.81210692 0.19450389 0.45891113]
 [0.27312598 0.42717701 0.63596021 0.15231583 0.35937289]]
