# Lab 6, part 1: new nodes
by Domrachev Ivan, B20-Ro-01

In [2]:
from nn_from_scratch.nodes import SoftMaxLoss
from nn_from_scratch.neurons import Linear
import numpy as np

## Part 1. Linear Layer remake

Previously, linear layers were implemented as:
$$Linear(X) = WX.$$
However, current assignment requires additional term $b$:
$$Linear(X) = WX + b.$$

The class `neurons.Linear` were modified to this structure. Let's check it in the same way as we did for the previous implementation:

In [3]:
input_dim=(2, 3)
output_dim=(5, 3)
relu = Linear(input_dim, output_dim)

# Random input values (x itself, and assumed partial derivative)
x_input = np.random.rand(*input_dim)
dL_dy = np.random.rand(*output_dim)

# Forward call
y_value = relu.forward(x_input)

# Backpropogation
dL_dx = relu.backward(dL_dy)
dL_dw = relu._W_pd

print("Forward output:")
print(y_value)
print("Jacobian dL/dx:")
print(dL_dx)
print("Jacobian dL/dw:")
print(dL_dw)

Forward output:
[[0.66301283 1.0737497  1.20726502]
 [0.76842175 1.21742733 1.36230741]
 [0.74308348 1.18529264 1.33006013]
 [0.8062085  1.25688708 1.40187946]
 [0.65207598 1.03503492 1.15885404]]
Jacobian dL/dx:
[[1.57724655 1.33246985 1.86702889]
 [1.51963415 1.2850436  1.93761231]]
Jacobian dL/dw:
[[2.36535438 1.42097401 0.99714234]
 [1.98563344 1.24312054 0.86382985]
 [1.67961297 0.93502298 0.65841791]
 [1.59854282 1.22054438 0.85224498]
 [1.83132879 1.34329134 0.94023213]]


Also note that calculations of back propogation wia jacobians lead to the same result:

In [4]:
dL_dx_jac = relu.backward(dL_dy, use_jacobian=True)
dL_dw_jac = relu._W_pd

print("Jacobian dL/dx:")
print(dL_dx_jac)

print("Jacobian dL/dw")
print(dL_dw_jac)

Jacobian dL/dx:
[[1.57724655 1.33246985 1.86702889]
 [1.51963415 1.2850436  1.93761231]]
Jacobian dL/dw
[[2.36535438 1.42097401 0.99714234]
 [1.98563344 1.24312054 0.86382985]
 [1.67961297 0.93502298 0.65841791]
 [1.59854282 1.22054438 0.85224498]
 [1.83132879 1.34329134 0.94023213]]


Finally, note that despite $X \in \mathbb{R}^{2 \times 3},\quad Y \in \mathbb{R}^{5 \times 3}$, the weight matrix $W$ has dimension $\mathbb{R}^{5 \times 3}$

## Part 2. Softmax Loss function

Also, a softmax-loss function:
$$L = -\log \left (y\frac{\exp^{x_i/x_{max}}}{\sum_{j=1}^{N} \exp^{x_j/x_{max}}} \right)$$
was implemented as `nodes.SoftMaxLoss`

> Note: the current implementation allows to use the same base class for Node and Loss function. The only differences are:
> 1. The labels $y$ are used in forward and backward computation.
> 2. Backpropogation doesn't contain incoming gradient 
> 3. The output is scalar and p.d. are one-dimensional
>
> This implementation differs from other libraries, such as `torch`, and probably has some disadvangates in scaling, but works perfectly in my case.

Here is some example implementation:

In [5]:
input_dim = 5
softmax_loss = SoftMaxLoss(input_dim)

x_input = np.random.rand(input_dim)
labels = np.random.choice(2, input_dim)
print(x_input, labels)
y_value = softmax_loss.forward(x_input, labels)
print(y_value)
dL_dx = softmax_loss.backward(x_input)

print("Forward output:")
print(y_value)
print("Jacobian dL/dx:")
print(dL_dx)

[0.41104046 0.08419121 0.42634531 0.44916483 0.56230137] [1 0 0 1 0]
0.8847264065266106
Forward output:
0.8847264065266106
Jacobian dL/dx:
[[-0.39116001  0.          0.         -0.40235145  0.        ]]
