# Lab 6, part 1: new nodes
by Domrachev Ivan, B20-Ro-01

In [1]:
from nn_from_scratch.nodes import SoftMaxLoss
from nn_from_scratch.neurons import Linear
import numpy as np

## Part 1. Linear Layer remake

Previously, linear layers were implemented as:
$$Linear(X) = WX.$$
However, current assignment requires additional term $b$:
$$Linear(X) = WX + b.$$

The class `neurons.Linear` were modified to this structure. Let's check it in the same way as we did for the previous implementation:

In [2]:
input_dim=(3, 2)
output_dim=(3, 5)
relu = Linear(input_dim, output_dim)

# Random input values (x itself, and assumed partial derivative)
x_input = np.random.rand(*input_dim)
dL_dy = np.random.rand(*output_dim)

# Forward call
y_value = relu.forward(x_input)

# Backpropogation
dL_dx = relu.backward(dL_dy)
dL_dw = relu._W_pd

print("Forward output:")
print(y_value)
print("Jacobian dL/dx:")
print(dL_dx)
print("Jacobian dL/dw:")
print(dL_dw)

Forward output:
[[1.18494715 1.41062189 1.11896593 1.14160105 1.23413608]
 [0.70806376 0.84211581 0.62968799 0.65457458 0.69722952]
 [1.0083814  1.11864922 0.900688   0.94300045 0.9918296 ]]
Jacobian dL/dx:
[[0.47750095 0.50584513]
 [1.59659126 1.68018523]
 [1.17571714 1.23450319]]
Jacobian dL/dw:
[[1.61650053 1.68179081 1.02794801 0.92252073 1.20212856]
 [0.42774024 0.36441561 0.26182885 0.11457575 0.28209585]
 [0.94395783 1.1200461  0.40326474 0.76542123 0.47258473]]


Also note that calculations of back propogation wia jacobians lead to the same result:

In [3]:
dL_dx_jac = relu.backward(dL_dy, use_jacobian=True)
dL_dw_jac = relu._W_pd

print("Jacobian dL/dx:")
print(dL_dx_jac)

print("Jacobian dL/dw")
print(dL_dw_jac)

Jacobian dL/dx:
[[0.47750095 0.50584513]
 [1.59659126 1.68018523]
 [1.17571714 1.23450319]]
Jacobian dL/dw
[[1.61650053 1.68179081 1.02794801 0.92252073 1.20212856]
 [0.42774024 0.36441561 0.26182885 0.11457575 0.28209585]
 [0.94395783 1.1200461  0.40326474 0.76542123 0.47258473]]


Finally, note that despite $X \in \mathbb{R}^{2 \times 3},\quad Y \in \mathbb{R}^{5 \times 3}$, the weight matrix $W$ has dimension $\mathbb{R}^{5 \times 3}$

## Part 2. Softmax Loss function

Also, a softmax-loss function:
$$L = \frac{1}{N} \sum -\log \left (y\frac{\exp^{x_i/x_{max}}}{\sum_{j=1}^{N} \exp^{x_j/x_{max}}} \right)$$
was implemented as `nodes.SoftMaxLoss`

> Note: the current implementation allows to use the same base class for Node and Loss function. The only differences are:
> 1. The labels $y$ are used in forward and backward computation.
> 2. Backpropogation doesn't contain incoming gradient 
> 3. The output is scalar and p.d. are one-dimensional
>
> This implementation differs from other libraries, such as `torch`, and probably has some disadvangates in scaling, but works perfectly in my case.

Here is some example implementation:

In [4]:
input_dim = 5
softmax_loss = SoftMaxLoss(input_dim)

x_input = np.random.rand(input_dim)
labels = np.random.choice(2, input_dim)
print(x_input, labels)
y_value = softmax_loss.forward(x_input, labels)
print(y_value)
dL_dx = softmax_loss.backward(x_input)

print("Forward output:")
print(y_value)
print("Jacobian dL/dx:")
print(dL_dx)

[0.19118483 0.22269646 0.36330756 0.19139163 0.15870146] [0 1 1 0 0]
[0.73284375]
Forward output:
[0.73284375]
Jacobian dL/dx:
[ 0.         -0.58808293 -0.40633713  0.          0.        ]


Matrices as input are supported as well:

In [5]:
input_dim = (10, 5)
softmax_loss = SoftMaxLoss(input_dim)

x_input = np.random.rand(*input_dim)
labels = np.random.choice(2, input_dim)
print(x_input, labels)
y_value = softmax_loss.forward(x_input, labels)
dL_dx = softmax_loss.backward(x_input)

print("Forward output:")
print(y_value)
print("Jacobian dL/dx:")
print(dL_dx)

[[0.4098809  0.70290909 0.79803945 0.49825987 0.77122749]
 [0.79044226 0.89593445 0.9434427  0.11767817 0.30124511]
 [0.55392198 0.29637471 0.93043622 0.45001948 0.02593289]
 [0.05164899 0.80490725 0.58662845 0.78830689 0.64627729]
 [0.67014526 0.76241184 0.9911628  0.59353523 0.92045358]
 [0.26689125 0.13970089 0.58903479 0.45844383 0.71752884]
 [0.20094275 0.16369616 0.89409796 0.77364318 0.93933076]
 [0.23557799 0.26885793 0.11884898 0.45365429 0.79282601]
 [0.62282782 0.44568548 0.09219348 0.46343266 0.18359826]
 [0.90403437 0.33314109 0.1249022  0.87229509 0.89545525]] [[1 0 1 0 0]
 [1 1 1 1 0]
 [0 0 0 1 1]
 [0 0 0 1 0]
 [0 0 0 1 0]
 [1 0 0 1 0]
 [0 1 0 1 0]
 [0 1 1 0 1]
 [0 1 1 1 0]
 [1 0 0 1 1]]
Forward output:
[0.88970002]
Jacobian dL/dx:
[[-0.16707145  0.         -0.18882362  0.          0.        ]
 [-1.27812771 -1.37930803 -0.95665661 -0.7209445   0.        ]
 [ 0.          0.          0.         -0.14085896 -0.09694169]
 [ 0.          0.          0.         -0.16585208  0. 