# Fully-Connected (Linear) Layer
In this notebook, we will look into the forward and the backward the the ```nn.Linear``` layer. We will also manualy derive the expressions for the gradient of the loss respect to the input $\frac{\partial L}{\partial I}$ of this layer and also the gradient of the loss with respect to the weights $\frac{\partial L}{\partial W}$.

In [1]:
require 'nn';
n = torch.rand(5)
lin = nn.Linear(5,4)
m = lin:forward(n)

#### Input

In [2]:
n

 0.2647
 0.7477
 0.5382
 0.1914
 0.0834
[torch.DoubleTensor of size 5]



#### Output

In [3]:
m

 0.1242
 0.2382
-0.6489
-0.2896
[torch.DoubleTensor of size 4]



#### Output (manually compute)

In [4]:
m_ = lin.weight*n + lin.bias
print(m_)

 0.1242
 0.2382
-0.6489
-0.2896
[torch.DoubleTensor of size 4]



#### Set gradient of the loss with respect of input of the next layer $\frac{\partial L}{\partial I^{l+1}}$ of next layer to random values, and backward propagae the gradient from the next layer via this linear layer.

In [5]:
nextgrad = torch.rand(4)
lin:backward(n, nextgrad)

#### Gradient of the loss with respect of input of this layer $\frac{\partial L}{\partial I}$

In [6]:
lin.gradInput

-0.1486
-0.3457
-0.0236
 0.2292
 0.3097
[torch.DoubleTensor of size 5]



#### Relation for calcuating the gradient of loss with respect to the input: $\frac{\partial L}{\partial I^{l}} = \frac{\partial L}{\partial I^{l+1}} \times \frac{\partial O^{l}}{\partial I^{l}}$. Note how the jacobian $\frac{\partial O^{l}}{\partial I^{l}} = W^{l}$

In [7]:
nextgrad:reshape(1,4)*lin.weight

-0.1486 -0.3457 -0.0236  0.2292  0.3097
[torch.DoubleTensor of size 1x5]



#### This layers gradient of Loss with respect to the weights: $\frac{\partial L}{\partial W^{l}}$

In [8]:
lin.gradWeight

 0.2135  0.6033  0.4343  0.1544  0.0673
 0.0556  0.1572  0.1132  0.0402  0.0175
 0.0850  0.2402  0.1729  0.0615  0.0268
 0.1717  0.4850  0.3491  0.1242  0.0541
[torch.DoubleTensor of size 4x5]



#### Relation for calcuating the gradient of the loss with respect to the weights of this layer: $\frac{\partial L}{\partial W^{l}} = \frac{\partial L}{\partial O} \frac{\partial O}{\partial W_{l}}$. <br/>
Let us first calcuate $\frac{\partial O}{\partial W_{l}}$ which is a jacobian of size $4\times20$. 

In [9]:
dodw = torch.Tensor(4,20)
st = 1
for i = 1, 4 do
    for j = 1, 5 do
        dodw[i][st]=n[j]
        st = st + 1
    end
end

#### Finally, we can now calculate $\frac{\partial L}{\partial W^{l}} = \frac{\partial L}{\partial O} \frac{\partial O}{\partial W_{l}}$

In [10]:
(nextgrad:reshape(1,4) * dodw):reshape(4,5)

 0.2135  0.6033  0.4343  0.1544  0.0673
 0.0556  0.1572  0.1132  0.0402  0.0175
 0.0850  0.2402  0.1729  0.0615  0.0268
 0.1717  0.4850  0.3491  0.1242  0.0541
[torch.DoubleTensor of size 4x5]



#### This layers gradient of output with respect to the bias: $\frac{\partial L}{\partial b^{l}}$

In [11]:
lin.gradBias

 0.8068
 0.2102
 0.3212
 0.6487
[torch.DoubleTensor of size 4]



#### Relation for calcuating this layers gradient of output with respect to the bias: $\frac{\partial L}{\partial b^{l}} = \frac{\partial L}{\partial I^{l+1}}$

In [12]:
nextgrad:reshape(1,4):t()

 0.8068
 0.2102
 0.3212
 0.6487
[torch.DoubleTensor of size 4x1]

