# Pytorch neural net layers



Inside `torch.nn` there are several layer operations build into `torch`. We can use them to build more complex neural networks.

- torch.nn.Linear
- torch.nn.Embedding
- torch.nn.LogSoftmax
- torch.nn.Dropout
- torch.nn.ReLU


It is important to notice that the forward method only works with `torch.Variable` types:
```
n_input = 3
n_output = 2
sample = torch.autograd.Variable(torch.Tensor([5.,4.,3.]))
linear_layer = torch.nn.Linear(n_input, n_output)
linear_layer.forward(sample)
```


Thefore if we define an example as a `torch.tensor` we will not be able to forward propagate it:
```
n_input = 3
n_output = 2
sample = torch.Tensor([5.,4.,3.])
linear_layer = torch.nn.Linear(n_input, n_output)
linear_layer.forward(sample) # -------------------> this does not work
```

In [724]:
import torch
import numpy as np

In [725]:
x = torch.Tensor(np.random.rand(3,1))

#### Linear layer in numpy

In [726]:
W = np.random.rand(2, 3)
x = np.random.rand(3, 1)
W.shape, x.shape

((2, 3), (3, 1))

In [727]:
W

array([[ 0.58366281,  0.56736373,  0.43495495],
       [ 0.79652223,  0.21222979,  0.81476489]])

In [728]:
x

array([[ 0.96248646],
       [ 0.529278  ],
       [ 0.43134643]])

In [729]:
W.shape, x.shape

((2, 3), (3, 1))

In [730]:
W

array([[ 0.58366281,  0.56736373,  0.43495495],
       [ 0.79652223,  0.21222979,  0.81476489]])

In [731]:
# W * x does not work!

In [732]:
np.matmul(W, x)

array([[ 1.04967696],
       [ 1.23041635]])

#### nn.Linear

In [733]:
n_input = 3
n_output = 2
torch.manual_seed(1000)
sample = torch.autograd.Variable(torch.Tensor([5.,4.,3.]))
torch.manual_seed(1000)
linear_layer = torch.nn.Linear(n_input, n_output)
linear_layer

Linear (3 -> 2)

In [734]:
linear_layer.weight 

Parameter containing:
 0.1773 -0.3400 -0.4446
 0.1275  0.5199 -0.5655
[torch.FloatTensor of size 2x3]

In [735]:
linear_layer.bias

Parameter containing:
-0.0206
-0.1607
[torch.FloatTensor of size 2]

In [736]:
linear_layer.forward(sample)

Variable containing:
-1.8276
 0.8598
[torch.FloatTensor of size 2]

In [737]:
# We can retrieve the weights and biases from the network to numpy as follows:
W_np = linear_layer.weight.data.numpy()
b_np = linear_layer.bias.data.numpy()
x_np = sample.data.numpy()
W_np @ x_np + b_np

array([-1.82760572,  0.85981667], dtype=float32)

In [738]:
linear_layer.state_dict().keys()

odict_keys(['weight', 'bias'])

#### nn.Embedding

In [739]:
n_input = 5000
n_output = 10
embedding = torch.nn.Embedding(n_input, n_output)
embedding

Embedding(5000, 10)

In [740]:
sample = torch.autograd.Variable(torch.LongTensor([506]))

In [741]:
embedding.forward(sample)

Variable containing:
 0.5854  0.5321  0.1900  1.5388 -2.2879  0.1719  1.0082  0.2139 -1.6939  0.1773
[torch.FloatTensor of size 1x10]

In [742]:
embedding.weight

Parameter containing:
 1.3553e+00 -1.3990e+00  3.8797e-02  ...   3.7456e-01  2.0669e-01  1.9429e+00
 7.1233e-01 -1.0958e+00  4.5843e-01  ...   2.2116e-01  8.5470e-02  1.8853e-01
-8.8053e-02 -2.1680e+00  1.6704e+00  ...  -5.7660e-01  6.7989e-01 -3.1399e-01
                ...                   ⋱                   ...                
 6.0944e-01  1.3832e+00 -4.2927e-01  ...   9.9310e-01  7.9954e-01 -3.1867e-01
 7.3491e-01 -1.7562e-01 -8.5124e-01  ...  -1.0927e+00  3.9260e-01 -3.4336e-01
-7.6331e-01  9.5274e-01  9.3096e-01  ...  -1.7369e-01  6.2901e-01 -1.1303e+00
[torch.FloatTensor of size 5000x10]

In [745]:
embedding.state_dict().keys()

odict_keys(['weight'])

#### nn.LogSoftmax

In [579]:
torch.nn.LogSoftmax

torch.nn.modules.activation.LogSoftmax

#### nn.Dropout

In [580]:
torch.nn.Dropout

torch.nn.modules.dropout.Dropout

#### nn.Relu

In [806]:
torch.nn.ReLU

torch.nn.modules.activation.ReLU

#### nn.GRU


In [807]:
gru = torch.nn.GRU(6,256)
sample = torch.autograd.Variable(torch.Tensor(np.random.rand(6).reshape(1,1,6)))

In [808]:
sample

Variable containing:
(0 ,.,.) = 
  0.4747  0.2536  0.4099  0.2773  0.8200  0.0557
[torch.FloatTensor of size 1x1x6]

In [809]:
type(gru.forward(sample)), len(gru.forward(sample))

(tuple, 2)

In [810]:
a,b = gru.forward(sample)

In [804]:
a.size()

torch.Size([1, 1, 256])

In [805]:
b.size()

torch.Size([1, 1, 256])

In [797]:
gru.state_dict().keys()

odict_keys(['weight_ih_l0', 'weight_hh_l0', 'bias_ih_l0', 'bias_hh_l0'])

In [811]:
embedding

Embedding(5000, 10)

#### nn.bmm

In [820]:
batch1 = torch.randn(10, 3, 4)
batch2 = torch.randn(10, 4, 5)
res = torch.bmm(batch1, batch2)
res.size()

torch.Size([10, 3, 5])

In [830]:
attn_weights = torch.randn([1, 1, 10])
encoder_outputs = torch.randn([1, 10, 256])
torch.bmm(attn_weights, encoder_outputs).size()

torch.Size([1, 1, 256])