# Pytorch neural net layers



Inside `torch.nn` there are several layer objects already build into `torch`. We can use them to build more complex neural networks.

- torch.nn.Linear
- torch.nn.Embedding
- torch.nn.LogSoftmax
- torch.nn.Dropout
- torch.nn.ReLU
- torch.nn.GRU


**It is important to notice that, in order to use  the forward method  already implemented in those layers we need to use as input data  formatted as `torch.Variable` type:**
```
n_input = 3
n_output = 2
sample = torch.autograd.Variable(torch.Tensor([5.,4.,3.]))
linear_layer = torch.nn.Linear(n_input, n_output)
linear_layer.forward(sample)
```


**Thefore if we define an example as a `torch.tensor` we will not be able to forward propagate it:**

```
n_input = 3
n_output = 2
sample = torch.Tensor([5.,4.,3.])
linear_layer = torch.nn.Linear(n_input, n_output)
linear_layer.forward(sample) # -------------------> this does not work
```

## About recurrent neural networks

- RNNCell does the forward pass for a single time step of a sequence (specially usefull if you want to do "custom" operatios at every time step).
- RNN applies the RNNCell forward pass to every time step of an input sequence (just like the traditional RNN)



In [1]:
import torch
import numpy as np

In [2]:
x = torch.Tensor(np.random.rand(3,1))

### Linear layer in numpy

In [133]:
np.random.seed(1234)
W = np.random.rand(2, 3)
x = np.random.rand(3, 1)
W.shape, x.shape

((2, 3), (3, 1))

In [134]:
W

array([[ 0.19151945,  0.62210877,  0.43772774],
       [ 0.78535858,  0.77997581,  0.27259261]])

In [135]:
x

array([[ 0.27646426],
       [ 0.80187218],
       [ 0.95813935]])

In [136]:
# W * x does not work!

In [137]:
np.matmul(W, x)

array([[ 0.97120417],
       [ 1.10374618]])

In [138]:
W @ x

array([[ 0.97120417],
       [ 1.10374618]])

### nn.Linear

In [149]:
n_input = 3
n_output = 2

torch.manual_seed(1000)
sample = torch.autograd.Variable(torch.Tensor([5.,4.,3.]))

linear_layer = torch.nn.Linear(n_input, n_output)
linear_layer

Linear(in_features=3, out_features=2)

In [148]:
linear_layer.weight 

Parameter containing:
-0.1253 -0.0658  0.1821
-0.0853 -0.1777  0.2099
[torch.FloatTensor of size 2x3]


In [141]:
linear_layer.bias

Parameter containing:
-0.0681
-0.1556
[torch.FloatTensor of size 2]


In [142]:
linear_layer.forward(sample)

Variable containing:
-0.7907
-2.1668
[torch.FloatTensor of size 2]


In [29]:
# We can retrieve the weights and biases from the network to numpy as follows:
W_np = linear_layer.weight.data.numpy()
b_np = linear_layer.bias.data.numpy()
x_np = sample.data.numpy()
W_np @ x_np + b_np

array([-0.79068434, -2.16681194], dtype=float32)

In [30]:
linear_layer.state_dict().keys()

odict_keys(['weight', 'bias'])

### nn.Embedding

In [16]:
n_input = 5000
n_output = 10
embedding = torch.nn.Embedding(n_input, n_output)
embedding

Embedding(5000, 10)

In [17]:
sample = torch.autograd.Variable(torch.LongTensor([506]))

In [18]:
embedding.forward(sample)

Variable containing:
 0.8614 -0.4794 -1.0838  1.6702  0.0033 -0.7552  1.8491 -0.2417  1.6373 -0.3765
[torch.FloatTensor of size 1x10]

In [19]:
embedding.weight

Parameter containing:
 9.2727e-01 -1.7421e+00 -7.6991e-01  ...   8.0227e-01  5.2690e-01  5.7296e-01
 1.3898e-01 -1.1762e+00  8.5172e-02  ...   5.1778e-01 -7.3832e-01 -3.9377e-01
 1.5146e+00  1.4999e+00  1.8176e-01  ...  -1.5661e+00 -8.5200e-01  1.2770e+00
                ...                   ⋱                   ...                
 1.6249e+00 -3.3196e-01  2.0770e+00  ...   4.3277e-01 -2.1889e+00  8.6387e-01
-8.9731e-01 -1.0062e+00 -2.8385e-01  ...   6.0996e-01  8.4379e-01 -4.9468e-01
 1.1973e+00 -8.2004e-01  1.5729e+00  ...   3.3910e-01 -1.8187e+00  5.7958e-01
[torch.FloatTensor of size 5000x10]

In [20]:
embedding.state_dict().keys()

odict_keys(['weight'])

### nn.LogSoftmax

In [21]:
torch.nn.LogSoftmax

torch.nn.modules.activation.LogSoftmax

### nn.Dropout

In [22]:
torch.nn.Dropout

torch.nn.modules.dropout.Dropout

### nn.Relu

In [23]:
torch.nn.ReLU

torch.nn.modules.activation.ReLU

### nn.GRU


In [106]:
np.random.seed(1234)
gru = torch.nn.GRU(6, 256)
sample = torch.autograd.Variable(torch.Tensor(np.random.rand(6).reshape(1,1,6)))

In [107]:
sample

Variable containing:
(0 ,.,.) = 
  0.1915  0.6221  0.4377  0.7854  0.7800  0.2726
[torch.FloatTensor of size 1x1x6]


In [108]:
type(gru.forward(sample)), len(gru.forward(sample))

(<class 'tuple'>, 2)

In [109]:
a,b = gru.forward(sample)

In [110]:
a.size()

torch.Size([1, 1, 256])

In [111]:
b.size()

torch.Size([1, 1, 256])

In [112]:
pprint.pprint(list(gru.state_dict().keys()))

['weight_ih_l0', 'weight_hh_l0', 'bias_ih_l0', 'bias_hh_l0']


### nn.GRU bidirectional

If we want to generate a representation that takes into account a sequence read from left to righ and from right to left we can use the `bidirectional=true` argument. This will double the number of parameters in our `gru` network. In fact this generates two GRU networks generating in the forward pass the concatenation of the output of both GRUs.


In [113]:
np.random.seed(1234)
gru = torch.nn.GRU(6, 256, bidirectional=True)
sample = torch.autograd.Variable(torch.Tensor(np.random.rand(6).reshape(1,1,6)))

In [114]:
sample

Variable containing:
(0 ,.,.) = 
  0.1915  0.6221  0.4377  0.7854  0.7800  0.2726
[torch.FloatTensor of size 1x1x6]


In [115]:
type(gru.forward(sample)), len(gru.forward(sample))

(<class 'tuple'>, 2)

In [116]:
a,b = gru.forward(sample)

In [117]:
# Notice that the forward pass returns a 512 vector instead of 256. 
# This is because what is returned is the concatenation of two vectors: 
# one from the "left_to_right" GRU  and the other from the "right_to_left" GRU.
a.size()

torch.Size([1, 1, 512])

In [118]:
b.size()

torch.Size([2, 1, 256])

In [119]:
pprint.pprint(list(gru.state_dict().keys()))

['weight_ih_l0',
 'weight_hh_l0',
 'bias_ih_l0',
 'bias_hh_l0',
 'weight_ih_l0_reverse',
 'weight_hh_l0_reverse',
 'bias_ih_l0_reverse',
 'bias_hh_l0_reverse']


### stacked nn.GRU 

In [51]:
rnn = torch.nn.GRU(input_size = 10, hidden_size=20, num_layers=2)
rnn.input_size, rnn.hidden_size, rnn.num_layers

(10, 20, 2)

In [43]:
input = torch.autograd.Variable(torch.randn(5, 3, 10))
h0 = torch.autograd.Variable(torch.randn(2, 3, 20))
output, hn = rnn(input, h0)
output.size(), hn.size()

(torch.Size([5, 3, 20]), torch.Size([2, 3, 20]))

### batched matrix multiplication: nn.bmm

In [33]:
batch1 = torch.randn(10, 3, 4)
batch2 = torch.randn(10, 4, 5)
res = torch.bmm(batch1, batch2)
res.size()

torch.Size([10, 3, 5])

In [34]:
attn_weights = torch.randn([1, 1, 10])
encoder_outputs = torch.randn([1, 10, 256])
torch.bmm(attn_weights, encoder_outputs).size()

torch.Size([1, 1, 256])