# Pytorch neural net layers



Inside `torch.nn` there are several layer objects already build into `torch`. We can use them to build more complex neural networks.

- torch.nn.Linear
- torch.nn.Embedding
- torch.nn.LogSoftmax
- torch.nn.Dropout
- torch.nn.ReLU
- torch.nn.GRU


**It is important to notice that, in order to use  the forward method  already implemented in those layers we need to use as input data  formatted as `torch.Variable` type:**
```
n_input = 3
n_output = 2
sample = torch.autograd.Variable(torch.Tensor([5.,4.,3.]))
linear_layer = torch.nn.Linear(n_input, n_output)
linear_layer.forward(sample)
```


**Thefore if we define an example as a `torch.tensor` we will not be able to forward propagate it:**

```
n_input = 3
n_output = 2
sample = torch.Tensor([5.,4.,3.])
linear_layer = torch.nn.Linear(n_input, n_output)
linear_layer.forward(sample) # -------------------> this does not work
```

## About recurrent neural networks

- RNNCell does the forward pass for a single time step of a sequence (specially usefull if you want to do "custom" operatios at every time step).
- RNN applies the RNNCell forward pass to every time step of an input sequence (just like the traditional RNN)



In [56]:
import torch
import numpy as np
print("\nNotebook done in pytorch version: ", torch.__version__)


Notebook done in pytorch version:  1.1.0


In [3]:
x = torch.Tensor(np.random.rand(3,1))

### Linear layer in numpy

In [4]:
np.random.seed(1234)
W = np.random.rand(2, 3)
x = np.random.rand(3, 1)
W.shape, x.shape

((2, 3), (3, 1))

In [5]:
W

array([[0.19151945, 0.62210877, 0.43772774],
       [0.78535858, 0.77997581, 0.27259261]])

In [6]:
x

array([[0.27646426],
       [0.80187218],
       [0.95813935]])

In [7]:
# W * x does not work!

In [8]:
np.matmul(W, x)

array([[0.97120417],
       [1.10374618]])

In [9]:
W @ x

array([[0.97120417],
       [1.10374618]])

### nn.Linear

In [10]:
n_input = 3
n_output = 2

torch.manual_seed(1000)
sample = torch.autograd.Variable(torch.Tensor([5.,4.,3.]))

linear_layer = torch.nn.Linear(n_input, n_output)
linear_layer

Linear(in_features=3, out_features=2, bias=True)

In [11]:
linear_layer.weight 

Parameter containing:
tensor([[-0.2091,  0.1311, -0.0672],
        [-0.2794, -0.2628,  0.1456]], requires_grad=True)

In [12]:
linear_layer.bias

Parameter containing:
tensor([-0.0681, -0.1556], requires_grad=True)

In [13]:
linear_layer.forward(sample)

tensor([-0.7907, -2.1668], grad_fn=<AddBackward0>)

In [14]:
# We can retrieve the weights and biases from the network to numpy as follows:
W_np = linear_layer.weight.data.numpy()
b_np = linear_layer.bias.data.numpy()
x_np = sample.data.numpy()
W_np @ x_np + b_np

array([-0.79068434, -2.166812  ], dtype=float32)

In [15]:
linear_layer.state_dict().keys()

odict_keys(['weight', 'bias'])

### nn.Embedding

In [16]:
n_input = 5000
n_output = 10
embedding = torch.nn.Embedding(n_input, n_output)
embedding

Embedding(5000, 10)

In [17]:
sample = torch.autograd.Variable(torch.LongTensor([506]))

In [18]:
embedding.forward(sample)

tensor([[ 0.0018,  1.7398,  0.0347, -0.1383, -0.0893, -0.4650, -0.1623,  0.1137,
         -0.3421, -0.2518]], grad_fn=<EmbeddingBackward>)

In [19]:
embedding.weight

Parameter containing:
tensor([[-0.3879,  1.2894, -0.9362,  ...,  0.2743, -0.8496,  0.3947],
        [ 0.0848,  0.1864,  0.0859,  ...,  1.0726,  1.0481,  1.0527],
        [-0.6424, -1.2234, -1.0794,  ..., -0.0482,  0.6610, -0.8908],
        ...,
        [-0.4186,  0.0305, -0.7265,  ...,  0.0622, -0.1281,  0.8795],
        [ 0.2722, -0.7068,  0.7342,  ...,  0.8290, -0.4435, -0.0754],
        [-0.4442,  0.8973, -1.2622,  ..., -1.2709, -1.1286,  0.7347]],
       requires_grad=True)

In [20]:
embedding.state_dict().keys()

odict_keys(['weight'])

### nn.LogSoftmax

In [21]:
torch.nn.LogSoftmax

torch.nn.modules.activation.LogSoftmax

### nn.Dropout

In [22]:
torch.nn.Dropout

torch.nn.modules.dropout.Dropout

### nn.Relu

In [23]:
torch.nn.ReLU

torch.nn.modules.activation.ReLU

### nn.GRU


In [24]:
np.random.seed(1234)
gru = torch.nn.GRU(6, 256)
sample = torch.autograd.Variable(torch.Tensor(np.random.rand(6).reshape(1,1,6)))

In [25]:
sample

tensor([[[0.1915, 0.6221, 0.4377, 0.7854, 0.7800, 0.2726]]])

In [26]:
type(gru.forward(sample)), len(gru.forward(sample))

(tuple, 2)

In [27]:
a,b = gru.forward(sample)

In [28]:
a.size()

torch.Size([1, 1, 256])

In [29]:
b.size()

torch.Size([1, 1, 256])

In [35]:
import pprint
pprint.pprint(list(gru.state_dict().keys()))

['weight_ih_l0',
 'weight_hh_l0',
 'bias_ih_l0',
 'bias_hh_l0',
 'weight_ih_l0_reverse',
 'weight_hh_l0_reverse',
 'bias_ih_l0_reverse',
 'bias_hh_l0_reverse']


### nn.GRU bidirectional

If we want to generate a representation that takes into account a sequence read from left to righ and from right to left we can use the `bidirectional=true` argument. This will double the number of parameters in our `gru` network. In fact this generates two GRU networks generating in the forward pass the concatenation of the output of both GRUs.


In [36]:
np.random.seed(1234)
gru = torch.nn.GRU(6, 256, bidirectional=True)
sample = torch.autograd.Variable(torch.Tensor(np.random.rand(6).reshape(1,1,6)))

In [37]:
sample

tensor([[[0.1915, 0.6221, 0.4377, 0.7854, 0.7800, 0.2726]]])

In [38]:
type(gru.forward(sample)), len(gru.forward(sample))

(tuple, 2)

In [39]:
a,b = gru.forward(sample)

In [40]:
# Notice that the forward pass returns a 512 vector instead of 256. 
# This is because what is returned is the concatenation of two vectors: 
# one from the "left_to_right" GRU  and the other from the "right_to_left" GRU.
a.size()

torch.Size([1, 1, 512])

In [41]:
b.size()

torch.Size([2, 1, 256])

In [42]:
pprint.pprint(list(gru.state_dict().keys()))

['weight_ih_l0',
 'weight_hh_l0',
 'bias_ih_l0',
 'bias_hh_l0',
 'weight_ih_l0_reverse',
 'weight_hh_l0_reverse',
 'bias_ih_l0_reverse',
 'bias_hh_l0_reverse']


### Checking bidirectional GRU

In [43]:
from torch.autograd import Variable

torch.manual_seed(1234)
random_input = Variable(torch.FloatTensor(5, 1, 1).normal_(), requires_grad=False)
random_input[:, 0, 0]

tensor([ 0.0461,  0.4024, -1.0115,  0.2167, -0.6123])

In [44]:
bi_grus = torch.nn.GRU(input_size=1, hidden_size=1, num_layers=1, batch_first=False, bidirectional=True)

In [45]:
reverse_gru = torch.nn.GRU(input_size=1, hidden_size=1, num_layers=1, batch_first=False, bidirectional=False)
reverse_gru.weight_ih_l0 = bi_grus.weight_ih_l0_reverse
reverse_gru.weight_hh_l0 = bi_grus.weight_hh_l0_reverse
reverse_gru.bias_ih_l0 = bi_grus.bias_ih_l0_reverse
reverse_gru.bias_hh_l0 = bi_grus.bias_hh_l0_reverse

In [46]:
bi_output, bi_hidden = bi_grus(random_input)

In [47]:
reverse_output, reverse_hidden = reverse_gru(random_input[np.arange(4, -1, -1), :, :])

In [48]:
reverse_output[:, 0, 0]

tensor([0.4095, 0.4667, 0.5444, 0.5134, 0.5124], grad_fn=<SelectBackward>)

In [49]:
bi_output[:, 0, 1]

tensor([0.5124, 0.5134, 0.5444, 0.4667, 0.4095], grad_fn=<SelectBackward>)

In [50]:
reverse_hidden

tensor([[[0.5124]]], grad_fn=<StackBackward>)

In [51]:
bi_hidden

tensor([[[0.4491]],

        [[0.5124]]], grad_fn=<StackBackward>)

### stacked nn.GRU 

In [52]:
rnn = torch.nn.GRU(input_size = 10, hidden_size=20, num_layers=2)
rnn.input_size, rnn.hidden_size, rnn.num_layers

(10, 20, 2)

In [53]:
input = torch.autograd.Variable(torch.randn(5, 3, 10))
h0 = torch.autograd.Variable(torch.randn(2, 3, 20))
output, hn = rnn(input, h0)
output.size(), hn.size()

(torch.Size([5, 3, 20]), torch.Size([2, 3, 20]))

### batched matrix multiplication: nn.bmm

In [54]:
batch1 = torch.randn(10, 3, 4)
batch2 = torch.randn(10, 4, 5)
res = torch.bmm(batch1, batch2)
res.size()

torch.Size([10, 3, 5])

In [55]:
attn_weights = torch.randn([1, 1, 10])
encoder_outputs = torch.randn([1, 10, 256])
torch.bmm(attn_weights, encoder_outputs).size()

torch.Size([1, 1, 256])