# Pytorch neural net layers



Inside `torch.nn` there are several layer objects already build into `torch`. We can use them to build more complex neural networks.

- torch.nn.Linear
- torch.nn.Embedding
- torch.nn.LogSoftmax
- torch.nn.Dropout
- torch.nn.ReLU
- torch.nn.GRU


**It is important to notice that, in order to use  the forward method  already implemented in those layers we need to use as input data  formatted as `torch.Variable` type:**
```
n_input = 3
n_output = 2
sample = torch.autograd.Variable(torch.Tensor([5.,4.,3.]))
linear_layer = torch.nn.Linear(n_input, n_output)
linear_layer.forward(sample)
```


**Thefore if we define an example as a `torch.tensor` we will not be able to forward propagate it:**

```
n_input = 3
n_output = 2
sample = torch.Tensor([5.,4.,3.])
linear_layer = torch.nn.Linear(n_input, n_output)
linear_layer.forward(sample) # -------------------> this does not work
```

## About recurrent neural networks

- RNNCell does the forward pass for a single time step of a sequence (specially usefull if you want to do "custom" operatios at every time step).
- RNN applies the RNNCell forward pass to every time step of an input sequence (just like the traditional RNN)



In [1]:
import torch
import numpy as np

In [2]:
x = torch.Tensor(np.random.rand(3,1))

### Linear layer in numpy

In [3]:
W = np.random.rand(2, 3)
x = np.random.rand(3, 1)
W.shape, x.shape

((2, 3), (3, 1))

In [4]:
W

array([[ 0.62161391,  0.10994547,  0.01822854],
       [ 0.26853693,  0.06991268,  0.87765238]])

In [5]:
x

array([[ 0.27355126],
       [ 0.26933617],
       [ 0.02976121]])

In [6]:
W.shape, x.shape

((2, 3), (3, 1))

In [7]:
W

array([[ 0.62161391,  0.10994547,  0.01822854],
       [ 0.26853693,  0.06991268,  0.87765238]])

In [8]:
# W * x does not work!

In [9]:
np.matmul(W, x)

array([[ 0.20019806],
       [ 0.11840863]])

### nn.Linear

In [10]:
n_input = 3
n_output = 2
torch.manual_seed(1000)
sample = torch.autograd.Variable(torch.Tensor([5.,4.,3.]))
torch.manual_seed(1000)
linear_layer = torch.nn.Linear(n_input, n_output)
linear_layer

Linear(in_features=3, out_features=2)

In [11]:
linear_layer.weight 

Parameter containing:
-0.2091  0.1311 -0.0672
-0.2794 -0.2628  0.1456
[torch.FloatTensor of size 2x3]

In [12]:
linear_layer.bias

Parameter containing:
-0.0681
-0.1556
[torch.FloatTensor of size 2]

In [13]:
linear_layer.forward(sample)

Variable containing:
-0.7907
-2.1668
[torch.FloatTensor of size 2]

In [14]:
# We can retrieve the weights and biases from the network to numpy as follows:
W_np = linear_layer.weight.data.numpy()
b_np = linear_layer.bias.data.numpy()
x_np = sample.data.numpy()
W_np @ x_np + b_np

array([-0.79068434, -2.16681194], dtype=float32)

In [15]:
linear_layer.state_dict().keys()

odict_keys(['weight', 'bias'])

### nn.Embedding

In [16]:
n_input = 5000
n_output = 10
embedding = torch.nn.Embedding(n_input, n_output)
embedding

Embedding(5000, 10)

In [17]:
sample = torch.autograd.Variable(torch.LongTensor([506]))

In [18]:
embedding.forward(sample)

Variable containing:
 0.8614 -0.4794 -1.0838  1.6702  0.0033 -0.7552  1.8491 -0.2417  1.6373 -0.3765
[torch.FloatTensor of size 1x10]

In [19]:
embedding.weight

Parameter containing:
 9.2727e-01 -1.7421e+00 -7.6991e-01  ...   8.0227e-01  5.2690e-01  5.7296e-01
 1.3898e-01 -1.1762e+00  8.5172e-02  ...   5.1778e-01 -7.3832e-01 -3.9377e-01
 1.5146e+00  1.4999e+00  1.8176e-01  ...  -1.5661e+00 -8.5200e-01  1.2770e+00
                ...                   ⋱                   ...                
 1.6249e+00 -3.3196e-01  2.0770e+00  ...   4.3277e-01 -2.1889e+00  8.6387e-01
-8.9731e-01 -1.0062e+00 -2.8385e-01  ...   6.0996e-01  8.4379e-01 -4.9468e-01
 1.1973e+00 -8.2004e-01  1.5729e+00  ...   3.3910e-01 -1.8187e+00  5.7958e-01
[torch.FloatTensor of size 5000x10]

In [20]:
embedding.state_dict().keys()

odict_keys(['weight'])

### nn.LogSoftmax

In [21]:
torch.nn.LogSoftmax

torch.nn.modules.activation.LogSoftmax

### nn.Dropout

In [22]:
torch.nn.Dropout

torch.nn.modules.dropout.Dropout

### nn.Relu

In [23]:
torch.nn.ReLU

torch.nn.modules.activation.ReLU

### nn.GRU


In [52]:
gru = torch.nn.GRU(6, 256)
sample = torch.autograd.Variable(torch.Tensor(np.random.rand(6).reshape(1,1,6)))

In [53]:
sample

Variable containing:
(0 ,.,.) = 
  0.3458  0.6440  0.2251  0.9013  0.4930  0.8462
[torch.FloatTensor of size 1x1x6]

In [54]:
type(gru.forward(sample)), len(gru.forward(sample))

(tuple, 2)

In [55]:
a,b = gru.forward(sample)

In [56]:
a.size()

torch.Size([1, 1, 256])

In [60]:
b.size()

torch.Size([1, 1, 256])

In [58]:
gru.state_dict().keys()

odict_keys(['weight_ih_l0', 'weight_hh_l0', 'bias_ih_l0', 'bias_hh_l0'])

In [59]:
embedding

Embedding(5000, 10)

### stacked nn.GRU 

In [51]:
rnn = torch.nn.GRU(input_size = 10, hidden_size=20, num_layers=2)
rnn.input_size, rnn.hidden_size, rnn.num_layers

(10, 20, 2)

In [43]:
input = torch.autograd.Variable(torch.randn(5, 3, 10))
h0 = torch.autograd.Variable(torch.randn(2, 3, 20))
output, hn = rnn(input, h0)
output.size(), hn.size()

(torch.Size([5, 3, 20]), torch.Size([2, 3, 20]))

### batched matrix multiplication: nn.bmm

In [33]:
batch1 = torch.randn(10, 3, 4)
batch2 = torch.randn(10, 4, 5)
res = torch.bmm(batch1, batch2)
res.size()

torch.Size([10, 3, 5])

In [34]:
attn_weights = torch.randn([1, 1, 10])
encoder_outputs = torch.randn([1, 10, 256])
torch.bmm(attn_weights, encoder_outputs).size()

torch.Size([1, 1, 256])