# Building Blocks of Models
- ```nn.Linear```
- Nonlinear Activations
- Loss functions
- Optimizers

In [0]:
!pip3 install torch torchvision

Collecting torch
[?25l  Downloading https://files.pythonhosted.org/packages/49/0e/e382bcf1a6ae8225f50b99cc26effa2d4cc6d66975ccf3fa9590efcbedce/torch-0.4.1-cp36-cp36m-manylinux1_x86_64.whl (519.5MB)
[K    100% |████████████████████████████████| 519.5MB 33kB/s 
tcmalloc: large alloc 1073750016 bytes == 0x590bc000 @  0x7fd7ce2351c4 0x46d6a4 0x5fcbcc 0x4c494d 0x54f3c4 0x553aaf 0x54e4c8 0x54f4f6 0x553aaf 0x54efc1 0x54f24d 0x553aaf 0x54efc1 0x54f24d 0x553aaf 0x54efc1 0x54f24d 0x551ee0 0x54e4c8 0x54f4f6 0x553aaf 0x54efc1 0x54f24d 0x551ee0 0x54efc1 0x54f24d 0x551ee0 0x54e4c8 0x54f4f6 0x553aaf 0x54e4c8
[?25hCollecting torchvision
[?25l  Downloading https://files.pythonhosted.org/packages/ca/0d/f00b2885711e08bd71242ebe7b96561e6f6d01fdb4b9dcf4d37e2e13c5e1/torchvision-0.2.1-py2.py3-none-any.whl (54kB)
[K    100% |████████████████████████████████| 61kB 20.8MB/s 
[?25hCollecting pillow>=4.1.1 (from torchvision)
[?25l  Downloading https://files.pythonhosted.org/packages/62/94/5430ebaa83f91cc7a

In [0]:
import numpy as np
import pandas as pd
import torch, torchvision
torch.__version__

'0.4.1'

In [0]:
import torch.nn as nn

## 1. nn.Linear
```nn.Linear()``` is one of the basic building blocks of any neural network (NN) model
  - Performs linear (or affine) transformation in the form of ```Wx (+ b)```. In NN terminology, generates a fully connected, or dense, layer.
  - Two parameters, ```in_features``` and ```out_features``` should be specified
  - Documentation: [linear_layers](https://pytorch.org/docs/stable/nn.html#linear-layers)
  
```python
torch.nn.Linear(in_features,       # size of each input sample
                out_features,     # size of each output sample
                bias = True)         # whether bias (b) will be added or not
                         
```

In [0]:
linear = nn.Linear(5, 1)             # input dim = 5, output dim = 1
x = torch.FloatTensor([1, 2, 3, 4, 5])    # 1d tensor
print(linear(x))      
y = torch.ones(3, 5)                      # 2d tensor
print(linear(y))

tensor([3.5314], grad_fn=<ThAddBackward>)
tensor([[1.1357],
        [1.1357],
        [1.1357]], grad_fn=<ThAddmmBackward>)


## 2. Nonlinear activations
PyTorch provides a number of nonlinear activation functions. Most commonly used ones are:
```python
torch.nn.ReLU()                # relu
torch.nn.Sigmoid()         # sigmoid
torch.nn.Tanh()        # tangent hyperbolic
torch.nn.Softmax()        # softmax
```
  - Documentation: [nonlinear_activations](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity)

In [0]:
relu = torch.nn.ReLU()
sigmoid = torch.nn.Sigmoid()
tanh = torch.nn.Tanh()
softmax = torch.nn.Softmax(dim = 0)   # when using softmax, explicitly designate dimension

x = torch.randn(5)     # five random numbers
print(x)
print(relu(x))       
print(sigmoid(x))
print(tanh(x))
print(softmax(x))

tensor([ 0.9238, -0.9667, -0.4237, -1.7181,  2.3956])
tensor([0.9238, 0.0000, 0.0000, 0.0000, 2.3956])
tensor([0.7158, 0.2755, 0.3956, 0.1521, 0.9165])
tensor([ 0.7277, -0.7472, -0.4000, -0.9376,  0.9835])
tensor([0.1713, 0.0259, 0.0445, 0.0122, 0.7462])


## 3. Loss Functions
There are a number of loss functions that are already implemented in PyTorch. Common ones include:
- ```nn.MSELoss```: Mean squared error. Commonly used in regression tasks.
- ```nn.CrossEntropyLoss```: Cross entropy loss. Commonly used in classification tasks

In [0]:
a = torch.FloatTensor([2, 4, 5])
b = torch.FloatTensor([1, 3, 2])

mse = nn.MSELoss()
print(mse(a, b))

tensor(3.6667)


In [0]:
# note that when using CrossEntropyLoss, input has to have (N, C) shape, where
# N is the batch size
# C is the number of classes
a = torch.FloatTensor([[0.5, 0], [4.5, 0], [0, 0.4], [0, 0.1]])   # input
b = torch.LongTensor([1, 1, 1, 0])                                # target

ce = nn.CrossEntropyLoss()
print(ce(a,b))

tensor(1.6856)


## 4. Optimizers
- ```torch.optim``` provides various optimization algorithms that are commonly used. Some of them are: 
```python
optim.Adagrad   
optim.Adam
optim.RMSprop
optim.SGD
```
- As arguments, (model) parameters and (optionally) learning rate are passed
- Model training process
  - ```optimizer.zero_grad()```: sets all gradients to zero (for every training batches)
  - ```loss_fn.backward()```: back propagate with respect to the loss function
  - ```optimizer.step()```: update model parameters

In [0]:
## how pytorch models are trained with loss function and optimizers

# input and output data
x = torch.randn(5)
y = torch.ones(1)

model = nn.Linear(5, 1)  # generate model
loss_fn = nn.MSELoss()   # define loss function
optimizer = torch.optim.RMSprop(model.parameters(), lr = 0.01)     # create optimizer 
optimizer.zero_grad()                      # setting gradients to zero
loss_fn(model(x), y).backward()            # back propagation
optimizer.step()                           # update parameters based on gradients computed