# PyTorch Framework

## Content

1. Installation
2. PyTorch Fundamentals - Tensors and Matrices
    2.1. Numpy and PyTorch
3. PyTorch and Gradients
4. Feedforward Networks in PyTorch
5. Convolutional Neural Networks in PyTorch
6. Recurrent Neural Networks in PyTorch

## 1. Installation

Installation can be easily done through the official website:  https://pytorch.org/. Simply select OS, python version and if you want the GPU (with CUDA) or the CPU version.

### 1.2 Installation Check

In [1]:
import torch
print("Torch version:", torch.__version__)
print("CUDA is active:", torch.cuda.is_available())
print("CUDA version:", torch.version.cuda)

Torch version: 0.4.1
CUDA is active: True
CUDA version: 9.0


## 2. PyTorch Fundamentals - Matrices and Tensors

### NumPy vs. PyTorch

In [2]:
#### With values ####
#Python array
arr = [[1,2], [3,4]]
#Numpy
import numpy as np

#Bidimensional array
numpy_arr = np.array([[1,2], [3,4]])
print("Numpy array:\n", numpy_arr)

#Torch
import torch
arr_t = torch.Tensor(arr)#Conversion to torch array
print("Torch array:\n", arr_t)
##### With default values #####
print("Numpy array of ones:\n", np.ones((2,2)))
print("Torch array of ones:\n", torch.ones((2,2)))

#### With random values ####
print("Numpy array of random int:\n", np.random.rand(2,2))
print("Torch array of random int:\n", torch.rand(2,2))

Numpy array:
 [[1 2]
 [3 4]]
Torch array:
 tensor([[1., 2.],
        [3., 4.]])
Numpy array of ones:
 [[1. 1.]
 [1. 1.]]
Torch array of ones:
 tensor([[1., 1.],
        [1., 1.]])
Numpy array of random int:
 [[0.22712468 0.1864801 ]
 [0.98675439 0.86452921]]
Torch array of random int:
 tensor([[0.6608, 0.7621],
        [0.0416, 0.7732]])


### Reproducibility with Seeds

In [3]:
######## Seeds in numpy 
print("Seeds in numpy:\n")
np.random.seed(0)
print(np.random.rand(2,2))

####### Seeds in torch (CPU)
torch.manual_seed(0)
print(torch.rand(2,2))

###### Seeds in torch (GPU)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(0)
    print(torch.rand(2,2))

Seeds in numpy:

[[0.5488135  0.71518937]
 [0.60276338 0.54488318]]
tensor([[0.4963, 0.7682],
        [0.0885, 0.1320]])
tensor([[0.3074, 0.6341],
        [0.4901, 0.8964]])


### NumPy-PyTorch Bridge

Torch supports following data types:
double, float, float16, int64, int32, and uint8.
//TODO: Inserisci tabella

#### From NumPy to Torch

In [4]:
#Numpy array
np_array = np.ones((2,2))

#Torch tensor
torch_tensor = torch.from_numpy(np_array)
print(torch_tensor)

tensor([[1., 1.],
        [1., 1.]], dtype=torch.float64)


Data types matter in pyTorch:

In [5]:
#Data types matter!
np_array2 = np.ones((2,2), dtype=np.int8)
try:
    torch.from_numpy(np_array2)
except TypeError as e:
    print("TypeError:",e)

TypeError: can't convert np.ndarray of type numpy.int8. The only supported types are: double, float, float16, int64, int32, and uint8.


In [6]:
np_array3 = np.ones((2,2), dtype=np.int64)
torch.from_numpy(np_array3)

tensor([[1, 1],
        [1, 1]])

#### PyTorch to NumPy

In [7]:
#Torch tensor
torch_tensor2 = torch.ones(2,2)
print(type(torch_tensor2))
torch_to_numpy = torch_tensor2.numpy()
print(type(torch_to_numpy))

<class 'torch.Tensor'>
<class 'numpy.ndarray'>


#### Tesors on CPU vs GPU

In [8]:
#CPU
tensor_cpu = torch.ones(2,2)

In [9]:
#CPU to GPU
if torch.cuda.is_available():
    tensor_cpu.cuda()
    print(tensor_cpu)

tensor([[1., 1.],
        [1., 1.]])


In [10]:
#GPU to CPU
tensor_cpu.cpu()

tensor([[1., 1.],
        [1., 1.]])

#### Tensor Dimensions

In [11]:
#### Resizing a tensor
a = torch.ones(2,2)
print("Tensor dimension:")
print("shape variable:", a.shape)
print("size()-method", a.size())

### view()
print("Resized tensor:", a.view(4))
print("Dimension:", a.view(4).size())

Tensor dimension:
shape variable: torch.Size([2, 2])
size()-method torch.Size([2, 2])
Resized tensor: tensor([1., 1., 1., 1.])
Dimension: torch.Size([4])


#### Tensor Element-wise Operations

In [12]:
a = torch.ones(2,2)
b = torch.ones(2,2)

### Addition (element-wise)
add = a+b 
add2 = torch.add(a,b)
print(add)
print(add2)

#In-place addition
print()
add.add_(a)
print(add)

#### Subtraction
print(a-b)
print(a.sub(b))
print("A:\n",a)
print()
#In place
print(a.sub_(b))
print("A:\n",a)


tensor([[2., 2.],
        [2., 2.]])
tensor([[2., 2.],
        [2., 2.]])

tensor([[3., 3.],
        [3., 3.]])
tensor([[0., 0.],
        [0., 0.]])
tensor([[0., 0.],
        [0., 0.]])
A:
 tensor([[1., 1.],
        [1., 1.]])

tensor([[0., 0.],
        [0., 0.]])
A:
 tensor([[0., 0.],
        [0., 0.]])


### Gradients - Parameter "requires_grad"

In [13]:
### Creating a tensor with gradient which allow the accumulation of gradients
a_grad = torch.ones((2,2), requires_grad=True)
a_grad

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

In [14]:
# Behaves similarly to tensors
b_grad = torch.ones((2, 2), requires_grad=True)
print(a_grad + b_grad)
print(torch.add(a_grad, b_grad))

tensor([[2., 2.],
        [2., 2.]], grad_fn=<ThAddBackward>)
tensor([[2., 2.],
        [2., 2.]], grad_fn=<ThAddBackward>)


**What is exactly `requires_grad`?**

Define a function:
$$y_i = 5(x_i+1)^2$$

In [15]:
# x value
x = torch.ones(2, requires_grad=True)
print("x:", x)
print(x.requires_grad)

x: tensor([1., 1.], requires_grad=True)
True


In [16]:
y_test = torch.ones(2)*5
print(y_test.requires_grad)
y = 5* (x + 1) ** 2
print("y:", y)
print(y.requires_grad)

False
y: tensor([20., 20.], grad_fn=<MulBackward>)
True


**Backward should be called only on a scalar (i.e. 1-element tensor) or with gradient w.r.t. the variable**
- Let's reduce y to a scalar then...

$$o = \frac{1}{2}\sum_i y_i$$

In [17]:
o = (1/2) * torch.sum(y)
o

tensor(20., grad_fn=<MulBackward>)

<center> **Recap `y` equation**: $y_i = 5(x_i+1)^2$ </center>
<center> **Recap `o` equation**: $o = \frac{1}{2}\sum_i y_i$ </center>
<center> **Substitute `y` into `o` equation**: $o = \frac{1}{2} \sum_i 5(x_i+1)^2$ </center>
$$\frac{\partial o}{\partial x_i} = \frac{1}{2}[10(x_i+1)]$$
$$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{1}{2}[10(1 + 1)] = \frac{10}{2}(2) = 10$$

In [18]:
o.backward

<bound method Tensor.backward of tensor(20., grad_fn=<MulBackward>)>

In [20]:
x.grad

## 3. Feedforward Neural Network with PyTorch (MNIST)

In [21]:
import torch
import torch.nn as nn

### Steps
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- Step 3: Create Model Class
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

### Step 1: Loading MNIST Train Dataset

In [22]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets

In [23]:
train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Processing...
Done!


### Step 2: Make Dataset Iterable

In [24]:
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

### Step 3: Create Model Class

In [25]:
class FeedforwardNeuralNetModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(FeedforwardNeuralNetModel, self).__init__()
        # Linear function
        self.fc1 = nn.Linear(input_dim, hidden_dim) 
        # Non-linearity
        self.sigmoid = nn.Sigmoid()
        # Linear function (readout)
        self.fc2 = nn.Linear(hidden_dim, output_dim)  
    
    def forward(self, x):
        # Linear function  # LINEAR
        out = self.fc1(x)
        # Non-linearity  # NON-LINEAR
        out = self.sigmoid(out)
        # Linear function (readout)  # LINEAR
        out = self.fc2(out)
        return out

### Step 4: Instantiate Model Class
- **Input** dimension: **784** 
    - Size of image
    - $28 \times 28 = 784$
- **Output** dimension: **10**
    - 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
- **Hidden** dimension: **100**
    - Can be any number
    - Similar term
        - Number of neurons
        - Number of non-linear activation functions
        
Our model will have 1 hidden layer and **sigmoid** activation.        

In [26]:
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

### Step 5: Instantiate Loss Class
- Feedforward Neural Network: **Cross Entropy Loss**
    - _Logistic Regression_: **Cross Entropy Loss**
    - _Linear Regression_: **MSE**
   

In [27]:
criterion = nn.CrossEntropyLoss()

### Step 6: Instantiate Optimizer Class
- Simplified equation
    - $\theta = \theta - \eta \cdot \nabla_\theta $
        - $\theta$: parameters (our tensors with gradient accumulation capabilities)
        - $\eta$: learning rate (how fast we want to learn)
        - $\nabla_\theta$: parameters' gradients
- Even simplier equation
    - `parameters = parameters - learning_rate * parameters_gradients`
    - **At every iteration, we update our model's parameters**

In [28]:
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)  

### Parameters In-Depth

In [30]:
print(model.parameters())

print(len(list(model.parameters())))

# FC 1 Parameters 
print(list(model.parameters())[0].size())

# FC 1 Bias Parameters
print(list(model.parameters())[1].size())

# FC 2 Parameters
print(list(model.parameters())[2].size())

# FC 2 Bias Parameters
print(list(model.parameters())[3].size())

###### IMMAGINE?

<generator object Module.parameters at 0x0000021F25700C78>
4
torch.Size([100, 784])
torch.Size([100])
torch.Size([10, 100])
torch.Size([10])


### Step 7: Train Model
- Process 
    1. Convert inputs to tensors with gradient accumulation capabilities
    2. Clear gradient buffers
    3. Get output given inputs 
    4. Get loss
    5. Get gradients w.r.t. parameters
    6. Update parameters using gradients
        - `parameters = parameters - learning_rate * parameters_gradients`
    7. REPEAT

In [32]:
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images with gradient accumulation capabilities
        images = images.view(-1, 28*28).requires_grad_()
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Load images with gradient accumulation capabilities
                images = images.view(-1, 28*28).requires_grad_()
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct / total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.item(), accuracy))

Iteration: 500. Loss: 0.595485508441925. Accuracy: 86
Iteration: 1000. Loss: 0.5494940876960754. Accuracy: 89
Iteration: 1500. Loss: 0.45526909828186035. Accuracy: 90
Iteration: 2000. Loss: 0.4022116959095001. Accuracy: 91
Iteration: 2500. Loss: 0.22733356058597565. Accuracy: 91
Iteration: 3000. Loss: 0.27196046710014343. Accuracy: 91


## 4. Convolutional Neural Networks in PyTorch

## Quellen

1. Deep Learning Wizard: https://www.deeplearningwizard.com/deep_learning/intro/ (images, explanations, examples..)
2. Siraj Raval, "Pytorch in 5 Minutes": https://www.youtube.com/watch?v=nbJ-2G2GXL0 (very quick start)
3. Udacity, Intro to Deep Learning with PyTorch: https://www.udacity.com/course/deep-learning-pytorch--ud188
