# Notas sobre cómo usar Pytorch

---

Para estas notas se usa el dataset CIFAR-10 que se encuentra en la carpeta `cs231n-Computer-Vision/assignments/2019/assignment2/cs231n/datasets/`

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler

import torchvision.datasets as dset
import torchvision.transforms as T

import numpy as np

Preprocesamiento del dataset y división en tran / val / test:

In [2]:
NUM_TRAIN = 49000 # Cantidad de muestras del training set.

# The torchvision.transforms package provides tools for preprocessing data
# and for performing data augmentation; here we set up a transform to
# preprocess the data by subtracting the mean RGB value and dividing by the
# standard deviation of each RGB value; we've hardcoded the mean and std.
transform = T.Compose([
                T.ToTensor(),
                T.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
            ])

# We set up a Dataset object for each split (train / val / test); Datasets load
# training examples one at a time, so we wrap each Dataset in a DataLoader which
# iterates through the Dataset and forms minibatches. We divide the CIFAR-10
# training set into train and val sets by passing a Sampler object to the
# DataLoader telling how it should sample from the underlying Dataset.
cifar10_train = dset.CIFAR10('../assignments/2019/assignment2/cs231n/datasets/', train=True, download=True,
                             transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=64, 
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

cifar10_val = dset.CIFAR10('../assignments/2019/assignment2/cs231n/datasets/', train=True, download=True,
                           transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=64, 
                        sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, 50000)))

cifar10_test = dset.CIFAR10('../assignments/2019/assignment2/cs231n/datasets/', train=False, download=True, 
                            transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=64)

Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified


Seteo de algunas constantes útiles.

In [3]:
# Uso de GPU:
USE_GPU = True
if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
    
# Tipo numérico a usar:
dtype = torch.float32 

# Constante para controlar cada cuánto se imprime el valor de la función loss:
print_every = 100

## Ejemplo 1: Red de dos capas fully-connected

Para cualquiera de los ejemplos que vamos a ver, se implementa una función que reciba el tensor de entrada, los parámetros de la red y devuelva la salida (es decir, ejecute el forward pass).

En este ejemplo vamos a hacer una red de dos capas fully connected. Para eso, hacemos una función `TwoLayerFullyConnected()` que reciba un minibatch de muestras `x` y una lista `params` que contenga las variables del gráfico que hacen de parámetros. Tanto la entrada como los parámetros son instancias de la clase `torch.Tensor()` y tienen el flag `requires_grad=True` para crear el computational graph en background automáticamente. En esta función se utilizan funciones pre-armadas de `torch.nn.functional` como la activación ReLU.

Por último, testeamos la función que creamos con `two_layer_fc_test()`

In [4]:
import torch.nn.functional as F  # useful stateless functions

def TwoLayerFullyConnected(x, params):
    """
    A fully-connected neural networks; the architecture is:
    NN is fully connected -> ReLU -> fully connected layer.
    Note that this function only defines the forward pass; 
    PyTorch will take care of the backward pass for us.
    
    The input to the network will be a minibatch of data, of shape
    (N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
    and the output layer will produce scores for C classes.
    
    Inputs:
    - x: A PyTorch Tensor of shape (N, d1, ..., dM) giving a minibatch of
      input data.
    - params: A list [w1, w2] of PyTorch Tensors giving weights for the network;
      w1 has shape (D, H) and w2 has shape (H, C).
    
    Returns:
    - scores: A PyTorch Tensor of shape (N, C) giving classification scores for
      the input data x.
    """
    
    # Obtenemos la entrada "flatted".
    batch_size = x.shape[0]
    x = x.view(batch_size,-1)
    
    # Extraemos los parámetros.
    w1, w2 = params # w1.requires_grad=True, w2.requires_grad=True
    
    # Ejecutamos el forward pass. Esto crea en background un computational graph
    # que tiene toda la información de cómo hacer backpropagation.
    x = F.relu(x.mm(w1)) # Multiplicación matricial tradicional + ReLu
    x = x.mm(w2) # Multiplicación matricial tradicional
    return x
    

def two_layer_fc_test():
    hidden_layer_size = 42
    x = torch.zeros((64, 50), dtype=dtype, requires_grad=True)  # minibatch size 64, feature dimension 50
    w1 = torch.zeros((50, hidden_layer_size), dtype=dtype, requires_grad=True)
    w2 = torch.zeros((hidden_layer_size, 10), dtype=dtype, requires_grad=True)
    scores = TwoLayerFullyConnected(x, [w1, w2])
    print(scores.size())  # you should see [64, 10]

two_layer_fc_test()

torch.Size([64, 10])


## Ejemplo 2: Red convolucional de 3 capas

Ahora vamos a hacer lo mismo pero con una arquitectura más elaborada:

1. A convolutional layer (with bias) with `channel_1` filters, each with shape `KW1 x KH1`, and zero-padding of two
2. ReLU nonlinearity
3. A convolutional layer (with bias) with `channel_2` filters, each with shape `KW2 x KH2`, and zero-padding of one
4. ReLU nonlinearity
5. Fully-connected layer with bias, producing scores for C classes.

Seguimos usando las funciones pre-armadas de `torch.nn.functional`.

Recordamos cómo son los hiperparámetros de las redes convolucionales:

**Summary**. To summarize, the Conv Layer:

* Accepts a volume of size $C_{in} \times W_{in} \times H_{in}$

* Requires four hyperparameters:

    * Number of filters $K$,
    * their spatial extent $F_W \times F_H$,
    * the stride $S_W \times S_H$,
    * the amount of zero padding $P_W \times P_H$.
    
* Produces a volume of size $C_{out} \times W_{out} \times H_{out}$ where:

    * $W_{out}=(W_{in}−F_W+2P_W)/S_W+1$
    * $H_{out}=(H_{in}−F_H+2P_H)/S_H+1$ (i.e. width and height are computed equally by symmetry)
    * $C_{out}=K$

* With parameter sharing, it introduces $F_W \cdot F_H \cdot C_{in}$ weights per filter, for a total of $(F_W \cdot F_H \cdot C_{in}) \cdot K$ weights and $K$ biases.

* In the output volume, the d-th depth slice (of size $W_2 \times H_2$) is the result of performing a valid convolution of the d-th filter over the input volume with a stride of S, and then offset by d-th bias.

A common setting of the hyperparameters is $F_H=F_W=3$,$S_H=S_W=1$,$P_H=P_W=1$. However, there are common conventions and rules of thumb that motivate these hyperparameters. See the ConvNet architectures section below.

In [5]:
def ThreeLayerConvNet(x,params):
    
    # Obtenemos las dimensiones de la entrada.
    batch_size, C, H, W = x.shape
    
    # Obtenemos los parámetros.
    conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
    
    # Forward pass. 
    x = F.conv2d(x,conv_w1,conv_b1,padding=2) # layer convolucional
    x = F.relu(x) # activación ReLU
    x = F.conv2d(x,conv_w2,conv_b2,padding=1) # layer convolucional
    x = F.relu(x) # activación ReLU
    x = x.view(batch_size,-1).mm(fc_w) + fc_b # fully-connected layer
    
    return x
    
    
def three_layer_convnet_test():
    x = torch.zeros((64, 3, 32, 32), dtype=dtype)  # minibatch size 64, image size [3, 32, 32]

    conv_w1 = torch.zeros((6, 3, 5, 5), dtype=dtype)  # [out_channel, in_channel, kernel_H, kernel_W]
    conv_b1 = torch.zeros((6,))  # out_channel
    conv_w2 = torch.zeros((9, 6, 3, 3), dtype=dtype)  # [out_channel, in_channel, kernel_H, kernel_W]
    conv_b2 = torch.zeros((9,))  # out_channel

    # you must calculate the shape of the tensor after two conv layers, before the fully-connected layer
    fc_w = torch.zeros((9 * 32 * 32, 10))
    fc_b = torch.zeros(10)

    scores = ThreeLayerConvNet(x, [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b])
    print(scores.size())  # you should see [64, 10]
three_layer_convnet_test()
    

torch.Size([64, 10])


## Función útil: check accuracy

Definimos una función que calcula la cantidad de aciertos en el entrenamiento.

In [11]:
def CheckAccuracy(loader, model, params, device='cpu', dtype=torch.float32):
    """
    Check the accuracy of a classification model.
    
    Inputs:
    - loader: A DataLoader for the data split we want to check
    - model: A function that performs the forward pass of the model,
      with the signature scores = model(x, params)
    - params: List of PyTorch Tensors giving parameters of the model
    
    Returns: Nothing, but prints the accuracy of the model
    """
    
    split = 'val' if loader.dataset.train else 'test'
    print('Checking accuracy on the %s set' % split)
    num_correct, num_samples = 0, 0
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.int64)
            scores = model(x, params)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f%%)' % (num_correct, num_samples, 100 * acc))

## Entrenamiento

In [13]:
def random_weight(shape):
    """
    Create random Tensors for weights; setting requires_grad=True means that we
    want to compute gradients for these Tensors during the backward pass.
    We use Kaiming normalization: sqrt(2 / fan_in)
    """
    if len(shape) == 2:  # FC weight
        fan_in = shape[0]
    else:
        fan_in = np.prod(shape[1:]) # conv weight [out_channel, in_channel, kH, kW]
    # randn is standard normal distribution generator. 
    w = torch.randn(shape, device=device, dtype=dtype) * np.sqrt(2. / fan_in)
    w.requires_grad = True
    return w

def zero_weight(shape):
    return torch.zeros(shape, device=device, dtype=dtype, requires_grad=True)

In [7]:
def train(model, params, learning_rate, loaders, device='cpu', dtype=torch.float32):
    """
    Train a model on CIFAR-10.
    
    Inputs:
    - model: A Python function that performs the forward pass of the model.
      It should have the signature scores = model(x, params) where x is a
      PyTorch Tensor of image data, params is a list of PyTorch Tensors giving
      model weights, and scores is a PyTorch Tensor of shape (N, C) giving
      scores for the elements in x.
    - params: List of PyTorch Tensors giving weights for the model
    - learning_rate: Python scalar giving the learning rate to use for SGD
    
    Returns: Nothing
    """
    
    loader_train, loader_val = loaders
    
    for t, (x, y) in enumerate(loader_train):
        
        # Move the data to the proper device (GPU or CPU)
        x = x.to(device=device, dtype=dtype)
        y = y.to(device=device, dtype=torch.long)

        # Forward pass: compute scores and loss
        scores = model(x, params)
        loss = F.cross_entropy(scores, y)

        # Backward pass: PyTorch figures out which Tensors in the computational
        # graph has requires_grad=True and uses backpropagation to compute the
        # gradient of the loss with respect to these Tensors, and stores the
        # gradients in the .grad attribute of each Tensor.
        loss.backward()

        # Update parameters. We don't want to backpropagate through the
        # parameter updates, so we scope the updates under a torch.no_grad()
        # context manager to prevent a computational graph from being built.
        with torch.no_grad():
            for w in params:
                w -= learning_rate * w.grad

                # Manually zero the gradients after running the backward pass
                w.grad.zero_()

        if t % print_every == 0:
            print('Iteration %d, loss = %.4f' % (t, loss.item()))
            CheckAccuracy(loader_val, model, params, device=device, dtype=dtype)
            print()

In [12]:
hidden_layer_size = 4000
learning_rate = 1e-2

w1 = random_weight((3 * 32 * 32, hidden_layer_size))
w2 = random_weight((hidden_layer_size, 10))
loaders = [loader_train, loader_val]

train(TwoLayerFullyConnected, [w1, w2], learning_rate, loaders)

Iteration 0, loss = 3.4100
Checking accuracy on the val set
Got 149 / 1000 correct (14.90%)

Iteration 100, loss = 1.8748
Checking accuracy on the val set
Got 371 / 1000 correct (37.10%)

Iteration 200, loss = 2.1303
Checking accuracy on the val set
Got 377 / 1000 correct (37.70%)

Iteration 300, loss = 1.9884
Checking accuracy on the val set
Got 422 / 1000 correct (42.20%)

Iteration 400, loss = 1.7753
Checking accuracy on the val set
Got 412 / 1000 correct (41.20%)

Iteration 500, loss = 1.7512
Checking accuracy on the val set
Got 414 / 1000 correct (41.40%)

Iteration 600, loss = 1.7685
Checking accuracy on the val set
Got 427 / 1000 correct (42.70%)

Iteration 700, loss = 2.3982
Checking accuracy on the val set
Got 439 / 1000 correct (43.90%)



In [None]:
learning_rate = 3e-3

channel_1, KW1, KH1 = 32, 32, 32
channel_2, KW2, KH2 = 16, 32, 32

conv_w1 = random_weight((channel_1, KW1, KH1))
conv_b1 = zero_weight((channel_1, KW1 * KH1))
conv_w2 = random_weight((channel_2, KW2, KH2))
conv_b2 = zero_weight((channel_2, KW2 * KH2))
fc_w = random_weight((3 * 32 * 32, hidden_layer_size))
fc_b = zero_weight((hidden_layer_size, 10))

params = [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b]
train_part2(three_layer_convnet, params, learning_rate)