# CNN для классификации поз

Цель данного эксперимента - оценить точность классификации позы человека с помощью CNN

# 1. Подготовка

В текущем блоке происходит загрузка обучающей выборки. Сейчас для теста используется выборка CFAR-10. Ее в последствии нужно будет заменить на MPII, но для начала можно запустить сеть на CFAR-10 и убедиться, что все работет.

Также потребуется установить Pytorch. Инструкция по установке здесь: https://pytorch.org/get-started/locally/#anaconda

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler

import torchvision.datasets as dset
import torchvision.transforms as T

import numpy as np

Загрузка выборки. После того, как начнете работать с MPII, этот код будет заменен на загрузку соответствующей выборки.

In [None]:
NUM_TRAIN = 49000

# The torchvision.transforms package provides tools for preprocessing data
# and for performing data augmentation; here we set up a transform to
# preprocess the data by subtracting the mean RGB value and dividing by the
# standard deviation of each RGB value; we've hardcoded the mean and std.
transform = T.Compose([
                T.ToTensor(),
                T.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
            ])

# We set up a Dataset object for each split (train / val / test); Datasets load
# training examples one at a time, so we wrap each Dataset in a DataLoader which
# iterates through the Dataset and forms minibatches. We divide the CIFAR-10
# training set into train and val sets by passing a Sampler object to the
# DataLoader telling how it should sample from the underlying Dataset.
cifar10_train = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                             transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=64, 
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

cifar10_val = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                           transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=64, 
                        sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, 50000)))

cifar10_test = dset.CIFAR10('./cs231n/datasets', train=False, download=True, 
                            transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=64)

Можно использовать для рассчетов GPU, задавая значение флага `USE_GPU`. Если компьютер не поддерживает CUDA, `torch.cuda.is_available()` вернет False.

In [None]:
USE_GPU = True

dtype = torch.float32 # we will be using float throughout this tutorial

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

# Constant to control how frequently we print train loss
print_every = 100

print('using device:', device)

## Проверка точности 

In [None]:
# TODO: Заменить эту проверку точности на проверку точности средствами PyTorch
def check_accuracy_part34(loader, model):
    if loader.dataset.train:
        print('Checking accuracy on validation set')
    else:
        print('Checking accuracy on test set')   
    num_correct = 0
    num_samples = 0
    model.eval()  # set model to evaluation mode
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)
            scores = model(x)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))

## Обучение сети

In [None]:
# TODO: Заменить это обучение на обучение средствами PyTorch
def train_part34(model, optimizer, epochs=1):
    """
    Train a model on CIFAR-10 using the PyTorch Module API.
    
    Inputs:
    - model: A PyTorch Module giving the model to train.
    - optimizer: An Optimizer object we will use to train the model
    - epochs: (Optional) A Python integer giving the number of epochs to train for
    
    Returns: Nothing, but prints model accuracies during training.
    """
    model = model.to(device=device)  # move the model parameters to CPU/GPU
    for e in range(epochs):
        for t, (x, y) in enumerate(loader_train):
            model.train()  # put model to training mode
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)

            scores = model(x)
            loss = F.cross_entropy(scores, y)

            # Zero out all of the gradients for the variables which the optimizer
            # will update.
            optimizer.zero_grad()

            # This is the backwards pass: compute the gradient of the loss with
            # respect to each  parameter of the model.
            loss.backward()

            # Actually update the parameters of the model using the gradients
            # computed by the backwards pass.
            optimizer.step()

            if t % print_every == 0:
                print('Iteration %d, loss = %.4f' % (t, loss.item()))
                check_accuracy_part34(loader_val, model)
                print()

## PyTorch Sequential API

API для последовательностей слоев я оставил для того, чтобы вы с ним ознакомились. Оно бывает полезно, в частности, с его помощью можно сделать сетку, реализованную мной, более приятной для чтения. Но делать этого не обязательно

For simple models like a stack of feed forward layers, you still need to go through 3 steps: subclass `nn.Module`, assign layers to class attributes in `__init__`, and call each layer one by one in `forward()`. Is there a more convenient way? 

Fortunately, PyTorch provides a container Module called `nn.Sequential`, which merges the above steps into one. It is not as flexible as `nn.Module`, because you cannot specify more complex topology than a feed-forward stack, but it's good enough for many use cases.

### Sequential API: Two-Layer Network
Let's see how to implement two-layer fully connected network example with `nn.Sequential`, and train it using the training loop defined above.

You don't need to tune any hyperparameters here, but you shoud achieve above 40% accuracy after one epoch of training.

In [None]:
# We need to wrap `flatten` function in a module in order to stack it
# in nn.Sequential
class Flatten(nn.Module):
    def forward(self, x):
        return flatten(x)

hidden_layer_size = 4000
learning_rate = 1e-2

model = nn.Sequential(
    Flatten(),
    nn.Linear(3 * 32 * 32, hidden_layer_size),
    nn.ReLU(),
    nn.Linear(hidden_layer_size, 10),
)

# you can use Nesterov momentum in optim.SGD
optimizer = optim.SGD(model.parameters(), lr=learning_rate,
                     momentum=0.9, nesterov=True)

train_part34(model, optimizer)

### Sequential API: Three-Layer ConvNet
Here you should use `nn.Sequential` to define and train a three-layer ConvNet with the same architecture we used in Part III:

1. Convolutional layer (with bias) with 32 5x5 filters, with zero-padding of 2
2. ReLU
3. Convolutional layer (with bias) with 16 3x3 filters, with zero-padding of 1
4. ReLU
5. Fully-connected layer (with bias) to compute scores for 10 classes

You should initialize your weight matrices using the `random_weight` function defined above, and you should initialize your bias vectors using the `zero_weight` function above.

You should optimize your model using stochastic gradient descent with Nesterov momentum 0.9.

Again, you don't need to tune any hyperparameters but you should see accuracy above 55% after one epoch of training.

In [None]:
channel_1 = 32
channel_2 = 16
learning_rate = 1e-2

model = None
optimizer = None

model = nn.Sequential(
    nn.Conv2d(3, channel_1, (5, 5), padding=2),
    nn.ReLU(),
    nn.Conv2d(channel_1, channel_2, (3, 3), padding=1),
    nn.ReLU(),
    Flatten(),
    nn.Linear(channel_2 * 32 * 32, 10)
)

optimizer = optim.SGD(model.parameters(), lr=learning_rate,
                     momentum=0.9, nesterov=True)

train_part34(model, optimizer)

# CNN для классификации поз

Этот блок пояснений я оставил, потому что там есть полезная для вас информация.

Useful links

* Layers in torch.nn package: http://pytorch.org/docs/stable/nn.html
* Activations: http://pytorch.org/docs/stable/nn.html#non-linear-activations
* Loss functions: http://pytorch.org/docs/stable/nn.html#loss-functions
* Optimizers: http://pytorch.org/docs/stable/optim.html


### Things you might try:
- **Filter size**: Above we used 5x5; would smaller filters be more efficient?
- **Number of filters**: Above we used 32 filters. Do more or fewer do better?
- **Pooling vs Strided Convolution**: Do you use max pooling or just stride convolutions?
- **Batch normalization**: Try adding spatial batch normalization after convolution layers and vanilla batch normalization after affine layers. Do your networks train faster?
- **Network architecture**: The network above has two layers of trainable parameters. Can you do better with a deep network? Good architectures to try include:
    - [conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    - [conv-relu-conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    - [batchnorm-relu-conv]xN -> [affine]xM -> [softmax or SVM]
- **Global Average Pooling**: Instead of flattening and then having multiple affine layers, perform convolutions until your image gets small (7x7 or so) and then perform an average pooling operation to get to a 1x1 image picture (1, 1 , Filter#), which is then reshaped into a (Filter#) vector. This is used in [Google's Inception Network](https://arxiv.org/abs/1512.00567) (See Table 1 for their architecture).
- **Regularization**: Add l2 weight regularization, or perhaps use Dropout.

### Tips for training
For each network architecture that you try, you should tune the learning rate and other hyperparameters. When doing this there are a couple important things to keep in mind:

- If the parameters are working well, you should see improvement within a few hundred iterations
- Remember the coarse-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all.
- Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.
- You should use the validation set for hyperparameter search, and save your test set for evaluating your architecture on the best parameters as selected by the validation set.

### Going above and beyond
If you are feeling adventurous there are many other features you can implement to try and improve your performance. You are **not required** to implement any of these, but don't miss the fun if you have time!

- Alternative optimizers: you can try Adam, Adagrad, RMSprop, etc.
- Alternative activation functions such as leaky ReLU, parametric ReLU, ELU, or MaxOut.
- Model ensembles
- Data augmentation
- New Architectures
  - [ResNets](https://arxiv.org/abs/1512.03385) where the input from the previous layer is added to the output.
  - [DenseNets](https://arxiv.org/abs/1608.06993) where inputs into previous layers are concatenated together.
  - [This blog has an in-depth overview](https://chatbotslife.com/resnets-highwaynets-and-densenets-oh-my-9bb15918ee32)

In [None]:
import torch.nn.functional as F  # useful stateless functions

model = None
optimizer = None

#[conv-relu-conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]

N = 2 #num of conv-relu layers
M = 1 #num of hidden layers
learning_rate = 1e-2
conv_channels = [64, 128, 192, 256] # количество каналов для свертоных слоев
affine_channels = [128] # количество признаков для скрытых афинных слоев

def flatten(x):
    N = x.shape[0] # read in N, C, H, W
    return x.view(N, -1)  # "flatten" the C * H * W values into a single vector per image

# TODO: Вынести класс для сети в отдельный модульй
class MarvelousConvNet(nn.Module):
    def __init__(self, N, M, in_channel, conv_channels, affine_channels, num_classes):
        super().__init__()
        
        # Создаем списки слоев (эти слои имеют параметры для обучения)
        self.convs = nn.ModuleList() # сверточные
        self.affines = nn.ModuleList() # афинные
        self.conv_norms = nn.ModuleList() # нормализация после сверток
        self.affine_norms = nn.ModuleList() # нормализация после афинных
        
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(2, stride=2)
        
        # дальше и до конца конструктора - добавление всерточных и афинных слоев в соответствии со структурой сетки
        # заданной параметрами in_channel, conv_channels, affine_channels, num_classes
        self.convs.append(nn.Conv2d(in_channel, conv_channels[0], (3, 3), padding=1))
        self.convs.append(nn.Conv2d(conv_channels[0], conv_channels[1], (3, 3), padding=1))
        nn.init.kaiming_normal_(self.convs[0].weight)
        nn.init.kaiming_normal_(self.convs[1].weight)
        
        im_H ,im_W = 32, 32        
        self.conv_norms.append(nn.BatchNorm2d(conv_channels[0]))
        self.conv_norms.append(nn.BatchNorm2d(conv_channels[1]))
        
        for i in range(1, N):
            self.convs.append(nn.Conv2d(conv_channels[i * 2 - 1], conv_channels[i * 2], (3, 3), padding=1))
            self.convs.append(nn.Conv2d(conv_channels[i * 2], conv_channels[i * 2 + 1], (3, 3), padding=1))
            
            im_H, im_W = im_H // 2, im_W // 2
            self.conv_norms.append(nn.BatchNorm2d(conv_channels[i * 2]))
            self.conv_norms.append(nn.BatchNorm2d(conv_channels[i * 2 + 1]))
            
            nn.init.kaiming_normal_(self.convs[i * 2].weight)
            nn.init.kaiming_normal_(self.convs[i * 2 + 1].weight)
            
            
        for i in range(0, M):
            channel = conv_channels[-1]
            d = 32 // 2 ** N
            
            self.affines.append(nn.Linear(channel * d * d, affine_channels[i]))
            self.affine_norms.append(nn.BatchNorm1d(affine_channels[i]))
            
            nn.init.kaiming_normal_(self.affines[i].weight)
            
        self.affines.append(nn.Linear(affine_channels[-1], num_classes))
        nn.init.kaiming_normal_(self.affines[-1].weight)

    def forward(self, x):
        scores = None
        
        scores = x
        
        #TODO: Заменить эти страшненькие вычисления на Sqeuenses, но это уже в последнюю очередь
        for i in range(0, N):
            scores = self.relu(self.conv_norms[i * 2](self.convs[i * 2](scores)))
            scores = self.relu(self.conv_norms[i * 2 + 1](self.convs[i * 2 + 1](scores)))
            scores = self.maxpool(scores)
            
        scores = flatten(scores)
        
        for i in range(0, N - 1):
            scores = self.relu(self.affine_norms[i](self.affines[i](scores)))
            
        scores = self.affines[-1](scores)

        return scores
    
model = MarvelousConvNet(N, M, 3, conv_channels, affine_channels, 10)
optimizer = optim.SGD(model.parameters(), lr=learning_rate,
                     momentum=0.9, nesterov=True)

# Точность должна быть больше 70%
train_part34(model, optimizer, epochs=10)

## Describe what you did 

In the cell below you should write an explanation of what you did, any additional features that you implemented, and/or any graphs that you made in the process of training and evaluating your network.

TODO: Describe what you did

## Test set -- run this only once

Now that we've gotten a result we're happy with, we test our final model on the test set (which you should store in best_model). Think about how this compares to your validation set accuracy.

In [None]:
best_model = model
check_accuracy_part34(loader_test, best_model)