<img src="https://s8.hostingkartinok.com/uploads/images/2018/08/308b49fcfbc619d629fe4604bceb67ac.jpg" width=500, height=450>
<h3 style="text-align: center;"><b>Phystech School of Applied Mathematics and Informatics (PSAMI) MIPT</b></h3>

---

<h2 style="text-align: center;"><b>Practice: Clothes recognition</b></h2>

---

Welcome to the 6th lesson's practice!  

We assume that you have already checked our lesson about Multilayer Neural Networks. Here we will learn how to train the MLP to recognize different clothes types.

<h2 style="text-align: center;"><b>FashionMNIST</b></h2>

<img src="https://emiliendupont.github.io/imgs/mnist-chicken/mnist-and-fashion-examples.png">

On the right of the picture you can see the dataset we will work with: grasyscale images of clothes with 28x28 resolution. On the left is more "common" dataset to start with -- MNIST dataset of handwritten numbers. The methods that you will use here are applicable to all multiclass grayscale datasets (with low resolution).

<h3 style="text-align: center;"><b>Original dataset: https://github.com/zalandoresearch/fashion-mnist#get-the-data</b></h3> 

<h2 style="text-align: center;"><b>Data</b></h2>

We want to predict the clothes type label `y` given its grayscale image `X`. Let's load the data:

In [0]:
import torch
import torchvision

In [0]:
from torchvision import transforms

BATCH_SIZE = 4

transform = transforms.Compose([transforms.ToTensor()])

trainset = torchvision.datasets.FashionMNIST(root='./data', train=True, 
                                             download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE, 
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.FashionMNIST(root='./data', train=False, 
                                            download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=BATCH_SIZE, 
                                         shuffle=False, num_workers=2)

classes = tuple(str(i) for i in range(10))

Here we've downloaded the dataset from PyTorch using  `datasets` module, then we created the `DataLoader` instances for train and test sets separately. We use `transforms` to cast the input data into `torch.Tensor()` and have set the `batch_size` to be equal to 32. What is the batch size? Check out the 7th lecture of our course about training the neural networks!

---

**Mini-batch (or just "batch")** is the subset of the original dataset. Mini-batch usually has a small number of elements, e.g. 32, 64 or 128 (compared to thousands in the original dataset). In order to avoid ordinal dependencies in data (excluding time series data) batch is usually constructed with random sampling from the original dataset.

The motivation for using batches instead of the full dataset is as follows: optimization methods such as SGD or Adam are too noisy when computed using only one element, and too computationally expensive when computed using the full dataset. That's why we use stochastic optimization with just a portion of the dataset

**One iteration** of the optimizer stands for the calculation with **one batch**.  
**One epoch** of the optimizer stands for the calculation of gradients for **all the batches (the whole training set)**. 

If we have 60000 objects and the batch_size = 64 then one epoch takes 60000 / 64 = 937,5 = 938 iterations.

---

In [0]:
trainloader.dataset.data.shape

In [0]:
testloader.dataset.data.shape

Let's see the first image:

In [0]:
# torch.Tensor to numpy array
numpy_img = trainloader.dataset.data[0].numpy()

In [0]:
import matplotlib.pyplot as plt
import numpy as np

plt.imshow(numpy_img);

In [0]:
# play aroung with this cell to draw a random image
i = np.random.randint(low=0, high=60000)

plt.imshow(trainloader.dataset.data[i].numpy(), cmap='gray');

We can iterate over the dataloader this way:

In [0]:
for data in trainloader:
    print(data)
    break

<h2 style="text-align: center;"><b>PyTorch Neural Network for clothes recognition</b></h2>

Implement the architecture of your neural network:

In [0]:
net = torch.nn.Sequential(
    ???
)

Multiclass classification loss:

In [0]:
loss_fn = torch.nn.CrossEntropyLoss(size_average=False)

We are ready to train the network (Hint: check the seminar materials):

In [0]:
NUM_EPOCHS = 100

learning_rate = 1e-4
optimizer = torch.optim.SGD(net.parameters(), lr=learning_rate)

for epoch_num in range(NUM_EPOCHS):
    for X_batch, y_batch in trainloader:
        
        # bathc generator returns iamges as matrices
        # for MLP we need pixels to be the row-feature-vector
        # so we "unroll" each image 28x28 in 784-dimesional vector
        X_batch = X_batch.view(BATCH_SIZE, -1)
            
        # forward pass
        <Your code here>

        # backward pass
        <Your code here>

We've trained our network. Let's see per class accuracy on the training set:

In [0]:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
           'Sandal', 'Shirt', 'Sneaker','Bag', 'Ankle boot']

with torch.no_grad():
    for X_batch, y_batch in trainloader:
        y_pred = net(X_batch.view(BATCH_SIZE, -1))
        _, predicted = torch.max(y_pred, 1)
        c = (predicted == y_batch).squeeze()
        for i in range(len(y_pred)):
            label = y_batch[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1

for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))

And on the test set:

In [0]:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
           'Sandal', 'Shirt', 'Sneaker','Bag', 'Ankle boot']

with torch.no_grad():
    for X_batch, y_batch in testloader:
        y_pred = net(X_batch.view(BATCH_SIZE, -1))
        _, predicted = torch.max(y_pred, 1)
        c = (predicted == y_batch).squeeze()
        for i in range(len(y_pred)):
            label = y_batch[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1

for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))

Try to tune hyperparameters and the network architecture to reach 0.9+ quality for each class!

#### Oprional task

Implement the function that takes as input the index of the image in the training set and prints the image and the neural network's prediction for this image.

In [0]:
def visualize(index):
    <Your code here>

In [0]:
# Test the function here

---

**Thresholds for points for this hometask will be published on Canvas.**

<h3 style="text-align: center;"><b>Further reading</b></h3>

*Kaggle kernels for FashionMNIST: https://www.kaggle.com/zalando-research/fashionmnist/kernels*