<a href="https://colab.research.google.com/github/IvanKatorgin/Deep-Learning/blob/main/Deep_Learning_%D0%B4%D0%BE%D0%BC%D0%B0%D1%88%D0%BD%D0%B5%D0%B5_%D0%B7%D0%B0%D0%B4%D0%B0%D0%BD%D0%B8%D0%B5_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Курс "Deep Learning". Домашнее задание 2. Многослойная нейронная сеть. Каторгин И.П.

## Задание

Постройте модель на основе полносвязных слоёв для классификации Fashion MNIST из библиотеки torchvision (https://docs.pytorch.org/vision/stable/datasets.html).

Получите качество на тестовой выборке не ниже 88%

Инструкция по выполнению задания
1. Скачайте тренировочную и тестовою часть датасета Fashion MNIST
2. Постройте модель, выбрав стартовую архитектуру
3. Обучите модель и сверьте качество на тестовой части с заданным порогом
4. Изменяйте архитектуру модели пока качество на тестовой части не будет выше порога. Вариации архитектуры можно реализовать через изменение количества слоёв, количества нейронов в слоях и использование регуляризации. Можно использовать различные оптимизаторы.

In [1]:
# загрузим библиотеки
import pandas as pd, numpy as np, matplotlib.pyplot as plt, seaborn as sns, time
%matplotlib inline

In [2]:
import torch
import torchvision as tv
from torch import autograd

In [3]:
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, random_split, DataLoader

Загрузим датасет

In [4]:
BATCH_SIZE=256

In [5]:
# загрузим датасет
train_dataset = tv.datasets.MNIST('.', train=True, transform=tv.transforms.ToTensor(), download=True)
test_dataset = tv.datasets.MNIST('.', train=False, transform=tv.transforms.ToTensor(), download=True)
train = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE)
test = torch.utils.data.DataLoader(test_dataset, batch_size=BATCH_SIZE)

100%|██████████| 9.91M/9.91M [00:00<00:00, 42.1MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 1.23MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 10.2MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 4.39MB/s]


In [6]:
train_dataset[0][0].shape

torch.Size([1, 28, 28])

Создадим модель

In [7]:
model = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784, 256),
    torch.nn.ReLU(),
    torch.nn.Linear(256, 10)
)

In [8]:
model

Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=784, out_features=256, bias=True)
  (2): ReLU()
  (3): Linear(in_features=256, out_features=10, bias=True)
)

In [9]:
loss = torch.nn.CrossEntropyLoss()
trainer = torch.optim.SGD(model.parameters(), lr=.01)
num_epochs = 10

In [10]:
def train_model():
    for ep in range(num_epochs):
        train_iters, train_passed  = 0, 0
        train_loss, train_acc = 0., 0.
        start=time.time()

        model.train()
        for X, y in train:
            trainer.zero_grad()
            y_pred = model(X)
            l = loss(y_pred, y)
            l.backward()
            trainer.step()
            train_loss += l.item()
            train_acc += (y_pred.argmax(dim=1) == y).sum().item()
            train_iters += 1
            train_passed += len(X)

        test_iters, test_passed  = 0, 0
        test_loss, test_acc = 0., 0.
        model.eval()
        for X, y in test:
            y_pred = model(X)
            l = loss(y_pred, y)
            test_loss += l.item()
            test_acc += (y_pred.argmax(dim=1) == y).sum().item()
            test_iters += 1
            test_passed += len(X)

        print("ep: {}, taked: {:.3f}, train_loss: {}, train_acc: {}, test_loss: {}, test_acc: {}".format(
            ep, time.time() - start, train_loss / train_iters, train_acc / train_passed,
            test_loss / test_iters, test_acc / test_passed)
        )

Обучим модель

In [11]:
train_model()

ep: 0, taked: 13.670, train_loss: 2.026048896667805, train_acc: 0.5403, test_loss: 1.6527101933956145, test_acc: 0.749
ep: 1, taked: 12.835, train_loss: 1.3075380431844834, train_acc: 0.7852333333333333, test_loss: 0.9819915056228637, test_acc: 0.8271
ep: 2, taked: 13.187, train_loss: 0.8469145682263881, train_acc: 0.8328333333333333, test_loss: 0.6963753223419189, test_acc: 0.8546
ep: 3, taked: 13.332, train_loss: 0.6522842730613465, train_acc: 0.8536166666666667, test_loss: 0.5675402142107486, test_acc: 0.8688
ep: 4, taked: 12.915, train_loss: 0.5549589166615871, train_acc: 0.8661833333333333, test_loss: 0.49591019935905933, test_acc: 0.8787
ep: 5, taked: 12.810, train_loss: 0.49699936534496064, train_acc: 0.87555, test_loss: 0.450462743267417, test_acc: 0.8854
ep: 6, taked: 12.528, train_loss: 0.4585243506634489, train_acc: 0.8817333333333334, test_loss: 0.41913165226578714, test_acc: 0.8901
ep: 7, taked: 13.486, train_loss: 0.43106964959743177, train_acc: 0.8865166666666666, test_l

Заменим SGD на Adam и RMSProp

In [12]:
model = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784, 256),
    torch.nn.ReLU(),
    torch.nn.Linear(256, 10)
)

In [13]:
trainer = torch.optim.Adam(model.parameters(), lr=.01)
train_model()

ep: 0, taked: 10.593, train_loss: 0.25133316168918257, train_acc: 0.9229166666666667, test_loss: 0.17128781897481532, test_acc: 0.9423
ep: 1, taked: 11.243, train_loss: 0.1025048160687723, train_acc: 0.9688166666666667, test_loss: 0.14138924330472946, test_acc: 0.9579
ep: 2, taked: 10.449, train_loss: 0.07317300830036402, train_acc: 0.9773333333333334, test_loss: 0.11774192083394155, test_acc: 0.9675
ep: 3, taked: 10.953, train_loss: 0.05703299590683681, train_acc: 0.9813333333333333, test_loss: 0.11238132262624276, test_acc: 0.9706
ep: 4, taked: 12.990, train_loss: 0.05468047473460753, train_acc: 0.9823666666666667, test_loss: 0.1433022439829074, test_acc: 0.9646
ep: 5, taked: 10.009, train_loss: 0.04970085704302851, train_acc: 0.9841333333333333, test_loss: 0.12291293297967058, test_acc: 0.972
ep: 6, taked: 10.832, train_loss: 0.044771621733943515, train_acc: 0.986, test_loss: 0.10972942423977657, test_acc: 0.9748
ep: 7, taked: 10.984, train_loss: 0.03958725530178623, train_acc: 0.98

In [14]:
model = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784, 256),
    torch.nn.ReLU(),
    torch.nn.Linear(256, 10)
)

In [15]:
trainer = torch.optim.RMSprop(model.parameters(), lr=.01)
train_model()

ep: 0, taked: 10.383, train_loss: 0.9400718860803766, train_acc: 0.9010333333333334, test_loss: 0.31108668316155674, test_acc: 0.8977
ep: 1, taked: 10.148, train_loss: 0.14530890255056797, train_acc: 0.9564, test_loss: 0.19338047690689564, test_acc: 0.9386
ep: 2, taked: 10.144, train_loss: 0.10991501729301316, train_acc: 0.9665333333333334, test_loss: 0.23763030068948865, test_acc: 0.9311
ep: 3, taked: 9.770, train_loss: 0.09074426028441558, train_acc: 0.97295, test_loss: 0.1383802484764601, test_acc: 0.9623
ep: 4, taked: 9.670, train_loss: 0.0674053361778088, train_acc: 0.9789666666666667, test_loss: 0.12567196328018326, test_acc: 0.9661
ep: 5, taked: 10.184, train_loss: 0.060507484250999194, train_acc: 0.9821166666666666, test_loss: 0.16240115629771026, test_acc: 0.962
ep: 6, taked: 10.176, train_loss: 0.05011923901261167, train_acc: 0.98425, test_loss: 0.18010700640443247, test_acc: 0.9607
ep: 7, taked: 10.132, train_loss: 0.042668900713126394, train_acc: 0.9861666666666666, test_lo

Сделаем больше слоёв и Adam

In [16]:
model = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784, 512),
    torch.nn.ReLU(),
    torch.nn.Linear(512, 256),
    torch.nn.ReLU(),
    torch.nn.Linear(256, 128),
    torch.nn.ReLU(),
    torch.nn.Linear(128, 10)
)

In [17]:
trainer = torch.optim.Adam(model.parameters(), lr=.01)
train_model()

ep: 0, taked: 13.268, train_loss: 0.3182741059426297, train_acc: 0.90145, test_loss: 0.15008386662229894, test_acc: 0.956
ep: 1, taked: 13.191, train_loss: 0.13021310661186247, train_acc: 0.96255, test_loss: 0.15613317450915928, test_acc: 0.9585
ep: 2, taked: 13.430, train_loss: 0.0974827642602093, train_acc: 0.9720333333333333, test_loss: 0.1108994363945385, test_acc: 0.9725
ep: 3, taked: 15.463, train_loss: 0.08338044207107197, train_acc: 0.97635, test_loss: 0.13898571890895256, test_acc: 0.9631
ep: 4, taked: 15.469, train_loss: 0.06939768251229791, train_acc: 0.97935, test_loss: 0.13369896282674745, test_acc: 0.9665
ep: 5, taked: 15.580, train_loss: 0.07496216414010826, train_acc: 0.97865, test_loss: 0.15325484235581827, test_acc: 0.9684
ep: 6, taked: 14.247, train_loss: 0.06522328020915627, train_acc: 0.9818166666666667, test_loss: 0.14284539262007456, test_acc: 0.971
ep: 7, taked: 14.995, train_loss: 0.06057288072943846, train_acc: 0.9839166666666667, test_loss: 0.1426355266536120

Добавим dropout и batchnorm-слои

In [18]:
model = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784, 512),
    torch.nn.ReLU(),
    torch.nn.BatchNorm1d(512),
    torch.nn.Linear(512, 256),
    torch.nn.ReLU(),
    torch.nn.BatchNorm1d(256),
    torch.nn.Linear(256, 128),
    torch.nn.ReLU(),
    torch.nn.BatchNorm1d(128),
    torch.nn.Linear(128, 10)
)

In [19]:
trainer = torch.optim.Adam(model.parameters(), lr=.01)
train_model()

ep: 0, taked: 13.173, train_loss: 0.1995008583794883, train_acc: 0.9379, test_loss: 0.12515610187547282, test_acc: 0.9602
ep: 1, taked: 13.287, train_loss: 0.10008004195988178, train_acc: 0.9694333333333334, test_loss: 0.12730557382456026, test_acc: 0.9613
ep: 2, taked: 13.342, train_loss: 0.07126081689320346, train_acc: 0.97755, test_loss: 0.1177482103346847, test_acc: 0.9642
ep: 3, taked: 13.535, train_loss: 0.05215304958772786, train_acc: 0.9829333333333333, test_loss: 0.12125980069977231, test_acc: 0.9665
ep: 4, taked: 13.333, train_loss: 0.039636359859178676, train_acc: 0.9870666666666666, test_loss: 0.11033004635755787, test_acc: 0.9715
ep: 5, taked: 13.412, train_loss: 0.034412419186667244, train_acc: 0.9887833333333333, test_loss: 0.13299535576661584, test_acc: 0.9682
ep: 6, taked: 13.354, train_loss: 0.029945388161379132, train_acc: 0.9902833333333333, test_loss: 0.10614263873721938, test_acc: 0.976
ep: 7, taked: 13.584, train_loss: 0.027584979068388804, train_acc: 0.990683333

In [20]:
model = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784, 2560),
    torch.nn.ReLU(),
    torch.nn.Dropout(0.5),
    torch.nn.Linear(2560, 1280),
    torch.nn.ReLU(),
    torch.nn.Dropout(0.5),
    torch.nn.Linear(1280, 640),
    torch.nn.ReLU(),
    torch.nn.Dropout(0.5),
    torch.nn.Linear(640, 10)
)

In [21]:
trainer = torch.optim.Adam(model.parameters(), lr=.01)
train_model()

ep: 0, taked: 66.732, train_loss: 0.8713837563991547, train_acc: 0.7376166666666667, test_loss: 0.33000228377059104, test_acc: 0.9134
ep: 1, taked: 65.061, train_loss: 0.6526616034989662, train_acc: 0.8153833333333333, test_loss: 0.32608561287634075, test_acc: 0.9238
ep: 2, taked: 69.815, train_loss: 0.6956393248223244, train_acc: 0.80965, test_loss: 0.3344119225628674, test_acc: 0.9182
ep: 3, taked: 87.071, train_loss: 0.7268584770091037, train_acc: 0.7988333333333333, test_loss: 0.34029333824291824, test_acc: 0.916
ep: 4, taked: 81.369, train_loss: 0.7079244357474307, train_acc: 0.8076833333333333, test_loss: 0.41874344954267145, test_acc: 0.8943
ep: 5, taked: 83.672, train_loss: 0.730344609377232, train_acc: 0.7988666666666666, test_loss: 0.39118291400372984, test_acc: 0.9083
ep: 6, taked: 82.835, train_loss: 0.7877896130084991, train_acc: 0.7883166666666667, test_loss: 0.3731915856711566, test_acc: 0.9046
ep: 7, taked: 84.834, train_loss: 0.782990936776425, train_acc: 0.78695, test

Выводы:
1. Построена и обучена модель многослойной НС
2. Модель была оптимизирована несколькими различными способами, получив тем самым точность на тестовой выборке больше пороговой (88%)