# <font style="color:blue">Obsah</font>

- [Spusťte TensorBoard](#launch)
- [Nástroje pro zpracování dat](#utils)
- [Přidaní vložení dat / Projektor](#embeds)
- [Konfigurace systému](#sys-config)
- [Konfigurace tréninku](#train-config)
- [Nastavení systému](#sys-setup)
- [Přidejte PR křivky do TensorBoard](#pr-curves)
- [Odešlete nesprávnou předpověď do TensorBoard](#wrong-preds)
- [Tréninková funkce](#train-fn)
- [Funkce validace](#validate-fn)
- [Přidejte histogram vah a síť grafů](#hist)
- [Hlavní funkce pro trénink a validaci](#main)
- [Optimalizátor a plánovač](#optim)
- [ResNet Model](#model)
- [Transfer Learning](#tl)
- [Fine-Tuning](#fine-tune)

# <font style="color:blue">Přenos učení a jemné ladění</font>

Naučte se vyladit předem trénovaný model pro jiný úkol.

Když trénujeme síť od nuly, čelíme dvěma omezením:

- Potřebné obrovské množství dat - Protože síť má miliony parametrů, k získání optimální sady parametrů potřebujete hodně dat.


- Potřebný vysoký výpočetní výkon - I když máte dostatek dat, školení obecně vyžaduje více iterací, což si zase vybírá daň na výpočetních zdrojích.

Předtrénované modely jsou trénovány na velmi rozsáhlých problémech klasifikace obrázků. Konvoluční vrstvy fungují jako extraktor prvků a plně propojené vrstvy se chovají jako klasifikátory.

Tyto velmi velké modely viděly obrovské množství obrázků, takže mají tendenci učit se dobré, rozlišující vlastnosti. Buď použijte konvoluční vrstvy pouze jako extraktor prvků a změňte poslední vrstvu v souladu s problémem. Nebo vyladit již natrénované konvoluční vrstvy tak, aby vyhovovaly aktuálnímu problému. První přístup se nazývá **Transfer Learning** a druhý **Fine-tuning**.

Chcete-li síť doladit, stačí upravit parametry již natrénované sítě tak, aby se přizpůsobila nové úloze. Počáteční vrstvy sítě se učí velmi obecné rysy. Ale jak postupujeme výše v síti, vrstvy mají tendenci učit se vzory specifičtější pro úkol, na který jsou trénovány. Počáteční vrstvy tedy zmrazíme nebo ponecháme nedotčené pro jemné doladění (Fine-tuning) a znovu trénujeme pouze pozdější vrstvy pro daný úkol.

Jemné doladění (Fine-tuning) se tak vyhne oběma výše uvedeným omezením.

1. Pro školení zde není potřeba mnoho dat, protože:

 - Za prvé, netrénujeme celou síť.

 - Za druhé, část, která je trénována, není trénována od nuly.

2. Je třeba aktualizovat méně parametrů, takže je také potřeba méně času.


Obecně platí, že když máme malou tréninkovou sadu a model byl předem vycvičen k řešení podobného úkolu, používáme přenosové učení (Transfer learning). Pokud však máme dostatek dat, snažíme se konvoluční vrstvy vyladit, aby se naučily robustnější funkce relevantní pro náš problém. Pro podrobný přehled Fine-tuning and Transfer Learning [**klikněte sem**](http://cs231n.github.io/transfer-learning/).

In [1]:
%matplotlib inline

In [2]:
import matplotlib.pyplot as plt  # one of the best graphics library for python
plt.style.use('ggplot')

In [3]:
import os
import time

from typing import Iterable
from dataclasses import dataclass

import multiprocessing as mp
mp.set_start_method('spawn', force=True)

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
torch.multiprocessing.set_start_method('spawn', force=True)

from torchvision import datasets, transforms, models

from torch.optim import lr_scheduler

from torch.utils.tensorboard import SummaryWriter

# <font style="color:blue">Spusťte TensorBoard</font><a name="launch"></a>

Po zahájení tréninku použijte tlačítko aktualizace na ovládacím panelu k zobrazení tréninkových metrik v reálném čase.

[Zde najdete odkaz na protokoly tensorboard.dev](https://tensorboard.dev/experiment/gEH87smeTpKbyVOdgRVknA/). 

**Poznámka:** V době nahrávání tohoto protokolu tensorbaord.dev podporuje pouze `skaláry`, `grafy`, `histogramy`, `distribuce` a `hparamy`. Odkaz tedy nemá `images`, `pr-curves` a `projectors`. 

In [4]:
!tensorboard --version

  pid, fd = os.forkpty()


2.16.2


In [5]:
!wget https://bin.equinox.io/c/bNyj1mQVY4c/ngrok-v3-stable-linux-amd64.tgz
!tar xf ./ngrok-v3-stable-linux-amd64.tgz -C /usr/local/bin

--2024-10-01 17:16:36--  https://bin.equinox.io/c/bNyj1mQVY4c/ngrok-v3-stable-linux-amd64.tgz
Resolving bin.equinox.io (bin.equinox.io)... 18.205.222.128, 54.161.241.46, 52.202.168.65, ...
Connecting to bin.equinox.io (bin.equinox.io)|18.205.222.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9085090 (8.7M) [application/octet-stream]
Saving to: 'ngrok-v3-stable-linux-amd64.tgz'


2024-10-01 17:16:37 (26.2 MB/s) - 'ngrok-v3-stable-linux-amd64.tgz' saved [9085090/9085090]



***Add to the console:***

```cmd
!ngrok authtoken <authtoken>
```

In [7]:
pool = mp.Pool(processes = 10)
results_of_processes = [pool.apply_async(os.system, args=(cmd, ), callback = None )
                        for cmd in [
                        f"tensorboard --logdir ./log_resnet18/transfer_learning --load_fast=false --host 0.0.0.0 --port 6006 &",
                        "/usr/local/bin/ngrok http 6006 &"
                        ]]

In [8]:
! curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

https://d15f-35-225-140-193.ngrok-free.app


In [9]:
#%load_ext tensorboard
# %reload_ext tensorboard

#%tensorboard --logdir=log_resnet18/transfer_learning

# <font style="color:green">Nástroje pro zpracování dat</font><a name="utils"></a>

In [10]:
!wget "https://www.dropbox.com/sh/n5nya3g3airlub6/AACi7vaUjdTA0t2j_iKWgp4Ra?dl=1" -O ./data.zip

--2024-10-01 17:17:00--  https://www.dropbox.com/sh/n5nya3g3airlub6/AACi7vaUjdTA0t2j_iKWgp4Ra?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.3.18, 2620:100:6018:18::a27d:312
Connecting to www.dropbox.com (www.dropbox.com)|162.125.3.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.dropbox.com/scl/fo/t88xubtku433w1t10t7he/AEmJ4AwgZ3Svjp29IdawVbE?rlkey=9vuxo0sqr57tsoqn8wgk9pjzc&dl=1 [following]
--2024-10-01 17:17:00--  https://www.dropbox.com/scl/fo/t88xubtku433w1t10t7he/AEmJ4AwgZ3Svjp29IdawVbE?rlkey=9vuxo0sqr57tsoqn8wgk9pjzc&dl=1
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://ucef48a559c809a611d80802055d.dl.dropboxusercontent.com/zip_download_get/B_DJK3S6NMiGfyLNFRq-WtIrFiMHcbMvzuLI3pglchVBnug6-qr0VXZ0IhtZmpUcAKZOXmyvlPEiKRgeNu99eJkIqpEuvUtBB-bovxcvPi6xUg# [following]
--2024-10-01 17:17:03--  https://ucef48a559c809a611d80802055d.dl.dropboxusercontent.com/zip_

TensorBoard 2.16.2 at http://0.0.0.0:6006/ (Press CTRL+C to quit)


200 OK
Length: 197683526 (189M) [application/zip]
Saving to: './data.zip'


2024-10-01 17:17:39 (5.55 MB/s) - './data.zip' saved [197683526/197683526]



### <font style="color:green">Extrahujte data</font>

In [11]:
!unzip -q ./data.zip

mapname:  conversion of  failed


In [12]:
!apt-get install tree
!tree -d ./cat-dog-panda

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
tree is already the newest version (2.0.2-1).
0 upgraded, 0 newly installed, 0 to remove and 67 not upgraded.
[01;34m./cat-dog-panda[0m
|-- [01;34mtraining[0m
|   |-- [01;34mcat[0m
|   |-- [01;34mdog[0m
|   `-- [01;34mpanda[0m
`-- [01;34mvalidation[0m
    |-- [01;34mcat[0m
    |-- [01;34mdog[0m
    `-- [01;34mpanda[0m

8 directories


In [13]:
def image_preprocess_transforms():
    
    preprocess = transforms.Compose([
        transforms.Resize(224),
        transforms.CenterCrop(224),
        transforms.ToTensor()
        ])
    
    return preprocess

In [14]:
def image_common_transforms(mean, std):
    preprocess = image_preprocess_transforms()
    
    common_transforms = transforms.Compose([
        preprocess,
        transforms.Normalize(mean, std)
    ])
    
    return common_transforms
    

In [15]:
def data_augmentation_preprocess(mean, std):
    
    initail_transoform = transforms.RandomChoice([
        transforms.RandomHorizontalFlip(),
        transforms.RandomVerticalFlip(),
        transforms.RandomRotation(90)
        ])
    
    common_transforms = image_common_transforms(mean, std)
                
    aug_transforms = transforms.Compose([
        initail_transoform,
        transforms.RandomGrayscale(p=0.1),
        common_transforms
        ])
    
    return aug_transforms
    

In [16]:
def data_loader(data_root, transform, batch_size=16, shuffle=False, num_workers=2, persistent_workers=False):
    dataset = datasets.ImageFolder(root=data_root, transform=transform)
    
    loader = torch.utils.data.DataLoader(dataset, 
                                         batch_size=batch_size,
                                         num_workers=num_workers,
                                         shuffle=shuffle,
                                         persistent_workers=persistent_workers)
    
    return loader

In [17]:
def get_mean_std():
    
    mean = [0.485, 0.456, 0.406] 
    std = [0.229, 0.224, 0.225]
    
    return mean, std

In [18]:
def get_data(batch_size, data_root, tb_writer, num_workers=4, data_augmentation=True):
    
    train_data_path = os.path.join(data_root, 'training')
       
    mean, std = get_mean_std()
    
    common_transforms = image_common_transforms(mean, std)
        
   
    # if data_augmentation is true 
    # data augmentation implementation
    if data_augmentation:    
        train_transforms = data_augmentation_preprocess(mean, std)
    # else do common transforms
    else:
        train_transforms = common_transforms
        
        
    # train dataloader
    
    train_loader = data_loader(train_data_path, 
                               train_transforms, 
                               batch_size=batch_size, 
                               shuffle=True, 
                               num_workers=num_workers,
                               persistent_workers=False)
    
    # test dataloader
    
    test_data_path = os.path.join(data_root, 'validation')
    
    test_loader = data_loader(test_data_path, 
                              common_transforms, 
                              batch_size=batch_size, 
                              shuffle=False, 
                              num_workers=num_workers,
                              persistent_workers=False)
    
    # test dataloader
    
    testdata = datasets.ImageFolder(root=test_data_path, transform=common_transforms)
    
    # add embedding / projector
    
    add_data_embedings(testdata, tb_writer, n=100)
    
    return train_loader, test_loader

# <font style="color:blue">Přidaní vložení dat / Projektor</font><a name="embeds"></a>

In [19]:
animal_classes = ['cat', 'dog', 'panda']


In [20]:
def get_random_inputs_labels(inputs, targets, n=100):
    """
    get random inputs and labels
    """

    assert len(inputs) == len(targets)

    rand_indices = torch.randperm(len(targets))
    
    data = inputs[rand_indices][:n]
    
    labels = targets[rand_indices][:n]
    
    class_labels = [animal_classes[lab] for lab in labels]
    
    return data, class_labels

In [21]:
def add_data_embedings(dataset, tb_writer, n=100):
    """
    Add a few inputs and labels to tensorboard. 
    """
    
    dataloader = torch.utils.data.DataLoader(dataset, batch_size=n, num_workers=4, shuffle=True)
    
    images, labels = next(iter(dataloader))
    
    tb_writer.add_embedding(mat = images.view(-1, 3 * 224 * 224), 
                            metadata=labels, 
                            label_img=images)
    
    return

## <font style="color:green">Konfigurace systému</font><a name="sys-config"></a>

In [22]:
@dataclass
class SystemConfiguration:
    '''
    Describes the common system setting needed for reproducible training
    '''
    seed: int = 21  # seed number to set the state of all random number generators
    cudnn_benchmark_enabled: bool = True  # enable CuDNN benchmark for the sake of performance
    cudnn_deterministic: bool = True  # make cudnn deterministic (reproducible training)

## <font style="color:green">Konfigurace školení</font><a name="train-config"></a>

In [23]:
@dataclass
class TrainingConfiguration:
    '''
    Describes configuration of the training process
    '''
    batch_size: int = 32  
    epochs_count: int = 20 
    init_learning_rate: float = 0.001  # initial learning rate for lr scheduler
    decay_rate: float = 0.1  
    log_interval: int = 500  
    test_interval: int = 1  
    data_root: str = "./cat-dog-panda" 
    num_workers: int = 2 
    device: str = 'cuda'  
    


## <font style="color:green">Nastavení systému</font><a name="sys-setup"></a>

In [24]:
def setup_system(system_config: SystemConfiguration) -> None:
    torch.manual_seed(system_config.seed)
    if torch.cuda.is_available():
        torch.backends.cudnn_benchmark_enabled = system_config.cudnn_benchmark_enabled
        torch.backends.cudnn.deterministic = system_config.cudnn_deterministic

In [25]:
def prediction(model, device, batch_input, max_prob=True):
    """
    get prediction for batch inputs
    """
    
    # send model to cpu/cuda according to your system configuration
    model.to(device)
    
    # it is important to do model.eval() before prediction
    model.eval()

    data = batch_input.to(device)

    output = model(data)

    # get probability score using softmax
    prob = F.softmax(output, dim=1)
    
    if max_prob:
        # get the max probability
        pred_prob = prob.data.max(dim=1)[0]
    else:
        pred_prob = prob.data
    
    # get the index of the max probability
    pred_index = prob.data.max(dim=1)[1]
    
    return pred_index.cpu().numpy(), pred_prob.cpu().numpy()

In [26]:
def get_target_and_prob(model, dataloader, device):
    """
    get targets and prediction probabilities
    """
    
    pred_prob = []
    targets = []
    
    for _, (data, target) in enumerate(dataloader):
        
        _, prob = prediction(model, device, data, max_prob=False)
        
        pred_prob.append(prob)
        
        target = target.numpy()
        targets.append(target)
        
    targets = np.concatenate(targets)
    targets = targets.astype(int)
    pred_prob = np.concatenate(pred_prob, axis=0)
    
    return targets, pred_prob
    
    

# <font style="color:blue">Přidejte PR křivky do TensorBoard</font><a name="pr-curves"></a>

In [27]:
def add_pr_curves_to_tensorboard(model, dataloader, device, tb_writer, epoch, num_classes=3):
    """
    Add precession and recall curve to tensorboard.
    """
    
    targets, pred_prob = get_target_and_prob(model, dataloader, device)
    
    for cls_idx in range(num_classes):
        binary_target = targets == cls_idx
        true_prediction_prob = pred_prob[:, cls_idx]
        
        tb_writer.add_pr_curve(animal_classes[cls_idx], 
                               binary_target, 
                               true_prediction_prob, 
                               global_step=epoch)
        
    return
    

# <font style="color:blue">Odešlete nesprávnou předpověď do TensorBoard</font><a name="wrong-preds"></a>

In [28]:
def add_wrong_prediction_to_tensorboard(model, dataloader, device, tb_writer, 
                                        epoch, tag='Wrong_Predections', max_images='all'):
    """
    Add wrong predicted images to tensorboard.
    """
    #number of images in one row
    num_images_per_row = 8
    im_scale = 3
    
    plot_images = []
    wrong_labels = []
    pred_prob = []
    right_label = []
    
    mean, std = get_mean_std()
    
    for _, (data, target) in enumerate(dataloader):
        
        
        images = data.numpy()
        pred, prob = prediction(model, device, data)
        target = target.numpy()
        indices = pred.astype(int) != target.astype(int)
        
        plot_images.append(images[indices])
        wrong_labels.append(pred[indices])
        pred_prob.append(prob[indices])
        right_label.append(target[indices])
        
    plot_images = np.concatenate(plot_images, axis=0).squeeze()
    plot_images = (np.moveaxis(plot_images, 1, -1) * std) + mean
    wrong_labels = np.concatenate(wrong_labels)
    wrong_labels = wrong_labels.astype(int)
    right_label = np.concatenate(right_label)
    right_label = right_label.astype(int)
    pred_prob = np.concatenate(pred_prob)
    
    
    if max_images == 'all':
        num_images = len(images)
    else:
        num_images = min(len(plot_images), max_images)
        
    fig_width = int(num_images_per_row * im_scale)
    
    if num_images % num_images_per_row == 0:
        num_row = int(num_images/num_images_per_row)
    else:
        num_row = int(num_images/num_images_per_row) + 1
        
    fig_height = int(num_row * im_scale)
        
    plt.style.use('default')
    plt.rcParams["figure.figsize"] = (fig_width, fig_height)
    fig = plt.figure()
    
    for i in range(num_images):
        plt.subplot(num_row, num_images_per_row, i+1, xticks=[], yticks=[])
        plt.imshow((plot_images[i]*255).astype(np.uint8))
        plt.gca().set_title('{0}({1:.2}), {2}'.format(animal_classes[wrong_labels[i]], 
                                                          pred_prob[i], 
                                                          animal_classes[right_label[i]]))
        
    tb_writer.add_figure(tag, fig, global_step=epoch)
    
    return


## <font style="color:green">Tréninková funkce</font><a name="train-fn"></a>

Jste obeznámeni s tréninkovou pipeline používaným v PyTorch.

In [29]:
def train(
    train_config: TrainingConfiguration, model: nn.Module, optimizer: torch.optim.Optimizer,
    train_loader: torch.utils.data.DataLoader, epoch_idx: int, tb_writer: SummaryWriter
) -> None:
    
    # change model in training mode
    model.train()
    
    # to get batch loss
    batch_loss = np.array([])
    
    # to get batch accuracy
    batch_acc = np.array([])
        
    for batch_idx, (data, target) in enumerate(train_loader):
        
        # clone target
        indx_target = target.clone()
        # send data to device (it is mandatory if GPU has to be used)
        data = data.to(train_config.device)
        # send target to device
        target = target.to(train_config.device)

        # reset parameters gradient to zero
        optimizer.zero_grad()
        
        # forward pass to the model
        output = model(data)
        
        # cross entropy loss
        loss = F.cross_entropy(output, target)
        
        # find gradients w.r.t training parameters
        loss.backward()
        # Update parameters using gradients
        optimizer.step()
        
        batch_loss = np.append(batch_loss, [loss.item()])
        
        # get probability score using softmax
        prob = F.softmax(output, dim=1)
            
        # get the index of the max probability
        pred = prob.data.max(dim=1)[1]  
                        
        # correct prediction
        correct = pred.cpu().eq(indx_target).sum()
            
        # accuracy
        acc = float(correct) / float(len(data))
        
        batch_acc = np.append(batch_acc, [acc])

        if batch_idx % train_config.log_interval == 0 and batch_idx > 0:
            
            total_batch = epoch_idx * len(train_loader.dataset)/train_config.batch_size + batch_idx
            tb_writer.add_scalar('Loss/train-batch', loss.item(), total_batch)
            tb_writer.add_scalar('Accuracy/train-batch', acc, total_batch)
            
    epoch_loss = batch_loss.mean()
    epoch_acc = batch_acc.mean()
    return epoch_loss, epoch_acc

## <font style="color:green">Validační funkce</font><a name="validate-fn"></a>

In [30]:
def validate(
    train_config: TrainingConfiguration,
    model: nn.Module,
    test_loader: torch.utils.data.DataLoader
) -> float:
    # 
    model.eval()
    test_loss = 0
    count_corect_predictions = 0
    for data, target in test_loader:
        indx_target = target.clone()
        data = data.to(train_config.device)
        
        target = target.to(train_config.device)
        
        output = model(data)
        # add loss for each mini batch
        test_loss += F.cross_entropy(output, target).item()
        
        # get probability score using softmax
        prob = F.softmax(output, dim=1)
        
        # get the index of the max probability
        pred = prob.data.max(dim=1)[1] 
        
        # add correct prediction count
        count_corect_predictions += pred.cpu().eq(indx_target).sum()

    # average over number of mini-batches
    test_loss = test_loss / len(test_loader)  
    
    # average over number of dataset
    accuracy = 100. * count_corect_predictions / len(test_loader.dataset)
    
    return test_loss, accuracy/100.0

# <font style="color:blue">Přidejte histogram vah</font><a name="hist"></a>

In [31]:
def add_model_weights_as_histogram(model, tb_writer, epoch):
    for name, param in model.named_parameters():
        tb_writer.add_histogram(name.replace('.', '/'), param.data.cpu().abs(), epoch)
    return

# <font style="color:blue">Přidejte síťový graf</font><a name="graph"></a>

In [32]:
def add_network_graph_tensorboard(model, inputs, tb_writer):
    tb_writer.add_graph(model, inputs)
    return

## <font style="color:green">Main funkce pro Trainink and Validaci</font><a name="main"></a>

Použijte konfigurační parametry definované výše a začněte trénovat. Důležité akce v níže uvedeném kódu:

1. Nastavte systémové parametry, jako je CPU/GPU, počet vláken atd.


2. Načtěte data pomocí dataloaderů.


3. Pro každou epochu zavolejte funkci vlaku. Pro každý testovací interval zavolejte funkci ověření.


4. Proveďte `scheduler.step()` pro aktualizaci rychlosti učení pro další epochu.


5. Nastavte proměnné pro sledování ztrát a přesnosti a začněte trénovat.


In [33]:
def main(model, optimizer, tb_writer, scheduler=None, system_configuration=SystemConfiguration(), 
         training_configuration=TrainingConfiguration(), data_augmentation=False):
    
    # system configuration
    setup_system(system_configuration)

    # batch size
    batch_size_to_set = training_configuration.batch_size
    # num_workers
    num_workers_to_set = training_configuration.num_workers
    # epochs
    epoch_num_to_set = training_configuration.epochs_count

    # if GPU is available use training config, 
    # else lower batch_size, num_workers and epochs count
    if torch.cuda.is_available():
        device = "cuda"
    else:
        device = "cpu"
        batch_size_to_set = 16
        num_workers_to_set = 2

    # data loader
    train_loader, test_loader = get_data(
        batch_size=batch_size_to_set,
        data_root=training_configuration.data_root,
        tb_writer=tb_writer,
        num_workers=num_workers_to_set,
        data_augmentation=data_augmentation
    )
    
    
    # Update training configuration
    training_configuration = TrainingConfiguration(
        device=device,
        batch_size=batch_size_to_set,
        num_workers=num_workers_to_set
    )
        
    # send model to device (GPU/CPU)
    model.to(training_configuration.device)
    
    
    # add network graph with inputs info
    images, labels = next(iter(test_loader))
    images = images.to(training_configuration.device)
    add_network_graph_tensorboard(model, images, tb_writer)

    best_loss = torch.tensor(np.inf)
    
    # epoch train/test loss
    epoch_train_loss = np.array([])
    epoch_test_loss = np.array([])
    
    # epoch train/test accuracy
    epoch_train_acc = np.array([])
    epoch_test_acc = np.array([])
    
    add_wrong_prediction_to_tensorboard(model, test_loader, 
                                                training_configuration.device, 
                                                tb_writer, 0, max_images=300)
    
    
    # training time measurement
    t_begin = time.time()
    for epoch in range(training_configuration.epochs_count):
        
        # Traing
        train_loss, train_acc = train(training_configuration, model, optimizer, train_loader, epoch, tb_writer)
        
        epoch_train_loss = np.append(epoch_train_loss, [train_loss])
        
        epoch_train_acc = np.append(epoch_train_acc, [train_acc])
        
        # add scalar (loss/accuracy) to tensorboard
        tb_writer.add_scalar('Loss/Train',train_loss, epoch)
        tb_writer.add_scalar('Accuracy/Train', train_acc, epoch)

        elapsed_time = time.time() - t_begin
        speed_epoch = elapsed_time / (epoch + 1)
        speed_batch = speed_epoch / len(train_loader)
        eta = speed_epoch * training_configuration.epochs_count - elapsed_time
        
        # add time metadata to tensorboard
        tb_writer.add_scalar('Time/elapsed_time', elapsed_time, epoch)
        tb_writer.add_scalar('Time/speed_epoch', speed_epoch, epoch)
        tb_writer.add_scalar('Time/speed_batch', speed_batch, epoch)
        tb_writer.add_scalar('Time/eta', eta, epoch)
        

        # Validate
        if epoch % training_configuration.test_interval == 0:
            current_loss, current_accuracy = validate(training_configuration, model, test_loader)
            
            epoch_test_loss = np.append(epoch_test_loss, [current_loss])
        
            epoch_test_acc = np.append(epoch_test_acc, [current_accuracy])
            
            # add scalar (loss/accuracy) to tensorboard
            tb_writer.add_scalar('Loss/Validation', current_loss, epoch)
            tb_writer.add_scalar('Accuracy/Validation', current_accuracy, epoch)
            
            # add scalars (loss/accuracy) to tensorboard
            tb_writer.add_scalars('Loss/train-val', {'train': train_loss, 
                                           'validation': current_loss}, epoch)
            tb_writer.add_scalars('Accuracy/train-val', {'train': train_acc, 
                                               'validation': current_accuracy}, epoch)
            
            if current_loss < best_loss:
                best_loss = current_loss
                
            # add wrong predicted image to tensorboard
            add_wrong_prediction_to_tensorboard(model, test_loader, 
                                                training_configuration.device, 
                                                tb_writer, epoch, max_images=300)
        
        # scheduler step/ update learning rate
        if scheduler is not None:
            scheduler.step()
            
        # adding model weights to tensorboard as histogram
        add_model_weights_as_histogram(model, tb_writer, epoch)
        
        # add pr curves to tensor board
        add_pr_curves_to_tensorboard(model, test_loader, 
                                     training_configuration.device, 
                                     tb_writer, epoch, num_classes=3)
        
                
    print("Total time: {:.2f}, Best Loss: {:.3f}".format(time.time() - t_begin, best_loss))
    
    
    
    return model, epoch_train_loss, epoch_train_acc, epoch_test_loss, epoch_test_acc

## <font style="color:green">Optimalizátor a plánovač</font><a name="optim"></a>

Optimalizátor a plánovač považujeme za metodu, protože ji používáme ve všech trénovacích experimentech.

In [34]:
def get_optimizer_and_scheduler(model):
    train_config = TrainingConfiguration()

    init_learning_rate = train_config.init_learning_rate

    # optimizer
    optimizer = optim.SGD(
        model.parameters(),
        lr = init_learning_rate,
        momentum = 0.9
    )

    decay_rate = train_config.decay_rate

    lmbda = lambda epoch: 1/(1 + decay_rate * epoch)

    # Scheduler
    scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lmbda)
    
    return optimizer, scheduler
    


# <font style="color:blue">Model ResNet</font><a name="model"></a>

Nahrajte model `resnet18` s jeho předem připravenými vah.

Vrstvy jsou nakonfigurovány tak, že pokud předáte příznak `transfer_learning`, nahradí pouze poslední vrstvy sítě. V opačném případě přeškolí všechny vrstvy, ale s předtrénovanými vahami a ne od nuly.

In [35]:
def pretrained_resnet18(transfer_learning=True, num_class=3):
    resnet = models.resnet18(pretrained=True)
    
    if transfer_learning:
        for param in resnet.parameters():
            param.requires_grad = False
            
    last_layer_in = resnet.fc.in_features
    resnet.fc = nn.Linear(last_layer_in, num_class)
    
    return resnet

# <font style="color:blue">Transfer Learning</font><a name="tl"></a>


In [36]:
model = pretrained_resnet18(transfer_learning=True)
print(model)
# get optimizer and scheduler
optimizer, scheduler = get_optimizer_and_scheduler(model)

# Tensorboard summary writer
transfer_learning_sw = SummaryWriter('log_resnet18/transfer_learning')   

# train and validate
model, train_loss_exp2, train_acc_exp2, val_loss_exp2, val_acc_exp2 = main(model, 
                                                                           optimizer,
                                                                           transfer_learning_sw,
                                                                           scheduler,
                                                                           data_augmentation=True)
transfer_learning_sw.close()

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 174MB/s] 


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

# <font style="color:blue">Fine-Tuning</font><a name="fine-tune"></a>


In [37]:
model = pretrained_resnet18(transfer_learning=False)
# print(model)
# get optimizer and scheduler
optimizer, scheduler = get_optimizer_and_scheduler(model)

# Tensorboard summary writer
fine_tuning_sw = SummaryWriter('log_resnet18/fine_tuning')   

model, train_loss_exp9, train_acc_exp9, val_loss_exp9, val_acc_exp9 = main(model, 
                                                                           optimizer, 
                                                                           fine_tuning_sw,
                                                                           scheduler,
                                                                           data_augmentation=True)

fine_tuning_sw.close()



Total time: 598.45, Best Loss: 0.027


Jak bylo uvedeno výše, zobrazte data pomocí tensorboardu.

### ***Reference a dokumentace***

[Image Classification Architecture](https://www.youtube.com/watch?v=NnB9Zm5bnok)

[Fine Tuning and Transfer Learning](https://www.youtube.com/watch?v=5xCj0zOyw-g)

In [39]:
!rm /kaggle/working/ngrok-v3-stable-linux-amd64.tgz
!rm /kaggle/working/data.zip

  pid, fd = os.forkpty()
