# Trénink Modelovu s protokolováním v TensorBoard, cvičení

#### **Zadání:**
1. Přidejte vrstvy CNN jako obrázky do TensorBoard.
2. Po aktivaci přidejte výstup vrstev CNN a uvidíte, co se naučili. 
3. Doplňte convolution matrix. 


In [2]:
%matplotlib inline

In [3]:
import matplotlib.pyplot as plt  # one of the best graphics library for python
plt.style.use('ggplot')

In [4]:
import os
import time
import seaborn as sns

import multiprocessing as mp
mp.set_start_method('spawn', force=True)

from typing import Iterable
from dataclasses import dataclass

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

from torchvision import datasets, transforms

from torch.optim import lr_scheduler

from torch.utils.tensorboard import SummaryWriter

from sklearn.metrics import confusion_matrix

## TensorBoard Dashboard

Pojďme nastavit řídicí panel TensorBorad.

Pojďme definovat nadřazený log adresář. Všechny logy zapíšeme do adresáře.

In [5]:
logdir = "/kaggle/working/logs_fashion_mnist"

In [6]:
!tensorboard --version

  pid, fd = os.forkpty()


2.16.2


In [7]:
!wget https://bin.equinox.io/c/bNyj1mQVY4c/ngrok-v3-stable-linux-amd64.tgz
!tar xf ./ngrok-v3-stable-linux-amd64.tgz -C /usr/local/bin

--2024-09-27 16:52:03--  https://bin.equinox.io/c/bNyj1mQVY4c/ngrok-v3-stable-linux-amd64.tgz
Resolving bin.equinox.io (bin.equinox.io)... 18.205.222.128, 54.161.241.46, 54.237.133.81, ...
Connecting to bin.equinox.io (bin.equinox.io)|18.205.222.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9085090 (8.7M) [application/octet-stream]
Saving to: 'ngrok-v3-stable-linux-amd64.tgz'


2024-09-27 16:52:03 (45.8 MB/s) - 'ngrok-v3-stable-linux-amd64.tgz' saved [9085090/9085090]



### **Add to the console:** 
```cmd 
!ngrok authtoken <authtoken>
```

In [9]:
pool = mp.Pool(processes = 10)
results_of_processes = [pool.apply_async(os.system, args=(cmd, ), callback = None )
                        for cmd in [
                        f"tensorboard --logdir ./logs_fashion_mnist --load_fast=false --host 0.0.0.0 --port 6006 &",
                        "/usr/local/bin/ngrok http 6006 &"
                        ]]

In [10]:
! curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

https://7934-35-243-215-80.ngrok-free.app


Spusťte TensorBoard v buňce notebooku.

Jakmile přidáme protokoly do adresáře protokolů, budou viditelné na řídicím panelu.

# 2. Nástroje tréninku</font><a name="utils"></a>

## <font style="color:green">2.1. Získejte data Fashion MNIST</font>

In [11]:
# Fashion mnist class name
fashion_mnist_classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
                         'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot']


### Utilita pro přidání obrázků jako vstupů

In [12]:
def get_random_inputs_labels(inputs, targets, n=100):
    """
    get random inputs and labels
    """

    assert len(inputs) == len(targets)

    rand_indices = torch.randperm(len(targets))
    
    data = inputs[rand_indices][:n]
    
    labels = targets[rand_indices][:n]
    
    class_labels = [fashion_mnist_classes[lab] for lab in labels]
    
    return data, class_labels

### Funkce ovladače TensorBoard


In [13]:
def add_data_embedings(dataset, tb_writer, n=100, global_step=1, tag="embedings"):
    """
    Add a few inputs and labels to tensorboard. 
    """
    
    images, labels = get_random_inputs_labels(inputs=dataset.data, targets=dataset.targets, n=n)
    
    # Add image as embedding to tensorboard
    tb_writer.add_embedding(mat = images.view(-1, 28 * 28), 
                            metadata=labels, 
                            label_img=images.unsqueeze(1),
                            global_step=global_step,
                            tag=tag)
    return

### Funkce pro získání dat

In [14]:
def get_data(batch_size, data_root, tb_writer, num_workers=1, data_augmentation=False):
    
    # common transforms
    common_transforms = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.2860, ), (0.3530, ))
    ])
    
    # if data_augmentation is true 
    # data augmentation implementation
    if data_augmentation:
        train_transforms = transforms.Compose([
            transforms.RandomChoice([
                transforms.RandomHorizontalFlip(),
                transforms.RandomVerticalFlip(),
                transforms.RandomRotation(90, fill=(0,)),
                transforms.RandomCrop(28, padding=4, fill=(0,))
            ]),
            transforms.ToTensor(),
            transforms.Normalize((0.2860, ), (0.3530, ))
        ])
    # else do common transforms
    else:
        train_transforms = common_transforms
        
        
    
    # train dataloader
    traindata = datasets.FashionMNIST(root=data_root, train=True, download=True, transform=train_transforms)
    
    train_loader = torch.utils.data.DataLoader(
        traindata,
        batch_size=batch_size,
        shuffle=True,
        num_workers=num_workers
    )
    
    # test dataloader
    testdata = datasets.FashionMNIST(root=data_root, train=False, download=True, transform=common_transforms)
    
    test_loader = torch.utils.data.DataLoader(
        testdata,
        batch_size=batch_size,
        shuffle=False,
        num_workers=num_workers
    )
    
    # add embedding / projector
    
    add_data_embedings(testdata, tb_writer, n=100)
    return train_loader, test_loader

## <font style="color:green">2.2. Konfigurace systému</font>

In [15]:
@dataclass
class SystemConfiguration:
    '''
    Describes the common system setting needed for reproducible training
    '''
    seed: int = 21  # seed number to set the state of all random number generators
    cudnn_benchmark_enabled: bool = True  # enable CuDNN benchmark for the sake of performance
    cudnn_deterministic: bool = True  # make cudnn deterministic (reproducible training)

## 2.3. <font>Konfigurace tréninku</font>

In [16]:
@dataclass
class TrainingConfiguration:
    '''
    Describes configuration of the training process
    '''
    batch_size: int = 32  
    epochs_count: int = 50  
    init_learning_rate: float = 0.02  # initial learning rate for lr scheduler
    decay_rate: float = 0.1  
    log_interval: int = 500  
    test_interval: int = 1  
    data_root: str = "/kaggle/working/" 
    num_workers: int = 2 
    device: str = 'cuda'  
    


## <font style="color:green">2.4. Nastavení systému</font>

In [17]:
def setup_system(system_config: SystemConfiguration) -> None:
    torch.manual_seed(system_config.seed)
    if torch.cuda.is_available():
        torch.backends.cudnn_benchmark_enabled = system_config.cudnn_benchmark_enabled
        torch.backends.cudnn.deterministic = system_config.cudnn_deterministic

## 2.5. <font>Predikce</font>

In [18]:
def prediction(model, device, batch_input, max_prob=True):
    """
    get prediction for batch inputs
    """
    
    # send model to cpu/cuda according to your system configuration
    model.to(device)
    
    # it is important to do model.eval() before prediction
    model.eval()

    data = batch_input.to(device)

    output = model(data)

    # get probability score using softmax
    prob = F.softmax(output, dim=1)
    
    if max_prob:
        # get the max probability
        pred_prob = prob.data.max(dim=1)[0]
    else:
        pred_prob = prob.data
    
    # get the index of the max probability
    pred_index = prob.data.max(dim=1)[1]
    
    return pred_index.cpu().numpy(), pred_prob.cpu().numpy()

In [19]:
def get_target_and_prob(model, dataloader, device):
    """
    get targets and prediction probabilities
    """
    
    pred_prob = []
    targets = []
    
    for _, (data, target) in enumerate(dataloader):
        
        _, prob = prediction(model, device, data, max_prob=False)
        
        pred_prob.append(prob)
        
        target = target.numpy()
        targets.append(target)
        
    targets = np.concatenate(targets)
    targets = targets.astype(int)
    pred_prob = np.concatenate(pred_prob, axis=0)
    
    return targets, pred_prob
    
    

### Utilita k přidání PR křivky do TensorBoard

In [20]:
def add_pr_curves_to_tensorboard(model, dataloader, device, tb_writer, epoch, num_classes=10):
    """
    Add precession and recall curve to tensorboard.
    """
    
    targets, pred_prob = get_target_and_prob(model, dataloader, device)
    
    for cls_idx in range(num_classes):
        binary_target = targets == cls_idx
        true_prediction_prob = pred_prob[:, cls_idx]
        
        # add PR curve to tensorboard
        tb_writer.add_pr_curve(tag=fashion_mnist_classes[cls_idx], 
                               labels=binary_target, 
                               predictions=true_prediction_prob, 
                               global_step=epoch)
        
    return
    

### Utilita k přidání obrázků do TensorBoard

In [21]:
def add_wrong_prediction_to_tensorboard(model, dataloader, device, tb_writer, 
                                        epoch, tag='Wrong_Predections', max_images='all'):
    """
    Add wrong predicted images to tensorboard.
    """
    #number of images in one row
    num_images_per_row = 8
    im_scale = 3
    
    plot_images = []
    wrong_labels = []
    pred_prob = []
    right_label = []
    
    for _, (data, target) in enumerate(dataloader):
        
        
        images = data.numpy()
        pred, prob = prediction(model, device, data)
        target = target.numpy()
        indices = pred.astype(int) != target.astype(int)
        
        plot_images.append(images[indices])
        wrong_labels.append(pred[indices])
        pred_prob.append(prob[indices])
        right_label.append(target[indices])
        
    plot_images = np.concatenate(plot_images, axis=0).squeeze()
    wrong_labels = np.concatenate(wrong_labels)
    wrong_labels = wrong_labels.astype(int)
    right_label = np.concatenate(right_label)
    right_label = right_label.astype(int)
    pred_prob = np.concatenate(pred_prob)
    
    
    if max_images == 'all':
        num_images = len(images)
    else:
        num_images = min(len(plot_images), max_images)
        
    fig_width = num_images_per_row * im_scale
    
    if num_images % num_images_per_row == 0:
        num_row = num_images/num_images_per_row
    else:
        num_row = int(num_images/num_images_per_row) + 1
        
    fig_height = num_row * im_scale
        
    plt.style.use('default')
    plt.rcParams["figure.figsize"] = (fig_width, fig_height)
    fig = plt.figure()
    
    for i in range(num_images):
        plt.subplot(num_row, num_images_per_row, i+1, xticks=[], yticks=[])
        plt.imshow(plot_images[i], cmap='gray')
        plt.gca().set_title('{0}({1:.2}), {2}'.format(fashion_mnist_classes[wrong_labels[i]], 
                                                          pred_prob[i], 
                                                          fashion_mnist_classes[right_label[i]]))
        
    # add figure to tensorboard
    tb_writer.add_figure(tag=tag,
                         figure=fig, 
                         global_step=epoch)
    
    return


## 2.6. <font>Funkce trénování</font>


In [22]:
def train(
    train_config: TrainingConfiguration, model: nn.Module, optimizer: torch.optim.Optimizer,
    train_loader: torch.utils.data.DataLoader, epoch_idx: int, tb_writer: SummaryWriter
) -> None:
    
    # change model in training mode
    model.train()
    
    # to get batch loss
    batch_loss = np.array([])
    
    # to get batch accuracy
    batch_acc = np.array([])
        
    for batch_idx, (data, target) in enumerate(train_loader):
        
        # clone target
        indx_target = target.clone()
        # send data to device (its is mandatory if GPU has to be used)
        data = data.to(train_config.device)
        # send target to device
        target = target.to(train_config.device)

        # reset parameters gradient to zero
        optimizer.zero_grad()
        
        # forward pass to the model
        output = model(data)
        
        # cross entropy loss
        loss = F.cross_entropy(output, target)
        
        # find gradients w.r.t training parameters
        loss.backward()
        # Update parameters using gardients
        optimizer.step()
        
        batch_loss = np.append(batch_loss, [loss.item()])
        
        # Score to probability using softmax
        prob = F.softmax(output, dim=1)
            
        # get the index of the max probability
        pred = prob.data.max(dim=1)[1]  
                        
        # correct prediction
        correct = pred.cpu().eq(indx_target).sum()
            
        # accuracy
        acc = float(correct) / float(len(data))
        
        batch_acc = np.append(batch_acc, [acc])

        if batch_idx % train_config.log_interval == 0 and batch_idx > 0:
            
            total_batch = epoch_idx * len(train_loader.dataset)/train_config.batch_size + batch_idx
            # add scalar log to tensorboard
            tb_writer.add_scalar('Loss/train-batch', loss.item(), total_batch)
            tb_writer.add_scalar('Accuracy/train-batch', acc, total_batch)
            
    epoch_loss = batch_loss.mean()
    epoch_acc = batch_acc.mean()
    return epoch_loss, epoch_acc

## <font>2.7. Funkce validace</font>

In [23]:
def validate(
    train_config: TrainingConfiguration,
    model: nn.Module,
    test_loader: torch.utils.data.DataLoader
) -> float:
    # 
    model.eval()
    test_loss = 0
    count_corect_predictions = 0
    for data, target in test_loader:
        indx_target = target.clone()
        data = data.to(train_config.device)
        
        target = target.to(train_config.device)
        
        output = model(data)
        # add loss for each mini batch
        test_loss += F.cross_entropy(output, target).item()
        
        # get probability score using softmax
        prob = F.softmax(output, dim=1)
        
        # get the index of the max probability
        pred = prob.data.max(dim=1)[1] 
        
        # add correct prediction count
        count_corect_predictions += pred.cpu().eq(indx_target).sum()

    # average over number of mini-batches
    test_loss = test_loss / len(test_loader)  
    
    # average over number of dataset
    accuracy = 100. * count_corect_predictions / len(test_loader.dataset)
    
    return test_loss, accuracy/100.0

### Funkce pro přidání váh modelu jako histogram

In [24]:
def add_model_weights_as_histogram(model, tb_writer, epoch):
    """
    Get named parameters and plot as histogram
    """
    for name, param in model.named_parameters():
        # add model weight as histogram to tensorboard
        tb_writer.add_histogram(name.replace('.', '/'), param.data.cpu().abs(), epoch)
    return

### Funkce pro zobrazení konvolučních vrstev při inicializaci modelu

In [25]:
def add_output_conv_start_training(model, input_image, tb_writer, layer_names=None):
    """
    Adds the outputs of the model's convolutional layers as images to TensorBoard
    """
    x = input_image  # Začneme s input image
    conv_layer_count = 0  # Počítadlo konvolučních vrstev
    
    # Pokud nejsou názvy vrstev poskytnuty, automaticky je vygenerujeme
    if layer_names is None:
        layer_names = [f'conv_{i+1}' for i, layer in enumerate(model.modules()) if isinstance(layer, nn.Conv2d)]
    
    for idx, layer in enumerate(model._body):
        # Pro každou vrstvu zkontrolujeme, zda je to konvoluční vrstva
        if isinstance(layer, nn.Conv2d):
            # Aplikujeme konvoluční vrstvu a aktivaci
            x = F.relu(layer(x))
            
            # Přidáme feature mapy vrstvy do TensorBoard
            for i in range(x.size(1)):  # Pro každý kanál ve feature mapě
                feature_map = x[0, i, :, :].unsqueeze(0)  # Extrakce jednoho kanálu

                # Získání názvu vrstvy (automaticky generovaný nebo zadaný uživatelem)
                layer_name = layer_names[conv_layer_count]  # Použijeme counter, abychom měli správný index
                tb_writer.add_image(f'start_feature_maps/{layer_name}/channel_{i}', feature_map)
            
            conv_layer_count += 1  # Zvyšujeme počítadlo konvolučních vrstev

    return

### Funkce pro zobrazení výsledků konvolučních vrstev při tréninku 

In [26]:
def add_output_conv(model, input_image, tb_writer, epoch, layer_names=None):
    """
    Logs feature maps of all convolutional layers to TensorBoard for a specific epoch.
    
    Args:
        model: The neural network model.
        input_image: Input image or batch of images.
        tb_writer: TensorBoard writer.
        epoch: Current epoch number.
        layer_names: List of layer names (optional).
    """
    # Move input to model's device (if not already)
    input_image = input_image.to(next(model.parameters()).device)
    
    # Start with input image
    x = input_image
    conv_layer_count = 0  # Počítadlo konvolučních vrstev
    
    # Pokud nejsou názvy vrstev poskytnuty, automaticky je vygenerujeme
    if layer_names is None:
        layer_names = [f'conv_{i+1}' for i, layer in enumerate(model.modules()) if isinstance(layer, nn.Conv2d)]
    
    # Iterate through the model's layers
    for idx, layer in enumerate(model._body):
        
        # Apply the layer (if it's Conv2d)
        if isinstance(layer, torch.nn.Conv2d):
            x = F.relu(layer(x))  # Apply ReLU activation function

            # Add feature maps of this convolutional layer to TensorBoard
            for i in range(x.size(1)):  # For each channel (filter output)
                feature_map = x[0, i, :, :].unsqueeze(0)  # Single channel extraction
                
                # Získání názvu vrstvy (automaticky generovaný nebo zadaný uživatelem)
                layer_name = layer_names[conv_layer_count]  # Použijeme counter pro správné názvy vrstev
                
                # Add epoch info in TensorBoard name
                tb_writer.add_image(f'feature_maps/{layer_name}/channel_{i}/epoch_{epoch}', feature_map)
            
            conv_layer_count += 1  # Zvyšujeme počítadlo konvolučních vrstev

    return


### Funkce přidání modelu jako grafu do TensorBoard

In [27]:
def add_network_graph_tensorboard(model, inputs, tb_writer):
    # add model to tensorboard
    tb_writer.add_graph(model, inputs)
    return

Funkce pro vygerenování predikcí ve spravném formátu

In [28]:
def get_target_and_classes_cm(model, dataloader, device):
    """
    Get true targets and predicted classes from the model.
    """
    model.eval()
    targets = []
    pred_classes = []
    
    with torch.no_grad():
        for data, target in dataloader:
            data = data.to(device)
            target = target.to(device)
            
            # Get model output (logits or probabilities)
            output = model(data)
            
            # Get predicted classes (use argmax to get the index of the highest probability)
            pred = torch.argmax(output, dim=1)
            
            # Append to lists
            pred_classes.append(pred.cpu().numpy())
            targets.append(target.cpu().numpy())
    
    # Convert lists to numpy arrays
    targets = np.concatenate(targets)
    pred_classes = np.concatenate(pred_classes)
    
    return targets, pred_classes

### Funkce pro přidání Confusion matrix

In [29]:
def plot_confusion_matrix_to_tensorboard(model, tb_writer, dataloader, device, class_names, epoch, normalize=True):
    # Get true targets and predicted classes
    targets, pred_classes = get_target_and_classes_cm(model, dataloader, device)
    
    # Compute the confusion matrix
    cm = confusion_matrix(targets, pred_classes, normalize='true' if normalize else None)  
    
    # Create a plot using matplotlib
    fig, ax = plt.subplots(figsize=(8, 8))
    sns.heatmap(cm, annot=True, fmt='.2f', cmap='Blues', xticklabels=class_names, yticklabels=class_names)
    
    ax.set_xlabel('Predicted labels')
    ax.set_ylabel('True labels')
    ax.set_title(f'Confusion Matrix at Epoch {epoch}')
    
    # Add the confusion matrix figure to TensorBoard
    tb_writer.add_figure('Confusion Matrix', fig, global_step=epoch)
    
    # Close the plot to free memory
    plt.close(fig)

    return 

## <font>2.8. Hlavní funkce pro trénink a validaci</font>

In [30]:
def main(model, class_names, optimizer, tb_writer, scheduler=None, system_configuration=SystemConfiguration(), 
         training_configuration=TrainingConfiguration(), data_augmentation=False):
    
    # system configuration
    setup_system(system_configuration)

    # batch size
    batch_size_to_set = training_configuration.batch_size
    # num_workers
    num_workers_to_set = training_configuration.num_workers
    # epochs
    epoch_num_to_set = training_configuration.epochs_count

    # if GPU is available use training config, 
    # else lower batch_size, num_workers and epochs count
    if torch.cuda.is_available():
        device = "cuda"
    else:
        device = "cpu"
        batch_size_to_set = 16
        num_workers_to_set = 2

    # data loader
    train_loader, test_loader = get_data(
        batch_size=batch_size_to_set,
        data_root=training_configuration.data_root,
        tb_writer=tb_writer,
        num_workers=num_workers_to_set,
        data_augmentation=data_augmentation
    )
    
    
    # Update training configuration
    training_configuration = TrainingConfiguration(
        device=device,
        batch_size=batch_size_to_set,
        num_workers=num_workers_to_set
    )
        
    # send model to device (GPU/CPU)
    model.to(training_configuration.device)
    
    
    # add network graph with inputs info
    images, labels = next(iter(test_loader))
    
    
    # add network graph with inputs info
    images = images.to(training_configuration.device)
    add_network_graph_tensorboard(model, images, tb_writer)
    add_output_conv_start_training(model, images, tb_writer)

    best_loss = torch.tensor(np.inf)
    
    # epoch train/test loss
    epoch_train_loss = np.array([])
    epoch_test_loss = np.array([])
    
    # epoch train/test accuracy
    epoch_train_acc = np.array([])
    epoch_test_acc = np.array([])
    
    
    # training time measurement
    t_begin = time.time()
    for epoch in range(training_configuration.epochs_count):
        
        # Train
        train_loss, train_acc = train(training_configuration, model, optimizer, train_loader, epoch, tb_writer)
        
        # Log feature maps for the current epoch
        images, labels = next(iter(train_loader))  # Get a batch of images
        add_output_conv(model, images, tb_writer, epoch)
        
        epoch_train_loss = np.append(epoch_train_loss, [train_loss])
        
        epoch_train_acc = np.append(epoch_train_acc, [train_acc])
        
        # add scalar (loss/accuracy) to tensorboard
        tb_writer.add_scalar('Loss/Train',train_loss, epoch)
        tb_writer.add_scalar('Accuracy/Train', train_acc, epoch)

        elapsed_time = time.time() - t_begin
        speed_epoch = elapsed_time / (epoch + 1)
        speed_batch = speed_epoch / len(train_loader)
        eta = speed_epoch * training_configuration.epochs_count - elapsed_time
        
        # add time metadata to tensorboard
        tb_writer.add_scalar('Time/elapsed_time', elapsed_time, epoch)
        tb_writer.add_scalar('Time/speed_epoch', speed_epoch, epoch)
        tb_writer.add_scalar('Time/speed_batch', speed_batch, epoch)
        tb_writer.add_scalar('Time/eta', eta, epoch)

        # Validate
        if epoch % training_configuration.test_interval == 0:
            current_loss, current_accuracy = validate(training_configuration, model, test_loader)
            
            epoch_test_loss = np.append(epoch_test_loss, [current_loss])
        
            epoch_test_acc = np.append(epoch_test_acc, [current_accuracy])
            
            # add scalar (loss/accuracy) to tensorboard
            tb_writer.add_scalar('Loss/Validation', current_loss, epoch)
            tb_writer.add_scalar('Accuracy/Validation', current_accuracy, epoch)
            
            # add scalars (loss/accuracy) to tensorboard
            tb_writer.add_scalars('Loss/train-val', {'train': train_loss, 
                                           'validation': current_loss}, epoch)
            tb_writer.add_scalars('Accuracy/train-val', {'train': train_acc, 
                                               'validation': current_accuracy}, epoch)
            
            if current_loss < best_loss:
                best_loss = current_loss
                
            # add wrong predicted image to tensorboard
            add_wrong_prediction_to_tensorboard(model, test_loader, 
                                                training_configuration.device, 
                                                tb_writer, epoch, max_images=300)
            
            # add confusion matrix to tensorboard
            plot_confusion_matrix_to_tensorboard(model, tb_writer, test_loader, 
                                                 training_configuration.device, class_names, 
                                                 epoch, normalize=True)
            
        # scheduler step/ update learning rate
        if scheduler is not None:
            scheduler.step()
            
        # adding model weights to tensorboard as histogram
        add_model_weights_as_histogram(model, tb_writer, epoch)
        
        # add pr curves to tensor board
        add_pr_curves_to_tensorboard(model, test_loader, 
                                     training_configuration.device, 
                                     tb_writer, epoch, num_classes=10)
        
                
    print("Total time: {:.2f}, Best Loss: {:.3f}".format(time.time() - t_begin, best_loss))
    
    
    
    return model, epoch_train_loss, epoch_train_acc, epoch_test_loss, epoch_test_acc

## <font>2.9. Optimalizátor a plánovač</font>

In [31]:
def get_optimizer_and_scheduler(model):
    train_config = TrainingConfiguration()

    init_learning_rate = train_config.init_learning_rate

    # optimizer
    optimizer = optim.SGD(
        model.parameters(),
        lr = init_learning_rate,
        momentum = 0.9
    )

    decay_rate = train_config.decay_rate

    lmbda = lambda epoch: 1/(1 + decay_rate * epoch)

    # Scheduler
    scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lmbda)
    
    return optimizer, scheduler
    


# 3. Definice modelu</font><a name="model"></a>

In [32]:
class MediumModel(nn.Module):
    def __init__(self, dropout=0.0, batch_norm=False):
        super().__init__()

        # convolution layers
        if batch_norm:
            self._body = nn.Sequential(
                nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5),
                nn.BatchNorm2d(16),
                nn.ReLU(inplace=True),

                nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5),
                nn.BatchNorm2d(32),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(kernel_size=2),

                nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3),
                nn.BatchNorm2d(64),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(kernel_size=2),
                nn.Dropout(dropout)
            )
        else:
             self._body = nn.Sequential(
                nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5),
                nn.ReLU(inplace=True),

                nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(kernel_size=2),

                nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(kernel_size=2),
                nn.Dropout(dropout)
            )
            
        
        # Fully connected layers
        self._head = nn.Sequential(
            
            nn.Linear(in_features=64 * 4 * 4, out_features=512), 
            nn.ReLU(inplace=True),
            nn.Dropout(dropout),
            
            nn.Linear(in_features=512, out_features=128), 
            nn.ReLU(inplace=True),
            nn.Dropout(dropout),
            
            nn.Linear(in_features=128, out_features=10)
        )

    def forward(self, x):
        x = self._body(x)
        x = x.view(x.size()[0], -1)
        x = self._head(x)
        return x

## 3.1. Experiment 1: Trénink bez regulizace

In [33]:

model = MediumModel()

# get optimizer and scheduler
optimizer, scheduler = get_optimizer_and_scheduler(model)

# Tensorboard summary writer
no_regularization_sw = SummaryWriter(os.path.join(logdir, 'no_regularization'))   

# train and validate
model, train_loss_exp2, train_acc_exp2, val_loss_exp2, val_acc_exp2 = main(model,
                                                                           fashion_mnist_classes,
                                                                           optimizer,
                                                                           no_regularization_sw,
                                                                           scheduler)
no_regularization_sw.close()

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to /kaggle/working/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:02<00:00, 12774778.71it/s]


Extracting /kaggle/working/FashionMNIST/raw/train-images-idx3-ubyte.gz to /kaggle/working/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to /kaggle/working/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 297235.17it/s]


Extracting /kaggle/working/FashionMNIST/raw/train-labels-idx1-ubyte.gz to /kaggle/working/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to /kaggle/working/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]TensorBoard 2.16.2 at http://0.0.0.0:6006/ (Press CTRL+C to quit)
100%|██████████| 4422102/4422102 [00:05<00:00, 754113.65it/s] 


Extracting /kaggle/working/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to /kaggle/working/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to /kaggle/working/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 9049571.25it/s]


Extracting /kaggle/working/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to /kaggle/working/FashionMNIST/raw

Total time: 3019.80, Best Loss: 0.251


## 3.2. Experiment 2: trénink s regularizací

In [34]:
model = MediumModel(0.25, batch_norm=True)

optimizer, scheduler = get_optimizer_and_scheduler(model)

# Tensorboard summary writer
regularization_sw = SummaryWriter(os.path.join(logdir, 'regularization'))  

model, train_loss_exp9, train_acc_exp9, val_loss_exp9, val_acc_exp9 = main(model,
                                                                           fashion_mnist_classes,
                                                                           optimizer, 
                                                                           regularization_sw,
                                                                           scheduler,
                                                                           data_augmentation=True)

regularization_sw.close()

Total time: 3270.67, Best Loss: 0.201


***Vyčištění výstupu***

In [35]:
rm /kaggle/working/ngrok-v3-stable-linux-amd64.tgz

  pid, fd = os.forkpty()
