# Q&A
* how to determine data split between train, validation and test sets? Since we cannot use the test set for any observation, is it even possible?  
* is there conv3d, to use for the viedo data?
* maxpool vs avgpool?
* why big filters do not improve the performance by much?
* how to manage memory correctly? because i think that my current code does generate some trash, that is noe beeing removes (as from time to time i use all available memory, but it appears to be rather random)
* increasing the batch size seems to improve the performance, but worsen the validation accuracy

# Setup

## Libraries

In [3]:
%matplotlib inline

In [4]:
# !pip install matplotlib torch torchvision numpy pandas scikit-learn wandb

In [5]:
import os
import time

import matplotlib.pyplot as plt
import numpy as np
from IPython.display import clear_output
from tqdm.auto import tqdm

import torch
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch import nn

import wandb

from io import StringIO
import sys

  from .autonotebook import tqdm as notebook_tqdm


## Config

In [6]:
batch_size = 64

In [7]:
_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {_device}")

Using device: cuda


In [8]:
data_augmentation = True

### Helpers


In [9]:
# https://stackoverflow.com/questions/16571150/how-to-capture-stdout-output-from-a-python-function-call

class Capturing(list):
    def __enter__(self):
        self._stdout = sys.stdout
        sys.stdout = self._stringio = StringIO()
        return self
    def __exit__(self, *args):
        self.extend(self._stringio.getvalue().splitlines())
        del self._stringio    # free up some memory
        sys.stdout = self._stdout


## Import data
**About data:** The dataset consists of 102 flower categories, and each class has between 40 and 258 images. The images have large scale, pose, and light variations. In addition, there are categories that have large variations within the category and several very similar categories.  
The default split of the dataset is 1020, 1020 and 6149 images for training, validation and test sets respectively.
If you can handle the bigger training dataset, you can experiment by taking up to 80% of the test set for training.

### Custom data loader

In [10]:
class InMemDataLoader(object):
    """
    A data loader that keeps all data in CPU or GPU memory.
    """

    __initialized = False

    def __init__(
        self,
        dataset,
        batch_size=1,
        shuffle=False,
        sampler=None,
        batch_sampler=None,
        drop_last=False,
    ):
        """A torch dataloader that fetches data from memory."""
        batches = []
        for i in tqdm(range(len(dataset))):
            batch = [torch.tensor(t) for t in dataset[i]]
            batches.append(batch)
        tensors = [torch.stack(ts) for ts in zip(*batches)]
        dataset = torch.utils.data.TensorDataset(*tensors)
        self.dataset = dataset
        self.batch_size = batch_size
        self.drop_last = drop_last

        if batch_sampler is not None:
            if batch_size > 1 or shuffle or sampler is not None or drop_last:
                raise ValueError(
                    "batch_sampler option is mutually exclusive "
                    "with batch_size, shuffle, sampler, and "
                    "drop_last"
                )
            self.batch_size = None
            self.drop_last = None

        if sampler is not None and shuffle:
            raise ValueError("sampler option is mutually exclusive with " "shuffle")

        if batch_sampler is None:
            if sampler is None:
                if shuffle:
                    sampler = torch.utils.data.RandomSampler(dataset)
                else:
                    sampler = torch.utils.data.SequentialSampler(dataset)
            batch_sampler = torch.utils.data.BatchSampler(
                sampler, batch_size, drop_last
            )

        self.sampler = sampler
        self.batch_sampler = batch_sampler
        self.__initialized = True

    def __setattr__(self, attr, val):
        if self.__initialized and attr in ("batch_size", "sampler", "drop_last"):
            raise ValueError(
                "{} attribute should not be set after {} is "
                "initialized".format(attr, self.__class__.__name__)
            )

        super(InMemDataLoader, self).__setattr__(attr, val)

    def __iter__(self):
        for batch_indices in self.batch_sampler:
            yield self.dataset[batch_indices]

    def __len__(self):
        return len(self.batch_sampler)

    def to(self, device):
        self.dataset.tensors = tuple(t.to(device) for t in self.dataset.tensors)
        return self

### Data loading function

In [11]:
def load_flowers(
    batch_size=64,
    test_train_valid_percent=(0.1, 0.8, 0.1),
    train_transform=None,
    eval_transform=None,
    Loader=torch.utils.data.DataLoader,
):
    """
    Load the flowers dataset with the given batch size and transformation.
    The dataset is split into train, validation, and test sets according to the specified percentages.
    The data is loaded using the specified loader class.
    """

    if train_transform is None:
        train_transform = transforms.Compose([
            transforms.Resize((224, 224)),

            transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
            transforms.RandomRotation(15),
            transforms.RandomHorizontalFlip(),
            transforms.RandomAdjustSharpness(sharpness_factor=2),
            transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),

            transforms.ToTensor(),
            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
        ])
    if eval_transform is None:
        eval_transform = transforms.Compose([
            transforms.Resize((224, 224)),
            transforms.ToTensor(),
            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
        ])


    test_percent, train_percent, valid_percent = test_train_valid_percent

    # TRAIN
    train = torchvision.datasets.Flowers102(
        root='./data', split='train', download=True,
        transform=train_transform if data_augmentation else eval_transform
    )
    train = torch.utils.data.Subset(train, range(int(len(train) * train_percent)))

    # TEST
    test = torchvision.datasets.Flowers102(
        root='./data', split='test', download=True, transform=eval_transform
    )
    test = torch.utils.data.Subset(test, range(int(len(test) * test_percent)))

    # VALID
    valid = torchvision.datasets.Flowers102(
        root='./data', split='val', download=True, transform=eval_transform
    )
    valid = torch.utils.data.Subset(valid, range(int(len(valid) * valid_percent)))

    data_loaders = {
        'train': Loader(train, batch_size=batch_size, shuffle=True),
        'valid': Loader(valid, batch_size=batch_size, shuffle=True),
        'test': Loader(test, batch_size=batch_size, shuffle=True),
    }

    return data_loaders


In [12]:
data_loaders = load_flowers(batch_size, (1, 1, 1), Loader=InMemDataLoader)

  batch = [torch.tensor(t) for t in dataset[i]]
100%|██████████| 1020/1020 [00:06<00:00, 153.12it/s]
100%|██████████| 1020/1020 [00:03<00:00, 262.89it/s]
100%|██████████| 6149/6149 [00:22<00:00, 272.92it/s]


# Solution

## Task 1
* Your task is to implement a convolutional neural network from scratch using PyTorch.
* Your CNN should consist of convolutional layers (Conv2D), pooling layers (MaxPooling2D), activation layers (e.g., ReLU), and fully connected layers (if needed).

### Import data

In [21]:
data_loaders = load_flowers(batch_size, (1, 1, 1), Loader=InMemDataLoader)

  batch = [torch.tensor(t) for t in dataset[i]]
100%|██████████| 1020/1020 [00:07<00:00, 145.32it/s]
100%|██████████| 1020/1020 [00:03<00:00, 261.81it/s]
100%|██████████| 6149/6149 [00:24<00:00, 250.25it/s]


### Model class

In [13]:
class Model1(nn.Module):
    def __init__(self, *args, **kwargs):
        super(Model1, self).__init__()
        self.layers = nn.Sequential(*args, **kwargs)

    def forward(self, x):
        x = self.layers(x)
        return x

In [14]:
def evaluate(model, data_loader):
    loss = 0
    correct = 0
    loss_fn = nn.CrossEntropyLoss(
        reduction='sum',
    )

    with torch.no_grad():
        for inputs, labels in data_loader:
            inputs, labels = inputs.to(_device), labels.to(_device)

            outputs = model(inputs)
            loss += loss_fn(outputs, labels).item()
            pred = outputs.argmax(
                dim=1, keepdim=True
            )
            correct += (
                pred.eq(labels.view_as(pred)).sum().item()
            )

    loss /= len(data_loader.dataset)
    accuracy = correct / len(data_loader.dataset)
    return loss, accuracy

In [15]:
def train_step(model, optimizer, loss_fn):
    for batch_idx, (inputs, labels) in enumerate(data_loaders['train']):
        inputs, labels = inputs.to(_device), labels.to(_device)

        optimizer.zero_grad()   # Zero gradients
        logits = model(inputs)   # Forward pass
        loss = loss_fn(logits, labels)  # Compute loss
        loss.backward() # Backward pass
        optimizer.step()    # Update weights

        wandb.log({
            "loss": loss.item(),
            "batch_idx": batch_idx,
        })

In [16]:
def init_weights(model):
    for layer in model.modules():
        if isinstance(layer, nn.Conv2d) or isinstance(layer, nn.Linear):
            nn.init.kaiming_uniform_(layer.weight, nonlinearity='relu')
            if layer.bias is not None:
                nn.init.zeros_(layer.bias)
        elif isinstance(layer, nn.BatchNorm2d):
            nn.init.ones_(layer.weight)
            if layer.bias is not None:
                nn.init.zeros_(layer.bias)

### Model creation

#### Some random small net
appears to be to small

In [None]:
model_type = "v1"

model = Model1(
    nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1),
    nn.ReLU(),

    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Dropout(p=0.25),

    nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
    nn.ReLU(),

    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Dropout(p=0.25),

    nn.Flatten(),
    nn.Linear(128 * 56 * 56, 512),
    nn.ReLU(),

    nn.Dropout(p=0.5),
    nn.Linear(512, 102),
)

#### Simmilar to previous one, but with more layers

In [27]:
# version 2
model_type = "v2"

model = Model1(
    nn.Conv2d(3, 128, kernel_size=11, stride=4, padding=2),
    nn.ReLU(),

    nn.MaxPool2d(kernel_size=5, stride=2),
    nn.Dropout(p=0.25),

    nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
    nn.ReLU(),

    nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Dropout(p=0.25),

    nn.Conv2d(256, 256, kernel_size=3),
    nn.ReLU(),

    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Dropout(p=0.25),

    nn.Flatten(),
    nn.Linear(6400, 512),
    nn.ReLU(),

    nn.Dropout(p=0.5),
    nn.Linear(512, 102),
)

#### Lots of small convolutions
learns a lot slower, is able to reach high valid accuracy quickly (without reaching high train accuracy)

In [114]:
# stacked 3x3 convs
model_type = "v3_stack3x3"

model = Model1(
    nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=2),
    nn.BatchNorm2d(64),
    nn.ReLU(),

    nn.MaxPool2d(kernel_size=3, stride=2),
    nn.Dropout(p=0.25),

    nn.Conv2d(64, 128, kernel_size=3, stride=2),
    nn.BatchNorm2d(128),
    nn.ReLU(),

    nn.MaxPool2d(kernel_size=3, stride=2),
    nn.Dropout(p=0.25),

    nn.Conv2d(128, 256, kernel_size=3),
    nn.BatchNorm2d(256),
    nn.ReLU(),
    
    nn.Conv2d(256, 256, kernel_size=3),
    nn.BatchNorm2d(256),
    nn.ReLU(),

    nn.MaxPool2d(kernel_size=3),
    nn.Dropout(p=0.25),

    nn.Flatten(),
    nn.Linear(2304, 4096),
    nn.ReLU(),

    # nn.Linear(4096, 4096),
    # nn.ReLU(),

    nn.Linear(4096, 512),
    nn.ReLU(),

    nn.Dropout(p=0.5),
    nn.Linear(512, 102),
)

#### Some variation inspired by alexnet

In [17]:
# version 3 = alexnet
model_type = "alexnet"

model = Model1(
    nn.Conv2d(3, 96, kernel_size=11, stride=4),
    nn.BatchNorm2d(96),
    nn.ReLU(),

    nn.MaxPool2d(kernel_size=3, stride=2),
    nn.Dropout(p=0.25),

    nn.Conv2d(96, 256, kernel_size=5, padding=2),
    nn.BatchNorm2d(256),
    nn.ReLU(),

    nn.MaxPool2d(kernel_size=3, stride=2),
    nn.Dropout(p=0.25),

    nn.Conv2d(256, 384, kernel_size=3, padding=1),
    nn.BatchNorm2d(384),
    nn.ReLU(),
    nn.Dropout(p=0.25),

    nn.Conv2d(384, 384, kernel_size=3, padding=1),
    nn.BatchNorm2d(384),
    nn.ReLU(),
    nn.Dropout(p=0.25),

    nn.Conv2d(384, 256, kernel_size=3, padding=1),
    nn.BatchNorm2d(256),
    nn.ReLU(),

    nn.MaxPool2d(kernel_size=3, stride=2),
    nn.Dropout(p=0.25),


    nn.Flatten(),
    nn.Linear(6400, 4096),
    nn.ReLU(),

    nn.Linear(4096, 4096),
    nn.ReLU(),

    nn.Dropout(p=0.5),
    nn.Linear(4096, 102),
    # nn.Softmax(dim=1),
)

### Training loop

In [None]:
model.to(_device)

epochs = 100

learning_rate = 0.0001
momentum = 0.9
weight_decay = 0.0005
betas = (0.9, 0.999)

optimizer = torch.optim.Adam(
    model.parameters(),
    lr=learning_rate,
    betas=betas,
    weight_decay=weight_decay,
)
# optimizer = torch.optim.SGD(
#     model.parameters(),
#     lr=learning_rate,
#     momentum=momentum,
#     weight_decay=weight_decay,
# )

loss_fn = nn.CrossEntropyLoss()

init_weights(model)

run = wandb.init(
    entity = "fejowo5522-",
    project= "NN_list3_OxFlow",
    config = {
        "task": 1,
        "batch_size": batch_size,
        "epochs": epochs,
        "optimizer": "Adam",
        "learning_rate": learning_rate,
        # "momentum": momentum,
        "betas": betas,
        "weight_decay": weight_decay,
        "loss_fn": "cross_entropy",
        "model": model_type,
        "data_augmentation": data_augmentation,
    }
)
run.name = "Task1_" + str(int(time.time()))

model.train()
for epoch in tqdm(range(epochs), desc="Training", leave=False):
    train_step(model, optimizer, loss_fn)

    for loader, split in [
        (data_loaders['train'], 'train'),
        (data_loaders['valid'], 'valid'),
    ]:
        loss, accuracy = evaluate(model, loader)
        wandb.log({
            "epoch": epoch,
            f"{split}_loss": loss,
            f"{split}_accuracy": accuracy,
        })

model.eval()
loss, accuracy = evaluate(model, data_loaders['test'])
wandb.log({
    "test_loss": loss,
    "test_accuracy": accuracy,
})
print(
    "Test set: Average loss: {:8.6f}, Accuracy: ({:4.1f}%)".format(
        loss,
        100.0 * accuracy,
    )
)

run.finish()

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mfejowo5522[0m ([33mfejowo5522-[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


                                                           

Test set: Average loss: 5.529020, Accuracy: (25.7%)


0,1
batch_idx,▆▅▁▆▄▃▁▆▅█▁▇▆▄█▃▇▃▁▁▇▇▄▁▅▇█▁▇▁▅▅▇▃▅▄▅█▄▁
epoch,▁▁▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▃▄▄▄▄▄▄▄▆▆▆▆▆▆▆▆▇▇▇▇▇██
loss,█▇▇▆▅▄▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▁▁▂▂▃▃▄▄▆███████████████████████████████
train_loss,█▆▆▆▄▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
valid_accuracy,▁▁▃▄▄▅▆▆▇▇▇▇▇▇▇▇▇▇▇▇█▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇█▇█
valid_loss,▅▄▂▁▁▂▃▄▅▆▆▅▆▆▇▇▆▇▇▇▇▇▇█▇▇█▇▇▇▇▇██▇▇▇██▇

0,1
batch_idx,15.0
epoch,299.0
loss,0.00363
test_accuracy,0.25695
test_loss,5.52902
train_accuracy,0.99804
train_loss,0.00727
valid_accuracy,0.30686
valid_loss,4.92447


## Task 2
* Train your CNN on different training set sized (10%, 20%, 50%, 80%, 100%) and evaluate the performance on the validation set and test set.
    * Report the accuracy and loss on the validation set and test set for each training set size.
* Train your CNN on the full training set plus 20%, 50% and 80% of the test set and evaluate the performance on the validation set and the remaining test set.
    * Report the accuracy and loss on the validation set and remaining test set for each training set size.
* Compare the performance of your CNN on the different training set sizes and analyze the results.


In [31]:
training_sizes = [0.1, 0.2, 0.5, 0.8, 1, 1.2, 1.5, 1.8]
# training_sizes = [1.2, 1.5, 1.8]

In [32]:
def _train_step(model, optimizer, loss_fn, data_loader, max_batch_percent=1, reverse_loop=False):
    data_iter = iter(data_loader)

    if reverse_loop:
        data_iter = reversed(list(data_iter))

    for batch_idx, (inputs, labels) in enumerate(data_iter):
        if batch_idx >= max_batch_percent * len(data_loader):
            break

        inputs, labels = inputs.to(_device), labels.to(_device)

        optimizer.zero_grad()   # Zero gradients
        logits = model(inputs)   # Forward pass
        loss = loss_fn(logits, labels)  # Compute loss
        loss.backward() # Backward pass
        optimizer.step()    # Update weights

        wandb.log({
            "loss": loss.item(),
            "batch_idx": batch_idx,
        })


def train_step(model, optimizer, loss_fn, max_batch_percent=1):
    _train_step(model, optimizer, loss_fn, data_loaders['train'], max_batch_percent)

    if max_batch_percent > 1:
        _train_step(model, optimizer, loss_fn, data_loaders['test'], max_batch_percent - 1, True)

In [33]:
def evaluate(model, data_loader, max_batch_percent=1):
    loss = 0
    correct = 0
    loss_fn = nn.CrossEntropyLoss(
        reduction='sum',
    )

    with torch.no_grad():
        for idx, (inputs, labels) in enumerate(data_loader):


            inputs, labels = inputs.to(_device), labels.to(_device)

            outputs = model(inputs)
            loss += loss_fn(outputs, labels).item()
            pred = outputs.argmax(
                dim=1, keepdim=True
            )
            correct += (
                pred.eq(labels.view_as(pred)).sum().item()
            )

    loss /= len(data_loader.dataset)
    accuracy = correct / len(data_loader.dataset)
    return loss, accuracy

In [34]:
model.to(_device)

epochs = 5

learning_rate = 0.0001
momentum = 0.9
weight_decay = 0.0005
betas = (0.9, 0.999)

optimizer = torch.optim.Adam(
    model.parameters(),
    lr=learning_rate,
    betas=betas,
    weight_decay=weight_decay,
)
# optimizer = torch.optim.SGD(
#     model.parameters(),
#     lr=learning_rate,
#     momentum=momentum,
#     weight_decay=weight_decay,
# )

loss_fn = nn.CrossEntropyLoss()

for training_size in training_sizes:
    print(f"====>   Training size: {training_size}")

    init_weights(model)

    run = wandb.init(
        entity = "fejowo5522-",
        project= "NN_list3_OxFlow",
        config = {
            "task": 2,
            "batch_size": batch_size,
            "epochs": epochs,
            "optimizer": "SGD",
            "learning_rate": learning_rate,
            # "momentum": momentum,
            "betas": betas,
            "weight_decay": weight_decay,
            "loss_fn": "cross_entropy",
            "model": model_type,
            "training_size": training_size,
            "data_augmentation": data_augmentation,
        }
    )
    run.name = "Task2_" + str(int(time.time()))

    model.train()

    for epoch in tqdm(range(epochs), desc="Training", leave=False):
        train_step(model, optimizer, loss_fn, training_size)

        for loader, split in [
            (data_loaders['train'], 'train'),
            (data_loaders['valid'], 'valid'),
        ]:
            loss, accuracy = evaluate(model, loader)
            wandb.log({
                "epoch": epoch,
                f"{split}_loss": loss,
                f"{split}_accuracy": accuracy,
            })

    model.eval()
    loss, accuracy = evaluate(model, data_loaders['test'], 1 if training_size < 1 else 2 - training_size)
    wandb.log({
        "test_loss": loss,
        "test_accuracy": accuracy,
    })
    print(
        "Test set: Average loss: {:8.6f}, Accuracy: ({:4.1f}%)".format(
            loss,
            100.0 * accuracy,
        )
    )

    run.finish()

====>   Training size: 0.1


                                                       

Test set: Average loss: 4.734443, Accuracy: ( 1.8%)


0,1
batch_idx,▁█▁█▁█▁█▁█
epoch,▁▁▃▃▅▅▆▆██
loss,▄█▃▄▅▃▁▄▂▂
test_accuracy,▁
test_loss,▁
train_accuracy,▅▁▂▄█
train_loss,█▆▄▃▁
valid_accuracy,▄▂█▄▁
valid_loss,█▆▄▃▁

0,1
batch_idx,1.0
epoch,4.0
loss,8.8791
test_accuracy,0.01821
test_loss,4.73444
train_accuracy,0.01863
train_loss,7.77216
valid_accuracy,0.0098
valid_loss,8.02943


====>   Training size: 0.2


                                                       

Test set: Average loss: 4.588317, Accuracy: ( 2.9%)


0,1
batch_idx,▁▃▆█▁▃▆█▁▃▆█▁▃▆█▁▃▆█
epoch,▁▁▃▃▅▅▆▆██
loss,█▇▅▅▄▅▄▄▃▃▂▃▃▂▂▂▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▁▄█▄▃
train_loss,█▅▃▂▁
valid_accuracy,▂▇▅█▁
valid_loss,█▆▃▂▁

0,1
batch_idx,3.0
epoch,4.0
loss,5.73706
test_accuracy,0.02944
test_loss,4.58832
train_accuracy,0.01078
train_loss,5.65553
valid_accuracy,0.0098
valid_loss,5.66893


====>   Training size: 0.5


                                                       

Test set: Average loss: 4.544996, Accuracy: ( 1.0%)


0,1
batch_idx,▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█▁▂▃▄▅▆▇█
epoch,▁▁▃▃▅▅▆▆██
loss,█▅▆▅▆▅▄▄▄▄▃▃▄▄▃▃▂▂▂▂▃▃▂▂▁▁▂▂▂▁▁▂▂▁▂▁▁▂▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▁▃▄▁█
train_loss,█▄▂▂▁
valid_accuracy,▂▁▁█▄
valid_loss,█▄▂▁▁

0,1
batch_idx,7.0
epoch,4.0
loss,5.04862
test_accuracy,0.01025
test_loss,4.545
train_accuracy,0.01765
train_loss,4.96135
valid_accuracy,0.01765
valid_loss,5.01469


====>   Training size: 0.8


                                                       

Test set: Average loss: 4.527538, Accuracy: ( 1.8%)


0,1
batch_idx,▂▃▃▄▅▆▆▇▁▂▄▅▆▆▇█▂▂▃▄▅▆▆▇▇▁▂▂▃▄▅▆▆▇█▃▄▅▅█
epoch,▁▁▃▃▅▅▆▆██
loss,█▆▅▅▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▂▁▂▁▁▁▂▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▁▃▄▇█
train_loss,█▄▂▁▁
valid_accuracy,▃▁▁█▅
valid_loss,█▃▂▂▁

0,1
batch_idx,12.0
epoch,4.0
loss,5.01719
test_accuracy,0.01789
test_loss,4.52754
train_accuracy,0.02451
train_loss,4.70549
valid_accuracy,0.01961
valid_loss,4.76114


====>   Training size: 1


                                                       

Test set: Average loss: 4.391946, Accuracy: ( 3.7%)


0,1
batch_idx,▁▂▃▃▄▅▆▇██▁▂▅▆▇█▁▂▅▅▆▇██▁▃▄▅▆▇██▁▁▂▄▅▆▇█
epoch,▁▁▃▃▅▅▆▆██
loss,██▅▄▃▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▂▁▂▁▁▁▂▁▁▁▁▁▂▁▁▁▁▂
test_accuracy,▁
test_loss,▁
train_accuracy,▁▂▅█▅
train_loss,█▄▂▁▁
valid_accuracy,▁▃▄▄█
valid_loss,█▃▂▁▁

0,1
batch_idx,15.0
epoch,4.0
loss,4.80637
test_accuracy,0.03692
test_loss,4.39195
train_accuracy,0.02647
train_loss,4.45508
valid_accuracy,0.02745
valid_loss,4.55347


====>   Training size: 1.2


                                                       

Test set: Average loss: 3.945188, Accuracy: ( 8.8%)


0,1
batch_idx,▁▃▄▇▂▄▆█▁▂▄▆▁▂▄▆▇▃▅▂▄▄▅█▁▁▂▃▃▄▇█▁▂▄▅▇▁▅█
epoch,▁▁▃▃▅▅▆▆██
loss,██▇▅▄▄▃▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▁▁▂▂▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▁▃▆▆█
train_loss,█▅▄▂▁
valid_accuracy,▁▃▄█▇
valid_loss,█▅▄▂▁

0,1
batch_idx,19.0
epoch,4.0
loss,4.30529
test_accuracy,0.08847
test_loss,3.94519
train_accuracy,0.04216
train_loss,4.36202
valid_accuracy,0.04118
valid_loss,4.38088


====>   Training size: 1.5


                                                       

Test set: Average loss: 3.587945, Accuracy: (14.1%)


0,1
batch_idx,▂▃▁▁▃▇█▃▆▆█▁▂▂▂▁▁▁▅▆▆▇▇▂▃▂▃▆▆▆▂▂▂▃▂▃▄▄▅▆
epoch,▁▁▃▃▅▅▆▆██
loss,██▇▅▄▄▃▃▄▃▃▃▃▂▂▃▃▂▂▂▂▂▃▃▃▂▃▂▂▂▃▂▃▂▁▁▂▂▂▁
test_accuracy,▁
test_loss,▁
train_accuracy,▁▃▅▆█
train_loss,█▆▄▂▁
valid_accuracy,▁▄▅▆█
valid_loss,█▆▃▂▁

0,1
batch_idx,48.0
epoch,4.0
loss,3.47943
test_accuracy,0.14149
test_loss,3.58794
train_accuracy,0.07255
train_loss,3.99466
valid_accuracy,0.08922
valid_loss,3.98025


====>   Training size: 1.8


                                                       

Test set: Average loss: 3.212188, Accuracy: (20.0%)


0,1
batch_idx,▁▃▄▄▅▇▁▂▁▂▄▅▆▇▇▁▃▃▄▄▅▅▅██▂▄▅▅▆▇▇██▁▁▄▅▅▆
epoch,▁▁▃▃▅▅▆▆██
loss,██▅▄▄▄▄▄▄▄▃▃▃▃▄▃▃▄▅▂▃▂▃▃▃▃▂▂▂▁▂▂▂▂▃▃▂▁▂▁
test_accuracy,▁
test_loss,▁
train_accuracy,▁▄▅▆█
train_loss,█▆▄▃▁
valid_accuracy,▁▄▅▆█
valid_loss,█▆▄▃▁

0,1
batch_idx,77.0
epoch,4.0
loss,3.08039
test_accuracy,0.20003
test_loss,3.21219
train_accuracy,0.13235
train_loss,3.5716
valid_accuracy,0.13529
valid_loss,3.5493


## Task 3
* Implement a baseline AlexNet model using PyTorch.
* Training AlexNet may take a long time, so try to use GPU acceleration if available.


### Model class

In [35]:
class AlexNetBaseline(nn.Module):
    def __init__(self, num_classes=102):
        super(AlexNetBaseline, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

    def init_weights(self):
        for layer in self.modules():
            if isinstance(layer, nn.Conv2d) or isinstance(layer, nn.Linear):
                nn.init.kaiming_uniform_(layer.weight, nonlinearity='relu')
                if layer.bias is not None:
                    nn.init.zeros_(layer.bias)
            elif isinstance(layer, nn.BatchNorm2d):
                nn.init.ones_(layer.weight)
                if layer.bias is not None:
                    nn.init.zeros_(layer.bias)

    def evaluate(self, data_loader):
        loss = 0
        correct = 0
        loss_fn = nn.CrossEntropyLoss(
            reduction='sum',
        )

        with torch.no_grad():
            for inputs, labels in data_loader:
                inputs, labels = inputs.to(_device), labels.to(_device)

                outputs = self(inputs)
                loss += loss_fn(outputs, labels).item()
                pred = outputs.argmax(
                    dim=1, keepdim=True
                )
                correct += (
                    pred.eq(labels.view_as(pred)).sum().item()
                )

        loss /= len(data_loader.dataset)
        accuracy = correct / len(data_loader.dataset)
        return loss, accuracy

In [36]:
def train_step(model, optimizer, loss_fn, data_loader):
    data_iter = iter(data_loader)

    for batch_idx, (inputs, labels) in enumerate(data_iter):
        inputs, labels = inputs.to(_device), labels.to(_device)

        optimizer.zero_grad()   # Zero gradients
        logits = model(inputs)   # Forward pass
        loss = loss_fn(logits, labels)  # Compute loss
        loss.backward() # Backward pass
        optimizer.step()    # Update weights

        wandb.log({
            "loss": loss.item(),
            "batch_idx": batch_idx,
        })

### Train model

In [39]:
batch_size = 64
data_loaders = load_flowers(batch_size, (1, 1, 1), Loader=torch.utils.data.DataLoader)

In [40]:
# initialize
alexnet_model = AlexNetBaseline(num_classes=102).to(_device)
print(alexnet_model)

AlexNetBaseline(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace=True)
    (3): Dropout

In [41]:
# parameters
epochs = 100
learning_rate = 0.001
weight_decay = 0.0005

# optimizer and loss
optimizer = torch.optim.Adam(alexnet_model.parameters(), lr=learning_rate, weight_decay=weight_decay)
loss_fn = nn.CrossEntropyLoss()

alexnet_model.init_weights()

run = wandb.init(
    entity = "fejowo5522-",
    project= "NN_list3_OxFlow",
    config = {
        "task": 3,
        "batch_size": batch_size,
        "epochs": epochs,
        "optimizer": "Adam",
        "learning_rate": learning_rate,
        "weight_decay": weight_decay,
        "loss_fn": "cross_entropy",
        "model": 'alexnet',
        "data_augmentation": data_augmentation,
    }
)
run.name = "Task3_" + str(int(time.time()))


alexnet_model.train()
for epoch in tqdm(range(epochs), desc="Training", leave=False):
    train_step(alexnet_model, optimizer, loss_fn, data_loaders['train'])

    for loader, split in [
        (data_loaders['train'], 'train'),
        (data_loaders['valid'], 'valid'),
    ]:
        loss, accuracy = alexnet_model.evaluate(loader)
        wandb.log({
            "epoch": epoch,
            f"{split}_loss": loss,
            f"{split}_accuracy": accuracy,
        })

alexnet_model.eval()
loss, accuracy = alexnet_model.evaluate(data_loaders['test'])
wandb.log({
    "test_loss": loss,
    "test_accuracy": accuracy,
})
print(
    "Test set: Average loss: {:8.6f}, Accuracy: ({:4.1f}%)".format(
        loss,
        100.0 * accuracy,
    )
)

run.finish()

                                                           

Test set: Average loss: 4.624848, Accuracy: ( 0.3%)


0,1
batch_idx,▅▃▆▁▃█▁▇▁▆▃▄▇▇▅▅▁▄▁██▃▇▄█▅▇▁▃▇▄▅▂▁▃▅▅▃▁▄
epoch,▁▁▂▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇█████
loss,▇▄█▇▇▇▇▇▆▇▇▆▇▇▇▆▁▇▆▆▇▆▆▆▇▆▇▇▇▆▇▇▆▆▇▇▆▇▇▆
test_accuracy,▁
test_loss,▁
train_accuracy,█▆▁▆▄▆▅▅▆▆▄▆▆▃▆▅▆▆▅▃▃▆▅▅▆▆▅▆▆▆▆▆▆▆▆▆▆▆▆▆
train_loss,████████████████████▁███████████████████
valid_accuracy,▄█▄▄▃▃▃▃▃▁▁▃▃▃▃▃▃▃▃▂▄▂▇▄▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃
valid_loss,█▇▇▇▇▇▇▇▇▇▇▇▁▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇

0,1
batch_idx,15.0
epoch,99.0
loss,4.62633
test_accuracy,0.00342
test_loss,4.62485
train_accuracy,0.0098
train_loss,4.62498
valid_accuracy,0.0098
valid_loss,4.62498


## Task 4
* Input normalization: experiment with different input normalization techniques (e.g., mean subtraction, standardization) and analyze their impact on the model's performance.


In [42]:
def train_step(model, optimizer, loss_fn, data_loader):
    data_iter = iter(data_loader)

    for batch_idx, (inputs, labels) in enumerate(data_iter):
        inputs, labels = inputs.to(_device), labels.to(_device)

        optimizer.zero_grad()   # Zero gradients
        logits = model(inputs)   # Forward pass
        loss = loss_fn(logits, labels)  # Compute loss
        loss.backward() # Backward pass
        optimizer.step()    # Update weights

        wandb.log({
            "loss": loss.item(),
            "batch_idx": batch_idx,
        })

In [43]:
def test_data_loaders(
    normalization_method,
    data_loaders,
    model,
    epochs=10,
    learning_rate=0.001,
    weight_decay = 0.0005,
):
    # optimizer and loss
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
    loss_fn = nn.CrossEntropyLoss()

    model.init_weights()

    run = wandb.init(
        entity = "fejowo5522-",
        project= "NN_list3_OxFlow",
        config = {
            "task": 4,
            "batch_size": batch_size,
            "epochs": epochs,
            "optimizer": "Adam",
            "learning_rate": learning_rate,
            "weight_decay": weight_decay,
            "loss_fn": "cross_entropy",
            "model": 'alexnet',
            "normalization": normalization_method,
            "data_augmentation": data_augmentation,
        }
    )
    run.name = "Task4_" + str(int(time.time()))


    model.train()
    for epoch in tqdm(range(epochs), desc="Training", leave=False):
        train_step(model, optimizer, loss_fn, data_loaders['train'])

        for loader, split in [
            (data_loaders['train'], 'train'),
            (data_loaders['valid'], 'valid'),
        ]:
            loss, accuracy = model.evaluate(loader)
            wandb.log({
                "epoch": epoch,
                f"{split}_loss": loss,
                f"{split}_accuracy": accuracy,
            })

    model.eval()
    loss, accuracy = model.evaluate(data_loaders['test'])
    wandb.log({
        "test_loss": loss,
        "test_accuracy": accuracy,
    })
    print(
        "Test set: Average loss: {:8.6f}, Accuracy: ({:4.1f}%)".format(
            loss,
            100.0 * accuracy,
        )
    )

    run.finish()

In [44]:
normalization_transforms = {
    'mean_subtraction': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[1, 1, 1]),
    'standardization': transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
}

for name, normalization in normalization_transforms.items():
    data_loaders = load_flowers(
        batch_size,
        (1,1,1),
        train_transform=transforms.Compose([
            transforms.Resize((224, 224)),

            transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
            transforms.RandomRotation(15),
            transforms.RandomHorizontalFlip(),
            transforms.RandomAdjustSharpness(sharpness_factor=2),
            transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),

            transforms.ToTensor(),
            normalization,
        ]),
        eval_transform=transforms.Compose([
            transforms.Resize((224, 224)),
            transforms.ToTensor(),
            normalization,
        ]),
        Loader=torch.utils.data.DataLoader
    )


    model = AlexNetBaseline(num_classes=102).to(_device)

    test_data_loaders(name, data_loaders, model)



                                                         

Test set: Average loss: 4.624605, Accuracy: ( 1.5%)


0,1
batch_idx,▁▅▆▁▂▇█▁▄▄█▁▁▂▃█▂▃▃▄█▃▄██▁▂▅▇█▄▄▅█▃▁▂▄▅█
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,█▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▁▅▅▅▅█▆█▃▆
train_loss,█▄▄▄▁▂▁▃▂▃
valid_accuracy,█▇▇▆█▆▇▇█▁
valid_loss,█▆▇▇▁▆▅▄▃▃

0,1
batch_idx,15.0
epoch,9.0
loss,4.62719
test_accuracy,0.01529
test_loss,4.6246
train_accuracy,0.01078
train_loss,4.62502
valid_accuracy,0.00392
valid_loss,4.62499


                                                         

Test set: Average loss: 4.624555, Accuracy: ( 0.3%)


0,1
batch_idx,▂▄▅▇▄▁▁▃▆▇▅▆▇▇▁▃▅▇▁▂▅▅▆▇▇▃▅▅▇▁▃▃▅▅▁▃▅▅▆█
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▅▅▁█▆▅▆▅▄▅
train_loss,▄█▆▄▃▂▆▂▃▁
valid_accuracy,▆█▁▆█▁█▃▆▆
valid_loss,█▆▆▅▃▃▃▂▂▁

0,1
batch_idx,15.0
epoch,9.0
loss,4.62548
test_accuracy,0.00325
test_loss,4.62455
train_accuracy,0.0098
train_loss,4.625
valid_accuracy,0.0098
valid_loss,4.62499


## Zad 5
* Experiment with different hyperparameters such as learning rate, batch size, number of epochs, and optimizer choice (e.g., SGD, Adam).

### Testing function

In [None]:
def random_search_tuner(
    param_ranges,
    model_name,
    model_class,
    optimizer_name='Adam',
    loss_fn_name='cross_entropy',
    trials=10,
    epochs=10,
):
    optimizer = {
        'Adam': torch.optim.Adam,
        'SGD': torch.optim.SGD,
    }[optimizer_name]

    loss_fn = {
        'cross_entropy': nn.CrossEntropyLoss,
        'mse': nn.MSELoss,
    }[loss_fn_name]
    loss_fn = loss_fn()

    for trial in tqdm(range(trials), desc="Trial", leave=False, position=0):
        # randomly sample hyperparameters
        variables = {
            'learning_rate': 0,
            'weight_decay': 0,
            'momentum': 0,
            'batch_size': 0,
        }

        for name, variable in variables.items():
            if name in param_ranges:
                variables[name] = np.random.uniform(param_ranges[name][0], param_ranges[name][1])

        # init model
        model = model_class().to(_device)
        model.init_weights()

        # create optimizer
        if(optimizer_name == 'SGD'):
            _optimizer = optimizer(
                model.parameters(),
                lr=variables['learning_rate'],
                weight_decay=variables['weight_decay'],
                momentum=variables['momentum'],
            )
        elif(optimizer_name == 'Adam'):
            _optimizer = optimizer(
                model.parameters(),
                lr=variables['learning_rate'],
                weight_decay=variables['weight_decay'],
            )

        # create data loaders
        data_percent = (1,1,1)
        data_loaders = load_flowers(
            batch_size,
            data_percent,
        )

        # init wandb
        with Capturing() as output:
            run = wandb.init(
                entity = "fejowo5522-",
                project= "NN_list3_OxFlow",
                config = {
                    "task": 5,
                    "batch_size": variables['batch_size'],
                    "epochs": epochs,
                    "optimizer": optimizer_name,
                    "learning_rate": variables['learning_rate'],
                    "momentum": variables['momentum'],
                    "weight_decay": variables['weight_decay'],
                    "loss_fn": loss_fn_name,
                    "model": model_name,
                    "data_percent": data_percent,
                    "data_augmentation": data_augmentation,
                }
            )
            run.name = "Task5_" + str(int(time.time()))

        # train model
        model.train()
        for epoch in tqdm(range(epochs), desc="Epoch", leave=False, position=1):
            train_step(model, _optimizer, loss_fn, data_loaders['train'])

            for loader, split in [
                (data_loaders['train'], 'train'),
                (data_loaders['valid'], 'valid'),
            ]:
                loss, accuracy = model.evaluate(loader)
                with Capturing() as output:
                    wandb.log({
                        "epoch": epoch,
                        f"{split}_loss": loss,
                        f"{split}_accuracy": accuracy,
                    })

        model.eval()
        loss, accuracy = model.evaluate(data_loaders['test'])
        with Capturing() as output:
            wandb.log({
                "test_loss": loss,
                "test_accuracy": accuracy,
            })
            # print(
            #     "Test set: Average loss: {:8.6f}, Accuracy: ({:4.1f}%)".format(
            #         loss,
            #         100.0 * accuracy,
            #     )
            # )
            run.finish()


### Model class

In [46]:
def train_step(model, optimizer, loss_fn, data_loader):
    data_iter = iter(data_loader)

    for batch_idx, (inputs, labels) in enumerate(data_iter):
        inputs, labels = inputs.to(_device), labels.to(_device)

        optimizer.zero_grad()   # Zero gradients
        logits = model(inputs)   # Forward pass
        loss = loss_fn(logits, labels)  # Compute loss
        loss.backward() # Backward pass
        optimizer.step()    # Update weights

        with Capturing() as output:
            wandb.log({
                "loss": loss.item(),
                "batch_idx": batch_idx,
            })

### Perform test on parameters

In [48]:
random_search_tuner(
    param_ranges={
        'learning_rate': (1e-5, 1e-1),
        'weight_decay': (0.0, 0.1),
    },
    model_name = 'AlexNetBaseline',
    model_class = AlexNetBaseline,
    optimizer_name='Adam',
    loss_fn_name='cross_entropy',
    trials=10,
    epochs=10,
)

Trial:   0%|          | 0/10 [00:00<?, ?it/s]



Test set: Average loss: 4.626618, Accuracy: ( 0.7%)


0,1
batch_idx,▁▁▂▃▃▅▇▁▄▅▂▄▇█▂▆█▃▇▂▄▅▆▇▇▁▂▄▆▆▂▄▅▇▃▄▅▇▂▅
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,█▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▁▇▆▄▇▇▆▇█▅
train_loss,█▁▁▁▁▁▁▁▁▁
valid_accuracy,▅▁▄▁▂▃▄▄█▄
valid_loss,█▁▁▁▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,4.62958
test_accuracy,0.00748
test_loss,4.62662
train_accuracy,0.00882
train_loss,4.6246
valid_accuracy,0.01078
valid_loss,4.62499


Trial:  10%|█         | 1/10 [03:48<34:14, 228.25s/it]



Test set: Average loss: 4.672667, Accuracy: ( 0.4%)


0,1
batch_idx,▂▃▄▄▂▄▆▇█▁▅▆▁█▃▄▄▅▆▇▁▁▆██▁▂▄▇▁▅▁▂▄▅▆▇▇█▆
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▃▃▃▁▃▃█▃▃▃
train_loss,▁▂▁█▂▁▂▂▁▄
valid_accuracy,▁▁▁█▁▁▃▁▃▁
valid_loss,▁▃▁█▂▁▂▂▁▅

0,1
batch_idx,15.0
epoch,9.0
loss,4.5841
test_accuracy,0.00407
test_loss,4.67267
train_accuracy,0.0098
train_loss,4.67572
valid_accuracy,0.0098
valid_loss,4.67169


Trial:  20%|██        | 2/10 [07:35<30:19, 227.42s/it]



Test set: Average loss: 4.627571, Accuracy: ( 0.8%)


0,1
batch_idx,▁▂▃▇▇▂▃▅▇█▂▅▅▇▇▄▅▆▁▄▆▁▃▇▄▆▇█▁▁▆▁▃▄▆██▁▃▅
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,▁▁▁▁▁█▃▂▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,█▅▃▂▄▃▃▃▁▄
train_loss,█▁▁▁▁▁▁▁▁▁
valid_accuracy,▁▆▆▇▅▆▆▅▁█
valid_loss,█▁▂▁▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,4.63346
test_accuracy,0.00764
test_loss,4.62757
train_accuracy,0.01078
train_loss,4.6254
valid_accuracy,0.01275
valid_loss,4.62431


Trial:  30%|███       | 3/10 [11:23<26:33, 227.65s/it]



Test set: Average loss: 4.624457, Accuracy: ( 1.0%)


0,1
batch_idx,▃▄▄▇█▅█▂▃▄▁▃▆▆▂▇▇▃▄▄▇█▁▁▂▄▅▅▂▂▇█▁▂▃▆▆▇▃█
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,██▁▆▆▆▅▆▆▆
train_loss,▃▁█▁▁▁▃▁▁▁
valid_accuracy,▃▂█▁▃▂▂▂▂▂
valid_loss,▃▁█▁▁▄▃▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,7371.14209
test_accuracy,0.01008
test_loss,4.62446
train_accuracy,0.0098
train_loss,304.64776
valid_accuracy,0.0098
valid_loss,98.68562


Trial:  40%|████      | 4/10 [15:09<22:43, 227.32s/it]



Test set: Average loss: 4.628027, Accuracy: ( 0.3%)


0,1
batch_idx,▂▄▆█▂▅▆▇▁▄█▁▆█▁▅▅▇▁▂▄▄▆▇▁▂▃▅▁▅▇██▁▁█▁▃▄█
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,▁▁▁▁▂▂▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,██▁███████
train_loss,▁▁▆█▁▁▁▁▁▁
valid_accuracy,▃█▃▁▃▃▃▃▃▃
valid_loss,▁▁▆█▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,1763265.25
test_accuracy,0.00325
test_loss,4.62803
train_accuracy,0.0098
train_loss,6217056.32939
valid_accuracy,0.0098
valid_loss,7116.0379


Trial:  50%|█████     | 5/10 [19:00<19:02, 228.55s/it]



Test set: Average loss: 4.624874, Accuracy: ( 0.4%)


0,1
batch_idx,▂▃▆▇▂▆▇▁▃▃▆█▂▃▆█▂▃█▁▆█▄▅▅▂▃▄▅▆▇█▁▃█▃▃▄▅█
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,█▂▃▃▂▃▃▁▁▄▃▂▂▂▂▂▂▂▂▂▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂
test_accuracy,▁
test_loss,▁
train_accuracy,▃▄▃█▄▁▂▆▃▄
train_loss,█▃▂▅▅▁▇▇█▇
valid_accuracy,▄█▅▇▄▄▁▄▄▄
valid_loss,█▁▃▄▃▄▆▆▇▇

0,1
batch_idx,15.0
epoch,9.0
loss,4.62678
test_accuracy,0.00407
test_loss,4.62487
train_accuracy,0.0098
train_loss,4.62498
valid_accuracy,0.0098
valid_loss,4.62497


Trial:  60%|██████    | 6/10 [22:48<15:13, 228.30s/it]



Test set: Average loss: 4.630321, Accuracy: ( 1.6%)


0,1
batch_idx,▁▂▃▃▄█▁▅▅▅▃▅▆▄▇▁▄▅▅▆▇▇█▁▃▇▃▄▆▇▂▅▆▂▃▃▄▅▇▇
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,▁▁▁▁▁▁▁▁▃▁▁▂▁▂▁▅▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▁▆▃▃▁▆█▃▃▃
train_loss,█▁▁▂▁▁▁▁▁▁
valid_accuracy,▅█▅▆▅▅▁▆▅▅
valid_loss,█▁▁▂▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,4.64729
test_accuracy,0.01626
test_loss,4.63032
train_accuracy,0.0098
train_loss,8695.68614
valid_accuracy,0.0098
valid_loss,327.65067


Trial:  70%|███████   | 7/10 [26:35<11:23, 227.99s/it]



Test set: Average loss: 4.622591, Accuracy: ( 0.6%)


0,1
batch_idx,▃▅▂▃▄▇▇▂▃▅▁▃▄▅▇▃▃▄▆▇▁▃▅▅▇▂▅█▁▄█▁▃▇█▁▃▅▅█
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,█▁▅▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▁▁█▁▁▁▁▁▁▁
train_loss,█▁▁▁▁▁▁▁▁▁
valid_accuracy,▁█▆▆▆▆▆▆▆▆
valid_loss,█▁▁▁▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,4.64324
test_accuracy,0.00634
test_loss,4.62259
train_accuracy,0.0098
train_loss,4.62523
valid_accuracy,0.0098
valid_loss,4.62523


Trial:  80%|████████  | 8/10 [30:31<07:41, 230.55s/it]



Test set: Average loss: 4.629289, Accuracy: ( 1.1%)


0,1
batch_idx,▂▄▅▆▆█▃▄▅▆▄▅▆▇▆▁▂▄▆█▇▃▅▅▆▇▇█▃▆█▁▄▁▂▃▅▆▆█
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,█▁▁▁▁▂▁▂▃▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▁▁▁▁▁▁▄▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▁▁▁▁▁▁▁▁▁▁
train_loss,▁█▁▅▅▂▁▁▁▁
valid_accuracy,█▁▁▁▁▁▁▁▁▁
valid_loss,▁█▁▂▅▁▁▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,4.64594
test_accuracy,0.01057
test_loss,4.62929
train_accuracy,0.0098
train_loss,1305.70387
valid_accuracy,0.0098
valid_loss,2171.545


Trial:  90%|█████████ | 9/10 [34:19<03:49, 229.74s/it]



Test set: Average loss: 4.624595, Accuracy: ( 1.1%)


0,1
batch_idx,▁▆▁▂▃█▁▂▄▅▃▃▆▇█▄▆▇█▁▅▅▁▁▂▆█▁▅▆██▂▃▄█▁▃▃▇
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,▁▁▁▁▂█▂▁▄▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▅▃▃▃▁▆▃█▅▃
train_loss,█▁▂▁▁▁▁▁▁▁
valid_accuracy,█▅▅▁▁▅▅▃▃▅
valid_loss,█▁▁▁▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,4.64725
test_accuracy,0.0109
test_loss,4.6246
train_accuracy,0.0098
train_loss,4.62798
valid_accuracy,0.0098
valid_loss,4.62652


                                                       

In [49]:
random_search_tuner(
    param_ranges={
        'learning_rate': (1e-5, 1e-1),
        'weight_decay': (0.0, 0.1),
        'momentum': (0.0, 0.01),
    },
    model_name = 'AlexNetBaseline',
    model_class = AlexNetBaseline,
    optimizer_name='SGD',
    loss_fn_name='cross_entropy',
    trials=10,
    epochs=20,
)

Trial:   0%|          | 0/10 [00:00<?, ?it/s]



Test set: Average loss:      nan, Accuracy: ( 0.3%)


0,1
batch_idx,▂▂▄▅▇▄▆▃▄▅▁▂▅▆▂▆▄█▁▃▆█▂▇▁▆▇█▃▁▂▄█▅▆▃▆▁▇█
epoch,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇████
loss,▁
test_accuracy,▁
train_accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
valid_accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,19.0
loss,
test_accuracy,0.00325
test_loss,
train_accuracy,0.0098
train_loss,
valid_accuracy,0.0098
valid_loss,


Trial:  10%|█         | 1/10 [07:05<1:03:53, 426.00s/it]



Test set: Average loss: 3.793891, Accuracy: ( 9.3%)


0,1
batch_idx,▁▁▄▄▅▇█▄▆▇█▅▁▆█▃▁▇█▇▆▇▇█▁▂▄▇▄▅▇▂▇▂▃▅▆▇▂▅
epoch,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇████
loss,█▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▂▁▂▂▂▂▂▃▃▃▄▄▄▅▆▇▇▇██
train_loss,███▇▇▇▆▆▅▅▅▄▄▃▃▃▂▂▁▁
valid_accuracy,▁▁▂▂▂▃▂▃▃▃▃▄▄▅▆▅▅▇█▇
valid_loss,██▇▇▇▆▆▆▅▅▅▄▄▃▃▃▃▂▁▁

0,1
batch_idx,15.0
epoch,19.0
loss,3.90171
test_accuracy,0.0927
test_loss,3.79389
train_accuracy,0.10588
train_loss,3.69677
valid_accuracy,0.07549
valid_loss,3.85274


Trial:  20%|██        | 2/10 [14:22<57:35, 431.95s/it]  



Test set: Average loss:      nan, Accuracy: ( 0.3%)


0,1
batch_idx,▄▂▃▃▆▄▆▆▁▃▅▆▇█▁▆▆▅▆▂▄▆█▁▆▂▃▆▄▇▂▄▂█▂█▁▂▃█
epoch,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇████
loss,▁
test_accuracy,▁
train_accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
valid_accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,19.0
loss,
test_accuracy,0.00325
test_loss,
train_accuracy,0.0098
train_loss,
valid_accuracy,0.0098
valid_loss,


Trial:  30%|███       | 3/10 [21:33<50:21, 431.66s/it]



Test set: Average loss: 3.809682, Accuracy: ( 8.1%)


0,1
batch_idx,▂▆█▁▄▁▁▆█▄▁▁▄▇▅▇▁▃▆▂▄▆▆▇█▁▂▄▄▇█▅▅█▄▄▁▂█▇
epoch,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇████
loss,█▆▇▆▇▆▆▆▆▆▆▅▆▅▆▅▅▅▅▅▄▄▄▅▄▃▄▄▃▃▂▂▂▂▂▂▂▂▂▁
test_accuracy,▁
test_loss,▁
train_accuracy,▁▂▂▁▂▂▃▃▃▄▄▅▅▅▆▆▆▆▇█
train_loss,██▇▇▇▇▆▆▆▅▅▄▄▃▃▃▂▂▂▁
valid_accuracy,▁▁▁▁▂▂▂▃▃▃▄▃▄▅▆▅▇▆▆█
valid_loss,█▇▇▇▇▇▆▆▆▅▄▄▄▃▃▃▂▂▂▁

0,1
batch_idx,15.0
epoch,19.0
loss,4.0165
test_accuracy,0.08099
test_loss,3.80968
train_accuracy,0.09706
train_loss,3.7461
valid_accuracy,0.08235
valid_loss,3.90515


Trial:  40%|████      | 4/10 [28:40<42:59, 429.99s/it]



Test set: Average loss: 4.375604, Accuracy: ( 2.9%)


0,1
batch_idx,▆▃▆▄██▁▁▂▅▆█▄▅█▄▁▃▇▁▄▃▃▄▅▂▄█▃▇▁▅▇▇▅▁▃▅▆▂
epoch,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇████
loss,▆▆██▆▆▆▆▆▆▆▆▆▆▆▆▇▆▇▆▆▆▅▆▆▆▆▆▅▅▅▇▅▄▄▄▄▃▃▁
test_accuracy,▁
test_loss,▁
train_accuracy,▅▃▄█▄▁▅▅▅▄▄▇▄▇▆▄▂▅▂▆
train_loss,███████████▇▇▇▇▇▆▃▂▁
valid_accuracy,▃▄▆▂▁▃▅▇▅▄▇▅▄▆█▄▅▂▅▇
valid_loss,██████▇▇▇▇▇▇▇▇▇▇▅▃▁▁

0,1
batch_idx,15.0
epoch,19.0
loss,4.33893
test_accuracy,0.02911
test_loss,4.3756
train_accuracy,0.01569
train_loss,4.38175
valid_accuracy,0.01569
valid_loss,4.40671


Trial:  50%|█████     | 5/10 [35:47<35:44, 428.95s/it]



Test set: Average loss:      nan, Accuracy: ( 0.3%)


0,1
batch_idx,▁▂▇▂▃█▁▄▆▂▁▅▇▁▂▆▇█▄▇▃▅▁▃▅▆▇█▁▇▂▃▁▄▅▃▅▃▄▆
epoch,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇████
test_accuracy,▁
train_accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
valid_accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,19.0
loss,
test_accuracy,0.00325
test_loss,
train_accuracy,0.0098
train_loss,
valid_accuracy,0.0098
valid_loss,


Trial:  60%|██████    | 6/10 [42:46<28:21, 425.47s/it]



Test set: Average loss: 4.420977, Accuracy: ( 1.2%)


0,1
batch_idx,▆▁▇█▁██▂▃▆▃▂▆▂█▁▄█▂▅█▁▁▄▅▆█▁▂▄▃▄▆▁▅▅▃▄▆█
epoch,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇████
loss,█▄▄▃▃▃▃▃▃▃▃▂▂▂▂▃▂▁▃▂▃▂▂▃▃▂▂▂▂▂▁▃▂▂▃▂▃▂▂▁
test_accuracy,▁
test_loss,▁
train_accuracy,▂▁▃▃▃▄▅▆▄▆▆▇▇▆█▅▇▆▅▅
train_loss,██▇▆▄▃▃▄▃▂▂▂▁▂▄▂▁▁▂▂
valid_accuracy,▄▄▂▁▂▅▄▆▃▅▆▆▇█▇▅▇▅▄▅
valid_loss,█▇▆▅▃▃▂▃▃▂▂▂▁▁▃▃▁▁▁▂

0,1
batch_idx,15.0
epoch,19.0
loss,4.17776
test_accuracy,0.01155
test_loss,4.42098
train_accuracy,0.02059
train_loss,4.37649
valid_accuracy,0.02059
valid_loss,4.4291


Trial:  70%|███████   | 7/10 [49:51<21:15, 425.24s/it]



Test set: Average loss:      nan, Accuracy: ( 0.3%)


0,1
batch_idx,▂▇▄▂▃▁▃▇▁▂▄█▁▄▆▂▂▆▁▂▅▆▇▄▅▇▁▂▃▇▄▇▁█▁█▆▆▁▇
epoch,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇████
loss,▁█
test_accuracy,▁
train_accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
valid_accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,19.0
loss,
test_accuracy,0.00325
test_loss,
train_accuracy,0.0098
train_loss,
valid_accuracy,0.0098
valid_loss,


Trial:  80%|████████  | 8/10 [57:01<14:13, 426.84s/it]



Test set: Average loss:      nan, Accuracy: ( 0.3%)


0,1
batch_idx,▂█▂▅▅▇█▁▂▅▇█▁▅▇▁█▁▃▄▄▁▃▆▁▄▅██▇▄▇▁▄▅▇▂█▁▅
epoch,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇████
test_accuracy,▁
train_accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
valid_accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,19.0
loss,
test_accuracy,0.00325
test_loss,
train_accuracy,0.0098
train_loss,
valid_accuracy,0.0098
valid_loss,


Trial:  90%|█████████ | 9/10 [1:04:07<07:06, 426.50s/it]



Test set: Average loss:      nan, Accuracy: ( 0.3%)


0,1
batch_idx,▂▄▄▇▃▂▅▃▅▇▁█▂▃▄▄▇▇▁▁▆▆▁▄▇█▂▄▇▁▅▅▇▆▁▁▁▆▃▇
epoch,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇████
loss,▁
test_accuracy,▁
train_accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
valid_accuracy,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,19.0
loss,
test_accuracy,0.00325
test_loss,
train_accuracy,0.0098
train_loss,
valid_accuracy,0.0098
valid_loss,


                                                         

## Zad 6
* Modify your CNN architecture to include batch normalization and dropout layers.
* Experiment with different dropout rates and analyze their impact on the model's performance.


### Model class

In [50]:
class AlexNetBaseline(nn.Module):
    def __init__(self, dropout_p=0.5, num_classes=102):
        super(AlexNetBaseline, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.Dropout(p=dropout_p),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),

            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.Dropout(p=dropout_p),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),

            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.Dropout(p=dropout_p),
            nn.ReLU(inplace=True),

            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.Dropout(p=dropout_p),
            nn.ReLU(inplace=True),

            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.Dropout(p=dropout_p),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.classifier = nn.Sequential(
            nn.Dropout(p=dropout_p),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),

            nn.Dropout(p=dropout_p),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

    def init_weights(self):
        for layer in self.modules():
            if isinstance(layer, nn.Conv2d) or isinstance(layer, nn.Linear):
                nn.init.kaiming_uniform_(layer.weight, nonlinearity='relu')
                if layer.bias is not None:
                    nn.init.zeros_(layer.bias)
            elif isinstance(layer, nn.BatchNorm2d):
                nn.init.ones_(layer.weight)
                if layer.bias is not None:
                    nn.init.zeros_(layer.bias)

    def evaluate(self, data_loader):
        loss = 0
        correct = 0
        loss_fn = nn.CrossEntropyLoss(
            reduction='sum',
        )

        with torch.no_grad():
            for inputs, labels in data_loader:
                inputs, labels = inputs.to(_device), labels.to(_device)

                outputs = self(inputs)
                loss += loss_fn(outputs, labels).item()
                pred = outputs.argmax(
                    dim=1, keepdim=True
                )
                correct += (
                    pred.eq(labels.view_as(pred)).sum().item()
                )

        loss /= len(data_loader.dataset)
        accuracy = correct / len(data_loader.dataset)
        return loss, accuracy

### Training functions

In [51]:
def train_step(model, optimizer, loss_fn, data_loader):
    data_iter = iter(data_loader)

    for batch_idx, (inputs, labels) in enumerate(data_iter):
        inputs, labels = inputs.to(_device), labels.to(_device)

        optimizer.zero_grad()   # Zero gradients
        logits = model(inputs)   # Forward pass
        loss = loss_fn(logits, labels)  # Compute loss
        loss.backward() # Backward pass
        optimizer.step()    # Update weights

        wandb.log({
            "loss": loss.item(),
            "batch_idx": batch_idx,
        })

In [58]:
def test_dropout(
    dropout,
    model_class,
    epochs=10,
    learning_rate=0.001,
    weight_decay = 0.0005,
):
    model = model_class(dropout).to(_device)
    model.init_weights()

    # optimizer and loss
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
    loss_fn = nn.CrossEntropyLoss()

    with Capturing() as output:
        run = wandb.init(
            entity = "fejowo5522-",
            project= "NN_list3_OxFlow",
            config = {
                "task": 6,
                "batch_size": batch_size,
                "epochs": epochs,
                "data_augmentation": data_augmentation,
                "optimizer": "Adam",
                "loss_fn": "cross_entropy",
                "learning_rate": learning_rate,
                "weight_decay": weight_decay,
                "model": 'alexnet',
                "dropout": dropout,
            }
        )
        run.name = "Task6_" + str(int(time.time()))


    model.train()
    for epoch in tqdm(range(epochs), desc="Training", leave=False, position=1):
        train_step(model, optimizer, loss_fn, data_loaders['train'])

        for loader, split in [
            (data_loaders['train'], 'train'),
            (data_loaders['valid'], 'valid'),
        ]:
            loss, accuracy = model.evaluate(loader)
            wandb.log({
                "epoch": epoch,
                f"{split}_loss": loss,
                f"{split}_accuracy": accuracy,
            })

    model.eval()
    loss, accuracy = model.evaluate(data_loaders['test'])
    with Capturing() as output:
        wandb.log({
            "test_loss": loss,
            "test_accuracy": accuracy,
        })
    print(
        "Test set: Average loss: {:8.6f}, Accuracy: ({:4.1f}%)".format(
            loss,
            100.0 * accuracy,
        )
    )

    with Capturing() as output:
        run.finish()

### Test dropouts

In [59]:
for drop in tqdm(range(0, 100, 10), position=0):
    test_dropout(
        drop/100,
        AlexNetBaseline,
        epochs=10,
        learning_rate=0.001,
        weight_decay = 0.0005,
    )

  0%|          | 0/10 [00:00<?, ?it/s]



Test set: Average loss: 4.625293, Accuracy: ( 0.5%)


0,1
batch_idx,▁▃▃▄▄█▃▃▆▇▃▄▆▆▇▅██▁▅▃▄▅▇▇█▃▄▆█▃▃▄▄▅▇▃▃▄█
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,█▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▁▁▁▁▁▁▁▁▁▁
train_loss,▁█▅▄▄▂▂▂▂▂
valid_accuracy,▁▁▁▁▁▁▁▁▁▁
valid_loss,▁█▇▆▆▆▆▅▅▅

0,1
batch_idx,15.0
epoch,9.0
loss,4.62573
test_accuracy,0.00472
test_loss,4.62529
train_accuracy,0.0098
train_loss,4.62501
valid_accuracy,0.0098
valid_loss,4.62501


 10%|█         | 1/10 [03:52<34:54, 232.74s/it]



Test set: Average loss: 4.623948, Accuracy: ( 0.9%)


0,1
batch_idx,▁▄▆▄▇█▆▇█▁▆▁▂▃▅▅▆▆▇█▁▅█▁▁▃▄▅▅▇▁▃▄▆█▁▃▄▅█
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,█▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▆▆▅▁▅▆▆▆█▆
train_loss,▇█▁▃▂▂▂▂▁▁
valid_accuracy,▁▅▁█▁▁▁▁▃▁
valid_loss,▁▇█▂▂▁▁▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,4.62693
test_accuracy,0.00943
test_loss,4.62395
train_accuracy,0.0098
train_loss,4.62501
valid_accuracy,0.0098
valid_loss,4.62503


 20%|██        | 2/10 [07:40<30:39, 229.99s/it]



Test set: Average loss: 4.625145, Accuracy: ( 1.1%)


0,1
batch_idx,▁▅▅▆▁▂▄▁▂▃▇▄▄▇▇▆█▁▁▂▃▄█▅▆▁▂▅▇▇▁▂▃▄▅█▂▅▅█
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,▂█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▁▁▁▁▁▁▁█▁▁
train_loss,█▆▅▃▂▂▂▁▁▁
valid_accuracy,███████▁██
valid_loss,█▄▄▂▂▂▁▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,4.62638
test_accuracy,0.01057
test_loss,4.62515
train_accuracy,0.0098
train_loss,4.62502
valid_accuracy,0.0098
valid_loss,4.62501


 30%|███       | 3/10 [11:26<26:37, 228.15s/it]



Test set: Average loss: 4.624293, Accuracy: ( 1.0%)


0,1
batch_idx,▄▅▆▁▄▁▃▃▆▇█▂▅▆▇▄▅▆▂▃▇▇█▂▅▆█▂▃▅██▁▂▃▂▃▄▅▆
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,▂█▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▆▅▁▁▃▁▃█▁▁
train_loss,█▂▂▂▁▁▁▁▁▁
valid_accuracy,▆▁▄▄▃▄▄█▄▄
valid_loss,█▂▂▂▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,4.62548
test_accuracy,0.01008
test_loss,4.62429
train_accuracy,0.0098
train_loss,4.62498
valid_accuracy,0.0098
valid_loss,4.625


 40%|████      | 4/10 [16:00<24:36, 246.16s/it]



Test set: Average loss: 4.625585, Accuracy: ( 1.6%)


0,1
batch_idx,▂▃▃▁▂▄▅▇█▂▄▅▁▄▄▇█▁▃▆▂▃▅▆█▁▄▆▆▇▂▃▆▇█▅▆▂▂▇
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,█▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▁▇▅▄▃▄█▆▃▄
train_loss,█▁▁▁▁▁▁▁▁▁
valid_accuracy,█▇▅▆▆▄▇▁▄▆
valid_loss,█▁▁▁▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,4.62718
test_accuracy,0.01626
test_loss,4.62559
train_accuracy,0.0098
train_loss,4.62504
valid_accuracy,0.0098
valid_loss,4.62499


 50%|█████     | 5/10 [19:46<19:54, 238.98s/it]



Test set: Average loss: 4.624671, Accuracy: ( 1.5%)


0,1
batch_idx,▂▃▅▆█▂▃▇█▅▆▇█▃▆▃▄▆█▃▇▁▃▄▅█▁▃▅▆█▁▁▂▂▅▂▃▇█
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▅▅▃▃▅▁▅▁▆█
train_loss,█▁▁▁▁▁▁▁▁▁
valid_accuracy,▁▄▅▄▄▆▁▄▅█
valid_loss,█▁▁▁▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,4.62845
test_accuracy,0.01529
test_loss,4.62467
train_accuracy,0.01176
train_loss,4.62506
valid_accuracy,0.01471
valid_loss,4.62509


 60%|██████    | 6/10 [23:33<15:39, 234.79s/it]



Test set: Average loss: 4.623091, Accuracy: ( 0.9%)


0,1
batch_idx,▂▄▅▆█▃▄▅▇▇▂▄▅▆▆▂▄▅▇█▄▅▇▇▁▃▄▅█▁▅▅█▁▃▁▂▄▅█
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▂▁▅▁▆▆▄▅▅█
train_loss,█▁▁▁▁▁▁▁▁▁
valid_accuracy,▁▁▃▄▇▄▄▄██
valid_loss,█▁▁▁▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,4.58498
test_accuracy,0.00943
test_loss,4.62309
train_accuracy,0.01961
train_loss,4.60428
valid_accuracy,0.01863
valid_loss,4.59446


 70%|███████   | 7/10 [27:20<11:36, 232.22s/it]



Test set: Average loss: 4.624409, Accuracy: ( 1.5%)


0,1
batch_idx,▁▄█▁▂▆▄▅▇▇▂▃▇▇▁▄▅▆█▂▄▂▃▅▅▂▃▄▅▅▇██▁▂▆██▅▇
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▂▁▆▇▂▃█▅▅▄
train_loss,█▁▁▁▁▁▁▁▁▁
valid_accuracy,▁▃▁▇▃▇▄█▄▆
valid_loss,█▁▁▁▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,4.57721
test_accuracy,0.01529
test_loss,4.62441
train_accuracy,0.01275
train_loss,4.64198
valid_accuracy,0.01275
valid_loss,4.63484


 80%|████████  | 8/10 [31:06<07:40, 230.24s/it]



Test set: Average loss: 4.623012, Accuracy: ( 0.5%)


0,1
batch_idx,▁▃▃▅█▃▄▅▄▅▆▇▂▃▅▂▃▁▂▃▅▆▇▇█▃▃▄▇█▃▄▅▇▇▃▃▄▅█
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,█▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▄▅▅▇▁▅▅▃▃█
train_loss,█▂▁▁▁▁▁▁▁▁
valid_accuracy,▅▁▅▇▁▂█▂▂▁
valid_loss,█▂▁▁▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,5.12937
test_accuracy,0.00472
test_loss,4.62301
train_accuracy,0.01373
train_loss,5.60694
valid_accuracy,0.00686
valid_loss,5.53309


 90%|█████████ | 9/10 [34:55<03:49, 229.75s/it]



Test set: Average loss: 4.628084, Accuracy: ( 0.6%)


0,1
batch_idx,▃▅▅▇█▃▅▆▁▅▇█▁▁▄▆▇▇█▁▃▄█▁▁▄▄▆▂▃▁▁▄▅▆▂▄▆▇█
epoch,▁▁▂▂▃▃▃▃▄▄▅▅▆▆▆▆▇▇██
loss,██▄▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_accuracy,▁
test_loss,▁
train_accuracy,▃▁▃▇▇▂▃▇█▄
train_loss,█▂▁▁▁▁▁▁▁▁
valid_accuracy,▆▄▃█▇▅▂▄▁▃
valid_loss,█▂▁▁▁▁▁▁▁▁

0,1
batch_idx,15.0
epoch,9.0
loss,83.79082
test_accuracy,0.00585
test_loss,4.62808
train_accuracy,0.00882
train_loss,70.5357
valid_accuracy,0.00784
valid_loss,59.25494


100%|██████████| 10/10 [38:42<00:00, 232.28s/it]


## Zad 7
* Implement data augmentation techniques such as random rotations, shifts, flips, and zooms on the training dataset.
* Train your CNN with augmented data and compare the performance with the baseline model trained on the original data.


implemented already in `load_flowers` data loader and global `data_augmentation` variable or custom `transfrom`'s

## Zad 8
* Implement residual connections in your CNN architecture; see the [ResNet paper](https://arxiv.org/abs/1512.03385) for more details.
* Implement inception modules in your CNN architecture; see the [GoogLeNet paper](https://arxiv.org/abs/1409.4842) for more details.


### Model classes

#### ResNet

In [None]:
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out += identity
        out = self.relu(out)
        return out


class ResNet(nn.Module):
    def __init__(self, num_classes=102):
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.layer1 = self._make_layer(64, 64, 2)
        self.layer2 = self._make_layer(64, 128, 2, stride=2)
        self.layer3 = self._make_layer(128, 256, 2, stride=2)
        self.layer4 = self._make_layer(256, 512, 2, stride=2)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512, num_classes)

    def _make_layer(self, in_channels, out_channels, blocks, stride=1):
        downsample = None
        if stride != 1 or in_channels != out_channels:
            downsample = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels),
            )

        layers = [ResidualBlock(in_channels, out_channels, stride, downsample)]
        for _ in range(1, blocks):
            layers.append(ResidualBlock(out_channels, out_channels))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x

#### Inception

In [None]:
class InceptionModule(nn.Module):
    def __init__(self, in_channels, out_1x1, red_3x3, out_3x3, red_5x5, out_5x5, out_pool):
        super(InceptionModule, self).__init__()
        self.branch1 = nn.Sequential(
            nn.Conv2d(in_channels, out_1x1, kernel_size=1),
            nn.ReLU(inplace=True),
        )

        self.branch2 = nn.Sequential(
            nn.Conv2d(in_channels, red_3x3, kernel_size=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(red_3x3, out_3x3, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
        )

        self.branch3 = nn.Sequential(
            nn.Conv2d(in_channels, red_5x5, kernel_size=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(red_5x5, out_5x5, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
        )

        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            nn.Conv2d(in_channels, out_pool, kernel_size=1),
            nn.ReLU(inplace=True),
        )

    def forward(self, x):
        branch1 = self.branch1(x)
        branch2 = self.branch2(x)
        branch3 = self.branch3(x)
        branch4 = self.branch4(x)
        outputs = torch.cat([branch1, branch2, branch3, branch4], dim=1)
        return outputs


class InceptionNet(nn.Module):
    def __init__(self, num_classes=102):
        super(InceptionNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.inception1 = InceptionModule(64, 64, 96, 128, 16, 32, 32)
        self.inception2 = InceptionModule(256, 128, 128, 192, 32, 96, 64)

        self.maxpool2 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.inception3 = InceptionModule(480, 192, 96, 208, 16, 48, 64)
        self.inception4 = InceptionModule(512, 160, 112, 224, 24, 64, 64)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512, num_classes)

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.maxpool1(x)

        x = self.inception1(x)
        x = self.inception2(x)
        x = self.maxpool2(x)

        x = self.inception3(x)
        x = self.inception4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x

### Helpers

In [19]:
def evaluate(model, data_loader):
    loss = 0
    correct = 0
    loss_fn = nn.CrossEntropyLoss(
        reduction='sum',
    )

    with torch.no_grad():
        for inputs, labels in data_loader:
            inputs, labels = inputs.to(_device), labels.to(_device)

            outputs = model(inputs)
            loss += loss_fn(outputs, labels).item()
            pred = outputs.argmax(
                dim=1, keepdim=True
            )
            correct += (
                pred.eq(labels.view_as(pred)).sum().item()
            )

    loss /= len(data_loader.dataset)
    accuracy = correct / len(data_loader.dataset)
    return loss, accuracy

In [13]:
def train_model(model, data_loaders, num_epochs=10, learning_rate=0.001):
    model.to(_device)
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    loss_fn = nn.CrossEntropyLoss()

    for epoch in range(num_epochs):
        model.train()
        for inputs, labels in data_loaders['train']:
            inputs, labels = inputs.to(_device), labels.to(_device)
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = loss_fn(outputs, labels)
            loss.backward()
            optimizer.step()

        model.eval()
        train_loss, train_acc = evaluate(model, data_loaders['train'])
        valid_loss, valid_acc = evaluate(model, data_loaders['valid'])
        print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}, Valid Loss: {valid_loss:.4f}, Valid Acc: {valid_acc:.4f}")

    test_loss, test_acc = evaluate(model, data_loaders['test'])
    print(f"Test Loss: {test_loss:.4f}, Test Accuracy: {test_acc:.4f}")

### Train

In [17]:
resnet_model = ResNet(num_classes=102)
train_model(resnet_model, data_loaders, num_epochs=10, learning_rate=0.001)

Epoch 1/10, Train Loss: 4.8427, Train Acc: 0.0461, Valid Loss: 4.8738, Valid Acc: 0.0412
Epoch 2/10, Train Loss: 3.9634, Train Acc: 0.0775, Valid Loss: 4.2482, Valid Acc: 0.0520
Epoch 3/10, Train Loss: 3.8106, Train Acc: 0.0902, Valid Loss: 4.1405, Valid Acc: 0.0804
Epoch 4/10, Train Loss: 4.0354, Train Acc: 0.0873, Valid Loss: 4.4489, Valid Acc: 0.0716
Epoch 5/10, Train Loss: 3.3883, Train Acc: 0.1569, Valid Loss: 3.9904, Valid Acc: 0.0971
Epoch 6/10, Train Loss: 3.0000, Train Acc: 0.2088, Valid Loss: 3.5486, Valid Acc: 0.1637
Epoch 7/10, Train Loss: 3.1503, Train Acc: 0.1873, Valid Loss: 3.8723, Valid Acc: 0.1176
Epoch 8/10, Train Loss: 2.8294, Train Acc: 0.2667, Valid Loss: 3.6525, Valid Acc: 0.1755
Epoch 9/10, Train Loss: 2.5828, Train Acc: 0.2971, Valid Loss: 3.4127, Valid Acc: 0.2078
Epoch 10/10, Train Loss: 2.3524, Train Acc: 0.3480, Valid Loss: 3.2429, Valid Acc: 0.2422
Test Loss: 3.3430, Test Accuracy: 0.1947


In [20]:
inception_model = InceptionNet(num_classes=102)
train_model(inception_model, data_loaders, num_epochs=10, learning_rate=0.001)

Epoch 1/10, Train Loss: 4.6251, Train Acc: 0.0098, Valid Loss: 4.6251, Valid Acc: 0.0098
Epoch 2/10, Train Loss: 4.6003, Train Acc: 0.0196, Valid Loss: 4.6027, Valid Acc: 0.0186
Epoch 3/10, Train Loss: 4.2557, Train Acc: 0.0255, Valid Loss: 4.2838, Valid Acc: 0.0235
Epoch 4/10, Train Loss: 4.1342, Train Acc: 0.0275, Valid Loss: 4.2110, Valid Acc: 0.0235
Epoch 5/10, Train Loss: 4.1162, Train Acc: 0.0284, Valid Loss: 4.1860, Valid Acc: 0.0284
Epoch 6/10, Train Loss: 4.0429, Train Acc: 0.0500, Valid Loss: 4.1326, Valid Acc: 0.0500
Epoch 7/10, Train Loss: 3.9405, Train Acc: 0.0451, Valid Loss: 4.0723, Valid Acc: 0.0706
Epoch 8/10, Train Loss: 3.9040, Train Acc: 0.0608, Valid Loss: 4.0412, Valid Acc: 0.0500
Epoch 9/10, Train Loss: 3.8459, Train Acc: 0.0775, Valid Loss: 4.0131, Valid Acc: 0.0696
Epoch 10/10, Train Loss: 3.8168, Train Acc: 0.0676, Valid Loss: 4.0339, Valid Acc: 0.0657
Test Loss: 4.0369, Test Accuracy: 0.0550


## Wandb reports
https://wandb.ai/fejowo5522-/NN_list3_OxFlow/reportlist