## Sztuczne sieci neuronowe - laboratorium 8

### Splotowe sieci neuronowe - cz. 2

Na poprzednich zajęciach poznaliśmy warstwy tworzące **splotową sieć neuronową** i nauczyliśmy się tworzyć modele jako klasy  dziedziczące po `nn.Module`.

Dziś wytrenujemy splotową sieć neuronową do binarnej klasyfikacji obrazu i poznamy dodatkowe techniki stosowane w sieciach neuronowych, m.in. do regularyzacji modeli.

#### Pytania kontrolne

1. Opisz budowę splotowej sieci neuronowej. Wyjaśnij, do czego służą jej poszczególne warstwy.
2. Na czym polega regularyzacja modeli?
3. Jakie znasz metody regularyzacji stosowane w sieciach neuronowych?

### Z poprzednich ćwiczeń

Uruchom kolejne komórki, wykorzystujące kod z poprzednich zajęć, aby przygotować zbiór danych - `cifar2` oraz klasę `Net` definiującą model.

In [29]:
from torchvision import datasets, transforms
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import torch.optim as optim
from torch.utils.data import DataLoader

In [30]:
class_names = {
    0: "airplane", 
    1: "automobile", 
    2: "bird", 
    3: "cat", 
    4: "deer", 
    5: "dog", 
    6: "frog",
    7: "horse",
    8: "ship",
    9: "truck"
}

In [31]:
tensor_cifar10 = datasets.CIFAR10("data", train=True, download=True, transform=transforms.ToTensor())
tensor_cifar10_val = datasets.CIFAR10("data", train=False, download=True, transform=transforms.ToTensor())

Files already downloaded and verified
Files already downloaded and verified


In [32]:
imgs = torch.stack([img_t for img_t, _ in tensor_cifar10], dim=3)
per_channel_means = imgs.view(3, -1).mean(dim=1)
per_channel_std = imgs.view(3, -1).std(dim=1)

In [33]:
transforms_compose = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(per_channel_means, per_channel_std)
])

transformed_cifar10 = datasets.CIFAR10("data", train=True, download=False, transform=transforms_compose)
transformed_cifar10_val = datasets.CIFAR10("data", train=False, download=False, transform=transforms_compose)

In [34]:
label_map = {0: 0, 2: 1}
new_class_names  = [class_names[i] for i in label_map]

cifar2 = [(img, label_map[label]) for img, label in tensor_cifar10 if label in label_map]
cifar2_val = [(img, label_map[label]) for img, label in tensor_cifar10 if label in label_map]

In [35]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(16, 8, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(8 * 8 * 8, 32)
        self.fc2 = nn.Linear(32, 2)
    
    def forward(self, x):
        out = F.max_pool2d(torch.tanh(self.conv1(x)), 2)
        out = F.max_pool2d(torch.tanh(self.conv2(out)), 2)
        out = out.view(-1, 8 * 8 * 8)
        out = torch.tanh(self.fc1(out))
        out = self.fc2(out)
        return out

### Trening modelu

#### Ćwiczenie
Uzupełnij poniższe komórki, aby wytrenować splotową sieć neuronową do zadania klasyfikacji binarnej.

Przyjmij learning rate o wartości 0.01 i batch size 64. Trenuj przez 100 epok. Użyj optymaliatora SGD.

In [36]:
def training_loop(n_epochs, optimizer, model, loss_fn, train_loader):
    model.train()
    for epoch in range(1, n_epochs + 1):
        loss_train = 0.0
        for imgs, labels in train_loader:
            optimizer.zero_grad()
            outputs = model(imgs)
            loss = loss_fn(outputs, labels.long())
            loss.backward()
            optimizer.step()
            loss_train += loss.item()
            
        if epoch == 1 or epoch % 10 == 0:
            print(f"Epoch {epoch}, Training loss {loss_train}")

AttributeError: 'list' object has no attribute 'shape'

In [10]:
model = Net()
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
train_loader = DataLoader(cifar2, batch_size=64, shuffle=True)

training_loop(
    n_epochs = 100,
    optimizer = optimizer,
    model = model,
    loss_fn = loss_fn,
    train_loader = train_loader
)

Epoch 1, Training loss 104.98927986621857
Epoch 10, Training loss 69.9262126982212
Epoch 20, Training loss 55.61783264577389
Epoch 30, Training loss 51.27168031036854
Epoch 40, Training loss 47.724389150738716
Epoch 50, Training loss 45.0846983268857
Epoch 60, Training loss 42.50561612844467
Epoch 70, Training loss 40.05517219007015
Epoch 80, Training loss 37.757189989089966
Epoch 90, Training loss 35.95438706129789
Epoch 100, Training loss 33.81690967828035


### Walidacja modelu

#### Ćwiczenie

Zaimplementuj funkcję `validate`, która zmierzy dokładność wytrenowanego modelu na dwóch zbiorach - uczącym i walidacyjnym.

Porównaj wyniki z wynikami z laboratorium nr 5 (sieć gęsta).

In [37]:
train_loader = torch.utils.data.DataLoader(cifar2, batch_size=64, shuffle=False)
val_loader = torch.utils.data.DataLoader(cifar2_val, batch_size=64, shuffle=False)

In [38]:
def validate(model, train_loader, val_loader):
    model.eval()
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    for name, loader in [("train", train_loader), ("val", val_loader)]:
        correct = 0
        total = 0
        
        with torch.no_grad():
            for imgs, labels in loader:
                imgs = imgs.to(device=device, dtype=torch.float32)
                labels = labels.to(device=device)
                outputs = model(imgs).to(device=device, dtype=torch.float32)
                _, predicted = torch.max(outputs, 1)
                correct += (predicted == labels).sum().item()
                total += labels.size(0)
                
        print(f"{name} accuracy: {correct/total}")

In [13]:
validate(model, train_loader, val_loader)

train accuracy: 0.9126
val accuracy: 0.9126


### Zapis i odczyt modelu

Wytrenowany model (zwłaszcza tak, którego trening trwa długo) warto zapisać, aby móc go użyć później.

Typowo po każdej epoce treningu sprawdza się działanie modelu na zbiorze walidacyjnym (walidacja).
Zapisu modelu (tzw. "checkpoint") typowo dokonuje się, jeśli wartość danej metryki (np. dokładność lub F1 na zbiorze walidacyjnym) jest lepsza niż najlepsza uzyskana dotychczas.

PyTorch pozwala zapisać wagi (parametry) modelu z użyciem `torch.save` oraz tzw. `state_dict` modelu (https://pytorch.org/tutorials/recipes/recipes/what_is_state_dict.html). Innym sposobem zapisu jest zapis całego modelu (z użyciem `pickle` "pod spodem"):
https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_models_for_inference.html

Następnie, do wczytania zapisanego modelu można użyć metody `load_state_dict` (oraz `torch.load`), jeśli zapisywaliśmy tylko `state_dict` lub tylko `torch.load`, jeśli zapisywaliśmy cały model.

In [14]:
print(model.state_dict())

OrderedDict([('conv1.weight', tensor([[[[ 0.0606, -0.2273, -0.0732],
          [-0.0974, -0.2431, -0.1682],
          [ 0.0541, -0.2974, -0.1833]],

         [[-0.0880,  0.1890, -0.1167],
          [ 0.1557, -0.0647, -0.1885],
          [ 0.1671, -0.0270,  0.2232]],

         [[ 0.0949,  0.2364,  0.1374],
          [-0.1203,  0.2242,  0.1321],
          [ 0.2106, -0.0045,  0.2103]]],


        [[[-0.0051,  0.1984,  0.1453],
          [-0.2703,  0.1930, -0.0929],
          [ 0.0636,  0.2682,  0.2221]],

         [[ 0.0434,  0.0888,  0.0075],
          [-0.2599, -0.1527, -0.2117],
          [ 0.1145, -0.0514,  0.0505]],

         [[-0.0213,  0.0060, -0.1025],
          [-0.1100, -0.1763, -0.1820],
          [ 0.2429,  0.2124,  0.0694]]],


        [[[ 0.1587, -0.1347, -0.2053],
          [ 0.0011, -0.0315, -0.1452],
          [-0.1595,  0.1614,  0.0540]],

         [[-0.1597,  0.0350,  0.1157],
          [-0.2125, -0.1473, -0.2104],
          [-0.1489, -0.0438, -0.0429]],

         [[ 0.

In [15]:
torch.save(model.state_dict(), "data/birds_vs_airplanes.pt")

In [16]:
loaded_model = Net()
loaded_model.load_state_dict(torch.load("data/birds_vs_airplanes.pt"))

<All keys matched successfully>

### Trening na GPU (opcjonalnie)

Aby przyspieszyć trening (zwłaszcza w przypadku głębokich modeli i dużych zbiorów danych), powszechnie stosuje się karty graficzne (GPU). Jeśli mamy dostęp do maszyny z kartą graficzną (najlepiej od NVIDIA, obsługującą CUDA), możemy łatwo "przenieść" trening na GPU.

W tym celu należy przenieść zarówno dane, jak i model, na kartę graficzną, używając metody `.to` (tensora i `nn.Module`) na zdefiniowane urządzenie (patrz poniżej).

In [39]:
device = (torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu'))
print(f"Training on {device}")

Training on cuda


#### Ćwiczenie

Zmodyfikuj napisaną wyżej pętlę treningową oraz inicjalizację modelu, przenosząc odpowiednio dane (obrazki i etykiety) oraz model na `device`.

In [40]:
def training_loop(n_epochs, optimizer, model, loss_fn, train_loader):
    model.train()
    for epoch in range(1, n_epochs + 1):
        loss_train = 0.0
        for imgs, labels in train_loader:
            imgs = imgs.to(device)
            labels = labels.to(device)
            
            optimizer.zero_grad()
            outputs = model(imgs)
            loss = loss_fn(outputs, labels.long())
            loss.backward()
            optimizer.step()
            loss_train += loss.item()
        if epoch == 1 or epoch % 10 == 0:
            print(f"Epoch {epoch}, Training loss {loss_train / len(train_loader)}")

In [19]:
model_gpu = Net().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(model_gpu.parameters(), lr=0.01)
train_loader = DataLoader(cifar2, batch_size=64, shuffle=True)

training_loop(n_epochs=100, optimizer=optimizer, model=model_gpu, loss_fn=loss_fn, train_loader=train_loader)

Epoch 1, Training loss 0.6840003303661468
Epoch 10, Training loss 0.46557374440940325
Epoch 20, Training loss 0.35944290231367587
Epoch 30, Training loss 0.3267110019542609
Epoch 40, Training loss 0.2992238887366216
Epoch 50, Training loss 0.2801251609803765
Epoch 60, Training loss 0.26367165366555473
Epoch 70, Training loss 0.25008184714302134
Epoch 80, Training loss 0.2378904631563053
Epoch 90, Training loss 0.22749763893283856
Epoch 100, Training loss 0.21602441434552716


### Rozbudowa modelu

Możemy "powiększyć" model "na szerokość" (dodać więcej filtrów) lub "na głębokość" (dodać więcej warstw).

#### Ćwiczenie

Zmodyfikuj klasę `Net` i stwórz kolejno:
- `NetWidth` - 2x więcej filtrów w warstwach splotowych (niech liczba filtrów będzie argumentem konstruktora)
- `NetDepth` - dodatkowa warstwa splotowa `conv3`

Pamiętaj o zmodyfikowaniu metody `forward`.

In [41]:
class NetWidth(nn.Module):
    def __init__(self, n_chans1=32):
        super().__init__()
        self.n_chans = n_chans1
        self.conv1 = nn.Conv2d(3, n_chans1, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(n_chans1, n_chans1*2, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(8 * 8 * n_chans1*2, 32)
        self.fc2 = nn.Linear(32, 2)

    def forward(self, x):
        out = nn.functional.max_pool2d(torch.relu(self.conv1(x)), 2)
        out = nn.functional.max_pool2d(torch.relu(self.conv2(out)), 2)
        out = out.view(-1, 8 * 8 * self.n_chans*2)
        out = torch.relu(self.fc1(out))
        out = self.fc2(out)
        return out

In [51]:
class NetDepth(nn.Module):
    def __init__(self, n_chans1=32):
        super().__init__()
        self.n_chans = n_chans1
        self.conv1 = nn.Conv2d(3, n_chans1, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(n_chans1, n_chans1, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(n_chans1, n_chans1, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(4 * 4 * n_chans1, 32)
        self.fc2 = nn.Linear(32, 2)

    def forward(self, x):
        out = nn.functional.max_pool2d(torch.relu(self.conv1(x)), 2)
        out = nn.functional.max_pool2d(torch.relu(self.conv2(out)), 2)
        out = nn.functional.max_pool2d(torch.relu(self.conv3(out)), 2)
        out = out.view(-1, 4 * 4 * self.n_chans)
        out = torch.relu(self.fc1(out))
        out = self.fc2(out)
        return out

### Regularyzacja L2 / weight decay

Regularyzację L2 można zaimplementować samemu (jak niżej). 

Jest ona jednak wbudowana w `torch.optim`, np. https://pytorch.org/docs/stable/optim.html#torch.optim.SGD, gdzie wystarczy podać wartość `weight_decay` tworząc optymalizator.

#### Ćwiczenie

W poniższej pętli treningowej dopisz fragment realizujący regularyzację L2 dla lambda = 0.001.

In [43]:
def training_loop_l2reg(n_epochs, optimizer, model, loss_fn, train_loader):
    for epoch in range(1, n_epochs + 1):
        loss_train = 0.0
        for imgs, labels in train_loader:
            imgs = imgs.to(device=device, dtype=torch.float32)
            labels = labels.to(device=device)
            outputs = model(imgs)
            loss = loss_fn(outputs, labels)
                        
            l2_lambda = 0.001
            for param in model.parameters():
                loss += l2_lambda * torch.norm(param)**2

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            loss_train += loss.item()
            
        if epoch == 1 or epoch % 10 == 0:
            print(f"Epoch {epoch}, Training loss {loss_train}")

In [23]:
train_loader = torch.utils.data.DataLoader(cifar2, batch_size=64, shuffle=True)

model = Net().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)
loss_fn = nn.CrossEntropyLoss()

training_loop_l2reg(
    n_epochs = 100,
    optimizer = optimizer,
    model = model,
    loss_fn = loss_fn,
    train_loader = train_loader
)

Epoch 1, Training loss 109.95175862312317
Epoch 10, Training loss 71.605403393507
Epoch 20, Training loss 59.85934379696846
Epoch 30, Training loss 56.33640371263027
Epoch 40, Training loss 53.65350362658501
Epoch 50, Training loss 51.90621429681778
Epoch 60, Training loss 50.32947988808155
Epoch 70, Training loss 48.58898697793484
Epoch 80, Training loss 47.131951063871384
Epoch 90, Training loss 45.31257829070091
Epoch 100, Training loss 43.84178914129734


#### Ćwiczenie

Wywołaj "zwykłą" pętlę treningową, tym razem podająć `weight_decay` optyamlizatora równe 0.001. Zaobserwuj wpływ na funkcję straty.

In [24]:
train_loader = torch.utils.data.DataLoader(cifar2, batch_size=64, shuffle=True)

model = Net().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2, weight_decay=0.001)
loss_fn = nn.CrossEntropyLoss()

def training_loop_l2reg_weight_decay(n_epochs, optimizer, model, loss_fn, train_loader):
    for epoch in range(1, n_epochs + 1):
        loss_train = 0.0
        for imgs, labels in train_loader:
            imgs = imgs.to(device=device, dtype=torch.float32)
            labels = labels.to(device=device)
            outputs = model(imgs)
            loss = loss_fn(outputs, labels)
                        
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            loss_train += loss.item()
            
        if epoch == 1 or epoch % 10 == 0:
            print(f"Epoch {epoch}, Training loss {loss_train}")

training_loop_l2reg_weight_decay(
    n_epochs = 100,
    optimizer = optimizer,
    model = model,
    loss_fn = loss_fn,
    train_loader = train_loader
)

Epoch 1, Training loss 106.9676085114479
Epoch 10, Training loss 69.2920361161232
Epoch 20, Training loss 56.69583120942116
Epoch 30, Training loss 53.437835335731506
Epoch 40, Training loss 50.53909331560135
Epoch 50, Training loss 48.41050708293915
Epoch 60, Training loss 45.966328859329224
Epoch 70, Training loss 43.805709198117256
Epoch 80, Training loss 41.343967974185944
Epoch 90, Training loss 39.20707976818085
Epoch 100, Training loss 37.02289388328791


### Dropout

https://pytorch.org/docs/stable/generated/torch.nn.Dropout2d.html

#### Ćwiczenie
Dołóż warstwy `nn.Dropout2d` po warstwach splotowych (po max poolingu) do sieci `Net` i stwórz w ten sposob `NetDropout`.

In [45]:
class NetDropout(nn.Module):
    def __init__(self, n_chans1=32):
        super().__init__()
        self.n_chans = n_chans1
        self.conv1 = nn.Conv2d(3, n_chans1, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(n_chans1, n_chans1, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(8 * 8 * n_chans1, 32)
        self.fc2 = nn.Linear(32, 2)
        self.dropout = nn.Dropout2d(p=0.5)

    def forward(self, x):
        out = nn.functional.max_pool2d(torch.relu(self.conv1(x)), 2)
        out = self.dropout(out)
        out = nn.functional.max_pool2d(torch.relu(self.conv2(out)), 2)
        out = self.dropout(out)
        out = out.view(-1, 8 * 8 * self.n_chans)
        out = torch.relu(self.fc1(out))
        out = self.fc2(out)
        return out

### Batch normalization

https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html

#### Ćwiczenie

Dodaj warstwy `BatchNorm2d` po warstwach splotowych `Net` tworząc w ten sposób `NetBatchNorm`.

In [46]:
class NetBatchNorm(nn.Module):
    def __init__(self, n_chans1=32):
        super().__init__()        
        self.n_chans = n_chans1
        self.conv1 = nn.Conv2d(3, n_chans1, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(n_chans1)
        self.conv2 = nn.Conv2d(n_chans1, n_chans1, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(n_chans1)
        self.fc1 = nn.Linear(8 * 8 * n_chans1, 32)
        self.fc2 = nn.Linear(32, 2)

    def forward(self, x):
        out = nn.functional.max_pool2d(torch.relu(self.bn1(self.conv1(x))), 2)
        out = nn.functional.max_pool2d(torch.relu(self.bn2(self.conv2(out))), 2)
        out = out.view(-1, 8 * 8 * self.n_chans)
        out = torch.relu(self.fc1(out))
        out = self.fc2(out)
        return out

#### Ćwiczenie (zadanie domowe)

Porównaj wyniki uzyskane przez każdą z zaimplementowanych klas na zbiorze uczącym i walidacyjnym.

In [52]:
train_loader = torch.utils.data.DataLoader(cifar2, batch_size=64, shuffle=True)
loss_fn = nn.CrossEntropyLoss()
train_loader = torch.utils.data.DataLoader(cifar2, batch_size=64, shuffle=False)
val_loader = torch.utils.data.DataLoader(cifar2_val, batch_size=64, shuffle=False)

model = Net().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)
training_loop_l2reg(
    n_epochs = 100,
    optimizer = optimizer,
    model = model,
    loss_fn = loss_fn,
    train_loader = train_loader
)
print(f"Walidacja modelu: {model.__class__.__name__}")
validate(model, train_loader, val_loader)


model_NetWidth= NetWidth().to(device)
optimizer = torch.optim.SGD(model_NetWidth.parameters(), lr=1e-2)
training_loop_l2reg(
    n_epochs = 100,
    optimizer = optimizer,
    model = model_NetWidth,
    loss_fn = loss_fn,
    train_loader = train_loader
)
print(f"Walidacja modelu: {model_NetWidth.__class__.__name__}")
validate(model_NetWidth, train_loader, val_loader)


model_NetDepth = NetDepth().to(device)
optimizer = torch.optim.SGD(model_NetDepth.parameters(), lr=1e-2)
training_loop_l2reg(
    n_epochs = 100,
    optimizer = optimizer,
    model = model_NetDepth,
    loss_fn = loss_fn,
    train_loader = train_loader
)
print(f"Walidacja modelu: {model_NetDepth.__class__.__name__}")
validate(model_NetDepth, train_loader, val_loader)


model_NetDropout = NetDropout().to(device)
optimizer = torch.optim.SGD(model_NetDropout.parameters(), lr=1e-2)
training_loop_l2reg(
    n_epochs = 100,
    optimizer = optimizer,
    model = model_NetDropout,
    loss_fn = loss_fn,
    train_loader = train_loader
)
print(f"Walidacja modelu: {model_NetDropout.__class__.__name__}")
validate(model_NetDropout, train_loader, val_loader)

model_NetBatchNorm = NetBatchNorm().to(device)
optimizer = torch.optim.SGD(model_NetBatchNorm.parameters(), lr=1e-2)
training_loop_l2reg(
    n_epochs = 100,
    optimizer = optimizer,
    model = model_NetBatchNorm,
    loss_fn = loss_fn,
    train_loader = train_loader
)
print(f"Walidacja modelu: {model_NetBatchNorm.__class__.__name__}")
validate(model_NetBatchNorm, train_loader, val_loader)

Epoch 1, Training loss 109.94240826368332
Epoch 10, Training loss 72.01702234148979
Epoch 20, Training loss 60.403788566589355
Epoch 30, Training loss 57.19509010016918
Epoch 40, Training loss 54.897967368364334
Epoch 50, Training loss 52.99198879301548
Epoch 60, Training loss 51.31756116449833
Epoch 70, Training loss 49.67265437543392
Epoch 80, Training loss 48.047274589538574
Epoch 90, Training loss 46.43214137852192
Epoch 100, Training loss 44.91967725753784
Walidacja modelu: Net
train accuracy: 0.8862
val accuracy: 0.8862
Epoch 1, Training loss 113.74421441555023
Epoch 10, Training loss 77.03097426891327
Epoch 20, Training loss 61.95955556631088
Epoch 30, Training loss 56.88470958173275
Epoch 40, Training loss 53.255750462412834
Epoch 50, Training loss 49.844123497605324
Epoch 60, Training loss 46.520743668079376
Epoch 70, Training loss 43.49854123592377
Epoch 80, Training loss 40.889246091246605
Epoch 90, Training loss 38.62727456539869
Epoch 100, Training loss 36.575046576559544


In [50]:
model_NetDepth = NetDepth().to(device)
optimizer = torch.optim.SGD(model_NetDepth.parameters(), lr=1e-2)
training_loop_l2reg(
    n_epochs = 100,
    optimizer = optimizer,
    model = model_NetDepth,
    loss_fn = loss_fn,
    train_loader = train_loader
)
print(f"Walidacja modelu: {model_NetDepth.__class__.__name__}")
validate(model_NetDepth, train_loader, val_loader)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x512 and 2048x32)