# Classification of bird or airplane, by conv.

In [15]:
import torch
import torch.nn as nn

## Dataset
We continue use the same cifar2 set which used in last fully connected model.
And normalize data and remap.

In [3]:
from torchvision import datasets, transforms
# load CIFAR10
data_path = '../data-unversioned/p1ch6/'
cifar10 = datasets.CIFAR10(
    data_path, train=True, download=True,
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.4915, 0.4823, 0.4468),
                             (0.2470, 0.2435, 0.2616))
    ]))
cifar10_val = datasets.CIFAR10(
    data_path, train=False, download=True,
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.4915, 0.4823, 0.4468),
                             (0.2470, 0.2435, 0.2616))
    ]))
# abstract birds and plane as CIFAR2
label_map = {0: 0, 2: 1}
class_names = ['airplane', 'bird']
cifar2 = [(img, label_map[label])
          for img, label in cifar10
          if label in [0, 2]]
cifar2_val = [(img, label_map[label])
              for img, label in cifar10_val
              if label in [0, 2]]

Using downloaded and verified file: ../data-unversioned/p1ch6/cifar-10-python.tar.gz
Extracting ../data-unversioned/p1ch6/cifar-10-python.tar.gz to ../data-unversioned/p1ch6/
Files already downloaded and verified


## Convolution
The last model we used is a fully connected model. The model will remember the position of bird or airplane, so the model don't have generalization!

At this time, we can use the convolution, it has 2 benefits:
- convolution will calculate in who image, it will ignore the position of target.
- convolution kernal will reduce the parameters in model than fully connected model.

In [4]:
# create conv kernel, assume kernel in every channel(RBG) have same size.
conv = nn.Conv2d(3, 16, kernel_size=3)
img, _ = cifar2[0]
output = conv(img.unsqueeze(0))
img.unsqueeze(0).shape, output.shape

(torch.Size([1, 3, 32, 32]), torch.Size([1, 16, 30, 30]))

We find that the result of convolution kernel shrink the tensor size (because of kernel size 3, 3//2 = 1, missing 1 pixel on each side).
For two resons we using padding to keep the result of convolution kernel's size: 1. the shrinked result will confuse us; 2. in resnet, keep in same size is very important.

In [5]:
# padding to keep size.
conv = nn.Conv2d(3, 16, kernel_size=3, padding=1)
img, _ = cifar2[0]
output = conv(img.unsqueeze(0))
img.unsqueeze(0).shape, output.shape

(torch.Size([1, 3, 32, 32]), torch.Size([1, 16, 32, 32]))

Using padding technology, we can keep the size of tensor after convolution.
Now we can use the Max pool to reduce the size of picture, to make model can 'see' more range, which provide layer to learn holistic characteristics of target.

## Max pooling
There are serval pooling methods, like average pooling or max pooling. And nowadays, researchers more like the max pooling, because of they can keep some features from the previous layer. 

In [7]:
# max pool
pool = nn.MaxPool2d(2) # decimate in x2
output = pool(img.unsqueeze(0))
img.unsqueeze(0).shape, output.shape

(torch.Size([1, 3, 32, 32]), torch.Size([1, 3, 16, 16]))

## Construct new convolution model
Now, we can use the conv kernel introducted before, to construct new model. We can just replace input layer of fully connected model.

In [10]:
model = nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=3, padding=1),
            nn.Tanh(),
            nn.MaxPool2d(2),
            nn.Conv2d(16, 8, kernel_size=3, padding=1),
            nn.Tanh(),
            nn.MaxPool2d(2),
            # ... convert size
            nn.Linear(8*8*8, 32),
            nn.Tanh(),
            nn.Linear(32, 2),
        )

Here we find in the above model, there is missing a gadget to 'flaten' convolution layer output to fit the linear layer input. So we introduce the nn.Module to upgrade the nn.Sequential.

In [12]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)
        self.act1 = nn.Tanh()
        self.pool1 = nn.MaxPool2d(2)
        self.conv2 = nn.Conv2d(16, 8, kernel_size=3, padding=1)
        self.act2 = nn.Tanh()
        self.pool2 = nn.MaxPool2d(2)
        self.fc1 = nn.Linear(8*8*8, 32)
        self.act3 = nn.Tanh()
        self.fc2 = nn.Linear(32, 2)

    def forward(self, x):
        out = self.pool1(self.act1(self.conv1(x)))
        out = self.pool2(self.act2(self.conv2(out)))
        out = out.view(-1, 8*8*8)
        out = self.act3(self.fc1(out))
        out = self.fc2(out)
        return out

Checkout our net defined before.

In [13]:
model = Net()
model(img.unsqueeze(0))

tensor([[-0.1087, -0.0359]], grad_fn=<AddmmBackward0>)

## Training model


In [17]:
import datetime

def training_loop(n_epochs, optimizer, model, loss_fn, train_loader):
    for epoch in range(1, n_epochs+1):
        loss_train = 0.0
        for imgs, labels in train_loader:
            outputs = model(imgs)
            loss = loss_fn(outputs, labels)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            loss_train += loss.item()
        if epoch == 1 or epoch % 10 == 0:
            print(f'{datetime.datetime.now()} Epoch {epoch}, Training loss {loss_train/len(train_loader)}')

In [18]:
train_loader = torch.utils.data.DataLoader(cifar2, batch_size=64, shuffle=True)
model = Net()
opt = torch.optim.SGD(model.parameters(), lr=1e-2)
loss_fn = nn.CrossEntropyLoss()

training_loop(n_epochs=100, optimizer=opt, model=model, loss_fn=loss_fn, train_loader=train_loader)

2024-02-24 15:23:58.789192 Epoch 1, Training loss 0.6127399350427518
2024-02-24 15:24:20.832954 Epoch 10, Training loss 0.34056056029857346
2024-02-24 15:24:45.468617 Epoch 20, Training loss 0.29784967081182323
2024-02-24 15:25:10.152612 Epoch 30, Training loss 0.27266352139650635
2024-02-24 15:25:34.774068 Epoch 40, Training loss 0.25076683719826354
2024-02-24 15:25:59.407091 Epoch 50, Training loss 0.23074330688472006
2024-02-24 15:26:24.190240 Epoch 60, Training loss 0.21134370423046647
2024-02-24 15:26:48.894683 Epoch 70, Training loss 0.19650284015828637
2024-02-24 15:27:13.608154 Epoch 80, Training loss 0.17896622082420216
2024-02-24 15:27:38.331984 Epoch 90, Training loss 0.16655958785562758
2024-02-24 15:28:03.086416 Epoch 100, Training loss 0.15171305562375456


## Validate model


In [19]:
# Just validate model on train and validate dataset
train_loader = torch.utils.data.DataLoader(cifar2, batch_size=64, shuffle=False)
val_loader = torch.utils.data.DataLoader(cifar2_val, batch_size=64, shuffle=False)

def validate(model, train_loader, val_loader):
    for name, loader in [("train", train_loader), ("val", val_loader)]:
        correct = 0
        total = 0

        with torch.no_grad():  # <1>
            for imgs, labels in loader:
                outputs = model(imgs)
                _, predicted = torch.max(outputs, dim=1) # <2>
                total += labels.shape[0]  # <3>
                correct += int((predicted == labels).sum())  # <4>

        print("Accuracy {}: {:.2f}".format(name , correct / total))

In [20]:
validate(model, train_loader, val_loader)

Accuracy train: 0.90
Accuracy val: 0.84


## Save and load trained model parameters

```
# save
torch.save(model.state_dict(), data_path + 'birds_vs_airplanes.pt')

# load
loaded_model = Net()  # <1>
loaded_model.load_state_dict(torch.load(data_path + 'birds_vs_airplanes.pt'))
```