## Image Classification（CIFAR-10)

This notebook explores the image classification task on Kaggle called [CIFAR-10](https://www.kaggle.com/c/cifar-10). For practice purpose, I only use the mini dataset of the original dataset and a simple [ResNet-18 model](https://arxiv.org/abs/1512.03385) based on Kaiming He's ["Deep Residual Learning for Image Recognition"](https://arxiv.org/abs/1512.03385)

In [2]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import os
import time

In [3]:
print("PyTorch Version: ",torch.__version__)

PyTorch Version:  1.1.0


## Load dataset

Dataset is divided to training set and test set. Training set contains 50,000 images. Test set contains 300,000 images. All PNG format with 32\*32 pixel, and 3 color channels RGB. The original dataset's images cover 10 categories: plane, car, bird, cat, deer, dog, frog, horse, ship and truck.

To save the time and costs, this notebook only uses the mini set of the original dataset with 80 training samples, and 100 test samples under "train_tiny" and "test_tiny".

### Image Augmentation

In [4]:
data_transform = transforms.Compose([
    transforms.Resize(40),
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32),
    transforms.ToTensor()
])
trainset = torchvision.datasets.ImageFolder(root='/home/kesci/input/CIFAR102891/cifar-10/train'
                                            , transform=data_transform)

In [5]:
trainset[0][0].shape

torch.Size([3, 32, 32])

In [6]:
data = [d[0].data.cpu().numpy() for d in trainset]
np.mean(data)

0.46735394

In [7]:
np.std(data)

0.23921667

In [8]:
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),  # pad first then crop to 32*32
    transforms.RandomHorizontalFlip(),  # half flip, half do not
    transforms.ToTensor(),
    transforms.Normalize((0.4731, 0.4822, 0.4465), (0.2212, 0.1994, 0.2010)), # mean&std for standardization in R G B channel
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4731, 0.4822, 0.4465), (0.2212, 0.1994, 0.2010)),
])

In [9]:
train_dir = '/home/kesci/input/CIFAR102891/cifar-10/train'
test_dir = '/home/kesci/input/CIFAR102891/cifar-10/test'

trainset = torchvision.datasets.ImageFolder(root=train_dir, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=256, shuffle=True)

testset = torchvision.datasets.ImageFolder(root=test_dir, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size=256, shuffle=False)

classes = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'forg', 'horse', 'ship', 'truck']

## Define Model

I am gonna use ResNet-18 model in this project. The ResNet or Residual Network is developed in Kaiming He's "Deep Residual Learning for Image Recognition" 


![Image Name](https://cdn.kesci.com/upload/image/q5x9kusfpk.png?imageView2/0/w/960/h/960)

In [10]:
class ResidualBlock(nn.Module):  # build ResNet using torch.nn.Module

    def __init__(self, inchannel, outchannel, stride=1):
        super(ResidualBlock, self).__init__()
        
        self.left = nn.Sequential( # use Sequential() to add layers based on above structure
            nn.Conv2d(inchannel, outchannel, kernel_size=3, stride=stride, padding=1, bias=False), # first conv layer
            nn.BatchNorm2d(outchannel), # batch norm for conv layer output
            nn.ReLU(inplace=True), # activation use ReLU
            nn.Conv2d(outchannel, outchannel, kernel_size=3, stride=1, padding=1, bias=False),
            nn.BatchNorm2d(outchannel) # batch norm for 2nd conv layer
        )
        
        self.shortcut = nn.Sequential() # for simple and quick use in model building 
        if stride != 1 or inchannel != outchannel: # check if Y = self.left(X) has the same shape as X
            self.shortcut = nn.Sequential( 
                nn.Conv2d(inchannel, outchannel, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(outchannel)
            )

    def forward(self, x): # stack two blocks，add ReLu in the end
        out = self.left(x)
        out += self.shortcut(x)
        out = F.relu(out)
        return out

class ResNet(nn.Module):
    
    def __init__(self, ResidualBlock, num_classes=10):
        super(ResNet, self).__init__()
        self.inchannel = 64
        self.conv1 = nn.Sequential( # use three 3x3 kernel size instead of 7x7, reduce params
            nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(),
        ) 
        self.layer1 = self.make_layer(ResidualBlock, 64,  2, stride=1)
        self.layer2 = self.make_layer(ResidualBlock, 128, 2, stride=2)
        self.layer3 = self.make_layer(ResidualBlock, 256, 2, stride=2)
        self.layer4 = self.make_layer(ResidualBlock, 512, 2, stride=2)
        self.fc = nn.Linear(512, num_classes)

    def make_layer(self, block, channels, num_blocks, stride):
        strides = [stride] + [1] * (num_blocks - 1)  # 1st Res Block stride is from make_layer fn's param: stride
        # the rest (num_blocks-1) Res Block has stride: 1
        layers = []
        for stride in strides: # loop to add blocks
            layers.append(block(self.inchannel, channels, stride))
            self.inchannel = channels
            
        return nn.Sequential(*layers)

    def forward(self, x):
        out = self.conv1(x)
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.fc(out)
        return out

def ResNet18():
    return ResNet(ResidualBlock)

## Training and Testing

In [11]:
# mini dataset use cpu
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# define hyperparams
EPOCH = 20  
pre_epoch = 0  
LR = 0.1       

net = ResNet18().to(device)

# define loss function and optimizer
criterion = nn.CrossEntropyLoss()  # cross entropy for classification
optimizer = optim.SGD(net.parameters(), lr=LR, momentum=0.9, weight_decay=5e-4) 
# use mini-batch momentum-SGD and L2 regularization

# train model
if __name__ == "__main__":
    print("Start Training, Resnet-18!")
    
    num_iters = 0
    for epoch in range(pre_epoch, EPOCH):
        print('\nEpoch: %d' % (epoch + 1))
        net.train() # tell BatchNorm in traning mode
        sum_loss = 0.0
        correct = 0.0
        total = 0
        for i, data in enumerate(trainloader, 0): 
            num_iters += 1
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)
            optimizer.zero_grad()  # reset grad to zero

            # forward + backward
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            sum_loss += loss.item() * labels.size(0)
            _, predicted = torch.max(outputs, 1) # use max value in output as prediction
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            
            # print loss and accuracy
            if i/20 == 0:
                print('[epoch:%d, iter:%d] Loss: %.03f | Acc: %.3f%% '
                    % (epoch + 1, num_iters, sum_loss / (i + 1), 100. * correct / total))

        print("Training Finished, TotalEPOCH=%d" % EPOCH)
    
        # print("Waiting Test!")
        # with torch.no_grad():
        #     correct = 0
        #     total = 0
        #     for data in testloader:
        #         net.eval() # tell BatchNorm in eval mode
        #         images, labels = data
        #         images, labels = images.to(device), labels.to(device)
        #         outputs = net(images)
                
        #         _, predicted = torch.maxtorch.max(outputs.data, 1)

Start Training, Resnet-18!

Epoch: 1
[epoch:1, iter:1] Loss: 596.250 | Acc: 9.766% 
Training Finished, TotalEPOCH=20

Epoch: 2
[epoch:2, iter:177] Loss: 344.395 | Acc: 51.562% 
Training Finished, TotalEPOCH=20

Epoch: 3
[epoch:3, iter:353] Loss: 264.417 | Acc: 65.625% 
Training Finished, TotalEPOCH=20

Epoch: 4
[epoch:4, iter:529] Loss: 231.419 | Acc: 69.141% 
Training Finished, TotalEPOCH=20

Epoch: 5
[epoch:5, iter:705] Loss: 197.945 | Acc: 76.953% 
Training Finished, TotalEPOCH=20

Epoch: 6
[epoch:6, iter:881] Loss: 132.735 | Acc: 82.422% 
Training Finished, TotalEPOCH=20

Epoch: 7
[epoch:7, iter:1057] Loss: 141.950 | Acc: 81.250% 
Training Finished, TotalEPOCH=20

Epoch: 8
[epoch:8, iter:1233] Loss: 136.229 | Acc: 81.641% 
Training Finished, TotalEPOCH=20

Epoch: 9
[epoch:9, iter:1409] Loss: 132.046 | Acc: 78.906% 
Training Finished, TotalEPOCH=20

Epoch: 10
[epoch:10, iter:1585] Loss: 97.200 | Acc: 85.938% 
Training Finished, TotalEPOCH=20

Epoch: 11
[epoch:11, iter:1761] Loss: 94

In [12]:
## we can clearly see that the loss decreased and accuracy imrpoved as training more epochs