# CIFAR-10 Darknet

In [1]:
%matplotlib inline
%reload_ext autoreload
%autoreload 2

## Data Download

CIFAR-10 dataset website: https://www.cs.toronto.edu/~kriz/cifar.html

Direct download link for CIFAR-10 dataset in PNG format (instead of raw Python pickle format): http://files.fast.ai/data/cifar10.tgz

In [37]:
!aria2c --file-allocation=none -c -x 5 -s 5 -d data/ http://files.fast.ai/data/cifar10.tgz

[#380821 144MiB/160MiB(90%) CN:3 DL:[32m22MiB[0m][0m                        
07/06 16:19:14 [[1;32mNOTICE[0m] Download complete: data//cifar10.tgz

Download Results:
gid   |stat|avg speed  |path/URI
380821|[1;32mOK[0m  |    22MiB/s|data//cifar10.tgz

Status Legend:
(OK):download completed.


In [38]:
!tar -xzf data/cifar10.tgz --directory data/

**Setup directory and file paths**

In [25]:
from fastai.conv_learner import *

PATH = Path('data/cifar10/')
os.makedirs(PATH, exist_ok=True)
# torch.cuda.set_device(0)

Compute CIFAR10 dataset stats

In [43]:
import torchvision.transforms as transforms

# ToTensor() converts image, whose elements ar in the range 0-255 to 0-1
train_transform = transforms.Compose([transforms.ToTensor()])
train_set = torchvision.datasets.CIFAR10(root='./cifar10', train=True, download=True, transform=train_transform)
# train_set.train_data returns numpy ndarray
# train_set.train_data.shape returns (50000, 32, 32, 3)
print(train_set.train_data.mean(axis=(0, 1, 2)) / 255)
print(train_set.train_data.std(axis=(0, 1, 2)) / 255)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./cifar10/cifar-10-python.tar.gz
[0.4914  0.48216 0.44653]
[0.24703 0.24349 0.26159]


In [45]:
%rm -rf cifar10/

Build a network from scratch

In [31]:
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

# these numbers are the averages and standard deviations for each channel in CIFAR10
stats = (np.array([ 0.4914 ,  0.48216,  0.44653]), np.array([ 0.24703,  0.24349,  0.26159]))

num_workers = num_cpus() // 2 # num cpus returns 4
bs = 256
sz = 32

In [40]:
tfms = tfms_from_stats(stats, sz, aug_tfms=[RandomFlip()], pad=sz // 8)
data = ImageClassifierData.from_paths(PATH, val_name='test', tfms=tfms, bs=bs)

Architecture

In [56]:
def conv_layer(ni, nf, ks=3, stride=1):
    return nn.Sequential(
        nn.Conv2d(in_channels=ni, out_channels=nf, kernel_size=ks, bias=False, stride=stride, padding=ks // 2),
        nn.BatchNorm2d(num_features=nf, momentum=0.01),
        nn.LeakyReLU(negative_slope=0.1, inplace=True)
    )

In [57]:
class ResLayer(nn.Module):
    def __init__(self, ni):
        super().__init__()
        self.conv1 = conv_layer(ni, ni // 2, ks=1)
        self.conv2 = conv_layer(ni // 2, ni, ks=3)
        
    def forward(self, x):
        return x.add_(self.conv2(self.conv1(x)))

In [59]:
class Darknet(nn.Module):
    def make_group_layer(self, ch_in, num_blocks, stride=1):
        return [conv_layer(ch_in, ch_in * 2, stride=stride)
               ] + [(ResLayer(ch_in * 2)) for i in range(num_blocks)]
    
    def __init__(self, num_blocks, num_classes, nf=32):
        super().__init__()
        layers = [conv_layer(3, nf, ks=3, stride=1)]
        for i, nb in enumerate(num_blocks):
            layers += self.make_group_layer(nf, nb, stride=2 - (i == 1))
            nf *= 2
        layers += [nn.AdaptiveAvgPool2d(1), Flatten(), nn.Linear(nf, num_classes)]
        self.layers = nn.Sequential(*layers)

    def forward(self, x):
        return self.layers(x)

**Define `Darknet`**

Create 5 group layers: the first one will contain 1 extra ResLayer, the second will contain 2, then 4, 6, 3 and we want to start with 32 filters. The first one of ResLayers will contain 32 filters, and there’ll just be one extra ResLayer. The second one, it’s going to double the number of filters because that’s what we do each time we have a new group layer. So the second one will have 64, and then 128, 256, 512 and that’ll be it. Nearly all of the network is going to be those bunches of layers and remember, every one of those group layers also has one convolution at the start. So then all we have is before that all happens, we are going to have one convolutional layer at the very start, and at the very end we are going to do our standard adaptive average pooling, flatten, and a linear layer to create the number of classes out at the end.

In [60]:
m = Darknet([1, 2, 4, 6, 3], num_classes=10, nf=32)
m = nn.DataParallel(m, [1, 2, 3])