In [2]:
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

In [3]:
from fastbook import *

Imagenette is a subset of ImageNet consisting of images from 10 categories as opposed to the full 1000 categories of ImageNet. Enables faster iteration, so you can try out more ideas.

In [4]:
from fastai.vision.all import *
path = untar_data(URLs.IMAGENETTE)

In [5]:
path.ls()

(#2) [Path('/storage/data/imagenette2/val'),Path('/storage/data/imagenette2/train')]

In [6]:
Path.BASE_PATH = path

In [7]:
path.ls()

(#2) [Path('val'),Path('train')]

## Baseline

In [8]:
datablock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                      get_items=get_image_files,
                      get_y=parent_label,
                      item_tfms=Resize(460),
                      batch_tfms=aug_transforms(size=224, min_scale=0.75))
dataloaders = datablock.dataloaders(path, bs=64)

Train a new resnet model. This will be our baseline. Note that the `xresnet50` model is not pretrained, so should not use `fine_tune` method.

In [9]:
model = xresnet50(n_out=dataloaders.c)
learner = Learner(dataloaders, model, loss_func=CrossEntropyLossFlat(), metrics=accuracy)
learner.fit_one_cycle(5, 3e-3)

epoch,train_loss,valid_loss,accuracy,time
0,1.55746,2.346827,0.416729,02:19
1,1.212875,2.598495,0.447722,02:17
2,0.942469,1.197255,0.629948,02:17
3,0.718531,0.66855,0.790889,02:18
4,0.57421,0.567909,0.820388,02:17


### Normalization

In [10]:
xb, yb = dataloaders.one_batch()
xb.mean(dim=[0, 2, 3]), xb.std(dim=[0, 2, 3])

(TensorImage([0.4898, 0.4815, 0.4437], device='cuda:0'),
 TensorImage([0.2862, 0.2812, 0.3122], device='cuda:0'))

Use the `Normalize` transform and pass it the mean and standard deviation of your data so it knows how to normalize it to the standard normal distribution (mean 0, std 1). Fastai stores the standard ImageNet mean and std in `imagenet_stats`, which is okay to use for the Imagenette subset. If you don't pass any statistics to the `Normalize` tranform, fastai will automatically calculate them from a single batch of your data.

In [11]:
def get_dataloaders(bs, aug_size):
    db = DataBlock(blocks=(ImageBlock, CategoryBlock), 
                   get_items=get_image_files, 
                   get_y=parent_label, 
                   item_tfms=Resize(460), 
                   batch_tfms=[*aug_transforms(size=aug_size, min_scale=0.75),
                               Normalize.from_stats(*imagenet_stats)]
                  )
    return db.dataloaders(path, bs=bs)

In [12]:
dataloaders = get_dataloaders(64, 224)

In [13]:
xb, yb = dataloaders.one_batch()
xb.mean(dim=[0, 2, 3]), xb.std(dim=[0, 2, 3])

(TensorImage([-0.0057,  0.0638,  0.1516], device='cuda:0'),
 TensorImage([1.2392, 1.2368, 1.3517], device='cuda:0'))

Let's see what effect this had on training.

In [15]:
model = xresnet50(n_out=dataloaders.c)
learner = Learner(dataloaders, model, loss_func=CrossEntropyLossFlat(), metrics=accuracy)
learner.fit_one_cycle(5, 3e-3)

epoch,train_loss,valid_loss,accuracy,time
0,1.571293,2.165836,0.398432,02:18
1,1.255483,1.439283,0.576176,02:18
2,0.970735,0.904502,0.714713,02:18
3,0.717651,0.785193,0.752427,02:18
4,0.599995,0.602664,0.81068,02:18


The effects of normalization are not so apparent here, but when fine-tuning pretrained models it is important to normalize the new data to the same statistics as the old data(??????)

### Progressive Resizing

Start by training the model with zoomed-in regions of images. This can be considered pretraining the model prior to fine-tuning with zoomed-out images.

In [None]:
dataloaders = get_dataloaders(128, 128) # Arguments are bs, aug_size
learner = Learner(dataloaders, xresnet50(n_out=dataloaders.c), loss_func=CrossEntropyLossFlat(), metrics=accuracy)
learner.fit_one_cycle(4, 3e-3)

epoch,train_loss,valid_loss,accuracy,time
0,1.622354,2.205114,0.449216,00:55
1,1.263802,2.014621,0.437267,00:54


Now replace the `DataLoaders` inside the `Learner` with zoomed-out images, and fine-tune the pretrained model.

In [18]:
learner.dataloaders = get_dataloaders(64, 224) # Arguments are bs, aug_size
learner.fine_tune(5, 1e-3)

epoch,train_loss,valid_loss,accuracy,time
0,0.619863,0.769145,0.754668,00:54
1,0.61755,0.736148,0.76699,00:54
2,0.57333,0.652837,0.789022,00:53
3,0.509342,0.567767,0.833831,00:53
4,0.461549,0.545268,0.833831,00:53


You can repeat this with progressively more zoomed out regions of the images—but of course, you will not get any benefit by using an image size larger than the size of your images on disk.