# Transfer learning

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
from fastai.vision import *
from fastai.metrics import error_rate
from pathlib import Path

In [3]:
torch.cuda.is_available()

True

# What is transfer learning?

- Transfer learning is one of the techniques we will use to help reduce the amount of training data we require. 
- Intuitively we can think of transfer learning as leveraging information a network has learned on one set of data for another set of data. 

# Does it help?

In [4]:
# Set random seed to aid with reproducibility 
np.random.seed(42)
# Batch size 
bs = 64
sz = 128

In [5]:
path = Path('data')
path.ls()

[PosixPath('data/__MACOSX'),
 PosixPath('data/Sample_OS_town_plans_London_1890s_compressed.zip'),
 PosixPath('data/railway_training_compressed'),
 PosixPath('data/.ipynb_checkpoints'),
 PosixPath('data/railway_training_compressed.zip'),
 PosixPath('data/Sample_OS_town_plans_London_1890s_compressed')]

In [6]:
data = (ImageDataBunch.from_folder(path/'railway_training_compressed', train='.', valid_pct=0.2, size=sz, bs=bs, num_workers=4)
        .normalize(imagenet_stats))

In [7]:
data

ImageDataBunch;

Train: LabelList (511 items)
x: ImageList
Image (3, 128, 128),Image (3, 128, 128),Image (3, 128, 128),Image (3, 128, 128),Image (3, 128, 128)
y: CategoryList
02_no_rail,02_no_rail,02_no_rail,02_no_rail,02_no_rail
Path: data/railway_training_compressed;

Valid: LabelList (127 items)
x: ImageList
Image (3, 128, 128),Image (3, 128, 128),Image (3, 128, 128),Image (3, 128, 128),Image (3, 128, 128)
y: CategoryList
02_no_rail,02_no_rail,02_no_rail,01_rail,01_rail
Path: data/railway_training_compressed;

Test: None

# Without transfer learning 

In [8]:
learn_no_pretrain = cnn_learner(data, models.resnet18, pretrained=False, metrics=error_rate)

In [9]:
learn_no_pretrain.fit_one_cycle(5)

epoch,train_loss,valid_loss,error_rate,time
0,0.8857,0.727551,0.464567,04:23
1,0.687684,0.772861,0.503937,04:22
2,0.544224,1.278929,0.480315,04:39
3,0.428904,1.229508,0.503937,04:30
4,0.336917,0.931798,0.448819,04:22


# With transfer learning

In [10]:
learn_pre_trained = cnn_learner(data, models.resnet18, pretrained=True, metrics=error_rate)

In [11]:
learn_pre_trained.fit_one_cycle(5)

epoch,train_loss,valid_loss,error_rate,time
0,0.949502,0.687832,0.440945,04:21
1,0.806088,0.731663,0.393701,04:21
2,0.677591,0.639432,0.299213,04:25
3,0.597312,0.605,0.23622,04:23
4,0.537828,0.594873,0.259843,04:23


# Transfer learning 

We've looked at the impact of switching 'pretraining' on or off but what does this actually do?

## ImageNet

- This option defines whether the cnn we create has been pre-trained on imagenet 

### What is imagenet and what is the network pre-trained to do?
- '[ImageNet](http://www.image-net.org/) is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images.' [Source](http://www.image-net.org/)

- 'Based on statistics about the dataset recorded on the ImageNet homepage, there are a little more than 14 million images in the dataset, a little more than 21 thousand groups or classes (synsets), and a little more than 1 million images that have bounding box annotations (e.g. boxes around identified objects in the images).The photographs were annotated by humans using crowdsourcing platforms such as Amazon’s Mechanical Turk.' [Source](https://machinelearningmastery.com/introduction-to-the-imagenet-large-scale-visual-recognition-challenge-ilsvrc/)

- Our network has been pre-trained to classify images into categories i.e. dog, cat, train, car, flower...


In [15]:
learn_pre_trained.summary

<bound method model_summary of Learner(data=ImageDataBunch;

Train: LabelList (511 items)
x: ImageList
Image (3, 128, 128),Image (3, 128, 128),Image (3, 128, 128),Image (3, 128, 128),Image (3, 128, 128)
y: CategoryList
02_no_rail,02_no_rail,02_no_rail,02_no_rail,02_no_rail
Path: data/railway_training_compressed;

Valid: LabelList (127 items)
x: ImageList
Image (3, 128, 128),Image (3, 128, 128),Image (3, 128, 128),Image (3, 128, 128),Image (3, 128, 128)
y: CategoryList
02_no_rail,02_no_rail,02_no_rail,01_rail,01_rail
Path: data/railway_training_compressed;

Test: None, model=Sequential(
  (0): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1,

# What is the network learning?
- It might seem strange that Imagenet will help us help find train tracks or buildings in historic os maps which it has never 'seen' before. 
- To explain this we need to develop some intuation about what a network is 'learning'
- Let's look through a dicussion of this [here](https://github.com/hiromis/notes/blob/master/Lesson1.md)

# Benefits of using transfer learning
- Training dataset can be much smaller 
- Improved accuracy 
- Cost of training reduced 
- Speed of convergence quicker 
- [Environmental costs reduced](https://www.technologyreview.com/s/613630/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/) 

# Transfer learning and digital humanities? 
- Quantify impact of transfer learning for DH 
- We might want to think about what possible impact transfer learning has in a dh context more closely?

# NLP and transfer learning (sidenote)
- Transfer learning has recently become much more widely used in NLP tasks, particularly for language models. 
- Language models training on a general corpus are fine-tuned on the domain specific corpus 
- These language models can then be utilised in downstream tasks like text classification, NER etc. 
- These approaches mean you can often get good accuracy on a classification model with <200 labels. 