# Fine-tuning a pre-trained Convolutional Network

A practical approach to training a convolutional network is to fine-tune an existing pretrained convolutional network. This works well in cases where there is reasonable overlap between the classification tasks of the pre-trained network and the new classification task.

We will show how to fine-tune an existing network trained on the imagenet classification task, to instead classify the "styles" of flickr images, from a Flickr style dataset. The dataset is already available on the Amazon EC2 instance, but if you're running this code from elsewhere, the dataset can easily be downloaded with scripts bundled with caffe:
```
scripts/download_model_binary.py models/bvlc_reference_caffenet
python examples/finetune_flickr_style/assemble_data.py --workers=-1 --images=2000 --seed=1701 --label=5
```

The dataset consists of photos from flickr classified according to certain styles:
![](./finetuning_example.png)

In [1]:
caffe_root = '/home/ubuntu/caffe/'
import sys
sys.path.insert(0, caffe_root + 'python')
import os
os.chdir('../caffe')

import caffe
import numpy as np
from pylab import *
%matplotlib inline

Let's show what is the difference between the fine-tuning network and the original caffe model.

In [3]:
!diff models/bvlc_reference_caffenet/train_val.prototxt models/finetune_flickr_style/train_val.prototxt

1c1
< name: "CaffeNet"
---
> name: "FlickrStyleCaffeNet"
4c4
<   type: "Data"
---
>   type: "ImageData"
15,26c15,19
< # mean pixel / channel-wise mean instead of mean image
< #  transform_param {
< #    crop_size: 227
< #    mean_value: 104
< #    mean_value: 117
< #    mean_value: 123
< #    mirror: true
< #  }
<   data_param {
<     source: "examples/imagenet/ilsvrc12_train_lmdb"
<     batch_size: 256
<     backend: LMDB
---
>   image_data_param {
>     source: "data/flickr_style/train.txt"
>     batch_size: 50
>     new_height: 256
>     new_width: 256
31c24
<   type: "Data"
---
>   type: "ImageData"
42,51c35,36
< # mean pixel / channel-wise mean instead of mean image
< #  transform_param {
< #    crop_size: 227
< #    mean_value: 104
< #    mean_value: 117
< #    mean_value: 123
< #    mirror: true
< #  }
<   data_param {
<     source: "examples/imagenet/ilsvrc12_val_lmdb"
---
>   image_data_param {
>     source: "data/flickr_style/test.t

For your record, if you want to train the network in pure C++ tools, here is the command:

<code>
build/tools/caffe train \
    -solver models/finetune_flickr_style/solver.prototxt \
    -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel \
    -gpu 0
</code>

However, we will train using Python in this example.

In [4]:
niter = 400
test_interval = 25
# losses will also be stored in the log
train_loss = np.zeros(niter)
test_acc = zeros(int(np.ceil(niter / test_interval)))
scratch_train_loss = np.zeros(niter)

caffe.set_device(0)
caffe.set_mode_gpu()
# We create a solver that fine-tunes from a previously trained network.
solver = caffe.SGDSolver('models/finetune_flickr_style/solver.prototxt')
solver.net.copy_from('models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel')
# For reference, we also create a solver that does no finetuning.
#scratch_solver = caffe.SGDSolver('models/finetune_flickr_style/solver.prototxt')

# We run the solver for niter times, and record the training loss.
for it in range(niter):
    solver.step(1)  # SGD by Caffe
    #scratch_solver.step(1)
    # store the train loss
    train_loss[it] = solver.net.blobs['loss'].data
    #scratch_train_loss[it] = scratch_solver.net.blobs['loss'].data
    if it % 10 == 0:
        #print 'iter %d, finetune_loss=%f, scratch_loss=%f' % (it, train_loss[it], scratch_train_loss[it])
        print 'iter %d, finetune_loss=%f' % (it, train_loss[it])
    if it % test_interval == 0:
        print 'Iteration', it, 'testing...'
        correct = 0
        for test_it in range(100):
            solver.test_nets[0].forward()
            correct += sum(solver.test_nets[0].blobs['fc8_flickr'].data.argmax(1)
                           == solver.test_nets[0].blobs['label'].data)
        test_acc[it // test_interval] = correct / 5000.
print 'done'

iter 0, finetune_loss=3.360094, scratch_loss=3.136188
iter 10, finetune_loss=2.672608, scratch_loss=9.736364
iter 20, finetune_loss=2.071996, scratch_loss=2.250404
iter 30, finetune_loss=1.758295, scratch_loss=2.049553
iter 40, finetune_loss=1.533391, scratch_loss=1.941318
iter 50, finetune_loss=1.561658, scratch_loss=1.839706
iter 60, finetune_loss=1.461696, scratch_loss=1.880035
iter 70, finetune_loss=1.267941, scratch_loss=1.719161
iter 80, finetune_loss=1.192778, scratch_loss=1.627453
iter 90, finetune_loss=1.541176, scratch_loss=1.822061
iter 100, finetune_loss=1.029039, scratch_loss=1.654087
iter 110, finetune_loss=1.138547, scratch_loss=1.735837
iter 120, finetune_loss=0.917412, scratch_loss=1.851918
iter 130, finetune_loss=0.971519, scratch_loss=1.801927
iter 140, finetune_loss=0.868252, scratch_loss=1.745545
iter 150, finetune_loss=0.790020, scratch_loss=1.844925
iter 160, finetune_loss=1.092668, scratch_loss=1.695591
iter 170, finetune_loss=1.055344, scratch_loss=1.661715
ite

In [None]:
# plot training loss and accuracy
rcParams['figure.figsize'] = (20, 7)
pl1 = subplot(1,2,1)
pl1.plot(arange(niter), train_loss)
pl1.set_ylabel('train_loss')
pl2 = subplot(1,2,2)
pl2.plot(test_interval * arange(len(test_acc)), test_acc*0.1, 'r')
pl2.set_ylabel('test_accuracy')
pl2.set_xlabel('iteration')

Huzzah! So we did finetuning and it is awesome. Let's take a look at what kind of results we are able to get with a longer, more complete run of the style recognition dataset. Note: the below URL might be occassionally down because it is run on a research machine.

http://demo.vislab.berkeleyvision.org/