# Training Networks for Segmentation

This example explains how to train networks that compute image segmentations, which are given by classifying each individual pixel of an input image. Although this task is different from (whole) image classification similar architectures can be adapted to this task. 

This example explains how to train the FCN-32s network for the PASCAL VOC dataset. Similar to the previous example this example will make use of a pre-trained network, which gives us the following outline:

1. Downloading Datasets
2. Convolutional conversion of Pre-Trained Model
3. Constructing and Training Net

In [1]:
caffe_root = '../'  # this file should be run from {caffe_root}/examples (otherwise change this line)

import sys
#sys.path.insert(0, caffe_root + 'python')
import caffe

import numpy as np
import matplotlib.pyplot as plt

# 1. Downloading Datasets

Networks trained for the PASCAL VOC challenge usually also use the supplementary segmentations of the SDS Dataset. To use these we have to download both. This is done by the `get_pascal.sh` script.

In [2]:
# run scripts from caffe root
import os
os.chdir(caffe_root)

# Download data
!data/pascal/get_voc2011.sh
!data/sbdd/get_sbdd.sh

# back to examples
os.chdir('examples')

Downloading...
Downloading...


# 2. Convolutional conversion of Pre-Trained Model

When training a segmentation network we want to make use of the weight from a pre-trained VGG-16 model. To do segmentation we want to cast the (fully connected) inner product layers as convolutional layers. This conversion does not happen automatically, so lets first create a model file with only convolutional layers, which we can import easily. For this we first have to download the modlel weights.

In [3]:
#We are currently in the examples folder, download vgg16fc.caffemodel
import os
os.chdir("segmentation")
!./get_vgg16.sh

Downloading...


In [4]:
#With the weights in place we can now perform the network surgery
from convert_vgg import convert_net
convert_net("VGG_ILSVRC_16_layers.caffemodel","vgg16fc.caffemodel")

Copying shared layer conv1_1
Copying shared layer conv1_2
Copying shared layer conv2_1
Copying shared layer conv2_2
Copying shared layer conv3_1
Copying shared layer conv3_2
Copying shared layer conv3_3
Copying shared layer conv4_1
Copying shared layer conv4_2
Copying shared layer conv4_3
Copying shared layer conv5_1
Copying shared layer conv5_2
Copying shared layer conv5_3
(source) fc6 weights are (4096, 25088) dimensional and biases are (4096,) dimensional
(destn.) fc6 weights are (4096, 512, 7, 7) dimensional and biases are (4096,) dimensional
(source) fc7 weights are (4096, 4096) dimensional and biases are (4096,) dimensional
(destn.) fc7 weights are (4096, 4096, 1, 1) dimensional and biases are (4096,) dimensional


# 3. Constructing and Training Net

With the convolutional version of VGG we can now start training. For this we first have to create the network model.

In [5]:
#create the train.prototxt and val.prototxt files, similar to previous example
!python2 net.py

#make snapshot directory
!mkdir snapshot

mkdir: cannot create directory ‘snapshot’: File exists


This new network contains a deconvolutional layer that perform upsampling in order to counteracts the subsampling performed by pooling layers, ensuring that the output segmentation has the same size as the input image. In this example upsampling by a factor of 32 is performed, hence the name FCN-32s. This upsampling layer needs to be initialized with weights so that it performs bi-linear upsampling.

Warning: When training the FCN-16s and FCN-8s networks these should be intialized with the weights of a *trained* FCN-32s net, not the VGG-16 model. 

In [6]:
weights = 'vgg16fc.caffemodel'

caffe.set_device(0)
caffe.set_mode_gpu()

solver = caffe.SGDSolver('solver.prototxt')
solver.net.copy_from(weights)

# surgery
import surgery
interp_layers = [k for k in solver.net.params.keys() if 'up' in k]
surgery.interp(solver.net, interp_layers)

With the correctly initialized network we can now start training. During training we want to evaluate segmentation performance, this is done using a special scoring function. Choosing a scoring function means we do not have to worry about setting the `batch_size` and `test_iter` in accordance to the test dataset size, but it requires training through python.

In [None]:
# scoring
import score
val = np.loadtxt('../../data/pascal/seg11valid.txt', dtype=str)

for _ in range(25):
    solver.step(4000)
    score.seg_tests(solver, False, val, layer='score')

#Save network.
solver.net.save("snapshot/final.caffemodel")

# 4. Further Networks

This example is based on the examples provided [here](fcn.berkleyvision.org). Have a look there for how to train more accurate FCN-16s and FCN-8s architectures. All python files used in this example are the same, the training code from section 3 is provided in the `solve.py` files.