# Multilabel classification on PASCAL using the image data layer

As is the case with the `pascal-multilabel-with-datalayer.ipynb` example, this tutorial considers multilabel claclassification on the PASCAL VOC 2012 dataset.

Multilabel classification is a generalization of multiclass classification, where each instance (image) can belong to many classes. For example, an image may both belong to a "beach" category and a "vacation pictures" category. In multiclass classification, on the other hand, each image belongs to a single class.

Caffe supports multilabel classification through the `SigmoidCrossEntropyLoss` layer. The input to the network is defined by an `ImageDataLayer`, which loads a list of images and corresponding labels from a text file. Data could also be provided through HDF5 or LMDB data layers (or even a Python data layer as in `pascal-multilabel-with-datalayer.ipynb`), but in some respects these layers are not as simple as just providing a list of images. The image data layer also has the advantage that it allows for the shuffling of the input data set after each epoch in training.

### Preliminaries
This examples requires that Caffe and PyCaffe are already built.

Download the Pascal VOC2012 data set. A script is provided at `../data/pascal/get_pascal.sh` which downloads the dataset. The example assumes that the dataset has been downloaded and extracted to `../data/pascal/VOC2012`.

In [None]:
!../data/pascal/get_pascal.sh
print("Done")

### Import the required modules

In [None]:
# import some modules
import sys, os
import numpy as np
import os.path as osp
import matplotlib.pyplot as plt

from copy import copy

# Ensure that matplotlib figures are displayed in the notebook and specify their size.
% matplotlib inline
plt.rcParams['figure.figsize'] = (6, 6)

caffe_root = '../'  # this file is expected to be in {caffe_root}/examples
sys.path.append(caffe_root + 'python')
import caffe # If you get "No module named _caffe", either you have not built pycaffe or you have the wrong path.

### Ensure that the starting model (`bvlc_reference_caffenet`) is available
The `download_model_binary.py` script is used to download pre-trained weights for the BVLC reference caffenet.

In [None]:
# make sure we have the caffenet weight downloaded.
if not os.path.isfile(caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'):
    print("Downloading pre-trained CaffeNet model...")
    !../scripts/download_model_binary.py ../models/bvlc_reference_caffenet

### Set the required paths

In [None]:
# Set root directory of the dataset
pascal_root = os.path.join(caffe_root, 'data/pascal/VOC2012')
# Set the directory where the model definitions and output files will be stored.
workdir = "./pascal"

Construct the file lists required by the `ImageDataLayer` used in this example.

In [None]:
print("Generating file lists\n")
!python ../examples/pascal/create_file_list.py --ignore-background --pascal-root ../data/pascal/VOC2012 --output-path ./pascal
print("\nDone")

### Initialise Caffe (using the GPU)

In [None]:
# Initialize caffe for gpu mode
caffe.set_mode_gpu()
caffe.set_device(0)

### Load the network definition
The network has been defined in the `pascal/solver_image_layer_notebook.prototxt` file. We load the network definition here and initialise it from the `bvlc_reference_caffenet` model.

In [None]:
solver = caffe.SGDSolver(osp.join(workdir, 'solver_image_layer_notebook.prototxt'))
solver.net.copy_from(caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel')
solver.test_nets[0].share_with(solver.net)
solver.step(1)

Define the class names as a list. Note The `ImageDataLayer` allows for labels to be ignored, although `__background__` is specified this is ignored for each image (and assigned a value of -1). See the lines of the files `trainval.list.txt` and `val.list.txt`.

In [None]:
# String representations of the class labels
classes = np.array(('__background__', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'))

# The network uses a label value of -1 to indicate that a label should be ignored.
ig_label = -1



### Visualise the input data
We can now view the data loaded by the network (defined by the lines of the files `pascal/trainval.list.txt`). As we are viewing the Caffe Blobs directly, the following function is used to convert a caffe data blob to a image for display purposes. This involves changing the order of the axes as well as the color channels.

In [None]:
def deprocess(im):
    """Make a caffe blob viewable"""
    im = im.transpose(1, 2, 0)
    im = im[:, :, ::-1]  # change to RGB
    return np.uint8(im)

View the first image loaded by the network as well as its corresponding labels. Note that labels to be ignored are not displayed. To confirm this, try setting `ig_label` to something other than -1 and confirm that the ground truth then contains the string `__background__`.

In [None]:
image_index = 0 #Lets look at the first image in the batch.
plt.imshow(deprocess(copy(solver.net.blobs['data'].data[image_index, ...])))
gtlist = solver.net.blobs['label'].data[image_index, ...].astype(np.int)
plt.title('GT: {}'.format(classes[np.where((gtlist) & (gtlist != ig_label))]))
plt.axis('off');

**NOTE:** we are reading the image from the data layer, so the resolution is lower than the original PASCAL image.

### Train a network

Let's train the net. It would be useful to have a means to measure accuracy. The Hamming distance is commonly used in multilabel problems, and we thus define the corresponding function. Note that labels which are ignored (`ig_label`) do not contribute to the distance.

In [None]:
def hamming_distance(gt, est):
    return sum([1 for (g, e) in zip(gt, est) if g == e and g != ig_label]) / float(len(gt[np.where(gt != ig_label)]))


 We also need a simple test loop.

In [None]:
def check_accuracy(net, num_batches, batch_size = 128):
    acc = 0.0
    for t in range(num_batches):
        net.forward()
        gts = net.blobs['label'].data
        ests = net.blobs['score'].data > 0
        for gt, est in zip(gts, ests): #for each ground truth and estimated label vector
            acc += hamming_distance(gt, est)
    return acc / (num_batches * batch_size)

Train the network for a while and output the computed accuracy at a number of intervals.

In [None]:
for itt in range(6):
    solver.step(100)
    print 'itt:{:3d}'.format((itt + 1) * 100), 'accuracy:{0:.4f}'.format(check_accuracy(solver.test_nets[0], 50))

Great! The accuracy is increasing, and it seems to converge rather quickly. It may seem strange that it starts off so high but it is because the ground truth is sparse. There are 20 classes in PASCAL, and usually only one or two are present. So predicting all zeros yields rather high accuracy. Let's check to make sure.

In [None]:
def check_baseline_accuracy(net, num_batches, batch_size = 128):
    acc = 0.0
    for t in range(num_batches):
        net.forward()
        gts = net.blobs['label'].data
        # Provide zeros as the estimated labels.
        ests = np.zeros((batch_size, len(gts)))
        for gt, est in zip(gts, ests): #for each ground truth and estimated label vector
            acc += hamming_distance(gt, est)
    return acc / (num_batches * batch_size)

print 'Baseline accuracy:{0:.4f}'.format(check_baseline_accuracy(solver.test_nets[0], 5823/128))

### Look at some prediction results
Since the network has now been trained somewhat, we can look at the predicitions for futher input files. Perform inferrence for a selection of images from the intput set.

In [None]:
test_net = solver.test_nets[0]
for image_index in range(5):
    plt.figure()
    plt.imshow(deprocess(copy(test_net.blobs['data'].data[image_index, ...])))
    gtlist = test_net.blobs['label'].data[image_index, ...].astype(np.int)
    estlist = test_net.blobs['score'].data[image_index, ...] > 0
    plt.title('GT: {} \n EST: {}'.format(classes[np.where((gtlist) & (gtlist != ig_label))], classes[np.where(estlist)]))
    plt.axis('off')