# A Practical Introduction to Deep Learning with Caffe and Python (Copied)  

> see http://adilmoujahid.com/posts/2016/06/introduction-deep-learning-python-caffe/ for the blog;  
> see https://github.com/adilmoujahid/deeplearning-cats-dogs-tutorial for source code;

## 1. Caffe Overview

Caffe is a deep learning framework developed by the Berkeley Vision and Learning Center (BVLC). It is written in C++ and has Python and Matlab bindings.

There are 4 steps in training a CNN using Caffe:

- Step 1 - Data preparation: In this step, we clean the images and store them in a format that can be used by Caffe. We will write a Python script that will handle both image pre-processing and storage.
- Step 2 - Model definition: In this step, we choose a CNN architecture and we define its parameters in a configuration file with extension .prototxt.
- Step 3 - Solver definition: The solver is responsible for model optimization. We define the solver parameters in a configuration file with extension .prototxt.
- Step 4 - Model training: We train the model by executing one Caffe command from the terminal. After training the model, we will get the trained model in a file with extension .caffemodel.

After the training phase, we will use the `.caffemodel` trained model to make predictions of new unseen data. We will write a Python script to this.

## 2. Data Preparation

we run `create_lmdb.py`:

```
cd ~/deeplearning-cats-dogs-tutorial/code
python create_lmdb.py
```

create_lmdb.py script does the following:

- Run histogram equalization on all training images. Histogram equalization is a technique for adjusting the contrast of images.
- Resize all training images to a 227x227 format.
- Divide the training data into 2 sets: One for training (5/6 of images) and the other for validation (1/6 of images). The training set is used to train the model, and the validation set is used to calculate the accuracy of the model.
- Store the training and validation in 2 LMDB databases. train_lmdb for training the model and validation_lmbd for model evaluation.

Below is the explanation of the most important parts of the code:
```
def transform_img(img, img_width=IMAGE_WIDTH, img_height=IMAGE_HEIGHT):

    #Histogram Equalization
    img[:, :, 0] = cv2.equalizeHist(img[:, :, 0])
    img[:, :, 1] = cv2.equalizeHist(img[:, :, 1])
    img[:, :, 2] = cv2.equalizeHist(img[:, :, 2])

    #Image Resizing
    img = cv2.resize(img, (img_width, img_height), interpolation = cv2.INTER_CUBIC)
    return img
    
```

See the complete source code in the following:

In [7]:
"""
Title           :create_lmdb.py
Description     :This script divides the training images into 2 sets and stores them in lmdb databases for training and validation.
"""

import os
import glob
import random
import numpy as np

import cv2
import caffe
from caffe.proto import caffe_pb2
import lmdb

#Size of images
IMAGE_WIDTH = 227
IMAGE_HEIGHT = 227

def transform_img(img, img_width=IMAGE_WIDTH, img_height=IMAGE_HEIGHT):

    #Histogram Equalization
    img[:, :, 0] = cv2.equalizeHist(img[:, :, 0])
    img[:, :, 1] = cv2.equalizeHist(img[:, :, 1])
    img[:, :, 2] = cv2.equalizeHist(img[:, :, 2])

    #Image Resizing
    img = cv2.resize(img, (img_width, img_height), interpolation = cv2.INTER_CUBIC)
    return img


def make_datum(img, label):
    #image is numpy.ndarray format. BGR instead of RGB
    return caffe_pb2.Datum(
        channels=3,
        width=IMAGE_WIDTH,
        height=IMAGE_HEIGHT,
        label=label,
        data=np.rollaxis(img, 2).tostring())


data_root = "/media/ccjData2/datasets/kaggle/dogs-vs-cats/"
train_lmdb = data_root + 'train_lmdb'
validation_lmdb = data_root + 'validation_lmdb'
os.system('rm -r ' + train_lmdb)
os.system('rm -r ' + validation_lmdb)
train_data = [img for img in glob.glob( data_root + "train/*jpg")]
test_data = [img for img in glob.glob( data_root + "test1/*jpg")]

#Shuffle train_data
random.shuffle(train_data)

print 'Creating train_lmdb'

in_db = lmdb.open(train_lmdb, map_size=int(1e12))
with in_db.begin(write=True) as in_txn:
    for in_idx, img_path in enumerate(train_data):
        if in_idx %  6 == 0:
            continue
        img = cv2.imread(img_path, cv2.IMREAD_COLOR)
        img = transform_img(img, img_width=IMAGE_WIDTH, img_height=IMAGE_HEIGHT)
        if 'cat' in img_path:
            label = 0
        else:
            label = 1
        datum = make_datum(img, label)
        in_txn.put('{:0>5d}'.format(in_idx), datum.SerializeToString())
        #print '{:0>5d}'.format(in_idx) + ':' + img_path
in_db.close()


print '\nCreating validation_lmdb'

in_db = lmdb.open(validation_lmdb, map_size=int(1e12))
with in_db.begin(write=True) as in_txn:
    for in_idx, img_path in enumerate(train_data):
        if in_idx % 6 != 0:
            continue
        img = cv2.imread(img_path, cv2.IMREAD_COLOR)
        img = transform_img(img, img_width=IMAGE_WIDTH, img_height=IMAGE_HEIGHT)
        if 'cat' in img_path:
            label = 0
        else:
            label = 1
        datum = make_datum(img, label)
        in_txn.put('{:0>5d}'.format(in_idx), datum.SerializeToString())
        #print '{:0>5d}'.format(in_idx) + ':' + img_path
in_db.close()

print '\nFinished processing all images'

Creating train_lmdb

Creating validation_lmdb

Finished processing all images


`transform_img` takes a colored images as input, does the histogram equalization of the 3 color channels and resize the image.  
![example of image transformations applied to one training image](http://adilmoujahid.com/images/image-transform.jpg)  

```
def make_datum(img, label):

    return caffe_pb2.Datum(
        channels=3,
        width=IMAGE_WIDTH,
        height=IMAGE_HEIGHT,
        label=label,
        data=np.rollaxis(img, 2).tostring())
```

`make_datum` takes an image and its label and return a [Datum object](https://github.com/BVLC/caffe/wiki/The-Datum-Object) that contains the image and its label.


```
in_db = lmdb.open(train_lmdb, map_size=int(1e12))
with in_db.begin(write=True) as in_txn:
    for in_idx, img_path in enumerate(train_data):
        if in_idx %  6 == 0:
            continue
        img = cv2.imread(img_path, cv2.IMREAD_COLOR)
        img = transform_img(img, img_width=IMAGE_WIDTH, img_height=IMAGE_HEIGHT)
        if 'cat' in img_path:
            label = 0
        else:
            label = 1
        datum = make_datum(img, label)
        in_txn.put('{:0>5d}'.format(in_idx), datum.SerializeToString())
        print '{:0>5d}'.format(in_idx) + ':' + img_path
in_db.close()
```

The code above takes 5/6 of the training images, transforms and stores them in train_lmdb. The code for storing validation data follows the same structure.

### Generating the mean image of training data:
We execute the command below to generate the mean image of training data. We will substract the mean image from each input image to ensure every feature pixel has zero mean. This is a common preprocessing step in supervised machine learning.

```
cd ~/caffe/build/tools
./compute_image_mean -backend=lmdb /media/ccjData2/datasets/kaggle/dogs-vs-cats/train_lmdb /media/ccjData2/datasets/kaggle/dogs-vs-cats/mean.binaryproto
```

This is the corresponding output:  
![get-mean-binaryproto.png](../files/get-mean-binaryproto.png)

## 3. Model Definition
After deciding on the CNN architecture, we need to define its parameters in a `.prototxt` train_val file. Caffe comes with a few popular CNN models such as Alexnet and GoogleNet. In this tutorial, we will use the `bvlc_reference_caffenet` model which is a replication of AlexNet with a few modifications. Below is a copy of the train_val file that we call `caffenet_train_val_1.prototxt`. If you clone the tutorial git repository as explained above, you should have the same file under `deeplearning-cats-dogs-tutorial/caffe_models/caffe_model_1/`.

We need to make the modifications below to the original bvlc_reference_caffenet prototxt file:
- Change the path for input data and mean image: Lines 24, 40 and 51.
- Change the number of outputs from 1000 to 2: Line 373. The original bvlc_reference_caffenet was designed for a classification problem with 1000 classes.
- see the file at this [gist reporsitory](https://gist.github.com/ccj5351/fce8c81e36fd62a7ac235462b589d8c6), and see the visulization via Netscope at [this](https://ethereon.github.io/netscope/#/gist/fce8c81e36fd62a7ac235462b589d8c6). 

We can print the model architecture by executing the command below.
```
python ~/caffe/python/draw_net.py ~/seg-depth/src/caffe/caffenet_train_val_1.prototxt ~/Downloads/caffe_model_1.png
```
The model is shown as below:  
![affe_model_1.png](../files/caffe_model_1.png)

### Netscope:

Netscope is a web-based tool for visualizing neural network architectures (or technically, any directed acyclic graph). It currently supports Caffe's prototxt format. So if this `.prototxt` file is part of a GitHub Gist, we can visualize it by visiting [this URL.](https://ethereon.github.io/netscope/#/gist/fce8c81e36fd62a7ac235462b589d8c6)
> Note the URL format is `http://ethereon.github.io/netscope/#/gist/your-gist-id`. The `Gist ID` is the numeric suffix in the Gist's URL.

## 4. Solver Definition
The solver is responsible for model optimization. We define the solver's parameters in a `.prototxt` file. You can find our solver under deeplearning-cats-dogs-tutorial/caffe_models/caffe_model_1/ with name solver_1.prototxt. Below is a copy of the same.

This solver computes the accuracy of the model using the validation set every 1000 iterations. The optimization process will run for a maximum of 40000 iterations and will take a snapshot of the trained model every 5000 iterations.

`base_lr`, `lr_policy`, `gamma`, `momentum` and `weight_decay` are hyperparameters that we need to tune to get a good convergence of the model.

I chose `lr_policy: "step"` with `stepsize: 2500, base_lr: 0.001` and `gamma: 0.1`. In this configuration, we will start with a learning rate of `0.001`, and we will drop the learning rate by a factor of ten every 2500 iterations.

There are different strategies for the optimization process. For a detailed explanation, I recommend Caffe's [solver documentation](http://caffe.berkeleyvision.org/tutorial/solver.html).

```
net: "/home/ccj/seg-depth/study_caffe/caffenet_train_val_1.prototxt"
test_iter: 1000
test_interval: 1000
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 2500
display: 50
max_iter: 40000
momentum: 0.9
weight_decay: 0.0005
snapshot: 5000
snapshot_prefix: "/home/ccj/seg-depth/logs/caffe_model_1"
solver_mode: GPU
```

## 5. Model Training
After defining the model and the solver, we can start training the model by executing the command below:
```
~/caffe/build/tools/caffe train --solver ~/seg-depth/study_caffe/solver_1.prototxt 2>&1 | tee ~/seg-depth/logs/caffe_model_1/model_1_train.log
```

The training logs will be stored under `~/seg-depth/logs/caffe_model_1/model_1_train.log`.

During the training process, we need to monitor the `loss` and the model `accuracy`. We can stop the process at anytime by pressing Ctrl+c. Caffe will take a snapshot of the trained model every 5000 iterations, and store them under caffe_model_1 folder.

The snapshots have `.caffemodel` extension. For example, 10000 iterations snapshot will be called: caffe_model_1_iter_10000.caffemodel.

## 6. Plotting the learning curve
A learning curve is a plot of the training and test losses as a function of the number of iterations. These plots are very useful to visualize the train/validation losses and validation accuracy.

We can see from the learning curve that the model achieved a validation accuracy of 90%, and it stopped improving after 3000 iterations.

```
python ~/seg-depth/study_caffe/plot_learning_curve.py ~/seg-depth/logs/caffe_model_1/model_1_train.log ~/seg-depth/logs/caffe_model_1/caffe_model_1_learning_curve.png
```

The result is shown below  
![caffe_model_1_learning_curve.png](../files/caffe_model_1_learning_curve.png)

## 7. Prediction on New Data
Now that we have a trained model, we can use it to make predictions on new unseen data (images from test1). The Python code for making the predictions is make_predictions_1.py and it's stored under deeplearning-cats-dogs-tutorial/code. The code needs 4 files to run:

- Test images: We will use test1 images.
- Mean image: The mean image that we computed in section 2 above.
- Model architecture file: We'll call this file `caffenet_deploy_1.prototxt`. It's stored under deeplearning-cats-dogs-tutorial/caffe_models/caffe_model_1. It's structured in a similar way to caffenet_train_val_1.prototxt, but with a few modifications. We need to delete the data layers, add an input layer and change the last layer type from `SoftmaxWithLoss` to `Softmax`.  
- Trained model weights: This is the file that we computed in the training phase. We will use `caffe_model_1_iter_10000.caffemodel`.

To run the Python code, we need to execute the command below. The predictions will be stored under deeplearning-cats-dogs-tutorial/caffe_models/caffe_model_1/submission_model_1.csv.

```
cd ~/seg-depth/study_caffe
python make_predictions_1.py
```

Below is the explanation of the most important parts in the code.

```python
#Read mean image
mean_blob = caffe_pb2.BlobProto()
with open('/home/ubuntu/deeplearning-cats-dogs-tutorial/input/mean.binaryproto') as f:
    mean_blob.ParseFromString(f.read())
mean_array = np.asarray(mean_blob.data, dtype=np.float32).reshape(
    (mean_blob.channels, mean_blob.height, mean_blob.width))


#Read model architecture and trained model's weights
net = caffe.Net('/home/ubuntu/deeplearning-cats-dogs-tutorial/caffe_models/caffe_model_1/caffenet_deploy_1.prototxt',
                '/home/ubuntu/deeplearning-cats-dogs-tutorial/caffe_models/caffe_model_1/caffe_model_1_iter_10000.caffemodel',
                caffe.TEST)

#Define image transformers
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_mean('data', mean_array)
transformer.set_transpose('data', (2,0,1))
```


The code above stores the mean image under `mean_array`, defines a model called `net` by reading the deploy file and the trained model, and defines the transformations that we need to apply to the test images.

```python
img = cv2.imread(img_path, cv2.IMREAD_COLOR)
img = transform_img(img, img_width=IMAGE_WIDTH, img_height=IMAGE_HEIGHT)

net.blobs['data'].data[...] = transformer.preprocess('data', img)
out = net.forward()
pred_probas = out['prob']
print pred_probas.argmax()

```

The code above read an image, apply similar image processing steps to training phase, calculates each class's probability and prints the class with the largest probability (0 for cats, and 1 for dogs).

## 8. Building a Cat/Dog Classifier using Transfer Learning
### 8.1 What is Transfer Learning?

Convolutional neural networks require large datasets and a lot of computional time to train. Some networks could take up to 2-3 weeks across multiple GPUs to train. `Transfer learning` is a very useful technique that tries to address both problems. Instead of training the network from scratch, transfer learning utilizes a trained model on a different dataset, and adapts it to the problem that we're trying to solve.

There are 2 strategies for transfer learning:

- Utilize the trained model as a fixed feature extractor: In this strategy, we remove the last fully connected layer from the trained model, we freeze the weights of the remaining layers, and we train a machine learning classifier (e.g., SVM) on the output of the remaining layers.
- Fine-tune the trained model: In this strategy, we fine tune the trained model on the new dataset by `continuing the backpropagation`. We can either fine-tune the whole network or freeze some of its layers.

For a detailed explanation of transfer learning, I recommend reading these [cs 231n deep learning course notes.](http://cs231n.github.io/transfer-learning/)

### 8.2 Training the Cat/Dog Classifier using Transfer Learning

Caffe comes with a repository that is used by researchers and machine learning practitioners to share their trained models. This library is called `Model Zoo`.

We will utilize the trained `bvlc_reference_caffenet` as a starting point of building our cat/dog classifier using transfer learning. This model was trained on the ImageNet dataset which contains millions of images across 1000 categories.

We will use the fine-tuning strategy for training our model.

#### Download trained bvlc_reference_caffenet model (i.e., the trained model weights):

We can download the trained model by executing the command below.
```
cd ~/caffe/models/bvlc_reference_caffenet
wget http://dl.caffe.berkeleyvision.org/bvlc_reference_caffenet.caffemodel
```

#### Model Definition
The model and solver configuration files are stored under `deeplearning-cats-dogs-tutorial/caffe_models/caffe_model_2`. We need to make the following change to the original bvlc_reference_caffenet model configuration file.

- Change the path for input data and mean image: Lines 24, 40 and 51.
- Change the name of the last fully connected layer from fc8 to fc8-cats-dogs. Lines 360, 363, 387 and 397.
- Change the number of outputs from 1000 to 2: Line 373. The original bvlc_reference_caffenet was designed for a classification problem with 1000 classes.

Note that if we keep a layer's name unchanged and we pass the trained model's weights to Caffe, it will pick its weights from the trained model. If we want to `freeze a layer`, we need to setup its `lr_mult` parameter to `0`.  
> see this file at [this gist repository.](https://gist.github.com/ccj5351/5a0e9f65ba42f658924295126f4e1f5f)

#### Solver Definition
We will use a similar solver to the one used before.
> see this file at [this gist repository.](https://gist.github.com/ccj5351/5a0e9f65ba42f658924295126f4e1f5f)

#### Model Training with Transfer Learning
After defining the model and the solver, we can start training the model by executing the command below. Note that we can pass the trained model's weights by using the argument `--weights`.

```
~/caffe/build/tools/caffe train --solver /home/ccj/seg-depth/study_caffe/solver_2.prototxt --weights /home/ccj/caffe/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel 2>&1 | tee /home/ccj/seg-depth/logs/caffe_model_2/model_2_train.log
```

#### Plotting the Learning Curve
Similarly to the previous section, we can plot the learning curve by executing the command below. We can see from the learning curve that the model achieved an accuracy of `~97%` after 1000 iterations only. This shows the power of `transfer learning`. We were able to get a higher accuracy with a smaller number of iterations.

```
python ~/seg-depth/study_caffe/plot_learning_curve.py ~/seg-depth/logs/caffe_model_2/model_2_train.log ~/seg-depth/logs/caffe_model_2/caffe_model_2_learning_curve.png
```

The result is shown as below:  
![caffe_model_2_learning_curve.png](../files/caffe_model_2_learning_curve.png)

#### Prediction on New Data
Similarly, we will generate predictions on the test data and upload the results to Kaggle to get the model accuracy. The code for making the predicitions is under deeplearning-cats-dogs-tutorial/code/make_predictions_2.py.

The model got an accuracy of 0.97154 which is better than the model that we trained from scratch.

## 9. Conclusion
In this blog post, we covered core concepts of deep learning and convolutional neural networks. We also learned how to build convolutional neural networks using Caffe and Python from scratch and using transfer learning. If you want to learn more about this topic, I highly recommend Stanford's ["Convolutional Neural Networks for Visual Recognition" course](http://cs231n.github.io/).

## 10. References
1. [CS231n - Neural Networks Part 1: Setting up the Architecture](http://cs231n.github.io/neural-networks-1/)
2. [Wikipedia - Convolutional Neural Network](https://en.wikipedia.org/wiki/Convolutional_neural_network)
3. [CS231n - Transfer Learning Notes](http://cs231n.github.io/transfer-learning/)
4. [A Step by Step Backpropagation Example](https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/)
5. [CS231n Convolutional Neural Networks for Visual Recognition](http://cs231n.github.io/)