While this tutorial uses Intel Optimized Caffe, the same general techniques apply to other frameworks like MXNet, Neon, Theano, Torch, TensorFlow, etc. [Caffe](http://caffe.berkeleyvision.org/) is a deep learning framework written in C++ and CUDA C++ developed by the Berkeley Vision and Learning Center ([BVLC](http://caffe.berkeleyvision.org/)). Intel Optimized Caffe is kept in synched with Caffe and therefore has all the benefits of Caffe. In addition, Intel Optimized Caffe is integrated with the latest release of Intel® Math Kernel Library (Intel® MKL) 2017 optimized for deep learning primitives on EC2 CPU instances. In addition Intel Optimized Caffe can be used for distributed multinode training or fine-tuning across various nodes. A tutorial for AWS distributed training can be found [here](https://software.intel.com/en-us/articles/distributed-training-of-deep-networks-on-amazon-web-services-aws).

In this tutorial we go through the steps of installing Intel Optimized Caffe and BVLC Caffe, comparing performance on Intel Optimized Caffe vs BVLC Caffe, and fine-tuning a model. I'll explain what is fine-tuning below. A detailed explanation of installing and using Intel Optimized Caffe can be found [here](https://software.intel.com/en-us/articles/training-and-deploying-deep-learning-networks-with-caffe-optimized-for-intel-architecture).

This tutorial has been tested using Ubuntu 14.04 and Ubuntu 16.04 on c4.8xlarge EC2 instances. I strongly suspect that it works on most instances with memory > 3 GB. Prior to these steps, start an Ubuntu 16.04 AMI using c4.8xlarge instance.

To run this notebook:

- Start a brand new Ubuntu 16.04 instance on a c4.8xlarge
- Update: `sudo apt-get update`
- Download this jupyter notebook: `wget https://raw.githubusercontent.com/RodriguezAndres/CaffeTutorial/master/IntelOptimizedCaffeTutorial.ipynb -O IntelOptimizedCaffeTutorial.ipynb`
- Download or scp the Dogs Vs Cats dataset (instructions below)
- Download Install [Anaconda](https://www.continuum.io/downloads) and setup jupyter:
  - `wget https://repo.continuum.io/archive/Anaconda2-4.2.0-Linux-x86_64.sh`
  - `bash Anaconda2-4.2.0-Linux-x86_64.sh`
  - `source .bashrc`
  - `wget https://raw.githubusercontent.com/RodriguezAndres/CaffeTutorial/master/jupyter_setup.sh`
  - `chmod +x jupyter_setup.sh`
  - `./jupyter_setup.sh`
- Run: `jupyter notebook`

In [None]:
%%bash
# get dependencies
sudo apt-get update &&
sudo apt-get install -y git build-essential &&
sudo apt-get install -y libprotobuf-dev libleveldb-dev libsnappy-dev &&
sudo apt-get install -y libopencv-dev libhdf5-serial-dev protobuf-compiler &&
sudo apt-get install -y --no-install-recommends libboost-all-dev &&
sudo apt-get install -y libgflags-dev libgoogle-glog-dev liblmdb-dev &&
sudo apt-get install -y libatlas-base-dev 

In [None]:
%%bash
# not needed if using Ubuntu 14.04
cd /usr/lib/x86_64-linux-gnu
sudo ln -s libhdf5_serial.so.10.1.0 libhdf5.so
sudo ln -s libhdf5_serial_hl.so.10.0.2 libhdf5_hl.so

In [None]:
%%bash
# download Intel Optimized Caffe and BVLC Caffe
cd ~
git clone https://github.com/intel/caffe.git 
mv caffe caffe-intel
cd caffe-intel
cp Makefile.config.example Makefile.config
# modify Makefile to run on CPU and use Anaconda
sed -i -e 's/# CPU_ONLY/CPU_ONLY/g' Makefile.config
sed -i -e 's/PYTHON_INCLUDE := \/usr\/include\/python2.7/# PYTHON_INCLUDE := \/usr\/include\/python2.7/g' Makefile.config
sed -i -e 's/\/usr\/lib\/python2.7\/dist-packages\/numpy\/core\/include/# \/usr\/lib\/python2.7\/dist-packages\/numpy\/core\/include/g' Makefile.config
sed -i -e 's/# ANACONDA_HOME := $(HOME)\/anaconda/ANACONDA_HOME := \/home\/ubuntu\/anaconda2/g' Makefile.config
sed -i -e 's/# PYTHON_INCLUDE := $(ANACONDA_HOME)/PYTHON_INCLUDE := $(ANACONDA_HOME)/g' Makefile.config
sed -i -e 's/# $(ANACONDA_HOME)/$(ANACONDA_HOME)/g' Makefile.config
sed -i -e 's/PYTHON_LIB := \/usr\/lib/# PYTHON_LIB := \/usr\/lib/g' Makefile.config
sed -i -e 's/# PYTHON_LIB := $(ANACONDA_HOME)/PYTHON_LIB := $(ANACONDA_HOME)/g' Makefile.config
sed -i -e 's/INCLUDE_DIRS := $(PYTHON_INCLUDE) \/usr\/local\/include/INCLUDE_DIRS := $(PYTHON_INCLUDE) \/usr\/local\/include \/usr\/include\/hdf5\/serial\//g' Makefile.config
cd ~
git clone https://github.com/BVLC/caffe.git
mv caffe caffe-bvlc
cd caffe-bvlc
cp Makefile.config.example Makefile.config
sed -i -e 's/# CPU_ONLY/CPU_ONLY/g' Makefile.config
sed -i -e 's/INCLUDE_DIRS := $(PYTHON_INCLUDE) \/usr\/local\/include/INCLUDE_DIRS := $(PYTHON_INCLUDE) \/usr\/local\/include \/usr\/include\/hdf5\/serial\//g' Makefile.config
conda install libgcc

In [None]:
%%bash
# compile BVLC Caffe and Intel Caffe
# get the number of threads in order to use all threads to compile
NUMTHREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))
cd ~/caffe-bvlc
make -j $NUMTHREADS
cd ~/caffe-intel
make -j $NUMTHREADS # downloads MKL DL functions on 1st make

In [None]:
%%bash
# compile Pycaffe
cd ~/caffe-intel
cd python
pip install --upgrade pip
for req in $(cat requirements.txt); do pip install $req; done
cd ~/caffe-intel
make pycaffe
echo "export PYTHONPATH=/home/ubuntu/caffe-intel/python" >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/ubuntu/anaconda2/lib"' >> ~/.bashrc
source ~/.bashrc

Both BVLC Caffe and Intel Optimized Caffe are now installed and compiled. For a comparison of performance on CPUs we will ran CaffeNet for a few iterations. CaffeNet and AlexNet are equivalent except for switching the pooling and normalization layers.

In [None]:
%%bash
# time the performance on BVLC Caffe on 10 iterations
cd ~/caffe-bvlc
build/tools/caffe time --model=/home/ubuntu/caffe-bvlc/models/bvlc_reference_caffenet/deploy.prototxt -iterations 10

In [None]:
%%bash
# time the performance on Intel Optimized Caffe on 10 iterations
cd ~/caffe-intel
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/ubuntu/anaconda2/lib"
build/tools/caffe time --model=/home/ubuntu/caffe-bvlc/models/bvlc_reference_caffenet/deploy.prototxt -iterations 10

In this tutorial we will use the data from the Kaggle dogs vs cats competition to fine-tune a model. In order to use the data you need to login to the Kaggle website and download the data, or login and then copy the cookies and download the data as follows:

~~~ bash
cd ~
mkdir dogvscat
wget -x --load-cookies cookies.txt -P dogvscat -nH --cut-dirs=5 http://www.kaggle.com/c/dogs-vs-cats/download/train.zip
~~~

In the reminder of this tutorial, we assume that the data has been downloaded and is already in the dogvscat folder

In [None]:
%%bash
# unzip the training data
sudo apt-get -y install unzip
cd /home/ubuntu/dogvscat
unzip train.zip -d .

In [None]:
# Selects 10% of the images (the ones that end in '2') for validation
# Prepares train.txt and val.txt with the names and labels of the images
%cd /home/ubuntu/dogvscat
import sys
import os
import os.path

TRAIN_TEXT_FILE = 'train.txt'
VAL_TEXT_FILE = 'val.txt'
IMAGE_FOLDER = 'train'

fr = open(TRAIN_TEXT_FILE, 'w')
fv = open(VAL_TEXT_FILE, 'w')

filenames = os.listdir(IMAGE_FOLDER)
for filename in filenames:
  if filename[0:3] == 'cat':
    if filename[-5] == '2':# or filename[-5] == '8':
      fv.write(filename + ' 0\n')
    else:
      fr.write(filename + ' 0\n')
  if filename[0:3] == 'dog':
    if filename[-5] == '2':# or filename[-5] == '8':
      fv.write(filename + ' 1\n')
    else:
      fr.write(filename + ' 1\n')

fr.close()
fv.close()

In [None]:
%%bash
# convert to dataset to lmdb format
# training set
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/ubuntu/anaconda2/lib"
/home/ubuntu/caffe-intel/build/tools/convert_imageset \
  --resize_height=256 \
  --resize_width=256 \
  --shuffle \
  /home/ubuntu/dogvscat/train/ \
  /home/ubuntu/dogvscat/train.txt \
  /home/ubuntu/dogvscat/train_lmdb
# validation set
/home/ubuntu/caffe-intel/build/tools/convert_imageset \
  --resize_height=256 \
  --resize_width=256 \
  --shuffle \
  /home/ubuntu/dogvscat/train/ \
  /home/ubuntu/dogvscat/val.txt \
  /home/ubuntu/dogvscat/val_lmdb
  
/home/ubuntu/caffe-intel/build/tools/compute_image_mean /home/ubuntu/dogvscat/train_lmdb \
  /home/ubuntu/dogvscat/dogvscat_mean.binaryproto

Recyle the layer definition prototxt file and made the following two changes:

1. Change the data layer to include the new data:

~~~ bash
layer {
  name: "data"
  type: "Data"
  data_param {
    source: "trained_lmdb" # CHANGED LINE
    ...
    }
  ...
  }
}
~~~

2. Change the last layer, e.g., "fc8" (note: in testing, make this same change to the deploy.prototxt file):

~~~ bash
layer {
  name: "ip8-ft" # CHANGED LINE
  type: "InnerProduct"
  inner_product_param {
    num_output: 2 # CHANGED LINE
    ...
    }
  ...
  }
}
~~~

<h4>Fine-tuning guidelines</h4>
- Learn the last layer first (earlier layer weights won't change very much in fine-tuning)
  - Caffe layers have local learning rates: `lr_mult`
  - Freeze all but the last layer for fast optimization, i.e., `lr_mult=0`
  - Stop if good enough or keep fine-tuning other layers
  - This will speed up training times
- Alternatively you could leave all learning rates as they are and increase the last two layers
  - Last layer by 10x
  - Second to last by 5x
- Reduce the learning rate
  - Drop the initial learning rate (in the solver_file.prototxt) by 10x or 100x

<h4>What happens under the hood</h4>
- Creates a new network
- Copy the previous weights to initialized network weights
- Solves the usual way

In [None]:
%%bash
# Download CaffeNet weights trained on ImageNet
/home/ubuntu/caffe-intel/scripts/download_model_binary.py \
/home/ubuntu/caffe-intel/models/bvlc_reference_caffenet

In [None]:
%%bash
# get train_val.prototxt modified for fine-tuning
cd ~/dogvscat
wget https://raw.githubusercontent.com/RodriguezAndres/CaffeTutorial/master/finetuning.prototxt -O finetuning.prototxt
diff finetuning.prototxt \
~/caffe-bvlc/models/bvlc_reference_caffenet/train_val.prototxt

In [None]:
%%bash
# get solver.prototxt modified for fine-tuning
wget https://raw.githubusercontent.com/RodriguezAndres/CaffeTutorial/master/solver.prototxt -O solver.prototxt
diff solver.prototxt \
~/caffe-bvlc/models/bvlc_reference_caffenet/solver.prototxt

In [None]:
%%bash
# get deploy.prototxt modified for fine-tuning
wget https://raw.githubusercontent.com/RodriguezAndres/CaffeTutorial/master/deploy.prototxt -O deploy.prototxt
diff deploy.prototxt \
~/caffe-bvlc/models/bvlc_reference_caffenet/deploy.prototxt

In [None]:
%%bash
# edit the solver.prototxt file and finetune network w/a larger learning rate
cd ~/dogvscat
sed -i -e 's/test_interval.*/test_interval: 1/g' solver.prototxt
sed -i -e 's/base_lr.*/base_lr: 0.003/g' solver.prototxt
sed -i -e 's/max_iter.*/max_iter: 10/g' solver.prototxt
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/ubuntu/anaconda2/lib"
/home/ubuntu/caffe-intel/build/tools/caffe train -solver solver.prototxt -weights \
/home/ubuntu/caffe-intel/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel

In [None]:
%%bash
# edit the solver.prototxt file and finetune network w/a small learning rate
cd ~/dogvscat
sed -i -e 's/test_interval.*/test_interval: 1/g' solver.prototxt
sed -i -e 's/base_lr.*/base_lr: 0.000001/g' solver.prototxt
sed -i -e 's/max_iter.*/max_iter: 15/g' solver.prototxt
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/ubuntu/anaconda2/lib"
/home/ubuntu/caffe-intel/build/tools/caffe train -solver solver.prototxt -weights \
/home/ubuntu/caffe-intel/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel

In [None]:
%%bash
# edit the solver.prototxt file and finetune network w/a good learning rate
cd ~/dogvscat
sed -i -e 's/test_interval.*/test_interval: 50/g' solver.prototxt
sed -i -e 's/base_lr.*/base_lr: 0.001/g' solver.prototxt
sed -i -e 's/max_iter.*/max_iter: 250/g' solver.prototxt
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/ubuntu/anaconda2/lib"
/home/ubuntu/caffe-intel/build/tools/caffe train -solver solver.prototxt -weights \
/home/ubuntu/caffe-intel/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel

In [None]:
# Prepare to use Python to classify images
%cd /home/ubuntu/dogvscat
import sys
sys.path.insert(0, '/home/ubuntu/caffe-intel/python')
import caffe
import numpy as np

blob = caffe.proto.caffe_pb2.BlobProto()
data = open( 'dogvscat_mean.binaryproto' , 'rb' ).read()
blob.ParseFromString(data)
arr = np.array( caffe.io.blobproto_to_array(blob) )
out = arr[0]
np.save( 'dogvscat_mean.npy' , out )

In [None]:
# load trained network
import numpy as np
import sys
sys.path.insert(0, '/home/ubuntu/caffe-intel/python')
import caffe
from IPython.display import Image

MODEL_FILE = '/home/ubuntu/dogvscat/deploy.prototxt' # architecture
PRETRAINED = '/home/ubuntu/dogvscat/dogvscat_iter_250.caffemodel' # weights

# Make sure that caffe is on the python path:
sys.path.insert(0, '/home/ubuntu/caffe-intel/python')

# Note arguments to preprocess input
#  mean subtraction switched on by giving a mean array
#  input channel swapping takes care of mapping RGB into BGR (CAFFE uses OpenCV which reads it as BGR)
#  raw scaling (max value in the images in order to scale the CNN input to [0 1])
caffe.set_mode_cpu()
net = caffe.Classifier(MODEL_FILE, PRETRAINED,
                       mean=np.load('/home/ubuntu/dogvscat/dogvscat_mean.npy').mean(1).mean(1),
                       channel_swap=(2,1,0),
                       raw_scale=255,
                       image_dims=(256, 256))

In [None]:
imfile = '/home/ubuntu/dogvscat/train/dog.10.jpg'
Image(imfile)

In [None]:
# classify an image
im = caffe.io.load_image(imfile)
prediction = net.predict([im])
if prediction[0].argmax() == 0:
    print 'cat'
else:
    print 'dog'