# Common Recipes of Caffe

In spite of its popularity, the documentation of Caffe is not quite centeralized. This notebook serves as a summary of common use cases that we have practiced with Caffe. The knowledge is not gauranteed to be accurate, and may change in the future, as writing the notebook itself is a learning experience for us.

## Q: How to install Caffe on Ubuntu (with pycaffe) ?
***A***: see [installation guide](http://caffe.berkeleyvision.org/installation.html) and our [cheatsheet](https://github.com/dolaameng/deeplearning-exploration/blob/master/installation.txt)

## Q: How is Caffe package organized ?
***A: *** If you follow the installation guide above, you will get an caffe root folder somewhere on your disk (e.g., ~/opt/caffe). There are several sub folders inside, with different functionalities.
### 1. subfolders of source code
- **src**: source code for caffe scaffold
- **include**: header files
- **tools**: source codes for main utitilies
- **scripts**: auxiliary tools such as upload/download model to gist
- **cmake**: configuration for compilation of source code
- **matlab**: source for +caffe
- **python**: source for pycaffe

### 2. subfolders of documentation
- **docs**: main documentation, including a tutorial from official website
- **examples**: live code on how to use caffe, including ipython notebooks
- **data**: scripts to download procesed data that are used across different tutorial examples

### 3. subfolders of main functionality
- **build/tool**: the main access point for caffe functionality, including the `caffe` command, and tools for common tasks such as `finetune_net`, `compute_image_mean`, `convert_imageset` and `device_query`
- **models**: the default repository for caffe zoo models

The recommended use pattern is to set CAFFE_ROOT as an env variable or constant path, and refer to the relatively fixed subfolders in your code.

-----------

## Q: What are common usages for Caffe?
- train a model with your own data from scratch
- load a model with existing structure and weights into an programming langauge (python, matlab, c++)
    - modify the structure and fine-tune existing weights
    - fine-tune weights with new data, or fine-tune selected layers
    - evaluate model with new data to extract outputs of different layers for other usage
    - evaluate model on new images to make classification
    
Generally those tasks can be done either via "commandline interface" or a programming API (e.g., python, matlab). The choice depends on the applications.

--------------

## Q: How to find and download a zoo model in Caffe?
**A**: The way people define and contribute their models are more by convention instead of by rules. Most caffe models can be found in the [model-zoo page](https://github.com/BVLC/caffe/wiki/Model-Zoo). For each contributed model, its minimum usually includes,
- a readme.md file to describe the model, data, and its usage
- to repeat the training: 
    - a `train_val.prototxt` or equivalent to define the structure/parameters of net
    - a `solver.prototxt` to define the optimization algorithm
    - original dataset for the model, with processing scripts if necessary
- to evaluate the model
    - a `*.caffemodel` containing pretrained weights
    - optionally a `deploy.prototxt` specially for classication. If it is not given, most of time it is the same with training structure defined in `train_val.prototxt`, or with minor changes 
    - optionally other extra file to interpret the output of the model, e.g., the meaning of labels. These may have been included in the original data
    
You should download those files for a model and put in a specific subfolder (named after the model) in `CAFFE_ROOT/models`. Some models may be more complicated to get, e.g., [fast-rcnn](https://github.com/rbgirshick/fast-rcnn)

-------------

## Q: What makes a Caffe model?
**A**: Different aspects of a Caffe model include,
- ***Statically***, a caffe **Net** model is a set of **Layers**. Layers can be imagined as "nodes"" in a DAG graph, and the connections between nodes are the "flow of data", which are defined as **Blobs** in Caffe. Each layer in the model has a "type", and optionally its connected "input" and "output", and "weights" parameters. 
The static model is defined as *** model prototype (.prototxt)*** file (the txt version of google protobuf). Since the structure and flow (nodes and connections) of the model depends on how it is used (e.g., training, evaluation), there might be multiple model prototype files, e.g., "train_val.prototxt" and "deploy.prototxt".
- ***Dynamically***, there are two computation flows through the model - `forward` and `backward`. Both can happen through the whole network or only across several layers. Most of time, the backward computation requires the model to cache the forward results, which is expensive for a forward-only case. So the caffe model usually needs to explicitly specify the ***force_backward*** parameter to be true in model prototype - equivallent to making the DAG graph of net *bidirectional*.
- ***Computationally***, you need a solver prototype (usually in ***solver.prototxt***) or a pretained weight file (***.caffemodel***) to put the model in use.  

---------------

## Q: How to understand a model's *train_val.prototxt* or *deploy.prototxt* files?
**A:**: Both are text version of prototypes based on `caffe.proto` definition. They are used to define the structure/parameters of a net model. The first is usually used for training phase (with a validation) and the second is for evaluation of model with new data.

The prototxt file itself can be viewed as a set of objects. An object can be a key:value pair with ":" as the delimiter, or an object message as defined in `caffe.proto`. The "," or ";" are not usually used to delimit entries in the file. The minimum scaffold of a model prototype file should include
- ***train_val.prototxt***
```yaml
name: "NameOfModel"
# start with data layer
layer {
    name: "data"
    type: "ImageData"
    ...
}
...
layer {
    name: "accuracy"
    type: "Accuracy"
    ...
}
layer {
    name: "loss"
    type: "SoftmaxWithLoss"
    ...
}
```
- ***deploy.prototxt***
``` yaml
name: "NameOfModel"
input: "data"
# input_dim DEPRECATED, use input_shape as recommended
# however you can still find them in most of existing models
input_dim: batch_size 
input_dim: nchannels
input_dim: img_width
input_dim: img_height
# there is no data layer in deploy.prototxt
...
layer {
    name: "prob"
    type: "Softmax"
    ...
}
```

The intermediate layers in both files should be defined the same. Different types of layers can be found in [Caffe layer tutorial](http://caffe.berkeleyvision.org/tutorial/layers.html) and the data layer definition can be found in [Caffe data layer tutorial](http://caffe.berkeleyvision.org/tutorial/data.html)

Note:
1. four `input_dim` in "deploy.prototxt" are deprecated, replaced by `input_shape`
2. `layers` are deprecated, replaced by `layer`

---------------

## How to specify a layer in model prototype file, e.g., train_val.prototxt and deploy.prototxt
## TODO
[Caffe layer tutorial](http://caffe.berkeleyvision.org/tutorial/layers.html) and the data layer definition can be found in [Caffe data layer tutorial](http://caffe.berkeleyvision.org/tutorial/data.html)

---------------------

## Q: What is the difference between *train_val.prototxt* and *deploy.prototxt*? Are there any other conventional prototxt for model structure?
**A:** The main difference between `train_val.prototxt` and `deploy.prototxt` include,
- `train_val.prototxt` specify model data with a ***data layer***, whereas `deploy.prototxt` specify data with ***NetParameter*** including *input*, *input_shape* (or four *input_dim* in old version)
- `train_val.prototxt` usually finishes with definition of a ***loss layer*** and optionally ***accuracy layer***, whereas `deploy.prototxt` usually finishes with a ***prob layer*** for class predictions
- `train_val.prototxt` is mainly used with a `solver.prototxt` to train a model, whereas `deploy.prototxt` is mainly used for model evaluation on new images

--------------

## Q: How to understand a model's *solver.prototxt* file?
**A: ** The main purpose of the `solver.prototxt` is to (1) train the caffe model (2) save/restore model weights in `*.caffemodel` and (3) save/restore solver status in `*.solverstate`

Common parameters in solver.prototxt and their meanings,

```yaml
## net model and data usage
net: "path/to/train_val.prototxt"
test_iter: 100 # total tested examples = test_iter x test_batch_size
test_interval: 1000 # train iters for every test to happen

## learning method
solver_type: SGD # ADAGRAD, NESTEROV
## other learning parameters will be detailed in the foot notes below
base_lr: 0.01 # initial value of learning rate
# step policy: decrease learning rate by a factor or gamma, every stepsize iterations
lr_policy: "step" # "fixed", "inv", "multistep", "stepearly" see footnotes for details
gamma: 0.1 # decreasing learning rate by 0.1 every stepsize iters
stepsize: 100000 ## training iterations
max_iter: 350000 ## max training iterations
momentum: 0.9 # fixed value for memomentum

## snapshots of learned models
snapshot: 10000 # take a snapshot of model every snapshot training iters
snapshot_prefix: "path/to/saved_model"
snapshot_after_train: true # save a snapshot after training

## solver mode
solver_mode: CPU # or GPU
```

***Footnotes***
- The `solver_type` parameter specifies the training algorithm. It has three choices, namely Stochastic Gradient Descent (SGD), Adaptive Gradient (ADAGRAD), and Nesterov’s Accelerated Gradient (NESTEROV). In practice it is uncommon, but used in [mnist example](https://github.com/BVLC/caffe/blob/master/examples/mnist/mnist_autoencoder_solver_adagrad.prototxt)
- the `SGD` solver updates weight by a linear combination of negative gradient (weighted by  learning rate $\alpha$) and previous weight date (weighted by momentum $\mu$): $$V_{t+1} = \mu V_t - \alpha \nabla L(W_t)$$ $$W_{t+1} = W_t + V_{t+1}$$
    - The best practice for SGD is to initialize the learning rate $\alpha$ to a value around 0.01, and decreasing it by a constant factor, e.g., 10 throughout training when the loss begins to reach an apprarent "plateau", repeating this several times. 
    - The momentum $\mu$ tends to make sgd both stable and fast, and a general choice is $\mu = 0.9$
    - if learning diverge (e.g. initial loss values are very big or even init), try reduce base_lr to e.g. 0.001
    - the parameter `lr_policy` controls how learning rate changes with iterations. See [stackoverflow question](http://stackoverflow.com/questions/30033096/what-is-lr-policy-in-caffe) for details. Basically, "fixed" assumes a fixed learning rate, "inv" assumes learning rate is proportional to $1/iters$, `step` use a factor of `gamma` every fixed iters and it results in a piecewise constant learning rate, `multiplestep` assumes arbitrary intervals.
- the `AdaGrad` solver ....
- the `NAG` solver ...

More details should be referred to in [caffe solver tutorial](http://caffe.berkeleyvision.org/tutorial/solver.html)

-----------------

## Q: How to take a pre-trained model and fine-tune to new tasks?
## A: [TODO](https://docs.google.com/presentation/d/1UeKXVgRvvxg9OUdh_UiC5G71UMscNPlvArsWER41PsU/edit?pli=1#slide=id.gc2fcdcce7_216_376)
## fine tune last several layers

-------------------------

## Q: Where to find caffe protobuf definition file?
**A: ** It is located in `CAFFE_ROOT/src/caffe/proto/caffe.proto`, or online version from [github](https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto)

-----------------------------------

## Q: How to parse a google prototxt file via programming api?
## Q: How to load a caffe weight file *.caffemodel
## TODO

## References
- [Caffe site](http://caffe.berkeleyvision.org/)
- [google/deepdream](https://github.com/google/deepdream)
- [Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition](http://cs231n.github.io/)
- [Code Example - Tune Model Parameters](http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb)
- [Code Example - Classifying New Images with Existing Model](http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb)
- [Code Example - Train a Model via pycaffe](http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/01-learning-lenet.ipynb)
- [Code Example - build a caffe model on iris](http://www.stackoverflow.dluat.com/questions/31385427/how-to-train-a-caffe-model)
- [Caffe notes in Chinese](http://dirlt.com/caffe.html)

### TODO
- ADA and NAG solver description
- example codes reading in official docs
- example code of iris