ImageNet

Yangqing Jia edited this page Oct 21, 2013 · 1 revision
Clone this wiki locally

For a complete example of using the network, check out decaf/demos/notebooks/lena_jeffnet. A HTML version with results can be checked out by ipython nbviewer.

Load the pretrained imagenet model

The pretrained model for imagenet could be downloaded from Yangqing Jia's homepage. There are two files, the imagenet.decafnet.epoch90 file that stores the pretrained network (the most part follows the cuda-convnet format except for a few layers we implemented ourselves), and imagenet.decaf.meta that stores the imagenet meta information.

The imagenet classifier is wrapped by a decaf.scripts.imagenet.DecafNet class that does most of the dirty work for you. To instantiate it, do

from decaf.scripts.imagenet import DecafNet
net = DecafNet('/path/to/imagenet.decafnet.epoch90', '/path/to/imagenet.decafnet.meta')

You may see some warnings about dropout, that's fine - it is what we need. The load should take under 1 second, but try initializing the network once and use it for your whole program for speed considerations.

Classify an image

Assume that you have an image called img that is stored as a numpy array, has dtype uint8, and 3 channels. To perform classification and having the code do the reshaping and cropping for you, simply do

scores = net.classify(img)

This will return a numpy vector of size 1000 as the (normalized) scores for the 1000 classes. This classification call uses the four corners and the center of the image (all of size 227x227) and their mirrored versions, and the final score is averaged over the 10 images. If you would like to see the top k predictions as well as their synset names, do

print net.top_k_prediction(scores, 5)

To perform prediction using only the center part, do

scores = net.classify(img, center_only=True)

This might be useful if you would like to extract features from images, and only want one feature vector instead of 10 for each image.

Advanced: classify a pre-cropped image (You probably don't want to manually do this)

Assuming that you have a set of images called imgs, which are already pre-cropped and stored as a 4-dimensional numpy matrix of size (num x 227 x 227 x 3), with dtype float32 and is c_contiguous, with the mean subtracted, and flipped vertically (because we accidentally trained our network with upside-down images), you can do

scores = net.classify_direct(images)

Obtain intermediate features

The detailed network structure of the network could be found on the decaf demo page. You can extract features for each blob represented as a rounded rectangle in the network structure. To extract feature on an image, simply use the feature() call right :

scores = net.classify(img, center_only=True)
feature = net.feature('fc6_cudanet_out')

This will get the 4096-dimensional feature for the image. Note that if you do not specify center_only=True, the returned feature will be a 10x4096 matrix with each row corresponding to the feature of one cropped image.

For the record, here is a table for commonly used features from the network:

  • pool5_cudanet_out: the last convolutional layer output, of size 6x6x256.
  • fc6_cudanet_out: the 4096 dimensional feature after the first fully connected layer.
  • fc6_neuron_cudanet_out: similar to the above feature, but after ReLU so the negative part is cropped out.
  • fc7_cudanet_out: the 4096 dimensional feature after the second fully connected layer.
  • fc7_neuron_cudanet_out: similar to the above feature, but after ReLU so the negative part is cropped out. This is the feature that goes into the final logistic regressor.

For all other layers refer to the network graph.

Notes

The Decaf implementation uses input images of size 227x227, while the cuda-convnet code uses images of size 224x224. We did 227x227 simply to have a full convolution (if the size is 224x224, the last row/column will only have height/width 8 instead of 11). We believe that cuda-convnet chose 224 for speed consideration as that creates good performance for GPUs. The performance difference should not be big.

Since we trained our network using GPU and are running on CPU, we actually observed some performance differences between them. We are not clear yet what caused it (it might be a bug in our code, admittedly).