Convolutional neural networks are now capable of outperforming humans on some computer vision tasks, such as classifying images.

That is, given a photograph of an object, answer the question as to which of 1,000 specific objects the photograph shows.

A competition-winning model for this task is the VGG model by researchers at Oxford. What is important about this model, besides its capability of classifying objects in photographs, is that the model weights are freely available and can be loaded and used in your own models and applications.

In this notebook, you will learn:
* About the ImageNet dataset and competition and the VGG winning models.
* How to load the VGG model in Keras and summarize its structure.
* How to use the loaded VGG model to classifying objects in ad hoc photographs.

Let’s get started.

## ImageNet
[ImageNet](http://www.image-net.org/) is a research project to develop a large database of images with annotations, e.g. images and their descriptions.

The images and their annotations have been the basis for an image classification challenge called the [ImageNet Large Scale Visual Recognition Challenge](http://www.image-net.org/challenges/LSVRC/) or ILSVRC since 2010. The result is that research organizations battle it out on pre-defined datasets to see who has the best model for classifying the objects in images.

For the classification task, images must be classified into one of 1,000 different categories.

For the last few years very deep convolutional neural network models have been used to win these challenges and results on the tasks have exceeded human performance.

## The Oxford VGG Models
Researchers from the [Oxford Visual Geometry Group](http://www.robots.ox.ac.uk/~vgg/), or VGG for short, participate in the ILSVRC challenge.

In 2014, convolutional neural network models (CNN) developed by the VGG [won the image classification tasks](http://image-net.org/challenges/LSVRC/2014/results). After the competition, the participants wrote up their findings in the paper: ["Very Deep Convolutional Networks for Large-Scale Image Recognition"](https://arxiv.org/abs/1409.1556).

They also made their models and learned weights available online. This allowed other researchers and developers to use a state-of-the-art image classification model in their own work and programs.

This helped to fuel a rash of transfer learning work where pre-trained models are used with minor modification on wholly new predictive modeling tasks, harnessing the state-of-the-art feature extraction capabilities of proven models.

The VGG models are not longer state-of-the-art by only a few percentage points. Nevertheless, they are very powerful models and useful both as image classifiers and as the basis for new models that use image inputs.

In the next section, we will see how we can use the VGG model directly in Keras.

## Load the VGG Model in Keras
The VGG model can be loaded and used in the Keras deep learning library.

Keras provides an Applications interface for loading and using pre-trained models. Using this interface, we can create a VGG model using the pre-trained weights provided by the Oxford group and use it as a starting point in our own model, or use it as a model directly for classifying images.

Keras provides both the 16-layer and 19-layer version via the VGG16 and VGG19 classes. Let’s focus on the VGG16 model.

The model can be created as follows:

In [1]:
from keras.applications.vgg16 import VGG16
model = VGG16()

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5


We can use the standard Keras tools for inspecting the model structure.

For example, we can print a summary of the network layers as follows:

In [2]:
print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

The model is huge. We can also see that, by default, the model expects images as input with the size 224 x 224 pixels with 3 channels (e.g. color).

The VGG() class takes a few arguments:
* **include_top** (True): Whether or not to include the output layers for the model. We don’t need these if we are fitting the model on our own problem.
* **weights** (‘imagenet‘): What weights to load. We can specify None to not load pre-trained weights if we are interested in training the model ourself from scratch.
* **input_tensor** (None): A new input layer if we intend to fit the model on new data of a different size.
* **input_shape** (None): The size of images that the model is expected to take if we change the input layer.
* **pooling** (None): The type of pooling to use when we are training a new set of output layers.
* **classes** (1000): The number of classes (e.g. size of output vector) for the model.

Next, let’s look at using the loaded VGG model to classify ad hoc photographs.

## Develop a Simple Photo Classifier
Using the images from Google Drive provided by Limbik, I chose a few ones and loaded into the directory. For example, below is an image from Paul Rudd's video.

![paul-rudd](test-images/Paul-Rudd.png)

Since the VGG-16 model has been loaded, I now can load the image as pixel data and prepare it to be presented to the network.

Keras provides some tools to help with this step.

First, I use the *load_img()* function to load the image and resize it to the required size of 224×224 pixels.

In [3]:
from keras.preprocessing.image import load_img
# load the Paul Rudd image
paulrudd = load_img('test-images/Paul-Rudd.png', target_size=(224, 224))

Next, I can convert the pixels to a NumPy array so that I can work with it in Keras. I use the *img_to_array()* function for this.

In [4]:
from keras.preprocessing.image import img_to_array
# convert the image pixels to a numpy array
paulrudd = img_to_array(paulrudd)

The network expects one or more images as input; that means the input array will need to be 4-dimensional: samples, rows, columns, and channels.

I only have one sample (one image). I can reshape the array by calling *reshape()* and adding the extra dimension.

In [5]:
# reshape data for the model
paulrudd = paulrudd.reshape((1, paulrudd.shape[0], paulrudd.shape[1], paulrudd.shape[2]))

Next, the image pixels need to be prepared in the same way as the ImageNet training data was prepared. Keras provides a function called *preprocess_input()* to prepare new input for the network.

In [6]:
from keras.applications.vgg16 import preprocess_input
# prepare the image for the VGG model
paulrudd = preprocess_input(paulrudd)

I am now ready to make a prediction for the Paul Rudd image. I call the *predict()* function on the model in order to get a prediction of the probability of the image belonging to each of the 1000 known object types.

In [7]:
# predict the probability across all output classes
prob = model.predict(paulrudd)

Nearly there, now I need to interpret the probabilities. 

Keras provides a function to interpret the probabilities called *decode_predictions().* It can return a list of classes and their probabilities in case I would like to present the top 3 objects that may be in the photo.

In [8]:
from keras.applications.vgg16 import decode_predictions
# decode the results into a list of tuples (class, description, probability)
print('Predicted:', decode_predictions(prob, top=3)[0])

Downloading data from https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json
Predicted: [('n06359193', 'web_site', 0.91855925), ('n03782006', 'monitor', 0.027165547), ('n04404412', 'television', 0.009196592)]


Running the example, I can see that the top 3 objects in the image are classified as: 
* "website" with a 91.9% likelihood.
* "monitor" with a 2.7% likelihood.
* "television" with a 0.9% likelihood.

Okay, let's try on a couple more images. Here's one from the video with a dog named 'Tuggy':

![tuggy](test-images/Tuggy.png)

In [9]:
# load the image from file
tuggy = load_img('test-images/Tuggy.png', target_size=(224, 224))
# convert the image pixels to a numpy array
tuggy = img_to_array(tuggy)
# reshape data for the model
tuggy = tuggy.reshape((1, tuggy.shape[0], tuggy.shape[1], tuggy.shape[2]))
# prepare the image for the VGG model
tuggy = preprocess_input(tuggy)
# predict the probability across all output classes
prob = model.predict(tuggy)
# decode the results into a list of tuples (class, description, probability)
print('Predicted:', decode_predictions(prob, top=3)[0])

Predicted: [('n02093428', 'American_Staffordshire_terrier', 0.7632004), ('n02093256', 'Staffordshire_bullterrier', 0.13892087), ('n02108422', 'bull_mastiff', 0.042401902)]


Running the example, I can see that the top 3 objects in the image are classified as: 
* "American_Staffordshire_terrier" with a 76.3% likelihood.
* "Staffordshire_bullterrier" with a 13.9% likelihood.
* "bull_mastiff" with a 4.2% likelihood.

Here's another image from the video with top 5 animal stories:

![farm](test-images/farm.png)

In [10]:
# load the image from file
farm = load_img('test-images/farm.png', target_size=(224, 224))
# convert the image pixels to a numpy array
farm = img_to_array(farm)
# reshape data for the model
farm = farm.reshape((1, farm.shape[0], farm.shape[1], farm.shape[2]))
# prepare the image for the VGG model
farm = preprocess_input(farm)
# predict the probability across all output classes
prob = model.predict(farm)
# decode the results into a list of tuples (class, description, probability)
print('Predicted:', decode_predictions(prob, top=3)[0])

Predicted: [('n02782093', 'balloon', 0.96484685), ('n02692877', 'airship', 0.019180182), ('n03888257', 'parachute', 0.009387733)]


Running the example, I can see that the top 3 objects in the image are classified as: 
* "Balloon" with a 96.5% likelihood.
* "Airship" with a 1.9% likelihood.
* "Parachute" with a 0.9% likelihood.