<a href="https://colab.research.google.com/github/Deep-Learning-Challenge/challenge-notebooks/blob/master/2.Convolutional%20Neural%20Networks/2.Guided%20Projects/2.Object%20Recognition%20in%20Photographs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" /></a>

# Object Recognition in Photographs

A difficult problem where traditional neural networks fall is called object recognition. It is where a model can identify objects in images. This lesson will discover how to develop and evaluate deep learning models for object recognition in Keras. After completing this step-by-step tutorial, you will know:

* About the CIFAR-10 object recognition dataset and how to load and use it in Keras.
* How to create a simple Convolutional Neural Network for object recognition.
* How to lift performance by creating deeper Convolutional Neural Networks.

Let's get started.

**Note**: You may want to speed up the computation for this tutorial by using GPU rather than CPU hardware. This is a suggestion, not a requirement. The tutorial will work just fine on the CPU.

## Photograph Object Recognition Dataset

The problem of automatically identifying objects in photographs is difficult because of the near-infinite number of permutations of objects, positions, lighting, etc. It's a tough problem. This is a well-studied problem in computer vision and, more recently, an important demonstration of deep learning capability. The Canadian Institute for Advanced Research developed a standard computer vision and deep learning dataset for this problem (CIFAR).

The [CIFAR-10](http://www.cs.toronto.edu/~kriz/cifar.html) dataset consists of 60,000 photos divided into ten classes (hence the name CIFAR-10). Classes include common objects such as airplanes, automobiles, birds, cats, and so on. The dataset is split in a standard way, where 50,000 images are used for training a model and the remaining 10,000 for evaluating its performance. The photos are in color with red, green, and blue channels but are small, measuring 32 x 32-pixel squares.

The CIFAR-10 dataset consists of 60,000 photos divided into ten classes (hence the name CIFAR-10)1. Classes include common objects such as airplanes, automobiles, birds, cats, and so on. The dataset is split in a standard way, where 50,000 images are used for training a model and the remaining 10,000 for evaluating its performance. The photos are in color with red, green, and blue channels but are small, measuring 32 x 32-pixel squares.

State-of-the-art results can be achieved using very large convolutional neural networks. You can learn about state-of-the-art results on CIFAR-10 on Rodrigo Benenson's [webpage](http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html). Model performance is reported in classification accuracy, with very good performance above 90% with human performance on the problem at 94% and state-of-the-art results at 96% at the time of writing.

## Loading The CIFAR-10 Dataset in Keras

The CIFAR-10 dataset can easily be loaded in Keras. Keras has the facility to automatically download standard datasets like CIFAR-10 and store them in the `~/.keras/datasets` directory using the `cifar10.load_data()` function. This dataset is large at 163 megabytes, so it may take a few minutes to download. Once downloaded, subsequent calls to the function will load the dataset ready for use. 

The dataset is stored as Python pickled training and test sets, ready for use in Keras. Each image is represented as a three-dimensional matrix, with dimensions for red, green, blue, width, and height. We can plot images directly using the Matplotlib Python plotting library.

In [1]:
import tensorflow as tf

# Plot ad hoc CIFAR10 instances
from tensorflow.keras.datasets import cifar10
from matplotlib import pyplot
# from scipy.misc import toimage
from PIL import Image

# load data
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

# create a grid of 3x3 images
for i in range(0, 9):
    pyplot.subplot(330 + 1 + i)
    pyplot.imshow(Image.fromarray(X_train[i]))

# show the plot
pyplot.show()

RuntimeError: In FT2Font: Can not load face (error code 0x55)

<Figure size 432x288 with 9 Axes>

Running the code, create a 3 x 3 plot of photographs. The images have been scaled up from their small 32 x 32 sizes, but you can see trucks, horses, and cars. You can also see some distortion in the images that have been forced to the square aspect ratio.

## Simple CNN for CIFAR-10

The CIFAR-10 problem is best solved using a convolutional neural network (CNN). We can quickly start by importing all of the classes and functions we will need in this example.

In [2]:
# Simple CNN model for CIFAR-10
import numpy
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.constraints import MaxNorm
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import utils

#for windows
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1000)])


As is good practice, we next initialize the random number seed with a constant to ensure the results are reproducible.

In [3]:
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

Next, we can load the CIFAR-10 dataset.

In [4]:
# load data
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

The pixel values range from 0 to 255 for each of the red, green, and blue channels. It is good practice to work with normalized data. Because the input values are well understood, we can easily normalize to the range 0 to 1 by dividing each value by the maximum observation, 255. Note, the data is loaded as integers, so we must cast it to float point values to perform the division.

In [5]:
# normalize inputs from 0-255 to 0.0-1.0
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train = X_train / 255.0
X_test = X_test / 255.0

The output variables are defined as a vector of integers from 0 to 1 for each class. We can use one-hot encoding to transform them into a binary matrix to best model the classification problem. We know there are ten classes for this problem so that we can expect the binary matrix to have a width of 10.

In [6]:
# one hot encode outputs
y_train = utils.to_categorical(y_train)
y_test = utils.to_categorical(y_test)
num_classes = y_test.shape[1]

Let's start by defining a simple CNN structure as a baseline and evaluate how well it performs on the problem. We will use a structure with two convolutional layers followed by max-pooling and a flattening out of the network to fully connected layers to make predictions. Our baseline network structure can be summarized as follows:

1. Convolutional input layer, 32 feature maps with a size of 3  3, a rectifier activation function, and a weight constraint of max norm set to 3.
2. Dropout set to 20%.
3. Convolutional layer, 32 feature maps with a size of 3 x 3, a rectifier activation function, and a weight constraint of max norm set to 3.
4. Max Pool layer with the size 2 x 2.
5. Flatten layer.
6. Fully connected layer with 512 units and a rectifier activation function.
7. Dropout set to 50%.
8. Fully connected output layer with ten units and a softmax activation function.

A logarithmic loss function is used with the stochastic gradient descent optimization algorithm configured with a large momentum and weight decay, starting with a learning rate of 0.01. A visualization of the network structure is provided below.

![Summary of the Convolutional Neural Network Structure](../../images/summary_cnn_structure.png)

In [7]:
# Create the model
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3), padding='same', activation='relu',kernel_constraint=MaxNorm(3)))
model.add(Dropout(0.2))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', kernel_constraint=MaxNorm(3)))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(512, activation='relu', kernel_constraint=MaxNorm(3)))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

# Compile model
epochs = 25
lrate = 0.01
decay = lrate/epochs
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 32, 32, 32)        896       
_________________________________________________________________
dropout (Dropout)            (None, 32, 32, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 32, 32, 32)        9248      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 16, 16, 32)        0         
_________________________________________________________________
flatten (Flatten)            (None, 8192)              0         
_________________________________________________________________
dense (Dense)                (None, 512)               4194816   
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0



We fit this model with 25 epochs and a batch size of 32. A small number of epochs was chosen to help keep this tutorial moving. Normally, the number of epochs would be one or two orders of magnitude larger for this problem. Once the model is fit, we evaluate it on the test dataset and print out the classification accuracy.

In [8]:
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=32)

# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25
Accuracy: 69.50%


The classification accuracy and loss are printed each epoch on both the training and test datasets. The model is evaluated on the test set and achieves an accuracy of 69.77%, which is good but not excellent.

We can improve the accuracy significantly by creating a much deeper network. This is what we will look at in the next section.

## Larger CNN for CIFAR-10

We have seen that a simple CNN performs poorly on this complex problem. In this section, we look at scaling up the size and complexity of our model. Let's design a deep version of the simple CNN above. We can introduce an additional round of convolutions with many more feature maps. We will use the same pattern of Convolutional, Dropout, Convolutional, and Max Pooling layers.

This pattern will be repeated three times with 32, 64, and 128 feature maps. The effect will be an increasing number of feature maps with a smaller and smaller size given the max-pooling layers. Finally, an additional and larger Dense layer will be used at the network's output end to translate better the large number feature maps to class values. We can summarize a new network architecture as follows:

1. A convolutional input layer, 32 feature maps with a size of 3 x 3, and a rectifier activation function.
2. Dropout layer at 20%.
3. Convolutional layer, 32 feature maps with a size of 3 x 3, and a rectifier activation function.
4. Max Pool layer with size 2 x 2.
5. Convolutional layer, 64 feature maps with a size of 3 x 3, and a rectifier activation function.
6. Dropout layer at 20%.
7. Convolutional layer, 64 feature maps with a size of 3 x 3, and a rectifier activation function.
8. Max Pool layer with size 2 x 2.
9. Convolutional layer, 128 feature maps with a size of 3 x 3, and a rectifier activation function.
10. Dropout layer at 20%.
11. Convolutional layer, 128 feature maps with a size of 3 x 3, and a rectifier activation function.
12. Max Pool layer with size 2 x 2.
13. Flatten layer.
14. Dropout layer at 20%.
15. Fully connected layer with 1,024 units and a rectifier activation function.
16. Dropout layer at 20%.
17. Fully connected layer with 512 units and a  rectifier activation function.
18. Dropout layer at 20%.
19. Fully connected output layer with ten units and a softmax activation function.

This is a larger network and a bit unwieldy to visualize. We can fit and evaluate this model using the same procedure above and the same number of epochs but a larger batch size of 64, found through some minor experimentation.

In [9]:
# Large CNN model for the CIFAR-10 Dataset
import numpy
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.constraints import MaxNorm
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import utils

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# load data
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

# normalize inputs from 0-255 to 0.0-1.0
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train = X_train / 255.0
X_test = X_test / 255.0

# one hot encode outputs
y_train = utils.to_categorical(y_train)
y_test = utils.to_categorical(y_test)
num_classes = y_test.shape[1]

# Create the model
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3), activation='relu', padding='same'))
model.add(Dropout(0.2))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D())
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(Dropout(0.2))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D())
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Dropout(0.2))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dropout(0.2))
model.add(Dense(1024, activation='relu', kernel_constraint=MaxNorm(3)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu', kernel_constraint=MaxNorm(3)))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))

# Compile model
epochs = 25
lrate = 0.01
decay = lrate/epochs
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
model.summary()

# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=64)

# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_2 (Conv2D)            (None, 32, 32, 32)        896       
_________________________________________________________________
dropout_2 (Dropout)          (None, 32, 32, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 32, 32, 32)        9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 16, 16, 64)        18496     
_________________________________________________________________
dropout_3 (Dropout)          (None, 16, 16, 64)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 16, 16, 64)       

In [13]:
ti5 = 120.0 + (58/60)  #minutos
print("tiempo en i5", ti5)
ti7d = 2984 #segundos
print("tiempo en i7 con docker ", ti7d/60)
ti7 = 2484
print("tiempo en i7", ti7/60)
print("ahorro con respecto a docker ", 1- (ti7 / ti7d))
print("ahorro con respecto a i5 ", 1- (ti7 / (ti5 * 60)))

tw10 = 12 + (31/60) #minutos
print("tiempo en windows 10, amd 6300 six-core + nvidia gforce ftx9600 gpu", tw10)
print("ahorro con respecto a i5 ", 1- (tw10 / (ti5)))
print("ahorro con respecto a i7 ", 1- (tw10 / (ti7 / 60)))




tiempo en i5 120.96666666666667
tiempo en i7 con docker  49.733333333333334
tiempo en i7 41.4
ahorro con respecto a docker  0.1675603217158177
ahorro con respecto a i5  0.6577569578396252
tiempo en windows 10, amd 6300 six-core + nvidia gforce ftx9600 gpu 12.516666666666667
ahorro con respecto a i5  0.8965279691375034
ahorro con respecto a i7  0.6976650563607085


Running this example prints the classification accuracy and loss on the training and test datasets each epoch. The estimate of classification accuracy for the final model is 78.28%, which is nearly 9 points better than our simpler model.

## Extensions To Improve Model Performance

We have achieved good results on this very difficult problem, but we are still a good way to achieve world-class results. Below are some ideas that you can try to extend upon the model and improve model performance.

* **Train for More Epochs**. Each model was trained for a very small number of epochs, 25. It is common to train large convolutional neural networks for hundreds or thousands of epochs. I would expect that performance gains can be achieved by significantly raising the number of training epochs.
* **Image Data Augmentation**. The objects in the image vary in their position. Another boost in model performance can likely be achieved by using some data augmentation. Methods such as standardization and random shifts and horizontal image flips may be beneficial.
* **Deeper Network Topology**. The larger network presented is deep, but larger networks could be designed for the problem. This may involve more feature maps closer to the input and perhaps less aggressive pooling. Additionally, standard convolutional network topologies that have been shown useful may be adopted and evaluated on the problem.

What accuracy can you achieve on this problem?

In [11]:
from tensorflow.keras.models import model_from_json
# serialize model to JSON
model_json = model.to_json()
with open("2.xmodel.json", "w") as json_file:
    json_file.write(model_json)

# serialize weights to HDF5
model.save_weights("2.xmodel.h5")
print("Saved model to disk")

Saved model to disk


### improve fitting 
using the saved model, and iterating more and more each time you run

In [12]:
#tensorboard
%load_ext tensorboard --logdir logs/fit

ModuleNotFoundError: No module named 'tensorboard --logdir logs/fit'

In [14]:
#checkpoint
import datetime
checkpoint = tf.keras.callbacks.ModelCheckpoint("2.xmodel_b.h5", monitor='val_accuracy', save_best_only=True, mode='max')
logDir = 'logs/fit/' + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard = tf.keras.callbacks.TensorBoard(log_dir=logDir,histogram_freq=1)
callbacks_list = [checkpoint,tensorboard]

In [15]:
#load model
json_file = open('2.xmodel.json','r')
loaded_model_json = json_file.read()
json_file.close()
loadedModel = tf.keras.models.model_from_json(loaded_model_json)

#load best weights
loadedModel.load_weights('2.xmodel.h5')

In [16]:
# Compile model
epochs = 250
lrate = 0.01
decay = lrate/epochs
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
loadedModel.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
loadedModel.summary()

# Fit the model
loadedModel.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=64, callbacks=callbacks_list)

# Final evaluation of the model
scores = loadedModel.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_2 (Conv2D)            (None, 32, 32, 32)        896       
_________________________________________________________________
dropout_2 (Dropout)          (None, 32, 32, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 32, 32, 32)        9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 16, 16, 64)        18496     
_________________________________________________________________
dropout_3 (Dropout)          (None, 16, 16, 64)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 16, 16, 64)       

## Summary

In this lesson, you discovered how to create deep learning models in Keras for object recognition in photographs. After working through this tutorial, you learned:

* About the CIFAR-10 dataset and how to load it in Keras and plot ad hoc examples from the dataset.
* How to train and evaluate a simple Convolutional Neural Network on the problem.
* How to expand a simple convolutional neural network into a deep convolutional neural network to boost performance on the difficult problem.
* How to use data augmentation to get a further boost on the difficult object recognition problem.