# Capstone Project of the Machine Learning Engineer Nanodegree

## Convolutional Neural Networks

## Project: Write an Algorithm for a Logo Detection App

---
### Why We're Here 

In this notebook, we make the first steps towards developing an algorithm that could be used as part of a mobile or web app.  At the end of this project, our code will accept any user-supplied image as input.  If a logo is detected in the image, it will provide an estimate of the brand.  

![Sample Dog Output](images/sample_dog_output.png)

In this real-world setting, we need to piece together a series of models to perform different tasks; for instance, the algorithm that detects logos in an image will be different from the CNN that infers the brand.  

### The Road Ahead

We break the notebook into separate steps.  Feel free to use the links below to navigate the notebook.

* [Step 0](#step0): Import Datasets
* [Step 1](#step1): Detect Logos
* [Step 3](#step2): Create a CNN to Classify Brands (from Scratch)
* [Step 4](#step3): Use a CNN to Classify Brands (using Transfer Learning)
* [Step 5](#step4): Create a CNN to Classify Brands (using Transfer Learning)
* [Step 6](#step5): Write the Algorithm
* [Step 7](#step6): Test the Algorithm

---
<a id='step0'></a>
## Step 0: Import Datasets

### Import Logo in the Wild Dataset

In the code cell below, we import a dataset of logo images.  We populate a few variables through the use of the `load_files` function from the scikit-learn library:
- `train_files`, `valid_files`, `test_files` - numpy arrays containing file paths to images
- `train_targets`, `valid_targets`, `test_targets` - numpy arrays containing onehot-encoded classification labels 
- `logo_names` - list of string-valued brand names for translating labels

In [1]:
from sklearn.datasets import load_files       
from keras.utils import np_utils
import numpy as np
from glob import glob

# define function to load train, test, and validation datasets
def load_jpg_dataset(path):
    data = load_files(path)
    logo_files = np.array(data['filenames'])
    indices_of_jpegs = [i for i, j in enumerate(logo_files) if '.jp' in j]
    logo_targets = np_utils.to_categorical(np.array(data['target']), max(data['target']+1))
    return logo_files[indices_of_jpegs], logo_targets[indices_of_jpegs]

# load train, test, and validation datasets
all_files, all_targets = load_jpg_dataset('LogosInTheWild-v2/data/voc_format')
#valid_files, valid_targets = load_dataset('dogImages/valid')
#test_files, test_targets = load_dataset('dogImages/test')

# load list of dog names
brand_names = [item[23:-1] for item in (glob("LogosInTheWild-v2/data/voc_format/*/"))]

print("brand names: ",brand_names)
# print statistics about the dataset
print('There are %d total brand categories.' % len(brand_names))
print('There are %d JPEG images with logos.' % len(all_files))


Using TensorFlow backend.


brand names:  ['voc_format/FedEx', 'voc_format/budweiser', 'voc_format/aspirin', 'voc_format/azeca', 'voc_format/bello digital', 'voc_format/airhawk', 'voc_format/chanel', 'voc_format/caterpillar', 'voc_format/rolex', 'voc_format/toyota', 'voc_format/athalon', 'voc_format/Samsung', 'voc_format/aquapac', 'voc_format/verizon', 'voc_format/LOreal', 'voc_format/American Express', 'voc_format/BMW', 'voc_format/boeing', 'voc_format/sony', 'voc_format/santander', 'voc_format/McDonalds', 'voc_format/panasonic', 'voc_format/nescafe', 'voc_format/hershey', 'voc_format/gucci', 'voc_format/shell', 'voc_format/porsche', 'voc_format/colgate', 'voc_format/huawei', 'voc_format/chevrolet', 'voc_format/bionade', 'voc_format/nivea', 'voc_format/bosch', 'voc_format/costco', 'voc_format/kia', 'voc_format/honda', 'voc_format/uniqlo', 'voc_format/visa', 'voc_format/ford', 'voc_format/ben sherman', 'voc_format/burger king', 'voc_format/lego', 'voc_format/pizza hut', 'voc_format/bank of america', 'voc_format/f

In [2]:
from sklearn.model_selection import train_test_split

train_files, test_files, train_targets, test_targets = train_test_split(all_files, all_targets, test_size=0.4, random_state=0)
#train_and_val_files, test_files, train_and_val_targets, test_targets = train_test_split(all_files, all_targets, test_size=0.4, random_state=0)
#train_files, val_files, train_targets, val_targets = train_test_split(train_and_val_files, train_and_val_targets, test_size=0.2, random_state=0)

from sklearn.model_selection import ShuffleSplit
n_samples = train_files.shape[0]
cv = ShuffleSplit(n_splits=5, test_size=0.3, random_state=0)

print(train_files.shape)
print(train_targets.shape)

(5656,)
(5656, 109)


---
<a id='step2'></a>
## Step 2: Detect Logos

In this section, we use a pre-trained [ResNet-50](http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006) model to detect logos in images.  Our first line of code downloads the ResNet-50 model, along with weights that have been trained on [ImageNet](http://www.image-net.org/), a very large, very popular dataset used for image classification and other vision tasks.  ImageNet contains over 10 million URLs, each linking to an image containing an object from one of [1000 categories](https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a).  Given an image, this pre-trained ResNet-50 model returns a prediction (derived from the available categories in ImageNet) for the object that is contained in the image.

In [3]:
from keras.applications.resnet50 import ResNet50

# define ResNet50 model
ResNet50_model = ResNet50(weights='imagenet')

### Pre-process the Data

When using TensorFlow as backend, Keras CNNs require a 4D array (which we'll also refer to as a 4D tensor) as input, with shape

$$
(\text{nb_samples}, \text{rows}, \text{columns}, \text{channels}),
$$

where `nb_samples` corresponds to the total number of images (or samples), and `rows`, `columns`, and `channels` correspond to the number of rows, columns, and channels for each image, respectively.  

The `path_to_tensor` function below takes a string-valued file path to a color image as input and returns a 4D tensor suitable for supplying to a Keras CNN.  The function first loads the image and resizes it to a square image that is $224 \times 224$ pixels.  Next, the image is converted to an array, which is then resized to a 4D tensor.  In this case, since we are working with color images, each image has three channels.  Likewise, since we are processing a single image (or sample), the returned tensor will always have shape

$$
(1, 224, 224, 3).
$$

The `paths_to_tensor` function takes a numpy array of string-valued image paths as input and returns a 4D tensor with shape 

$$
(\text{nb_samples}, 224, 224, 3).
$$

Here, `nb_samples` is the number of samples, or number of images, in the supplied array of image paths.  It is best to think of `nb_samples` as the number of 3D tensors (where each 3D tensor corresponds to a different image) in your dataset!

In [4]:
from keras.preprocessing import image                  
from tqdm import tqdm

def path_to_tensor(img_path):
    # loads RGB image as PIL.Image.Image type
    img = image.load_img(img_path, target_size=(224, 224))
    # convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
    x = image.img_to_array(img)
    # convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
    return np.expand_dims(x, axis=0)

def paths_to_tensor(img_paths):
    list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
    return np.vstack(list_of_tensors)

---
<a id='step3'></a>
## Step 3: Create a CNN to Classify Logos (from Scratch)

Now we want a way to predict logo brands from images.  In this step, you will create a CNN that classifies logos.  

We do not add too many trainable layers as more parameters mean longer training and we do not have a GPU to accelerate the training process.  Thankfully, Keras provides a handy estimate of the time that each epoch is likely to take.

We mention that the task of classifying small logos in images is considered exceptionally challenging.  

### Pre-process the Data

We rescale the images by dividing every pixel in every image by 255.

In [5]:
from PIL import ImageFile                            
ImageFile.LOAD_TRUNCATED_IMAGES = True                 

# pre-process the data for Keras
train_tensors = paths_to_tensor(train_files).astype('float32')/255
#val_tensors = paths_to_tensor(val_files).astype('float32')/255
test_tensors = paths_to_tensor(test_files).astype('float32')/255

100%|██████████| 5656/5656 [02:40<00:00, 32.10it/s]
100%|██████████| 3772/3772 [01:57<00:00, 32.08it/s]


### (IMPLEMENTATION) Model Architecture



In [7]:
from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.layers import Dropout, Flatten, Dense
from keras.models import Sequential

model = Sequential()

### TODO: Define your architecture.
model.add(Conv2D (kernel_size = (2,2), filters = 32, input_shape=train_tensors.shape[1:], activation='relu',data_format="channels_last"))
print(model.input_shape)
print(model.output_shape)
model.add(MaxPooling2D(pool_size=2, strides=2))
print(model.output_shape)
model.add(Conv2D (kernel_size = 2, filters = 64, activation='relu'))
print(model.output_shape)
model.add(MaxPooling2D(pool_size = 2, strides=2))
print(model.output_shape)
model.add(GlobalAveragePooling2D(data_format=None))
model.add(Dense(109, activation = 'softmax'))
 
model.summary()

(None, 224, 224, 3)
(None, 223, 223, 32)
(None, 111, 111, 32)
(None, 110, 110, 64)
(None, 55, 55, 64)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 223, 223, 32)      416       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 111, 111, 32)      0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 110, 110, 64)      8256      
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 55, 55, 64)        0         
_________________________________________________________________
global_average_pooling2d_1 ( (None, 64)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 109)               7085      
Total params: 15,757
Trainable params: 1

### Compile the Model

In [13]:
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

### (IMPLEMENTATION) Train the Model

Train your model in the code cell below.  Use model checkpointing to save the model that attains the best validation loss.

You are welcome to [augment the training data](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html), but this is not a requirement. 

In [15]:
from keras.callbacks import ModelCheckpoint  

### TODO: specify the number of epochs that you would like to use to train the model.

epochs = 20

### Do NOT modify the code below this line.

checkpointer = ModelCheckpoint(filepath='saved_keras_models/weights.best.from_scratch.hdf5', 
                               verbose=1, save_best_only=True)

model.fit(train_tensors, train_targets,  validation_split=0.3,
          #validation_data=(valid_tensors, valid_targets),
          epochs=epochs, batch_size=20, callbacks=[checkpointer], verbose=1)

Train on 3959 samples, validate on 1697 samples
Epoch 1/20

Epoch 00001: val_loss improved from inf to 4.29928, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 2/20

Epoch 00002: val_loss improved from 4.29928 to 4.24890, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 3/20

Epoch 00003: val_loss improved from 4.24890 to 4.18261, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 4/20

Epoch 00004: val_loss improved from 4.18261 to 4.15900, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 5/20

Epoch 00005: val_loss improved from 4.15900 to 4.08941, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 6/20

Epoch 00006: val_loss improved from 4.08941 to 4.05880, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 7/20

Epoch 00007: val_loss improved from 4.05880 to 4.02870, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 8/20

Epoch 

<keras.callbacks.History at 0x7fc1aed3fbe0>

In [17]:
epochs = 100

#checkpointer2 = ModelCheckpoint(filepath='saved_keras_models/weights-Copy1.best.from_scratch.hdf5', 
#verbose=1, save_best_only=True)
model.load_weights('saved_keras_models/weights.best.from_scratch.hdf5')
model.fit(train_tensors, train_targets, 
          validation_split=0.3,
          epochs=epochs, batch_size=20, callbacks=[checkpointer], verbose=1)

Train on 3959 samples, validate on 1697 samples
Epoch 1/100

Epoch 00001: val_loss improved from 3.82612 to 3.81926, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 2/100

Epoch 00002: val_loss improved from 3.81926 to 3.81326, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 3/100

Epoch 00003: val_loss did not improve from 3.81326
Epoch 4/100

Epoch 00004: val_loss improved from 3.81326 to 3.78730, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 5/100

Epoch 00005: val_loss did not improve from 3.78730
Epoch 6/100

Epoch 00006: val_loss did not improve from 3.78730
Epoch 7/100

Epoch 00007: val_loss improved from 3.78730 to 3.78148, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 8/100

Epoch 00008: val_loss improved from 3.78148 to 3.76171, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 9/100

Epoch 00009: val_loss improved from 3.76171 to 3.76036, saving model to 


Epoch 00037: val_loss improved from 3.63846 to 3.62965, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 38/100

Epoch 00038: val_loss improved from 3.62965 to 3.62283, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 39/100

Epoch 00039: val_loss did not improve from 3.62283
Epoch 40/100

Epoch 00040: val_loss improved from 3.62283 to 3.61297, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 41/100

Epoch 00041: val_loss did not improve from 3.61297
Epoch 42/100

Epoch 00042: val_loss did not improve from 3.61297
Epoch 43/100

Epoch 00043: val_loss did not improve from 3.61297
Epoch 44/100

Epoch 00044: val_loss improved from 3.61297 to 3.60475, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 45/100

Epoch 00045: val_loss improved from 3.60475 to 3.59527, saving model to saved_keras_models/weights.best.from_scratch.hdf5
Epoch 46/100

Epoch 00046: val_loss did not improve from 3.59527
Epoch 47

<keras.callbacks.History at 0x7fc1aed5d208>

### Load the Model with the Best Validation Loss

In [18]:
model.load_weights('saved_keras_models/weights.best.from_scratch.hdf5')

### Test the Model

Try out your model on the test dataset of dog images.  Ensure that your test accuracy is greater than 1%.

In [19]:
# get index of predicted logo brand for each image in test set
brand_predictions = [np.argmax(model.predict(np.expand_dims(tensor, axis=0))) for tensor in test_tensors]

# report test accuracy
test_accuracy = 100*np.sum(np.array(brand_predictions)==np.argmax(test_targets, axis=1))/len(brand_predictions)
print('Test accuracy: %.4f%%' % test_accuracy)

Test accuracy: 22.9852%


---
<a id='step4'></a>
## Step 4: Use a CNN to Classify Logos

### Model Architecture

The model uses the the pre-trained VGG-16 model as a fixed feature extractor, where the last convolutional output of VGG-16 is fed as input to our model.  We only add a global average pooling layer and a fully connected layer, where the latter contains one node for each dog category and is equipped with a softmax.

In [1]:
#VGG16_model = Sequential()
#VGG16_model.add(GlobalAveragePooling2D(input_shape=train_VGG16.shape[1:]))
#VGG16_model.add(Dense(133, activation='softmax'))

#VGG16_model.summary()
from keras.applications.vgg16 import VGG16
from keras.layers import Conv2D,MaxPooling2D, Flatten,Dense,Dropout,GlobalAveragePooling2D
VGG16_model = VGG16(include_top=True, weights='imagenet', input_shape=train_tensors.shape[1:])
print(VGG16_model.summary())

# Creating dictionary that maps layer names to the layers
layer_dict = dict([(layer.name, layer) for layer in VGG16_model.layers])

# Getting output tensor of the last VGG layer that we want to include
x = layer_dict['flatten'].output

# Stacking a new simple convolutional network on top of it    
x=GlobalAveragePooling2D(data_format=None)(x)
x = Conv2D(filters=64, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Flatten()(x)
x = GlobalAveragePooling2D(256)(x)
x = Dropout(0.5)(x)
x = Dense(109, activation='softmax')(x)

# Creating new model. Please note that this is NOT a Sequential() model.
from keras.models import Model
custom_model = Model(input=VGG16_model.input, output=x)

# Make sure that the pre-trained bottom layers are not trainable
for layer in custom_model.layers[:7]:
    layer.trainable = True

custom_model.summary()

Using TensorFlow backend.


NameError: name 'train_tensors' is not defined

### Compile the Model

In [40]:
#custom_model.load_weights('saved_models/weights.best.VGG16.hdf5')
VGG16_model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
custom_model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

### Train the Model

In [41]:
from keras.callbacks import ModelCheckpoint  
checkpointer3 = ModelCheckpoint(filepath='saved_models/weights.best.VGG16.hdf5', 
                               verbose=1, save_best_only=True)
checkpointer4 = ModelCheckpoint(filepath='saved_models/weights.best.custom.hdf5', 
                               verbose=1, save_best_only=True)
#custom_model.load_weights('saved_models/weights.best.custom.hdf5')
#VGG16_model.fit(train_tensors, train_targets, validation_split=0.2,epochs=10, batch_size=20, callbacks=[checkpointer3], verbose=1)
custom_model.fit(train_tensors, train_targets, validation_split=0.3,epochs=100, batch_size=20, callbacks=[checkpointer4], verbose=1)


Train on 3889 samples, validate on 1667 samples
Epoch 1/10

Epoch 00001: val_loss improved from inf to 16.05041, saving model to saved_models/weights.best.custom.hdf5
Epoch 2/10

KeyboardInterrupt: 

### Load the Model with the Best Validation Loss

In [24]:
custom_model.load_weights('saved_models/weights.best.VGG16.hdf5')

### Test the Model

Now, we can use the CNN to test how well it identifies breed within our test dataset of dog images.  We print the test accuracy below.

In [26]:
# get index of predicted dog breed for each image in test set
custom_model_predictions = [np.argmax(custom_model.predict(np.expand_dims(feature, axis=0))) for feature in test_tensors]

# report test accuracy
test_accuracy = 100*np.sum(np.array(custom_model_predictions)==np.argmax(test_targets, axis=1))/len(custom_model_predictions)
print('Test accuracy: %.4f%%' % test_accuracy)

KeyboardInterrupt: 

### Predict Brand with the Model