This notebook is a complete solution for classifiying images (Logos in this case) by using Transfer Learn from a Keras pre-trained network ( currently InceptionV3). 

**About Data:** Data was collected by a group of students at **Atılım University** from food courts of several shooping mall at **Ankara, Turkey.** **5 Brand logos** were recorded and then extracted still images from these videos. Furthermore, there is a class  called "**None**" for images without any logos. Data is already splitted into two main directories as **Train and Test **for the user convenience. 


**About Notebook**: We need to develope a image classification model for helping blind people to be aware of the logos around themselves. Thus, we implement the solution by creating this notebook. During development, we noticed that there are many resources for Transfer Learning and Image Classification however, unfortunately, most of them are not either complete or lack of explanation. Thus, we decided to share this notebbok for anyone who would need similar requirements as we do.

**Why study this notebook**: By studying this notebook you would be practicing on:
* How to import **pre-trained** network model and its weights
* How to define and use **Image Data Generators** for train, validation & test data
* How to **agument** training data
*  How to **plot images** in Image Data Generators
*  How to **create your mode**l by **Transfer Learning** from InceptionV3 (or any Keras pre-trained model)
*  How to set-up and use **Callbacks** such as **Checkpoints** and **EarlyStopping** 
* How to **compile** and **train** the model  by setting up **fit_generator**
* How to calculate **steps_per_epoch** and **validation_steps** for fit_generator 
* How to monitor **Training History** using history object
* How to **save** your model, best and last weights
* How to **upload** your model, best and last weights from saved files
* How to **evaulate** the model by using evaluate_generator
* How to use your trained model to **predict** the classes of the test images by using **predict_generator**
* How to **decode** the labels of the predicted classes
* How to **save** the **predictions and actual labels** into a CSV file
* How to **measeure the success** of your model by using **Accuracy**, **Precision**, and **Recall**
* How to generate **classification_report**
* How to prepare the **confusion matrix** to monitor the classification success
* How to **plot** some sample predictions along with the corresponding true labels

**In the end of this notebook**: We hope that you will have necessary skills to
* **Understand** the Transfer Learning
* **Apply** Image Detection
* **Evaulate** the success of the classification
* **Report** the results

You can leave comments or suggection to improve this tutorial.
Thanks in advanced!
KMK


In [None]:
# Standard data science libraries
import psutil
import humanize
import os
from IPython.display import display_html

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))
dataDirectory= "../input/logos-bk-kfc-mcdonald-starbucks-subway-none/logos_v3_mini/logos3" 
print(os.listdir(dataDirectory))

# Any results you write to the current directory are saved as output.        

In [None]:
print(os.listdir("../input"))

To  transfer learning, we first need to download pre-trained model and its weights. To do so:
* First from the right tool bar click "**+Add Data**" button.
* Then search for "**Keras Pretrained Models**" and and add it to your notebook
* Last run the below code to copy the necessary files into "**Keras**" directory where the notebook will use.

In [None]:
!rm -r ~/.keras
!mkdir ~/.keras
!mkdir ~/.keras/models
# not enough space for both
#!cp ../input/keras-pretrained-models/* ~/.keras/models/ 
#!cp ../input/vgg19/* ~/.keras/models
!cp ../input/keras-pretrained-models/*notop* ~/.keras/models/
!cp ../input/keras-pretrained-models/imagenet_class_index.json ~/.keras/models/
#!cp ../input/keras-pretrained-models/resnet50* ~/.keras/models/


# Dependicies

In [None]:
import numpy as np
import keras
from keras import backend as K
from keras.models import Sequential
from keras.models import Model
from keras.layers import Activation
from keras.layers.core import Dense, Flatten
from keras.optimizers import Adam
from keras.metrics import categorical_crossentropy
from keras.preprocessing.image import ImageDataGenerator
from keras.layers.normalization import BatchNormalization
from keras.layers.core import Dropout
from keras.layers.convolutional import *
from keras.callbacks import ModelCheckpoint
from keras.applications.inception_v3 import InceptionV3
from keras.applications.inception_v3 import preprocess_input
from keras.applications.inception_v3 import decode_predictions
from sklearn.metrics import confusion_matrix
from sklearn.metrics import average_precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from keras.models import model_from_json
import itertools
import matplotlib.pyplot as plt
import time
import pandas as pd
%matplotlib inline

# Paths to data

In [None]:
train_path = dataDirectory+'/train'
test_path  = dataDirectory+'/test'
print(os.listdir(train_path))
print(os.listdir(test_path))

# Define Image Data Generators for train, validation & test data
tf.keras.preprocessing.image.ImageDataGenerator:

https://www.tensorflow.org/versions/r1.6/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator

https://keras.io/preprocessing/image/

Generate minibatches of image data with real-time data augmentation.
The data will be looped over (in batches).

## Arguments:

### validation_split: 
Float. Fraction of images reserved for validation (strictly between 0 and 1).

### featurewise_center: 
set input mean to 0 over the dataset.

### samplewise_center: 
set each sample mean to 0.

### featurewise_std_normalization: 
divide inputs by std of the dataset.

### samplewise_std_normalization: 
divide each input by its std.

### zca_whitening: 
apply ZCA whitening.

### zca_epsilon: 
epsilon for ZCA whitening. Default is 1e-6.

### rotation_range: 
degrees (0 to 180).

### width_shift_range: 
fraction of total width, if < 1, or pixels if >= 1.

### height_shift_range: 
fraction of total height, if < 1, or pixels if >= 1.

### shear_range: 
shear intensity (shear angle in degrees).

### zoom_range: 
amount of zoom. if scalar z, zoom will be randomly picked in the range [1-z, 1+z]. A sequence of two can be passed instead to select this range.

etc...



In [None]:
train_datagen = ImageDataGenerator(
        rotation_range=40,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.2,
        zoom_range=0.2,
        fill_mode='nearest',
    validation_split=0.2) # set validation split




# Data generators with flow from directory

flow_from_directory

https://keras.io/preprocessing/image/

Takes the path to a directory & generates batches of augmented data.

## Arguments

### directory: 
Path to the target directory. It should contain *** one subdirectory per class ***. Any PNG, JPG, BMP, PPM or TIF images inside each of the subdirectories directory tree will be included in the generator. See this script for more details.

### target_size: 
Tuple of integers (height, width), default: (256, 256). The dimensions to which all images found will be resized.

### color_mode: 
One of "grayscale", "rbg", "rgba". Default: "rgb". Whether the images will be converted to have 1, 3, or 4 channels.

### classes: 
Optional list of class subdirectories (e.g. ['dogs', 'cats']). Default: None. ***If not provided, the list of classes will be automatically inferred from the subdirectory names/structure under directory, where each subdirectory will be treated as a different class*** (and the order of the classes, which will map to the label indices, will be alphanumeric). The dictionary containing the mapping from class names to class indices can be obtained via the attribute class_indices.

### class_mode: 
One of "categorical", "binary", "sparse", "input", or None. Default: "categorical". Determines the type of label arrays that are returned:
"categorical" will be 2D one-hot encoded labels,
"binary" will be 1D binary labels, "sparse" will be 1D integer labels,
"input" will be images identical to input images (mainly used to work with autoencoders).
If None, no labels are returned (the generator will only yield batches of image data, which is useful to use with model.predict_generator(),  model.evaluate_generator(), etc.). Please note that in case of class_mode None, the data still needs to reside in a subdirectory of directory for it to work correctly.

### batch_size: 
Size of the batches of data (default: 32).

### shuffle: 
Whether to shuffle the data (default: True)

### seed: 
Optional random seed for shuffling and transformations.

### save_to_dir: 
None or str (default: None). This allows you to optionally specify a directory to which to save the augmented pictures being generated (useful for visualizing what you are doing).

### save_prefix: 
Str. Prefix to use for filenames of saved pictures (only relevant if save_to_dir is set).

### save_format: 
One of "png", "jpeg" (only relevant if save_to_dir is set). Default: "png".

### follow_links: 
Whether to follow symlinks inside class subdirectories (default: False).

### subset: 
Subset of data ("training" or "validation") if *** validation_split *** is set in ImageDataGenerator.

### interpolation: 
Interpolation method used to resample the image if the target size is different from that of the loaded image. Supported methods are "nearest", "bilinear", and "bicubic". If PIL version 1.1.3 or newer is installed, "lanczos" is also supported. If PIL version 3.4.0 or newer is installed,  "box" and "hamming" are also supported. By default, "nearest" is used.

### Returns

A DirectoryIterator yielding tuples of (x, y) where x is a numpy array containing a batch of images with shape (batch_size, *target_size, channels) and y is a numpy array of corresponding labels.


## Load data from directory

### Select classes by name

In [None]:
#['Burger King','HD Iskender','Kahve Dunyasi', 'KFC','McDonalds','Other', 'Ozsut','Popeyes',  'Starbucks', 'Subway', 'Tavuk Dunyasi'] 
selectedClasses = ['Burger King', 'KFC','McDonalds','Other', 'Starbucks', 'Subway'] 

### Load images from the selected directories

In [None]:
batchSize=32


train_generator = train_datagen.flow_from_directory(
    train_path,
    target_size=(224, 224),
    batch_size=batchSize,
    classes=selectedClasses,
    subset='training') # set as training data

validation_generator = train_datagen.flow_from_directory(
    train_path, # same directory as training data
    target_size=(224, 224),
    batch_size=batchSize,
    classes=selectedClasses,
    subset='validation') # set as validation data

test_generator = ImageDataGenerator().flow_from_directory(
    test_path, 
    target_size=(224,224), 
    classes=selectedClasses,
    shuffle= False,
    batch_size = batchSize)# set as test data

### Number of samples of each class in all data generators

In [None]:
print ("In train_generator ")
for cls in range(len (train_generator.class_indices)):
    print(selectedClasses[cls],":\t",list(train_generator.classes).count(cls))
print ("") 

print ("In validation_generator ")
for cls in range(len (validation_generator.class_indices)):
    print(selectedClasses[cls],":\t",list(validation_generator.classes).count(cls))
print ("") 

print ("In test_generator ")
for cls in range(len (test_generator.class_indices)):
    print(selectedClasses[cls],":\t",list(test_generator.classes).count(cls))



# Auxilary Functions for ploting images

In [None]:
#plots images with labels within jupyter notebook
def plots(ims, figsize = (22,22), rows=4, interp=False, titles=None, maxNum = 9):
    if type(ims[0] is np.ndarray):
        ims = np.array(ims).astype(np.uint8)
        if(ims.shape[-1] != 3):
            ims = ims.transpose((0,2,3,1))
           
    f = plt.figure(figsize=figsize)
    #cols = len(ims) //rows if len(ims) % 2 == 0 else len(ims)//rows + 1
    cols = maxNum // rows if maxNum % 2 == 0 else maxNum//rows + 1
    #for i in range(len(ims)):
    for i in range(maxNum):
        sp = f.add_subplot(rows, cols, i+1)
        sp.axis('Off')
        if titles is not None:
            sp.set_title(titles[i], fontsize=20)
        plt.imshow(ims[i], interpolation = None if interp else 'none')   

# Plot some train data

In [None]:
train_generator.reset()
imgs, labels = train_generator.next()

#print(labels)

labelNames=[]
labelIndices=[np.where(r==1)[0][0] for r in labels]
#print(labelIndices)

for ind in labelIndices:
    for labelName,labelIndex in train_generator.class_indices.items():
        if labelIndex == ind:
            #print (labelName)
            labelNames.append(labelName)

#labels

In [None]:
plots(imgs, rows=4, titles = labelNames, maxNum=8)

# Create model by Transfer Learning from InceptionV3

In [None]:
#InceptionV3

base_model = InceptionV3(weights='imagenet', 
                                include_top=False, 
                                input_shape=(224, 224,3))
base_model.trainable = False

x = base_model.output
x = keras.layers.GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dropout(0.5)(x)
# and a sofymax/logistic layer -- we have 6 classes
predictions = Dense(len(selectedClasses), activation='softmax')(x)

# this is the model we will train
model = Model(input=base_model.input, output=predictions)


model.summary()



# Usage of callbacks

https://keras.io/callbacks/


A callback is a set of functions to be applied at given stages of the training procedure. You can use callbacks to get a view on internal states and statistics of the model during training. You can pass a list of callbacks (as the keyword argument callbacks) to the .fit() method of the Sequential or Model classes. The relevant methods of the callbacks will then be called at each stage of the training.


## History

keras.callbacks.History()

Callback that records events into a History object.

This callback is automatically applied to every Keras model. 
***The History object gets returned by the fit method of models.***


## ModelCheckpoint

***keras.callbacks.ModelCheckpoint(filepath, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', period=1)***


Save the model after every epoch.


filepath can contain named formatting options, which will be filled the value of epoch and keys in logs (passed in on_epoch_end).


For example: if filepath is weights.{epoch:02d}-{val_loss:.2f}.hdf5, then the model checkpoints will be saved with the epoch number and the validation loss in the filename.

### Arguments

***filepath:*** string, path to save the model file. 

*** monitor:*** quantity to monitor. 

***verbose:**** verbosity mode, 0 or 1. 

***save_best_only:*** if save_best_only=True, the latest best model according to the quantity monitored will not be overwritten. 


***mode:**** one of {auto, min, max}. 

***If save_best_only=True,**** the decision to overwrite the current save file is made based on either the maximization or the minimization of the monitored quantity. For val_acc, this should be max, for val_loss this should be min, etc. 

***In auto mode,*** the direction is automatically inferred from the name of the monitored quantity. 

***save_weights_only:*** if True, then only the model's weights will be saved (model.save_weights(filepath)), else the full model is saved (model.save(filepath)). 

***period:*** Interval (number of epochs) between checkpoints.

### Example:

***Atutomatic rename with epoch number and val accuracy:***

filepath="checkpoints/weights-improvement-epeoch-{epoch:02d}-val_acc-{val_acc:.2f}.hdf5"

checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')

callbacks_list = [checkpoint]


## EarlyStopping

***keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=0, verbose=0, mode='auto', baseline=None, restore_best_weights=False)***

Stop training when a monitored quantity has stopped improving.

### Arguments

***monitor:*** quantity to be monitored. 

***min_delta:*** minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement. 

***patience:*** number of epochs with no improvement after which training will be stopped. 

***verbose:*** verbosity mode. 

***mode:*** one of {auto, min, max}. **In min mode,** training will stop when the quantity monitored has stopped decreasing; **in max mode** it will stop when the quantity monitored has stopped increasing; **in auto mode,** the direction is automatically inferred from the name of the monitored quantity. 

***baseline:*** Baseline value for the monitored quantity to reach. Training will stop if the model doesn't show improvement over the baseline. 

***restore_best_weights:*** whether to restore model weights from the epoch with the best value of the monitored quantity. **If False,** the model weights obtained at the last step of training are used.








In [None]:
#Atutomatic rename with epoch number and val accuracy:
#filepath="checkpoints/weights-improvement-epeoch-{epoch:02d}-val_acc-{val_acc:.2f}.hdf5"


 
modelName= "InceptionTutorial"
#save the best weights over the same file with the model name

#filepath="checkpoints/"+modelName+"_bestweights.hdf5"
filepath=modelName+"_bestweights.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]


# Compile the model 

In [None]:
model.compile(Adam(lr=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model

## Set up fit_generator
https://keras.io/models/model/

fit_generator(generator, steps_per_epoch=None, epochs=1, verbose=1, callbacks=None, validation_data=None, validation_steps=None, class_weight=None, max_queue_size=10, workers=1, use_multiprocessing=False, shuffle=True, initial_epoch=0)

Trains the model on data generated batch-by-batch by a Python generator (or an instance of Sequence).

The generator is run in parallel to the model, for efficiency. For instance, this allows you to do real-time data augmentation on images on CPU in parallel to training your model on GPU.

The use of keras.utils.Sequence guarantees the ordering and guarantees the single use of every input per epoch when using use_multiprocessing=True.

### Arguments

#### generator: 
A generator or an instance of Sequence (keras.utils.Sequence) object in order to avoid duplicate data when using multiprocessing. The output of the generator must be either
* a tuple (inputs, targets)
* a tuple (inputs, targets, sample_weights).
This tuple (a single output of the generator) makes a single batch. Therefore, all arrays in this tuple must have the same length (equal to the size of this batch). Different batches may have different sizes. For example, the last batch of the epoch is commonly smaller than the others, if the size of the dataset is not divisible by the batch size. The generator is expected to loop over its data indefinitely. An epoch finishes when steps_per_epoch batches have been seen by the model.

#### steps_per_epoch: 
Integer. Total number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch. ***It should typically be equal to the number of samples of your dataset divided by the batch size***. Optional for Sequence: if unspecified, will use the len(generator) as a number of steps.

#### epochs: 
Integer. Number of epochs to train the model. An epoch is an iteration over the entire data provided, as defined by steps_per_epoch. Note that in conjunction with initial_epoch, epochs is to be understood as "final epoch". The model is not trained for a number of iterations given by epochs, but merely until the epoch of index epochs is reached.

#### verbose: 
Integer. 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch.

#### callbacks: 
List of keras.callbacks.Callback instances. List of callbacks to apply during training. See callbacks.
validation_data: This can be either

* a generator or a Sequence object for the validation data
* tuple (x_val, y_val)
* tuple (x_val, y_val, val_sample_weights)
on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data.

#### validation_steps: 
Only relevant if validation_data is a generator. Total number of steps (batches of samples) to yield from validation_data generator before stopping at the end of every epoch. ***It should typically be equal to the number of samples of your validation dataset divided by the batch size.*** Optional for Sequence: if unspecified, will use the len(validation_data) as a number of steps.

#### class_weight: 
Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.

#### max_queue_size: 
Integer. Maximum size for the generator queue. If unspecified, max_queue_size will default to 10.

#### workers: 
Integer. Maximum number of processes to spin up when using process-based threading. If unspecified, workers will default to 1. If 0, will execute the generator on the main thread.

#### use_multiprocessing: 
Boolean. If True, use process-based threading. If unspecified, use_multiprocessing will default to False. Note that because this implementation relies on multiprocessing, you should not pass non-picklable arguments to the generator as they can't be passed easily to children processes.

#### shuffle: 
Boolean. Whether to shuffle the order of the batches at the beginning of each epoch. Only used with instances of Sequence (keras.utils.Sequence). Has no effect when steps_per_epoch is not None.
initial_epoch: Integer. Epoch at which to start training (useful for resuming a previous training run).

### Returns

#### A History object. 
Its History.history attribute is a record of training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values (if applicable).


### Calculate steps_per_epoch and validation_steps For fit_generator 
Integer. Total number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch. ***It should typically be equal to the number of samples of your dataset divided by the batch size.*** Optional for Sequence: if unspecified, will use the len(generator) as a number of steps.


In [None]:
stepsPerEpoch= (train_generator.samples+ (batchSize-1)) // batchSize
print("stepsPerEpoch: ", stepsPerEpoch)

validationSteps=(validation_generator.samples+ (batchSize-1)) // batchSize
print("validationSteps: ", validationSteps)


#validationSteps=(test_generator.samples+ (batchSize-1)) // batchSize
#print("validationSteps: ", validationSteps)




## Train
Run more epochs for increasing the accuracy. For example:

**epochs = 30**

In [None]:
train_generator.reset()
validation_generator.reset()

# Fit the model
history = model.fit_generator(
    train_generator, 
    validation_data = validation_generator,
    epochs = 3,
    steps_per_epoch = stepsPerEpoch,
    validation_steps= validationSteps,
    callbacks=callbacks_list,
    verbose=1)




## Show Training History
We can plot the accuracy and loss values for each epoch using the history object as follows.

In [None]:
# list all data in history
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'Validation'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

# Save the model and last weights

In [None]:
timestr = time.strftime("%Y%m%d_%H%M%S")

# serialize model to JSON
model_json = model.to_json()
with open(timestr+"_"+modelName+"_MODEL_3"+".json", "w") as json_file:
    json_file.write(model_json)
# serialize weights to HDF5
model.save_weights(timestr+"_"+modelName+"_3_LAST_WEIGHTS_"+".h5")


# Upload the model and best weights

In [None]:

# load json and create model
json_file = open('20190107_220958_InceptionTutorial_MODEL_3.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
model = model_from_json(loaded_model_json)



In [None]:
# load weights into new model
model.load_weights("InceptionTutorial_bestweights.hdf5")

# Evaulate the model

***evaluate_generator(generator, steps=None, max_queue_size=10, workers=1, use_multiprocessing=False, verbose=0)***

Evaluates the model on a data generator.

The generator should return the same kind of data as accepted by test_on_batch.

## Arguments

***generator:*** Generator yielding tuples (inputs, targets) or (inputs, targets, sample_weights) or an instance of Sequence (keras.utils.Sequence) object in order to avoid duplicate data when using multiprocessing.

***steps:*** Total number of steps (batches of samples) to yield from generator before stopping. Optional for Sequence: if unspecified, will use the len(generator) as a number of steps.

***max_queue_size:*** maximum size for the generator queue

***workers:*** Integer. Maximum number of processes to spin up when using process based threading. If unspecified, workers will default to 1. If 0, will execute the generator on the main thread.

***use_multiprocessing:*** if True, use process based threading. Note that because this implementation relies on multiprocessing, you should not pass non picklable arguments to the generator as they can't be passed easily to children processes.

***verbose:*** verbosity mode, 0 or 1.

***Returns***
Scalar test loss (if the model has a single output and no metrics) or list of scalars (if the model has multiple outputs and/or metrics). The attribute model.metrics_names will give you the display labels for the scalar outputs.

In [None]:
validation_generator.reset()
score = model.evaluate_generator(validation_generator, (validation_generator.samples + (batchSize-1)) //batchSize)
print("For validation data set; Loss: ",score[0]," Accuracy: ", score[1])

In [None]:
test_generator.reset()
score = model.evaluate_generator(test_generator, (test_generator.samples + (batchSize-1)) // batchSize)
print("For test data set; Loss: ",score[0]," Accuracy: ", score[1])

# Predict Generator
https://keras.io/models/sequential/

predict_generator(generator, steps=None, max_queue_size=10, workers=1, use_multiprocessing=False, verbose=0)

Generates predictions for the input samples from a data generator.

The generator should return the same kind of data as accepted by predict_on_batch.

## Arguments

### generator: 
Generator yielding batches of input samples or an instance of Sequence (keras.utils.Sequence) object in order to avoid duplicate data when using multiprocessing.

### steps: 
Total number of steps (batches of samples) to yield from generator before stopping. Optional for Sequence: if unspecified, will use the len(generator) as a number of steps.

### max_queue_size: 
Maximum size for the generator queue.

### workers: 
Integer. Maximum number of processes to spin up when using process based threading. If unspecified, workers will default to 1. If 0, will execute the generator on the main thread.

### use_multiprocessing: 
If True, use process based threading. Note that because this implementation relies on multiprocessing, you should not pass non picklable arguments to the generator as they can't be passed easily to children processes.

### verbose: 
verbosity mode, 0 or 1.

## Returns

Numpy array(s) of predictions.

# Make Predictions

You need to **reset** the test_generator before whenever you call the predict_generator. This is important, if you forget to reset the test_generator you will get outputs in a weird order.

In [None]:
test_generator.reset()
testStep = (test_generator.samples + (batchSize-1)) // batchSize
print("testStep: ", testStep)
predictions = model.predict_generator(test_generator, steps = testStep ,  verbose = 1)
len(predictions)

In [None]:
len(predictions)


## Decode Labels

Now ***predictions*** has the probabilities for 6 classes for each test case!

We can find the class with the highest probability as the prediction label as follows:


In [None]:
predicted_class_indices=np.argmax(predictions,axis=1)
print(predicted_class_indices)
len(predicted_class_indices)

***predicted_class_indices** has the predicted labels, but you can’t simply tell what the predictions are, because all you can see is numbers like 0,1,4,1,0,6…
and most importantly you need to map the predicted labels with their unique ids such as filenames to find out what you predicted for which image.

In [None]:
labels = (test_generator.class_indices)
print(labels)

In [None]:
labels = dict((v,k) for k,v in labels.items())
print(labels)

In [None]:
predictedLables= [labels[k] for k in predicted_class_indices]
print(predictedLables)
len(predictedLables)

***predictedLabels*** have the labels predicted by the model. We need to locate ***the actual labels*** for the same test data as follows:

In [None]:
actualLables= [labels[k] for k in test_generator.classes]
print(actualLables)
len(actualLables)

# Evaulate the results

Below, we will see several methods for evaluating a classifier.

More on http://www.cse.chalmers.se/~richajo/dit865/files/Classification%20evaluation%20examples.html

## Accuracy
The most classical evaluation metric for classifiers is the accuracy, which corresponds to the proportion of correctly classified instances. 

In [None]:
accuracy_score(actualLables, predictedLables)

## Evaluation metrics based on a confusion matrix

A confusion matrix is such that the cell at row  i  and column  j  is equal to the number of observations known to be in group  i  but predicted to be in group  j .



In [None]:
matrix = confusion_matrix(actualLables, predictedLables)
print(labels)
matrix

## The precision and recall metrics
Several metrics can be derived from a confusion matrix. (See the Wikipedia article.) In particular, they tend to be based on the special case of a confusion matrix, where we assign one class to be the "positive" class that is important to us. This is sometimes called a table of confusion. In such a table, we speak of true positives, false positives, false negatives, and true negatives.

The precision and recall metrics are probably the most common metrics derived from such a table.

P  = TP / (TP + FP)

R  = TP / (TP + FN)

For example, What's the precision and recall of 'Burger King' in this case?

The utility function ***classification_report*** prints the precision and recall values for all the categories. (The  F1  score combines the precision and recall values into a single value.)

In [None]:
print(classification_report(actualLables, predictedLables))

In [None]:
recall_score( actualLables, predictedLables,average='weighted') 

In [None]:
precision_score( actualLables, predictedLables,average='weighted') 

## Plot the confusion matrix

In [None]:
#Prepared code that is taken from SKLearn Website, Creates Confusion Matrix
def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)

    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

In [None]:
cm_plot_labels = selectedClasses
plot_confusion_matrix(matrix,cm_plot_labels, normalize=False
                      , title = 'Confusion Matrix')

# Save Predictions
Finally, save the results to a CSV file.

In [None]:
filenames=test_generator.filenames
directory= test_generator.directory
results=pd.DataFrame({"Directory":directory,
                      "Filename":filenames,
                      "Predictions":predictedLables,
                     "Actuals": actualLables })
results.to_csv("results.csv",index=False)

# Show some sample predictions with corresponding true labels


In [None]:
#import glob
#import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline

res = results[260:280]

images = []
#for img_path in glob.glob('images/*.jpg'):
for img_path in "./"+res['Directory']+"/"+res['Filename']:
    images.append(mpimg.imread(img_path))

plt.figure(figsize=(80,80))
columns = 4
for i, image in enumerate(images):
    ax= plt.subplot(len(images) / columns + 1, columns, i + 1)
    ax.set_title(res['Actuals'].iloc[i]+" "+res['Predictions'].iloc[i], fontsize=40)
    plt.imshow(image)
    
