# Introduction

<br></br>
Take me to the [code and Jupyter Notebook](https://github.com/AMoazeni/Machine-Learning-Image-Recognition/blob/master/Jupyter%20Notebook/ML%20-%20Image%20Recognition.ipynb) for Image Recognition!

<br></br>
This article explores a Machine Learning algorithm called Convolution Neural Network (CNN), it's a common Deep Learning technique used for image recognition and classification.

<br></br>
<div align="center">
<img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Data/single_prediction/cat_or_dog_1.jpg" width=20% alt="Dog">

<img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Data/single_prediction/cat_or_dog_2.jpg" width=20% alt="Cat">
</div>




<br></br>
You are provided with a dataset consisting of 5,000 Cat images and 5,000 Dog images. We are going to train a Machine Learning model to learn differences between the two categories. The model will predict if a new unseen image is a Cat or Dog. The code architecture is robust and can be used to recognize any number of image categories, if provided with enough data.



<br></br>

# Convolution Neural Networks (CNN)

<br></br>
Convolution Neural Networks are good for pattern recognition and feature detection which is especially useful in image classification. Improve the performance of Convolution Neural Networks through hyper-parameter tuning, adding more convolution layers, adding more fully connected layers, or providing more correctly labeled data to the algorithm.


<br></br>
Create a Convolution Neural Network (CNN) with the following steps:

1. Convolution
2. Max Pooling
3. Flattening
4. Full Connection


<br></br>
Check out [How to implement a neural network](http://peterroelants.github.io/posts/neural_network_implementation_intermezzo02/), also take a look at [A Friendly Introduction to Cross-Entropy Loss](http://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/).


<br></br>
Convolution is a function derived from two other functions through an integration that expresses how the shape of one is modified by the other.


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/01%20-%20Convolution%20Equation.png" alt="Convolution-Equation"></div>


<br></br>
For image recognition, we convolve the input image with Feature Detectors (also known as Kernel or Filter) to generate a Feature Map (also known as Convolved Map or Activation Map). This reveals and preserves patterns in the image, and also compresses the image for easier processing. Feature Maps are generated by element-wise multiplication and addition of corresponding images with Filters consisting of multiple Feature Detectors. This allows the creation of multiple Feature Maps.


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/02%20-%20CNN%20Example.png" width="500" alt="CNN-Example"></div>


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/03%20-%20CNN%20Feature%20Map.png" width="500" alt="CNN-Feature"></div>


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/04%20-%20CNN%20Multi%20Feature%20Map.png" width="500" alt="Feature-Map"></div>


<br></br>
This [Image Convolution Guide](https://docs.gimp.org/en/plug-in-convmatrix.html) allows you to play with various filters applied to an image. Edge Detect is a useful filters in Machine Learning. The algorithm creates filters that are not recognizable to humans, perhaps we learn with similar techniques in our subconscious. Feature Maps preserve spatial relationships between pixels throughout processing.


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/05%20-%20Edge%20Detect%20Filter.png" width="500" alt="Edge-Detect"></div>



<br></br>

# Rectified Linear Units (ReLU)

<br></br>
Rectifier Functions are applied to Convolution Neural Networks to increases non-linearity (breaks up linearity). This is an important step for image recognition with CNNs. Images are usually non-linear due to sharp transition of pixels, different colors, etc. ReLU functions help amplify the non-linearity of images so the ML model has an easier time finding patterns. 

<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/06%20-%20ReLU%20Layer.png" alt="ReLU"></div>


<br></br>
### Before ReLU

<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/07%20-%20Before%20ReLU.png" alt="Before-ReLU"></div>


<br></br>
###  After ReLU

<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/08%20-%20After%20ReLU.png" alt="After-ReLU"></div>


<br></br>
In the above example, the ReLU operation removed the Black Pixels so there's less White to Gray to Black transitions. Borders now have more abrupt Pixel changes. Microsoft argues that the using their Modified Rectifier Function works better for CNNs.


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/09%20-%20Modified%20Rectifier.png" alt="Rectifier"></div>



<br></br>

# Max Pooling

<br></br>
Max Pooling finds the largest value of small grids in the Feature Map, this creates a Pooled Feature Map. Average Pooling (sub-sampling) takes the average values of small grids.  It makes sure that your Neural Network has Spatial Invariance (able to find learned features in new images that are slightly varied or distorted). Max Pooling provides resilience against shifter or rotated features. It also further distills Feature Maps (reduces size) while preserving spatial relationships of pixels. Removing unnecessary information also helps prevent overfitting. Read 'Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition.pdf'. Here is an online [CNN Visualization Tool](http://scs.ryerson.ca/~aharley/vis/conv/flat.html).


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/10%20-%20Max%20Pooling.png" alt="Pooling"></div>



<br></br>

# Flattening

<br></br>
Flattening puts values of the pooled Feature Map matrix into a 1-D vector. This makes it easy for the image data to pass through an Artificial Neural Network algorithm.


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/11%20-%20Flattening.png" width="400" alt="Flattening"></div>


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/12%20-%20Flattening%202.png" width="400" alt="Flattening-2"></div>



<br></br>

# Full Connection

<br></br>
This is when the output of a Convolution Neural Network is flattened and fed through a classic Artificial Neural Network. It's important to note that CNNs require fully-connected hidden layers where as regular ANNs don't necessarily need full connections.


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/13%20-%20Full%20Connection.png" width="400" alt="Full-Connection"></div>


<br></br>
The process of CNN back-propagation adjusts weights of neurons, while adjusting Feature Maps.


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/14%20-%20CNN%20Backprop.png" width="400" alt="Back-Propagation"></div>


<br></br>
When it's time for the CNN to make a decision between Cat or Dog, the final layer neurons 'vote' on probability of an image being a Cat or Dog (or any other categories you show it). The Neural Network adjusts votes according to the best weights it has determined through back-propagation.


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/15%20-%20CNN%20Weighted%20Votes.png" alt="Weighted-Votes"></div>


<br></br>
Here is a summary of every step of a CNN, don't forget about the Rectifier Function that removes linearity in Feature Maps, also remember that the hidden layers are fully connected.


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/16%20-%20CNN%20Full.png" alt="CNN-Full"></div>



<br></br>

# Pre-Processing (Images Augmentation)

<br></br>
This step modifies images to prevent over-fitting. This data augmentation trick can generate tons more data by applying random modifications to existing data like shearing, stretching, zooming, etc. This makes your dataset and algorithm more robust and generalized.



<br></br>

# Softmax and Cross-Entropy Cost Function

<br></br>
The Softmax function shown below is used to make sure that the probabilities of the output layer add up to one, this gives us a percentage guess. Watch this Geoffrey Hinton [video about the SoftMax Function](https://www.youtube.com/watch?v=mlaLLQofmR8).


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/18%20-%20Softmax%20Function.png" alt="SoftMax"></div>


<br></br>
We had previously used the Mean Squared Error (MSE) Cost Function. For CNNs, it's better to use the Cross-Entropy Function as your Cost Function. We use Cross-Entropy as a Loss Function because it has a 'Log' term which helps amplify small Errors and better guide gradient descent.

<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/17%20-%20Log%20Loss%20Function.png" alt="Loss-Function"></div>


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/19%20-%20Cross%20Entropy%20Function.png" width="200" alt="Cross-Entropy"></div>


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/20%20-%20Cross%20Entropy%20Plug%20In.png" width="400" alt="Cross-Entropy-2"></div>


<br></br>
<div align="center"><img src="https://raw.githubusercontent.com/AMoazeni/Machine-Learning-Image-Recognition/master/Jupyter%20Notebook/Images/21%20-%20Error%20Comparison.png" alt="Error"></div>



<br></br>

# Code

<br></br>
Download the code and run it with 'Jupyter Notebook' or copy the code into the 'Spyder' IDE found in the [Anaconda Distribution](https://www.anaconda.com/download/). 'Spyder' is similar to MATLAB, it allows you to step through the code and examine the 'Variable Explorer' to see exactly how the data is parsed and analyzed. Jupyter Notebook also offers a [Jupyter Variable Explorer Extension](http://volderette.de/jupyter-notebook-variable-explorer/) which is quite useful for keeping track of variables.


<br></br>
```shell
$ git clone https://github.com/AMoazeni/Machine-Learning-Image-Recognition.git
$ cd Machine-Learning-Image-Recognition
```

<br></br>
<br></br>
<br></br>
<br></br>


In [8]:
# Convolution Neural Network
# Part 1 - Building CNN Architecture and Import Data

# Importing the Keras libraries and packages
# 'Sequential' library used to Initialize NN as sequence of layers (Alternative to Graph initialization)
from keras.models import Sequential
# 'Conv2D' for 1st step of adding convolution layers to images ('Conv3D' for videos with time as 3rd dimension)
from keras.layers import Conv2D
# 'MaxPooling2D' step 2 for pooling of max values from Convolution Layers
from keras.layers import MaxPooling2D
# 'Flatten' Pooled Layers for step 3
from keras.layers import Flatten
# 'Dense' for fully connected layers that feed into classic ANN
from keras.layers import Dense

# Initializing the CNN
# Calling this object a 'classifier' because that's its job
classifier = Sequential()

# Step 1 - Convolution
# Apply a method 'add' on the object 'classifier'
# Filter = Feature Detector = Feature Kernel
# 'Conv2D' (Number of Filters, (Filter Row, Filter Column), input shape of inputs = (3 color channels, 64x64 -> 256x256 dimension of 2D array in each channel))
# Start with 32 filters, work your way up to 64 -> 128 -> 256
# 'input_shape' needs all picture inputs to be the same shape and format (2D array for B&W, 3D for Color images with each 2D array channel being Blue/Green/Red)
# 'input_shape' parameter shape matters (3,64,64) vs (64,64,3)
# 'Relu' Rectifier Activation Function used to get rid of -ve pixel values and increase non-linearity
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))

# Step 2 - Pooling
# Reduces the size of the Feature Map by half (eg. 5x5 turns into 3x3 or 8x8 turns into 4x4)
# Preserves Spatial Structure and performance of model while reducing computation time
# 'pool_size' at least needs to be 2x2 to preserve Spatial Structure information (context around individual pixels)
classifier.add(MaxPooling2D(pool_size = (2, 2)))

# Adding a second convolution layer to improve performance
# Only need 'input_shape' for Input Layer
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))

# Step 3 - Flattening
# Take all the Pooled Feature Maps and put them into one huge single Vector that will input into a classic NN
classifier.add(Flatten())

# Step 4 - Full connection
# Add some fully connected hidden layers (start with a number of Node between input and output layers)
# [Input Nodes(huge) - Output Nodes (2: Cat or Dog)] / 2 = ~128?...
# 'Activation' function makes sure relevant Nodes get a stronger vote or no vote
classifier.add(Dense(units = 128, activation = 'relu'))
# Add final Output Layer with binary options
classifier.add(Dense(units = 1, activation = 'sigmoid'))

# Compiling the CNN
# 'adam' Stochastic Gradient Descent optimizer
# 'loss' function. Logarithmic loss for 2 categories use 'binary_crossentropy' and 'categorical_crossentropy' for more objects
# 'metric' is the a performance metric
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])



# Part 2 - Fitting the CNN to the images

from keras.preprocessing.image import ImageDataGenerator

# Create random transformation from Data to increase Dataset and prevent overfitting
train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

# 'batch_size' is the number of images that go through the CNN every weight update cycle
# Increase 'target_size' to improve model accuracy 

training_set = train_datagen.flow_from_directory('../Data/training_set',
                                                 target_size = (64, 64),
                                                 batch_size = 32,
                                                 class_mode = 'binary')


test_set = test_datagen.flow_from_directory('../Data/test_set',
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'binary')

Found 8000 images belonging to 2 classes.
Found 1999 images belonging to 2 classes.


In [9]:
# Train the model
# Increase 'epochs' to boost model performance (takes longer)
classifier.fit_generator(training_set,
                         steps_per_epoch = 8000,
                         epochs = 1,
                         validation_data = test_set,
                         validation_steps = 2000)


Epoch 1/1


<keras.callbacks.History at 0x10fe0fe48>

In [16]:
# Save model to file
# Architecture of the model, allowing to reuse trained models
# Weights of the model
# Training configuration (loss, optimizer)
# State of the optimizer, allowing to resume training exactly where you left off
classifier.save('../Data/saved_model/CNN_Cat_Dog_Model.h5')

# Examine model
classifier.summary()

# Examine Weights
classifier.weights

# Examine Optimizer
classifier.optimizer



# Load saved Model
from keras.models import load_model

model = load_model('../Data/saved_model/CNN_Cat_Dog_Model.h5')


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_3 (Conv2D)            (None, 62, 62, 32)        896       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 31, 31, 32)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 29, 29, 32)        9248      
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 14, 14, 32)        0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 6272)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 128)               802944    
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 129       
Total para

In [34]:
# Part 3 - Making new predictions

# Place a new picture of a cat or dog in 'single_prediction' folder and see if your model works
import numpy as np
from keras.preprocessing import image
test_image = image.load_img('../Data/single_prediction/cat_or_dog_1.jpg', target_size = (64, 64))
# Add a 3rd Color dimension to match Model expectation
test_image = image.img_to_array(test_image)
# Add one more dimension to beginning of image array so 'Predict' function can receive it (corresponds to Batch, even if only one batch)
test_image = np.expand_dims(test_image, axis = 0)
result = classifier.predict(test_image)
# We now need to pull up the mapping between 0/1 and cat/dog
training_set.class_indices
# Map is 2D so check the first row, first column value
if result[0][0] == 1:
    prediction = 'dog'
else:
    prediction = 'cat'
# Print result

print("The model class indices are:", training_set.class_indices)

print("\nPrediction: " + prediction)


The model class indices are: {'cats': 0, 'dogs': 1}

Prediction: dog
