In [None]:
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.

# Transfer Learning in Tensorflow with Inception V3

## Introduction

Transfer learning is the process of taking a pre-trained model (the weights and parameters of a network that has been trained on a large dataset by somebody else) and “fine-tuning” the model with your own dataset. The idea is that this pre-trained model will act as a feature extractor. You will remove the last layer of the network and replace it with your own classifier (depending on what your problem space is). You then freeze the weights of all the other layers and train the network normally (Freezing the layers means not changing the weights during gradient descent/optimization).

For this experiment we used Google's Inception-V3 pretrained model for Image Classification. This model consists of two parts:
    - Feature extraction part with a convolutional neural network.
    - Classification part with fully-connected and softmax layers.
The pre-trained Inception-v3 model achieves state-of-the-art accuracy for recognizing general objects with 1000 classes. The model extracts general features from input images in the first part and classifies them based on those features in the second part.

We will use this pre-trained model and re-train it it to classify aerial views of houses which have a swimming pool in their backyard. In order to do so we will build a new image dataset with aerial views coming from Goolgle Images. 

The following chart shows how the data flows in the Inception v3 model, which is a Convolutional Neural Network with many layers and a complicated structure. 

<img src="../doc/source/images/inception_flowchart.png">

In transfer learning, when you build a new model to classify your original dataset, you reuse the feature extraction part and re-train the classification part with your dataset. Since you don't have to train the feature extraction part (which is the most complex part of the model), you can train the model with less computational resources and training time.

<img src="../doc/source/images/inception_training.png">

The next step is to train the classification part of the model using the preprocessed data. The previous diagram shows the relationship between the preprocessing and the training.

## Disclaimer

This demo could not have been built without the research and work done by **Aniruddha Tapas**<br> and published on Github:https://github.com/Aniruddha-Tapas

The content of the following paper: http://ijarcet.org/wp-content/uploads/IJARCET-VOL-5-ISSUE-11-2664-2669.pdf has largely driven the execution of the demo as well as the fork of his github repository at: https://github.com/Aniruddha-Tapas/Transfer-Learning-for-Animal-Classification-in-Tensorflow

The images used in this demo application have been downloaded from publicly available images on the internet, and the copyright belongs to the respective owners. 


## Dataset

For this experiment, we built two small image dataset (~350 images) from the Google Images Web site. One with aerial view of houses without swimming pool and another one with aerial view of houses with a swimming pool in their backyard. 
To extract and load the pictures, we used 'googliser.sh' (https://github.com/teracow/googliser.git). 'googliser.sh'is a BASH script to perform fast image downloads sourced from Google Images based upon a user-specified search-phrase. It's a web-page scraper that feeds a list of image URLs to Wget to download images in parallel then combines them into a gallery image. 

Commands to build the two datasets: 

**./googliser.sh -p "house pool aerial view" -f 0 -g -n 500**<br>
**./googliser.sh -p "house aerial view" -f 0 -g -n 500**

After downloading the images, we took an extra step to visualize the images and remove the false positive. All the images are then saved in two different directories.

Next step is to resize all to images to 299x299.

Install: **pip install python-resize-image** 



## Data Preprocessing

The images saved in the directory may not be optimal for classifying using transfer learning on the Inception model as it requires 299 X 299 pixel sizes. We also rename the images for easier manipulation. Copy and run the following code in the image directory.

### Renaming and Resizing Images
Example script used to fix up image files.

```python
#! python2.7
import os
from PIL import Image
from resizeimage import resizeimage

# Files downloaded from the Google Images Web site follow a naming convention that includes '(xxxx)'   
# to number the images (e.g., google-image(0240).jpeg). The two lines below remove the parentheses.
[os.rename(f, f.replace('(', '-')) for f in os.listdir('.') if f.endswith('.jpeg')]
[os.rename(f, f.replace(')', '')) for f in os.listdir('.') if f.endswith('.jpeg')]

# Resize the images to 299x299  
def resize_file(in_file):
    fd_img = open(in_file, 'r')
    img = Image.open(fd_img)
    img = resizeimage.resize_contain(img, [299, 299])
    img.save((in_file.rsplit( ".", 1 )[ 0 ]) + '-resized.jpeg', img.format)
    fd_img.close()
    os.remove(in_file)

[resize_file(f)  for f in os.listdir('.') if f.endswith('.jpeg')]
```

This **Python 2.7 **script resizes the images to 299 x 299 by padding white color to the boundaries.

For example it resizes 

<img src="../doc/source/images/google-image-0002.jpeg">

to : 

<img src="../doc/source/images/google-image-0002-resized.jpeg">

Now that we have our datasets ready we can move over to actually code our image classifier.

# Visualize dataset images

## Houses
Range of images are between **0001** and **0511**. As some 'false positive' have been removed as well as 'garbage' images, not all the range is covered. For example, 0024, 0217, 0432 exist but not 0047 or 0410 or ...so try different image numbers.<br><br>
To visualize a different image, double click on the displayed image below, the command will show up. Change the image number to display another one.


<img src="../data/images/House-Pool/House/google-image-0001-resized.jpeg">

## Houses + Pools
Range of images are between **0001** and **0504**. As some 'false positive' have been removed as well as 'garbage' images, not all the range is covered. For example, 0113, 0285, 0467 exist but not 0105 or 0302 or ...so try different image numbers.<br><br>
To visualize a different image, double click on the displayed image below, the command will show up. Change the image number to display another one.


<img src="../data/images/House-Pool/Pool/google-image-0471-resized.jpeg">

# Retraining

The following scripts demonstrate how to take an Inception v3 architecture model trained on
ImageNet images, and train a new top layer that can recognize other classes of
images.

The top layer receives as input a 2048-dimensional vector for each image. We
train a softmax layer on top of this representation. Assuming the softmax layer
contains N labels, this corresponds to learning N + 2048*N model parameters
corresponding to the learned biases and weights.

We have a folder called **House-Pool** and two subfolders called **Pool** and **House** containing each one a different set of images (with and without pools).<br> 
The subfolder names are important, since they define what label is applied to each image, but the filenames themselves don't matter. The label for each image is taken from the name of the subfolder it's in. This produces a new model file that can be loaded and run by any TensorFlow program, for example the label_image sample code.

## Imports

In [None]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import sys
# sys.path.insert(0, "/opt/DL/tensorflow/lib/python2.7/site-packages/")

from datetime import datetime
import hashlib
import os
import os.path
import random
import re
import struct
import sys
import tarfile

import numpy as np
from six.moves import urllib

import tensorflow as tf
from tensorflow.python.framework import graph_util
from tensorflow.python.framework import tensor_shape
from tensorflow.python.platform import gfile
from tensorflow.python.util import compat

from shutil import copyfile

## Helper Function Definitions
We put a lot of the code in functions in a Python module. This should make the simplified notebook easier to read and make the functions easier to unit test. We don't want to hide the code though. If you are taking a deeper look, be
sure to look into helper.py.

In [None]:
# Import the helper functions from the transferlearning directory
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
from transferlearning import helper

## Parameters

The following are all parameters that are tied to the particular model architecture
we're using for Inception v3. These include things like tensor names and their
sizes. If you want to adapt this script to work with another model, you will
need to update these to reflect the values in the network you're using.

In [None]:
image_dir = '../data/images/House-Pool'
output_graph = "output_graph.pb"
output_labels = "output_labels.txt"
output_graph_orig = "output_graph_orig.pb"
summaries_dir = "/tmp/output_labels.txt"
how_many_training_steps = 500
learning_rate = 0.01
testing_percentage = 10
validation_percentage = 10
eval_step_interval = 10
train_batch_size = 100
test_batch_size = -1
validation_batch_size = 100
print_misclassified_test_images = False
model_dir = os.path.join('inception')
bottleneck_dir = "bottlenecks"
final_tensor_name = "final_result"
flip_left_right = False
random_crop = 0
random_scale = 0
random_brightness = 0
DATA_URL = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz'

# pylint: enable=line-too-long
BOTTLENECK_TENSOR_NAME = 'pool_3/_reshape:0'
BOTTLENECK_TENSOR_SIZE = 2048
MODEL_INPUT_WIDTH = 299
MODEL_INPUT_HEIGHT = 299
MODEL_INPUT_DEPTH = 3
JPEG_DATA_TENSOR_NAME = 'DecodeJpeg/contents:0'
RESIZED_INPUT_TENSOR_NAME = 'ResizeBilinear:0'
MAX_NUM_IMAGES_PER_CLASS = 2 ** 27 - 1  # ~134M

## Main function


In [None]:
import os
import shutil
import sys

# do some cleanup
if os.path.isfile(output_graph):
    os.remove(output_graph)

# Force download of inception model:
# if os.path.isdir(model_dir):    
#    shutil.rmtree(model_dir, ignore_errors=False, onerror=None)
    
if os.path.isdir(bottleneck_dir):    
    shutil.rmtree(bottleneck_dir, ignore_errors=False, onerror=None)

tf.reset_default_graph()

# Setup the directory we'll write summaries to for TensorBoard
if tf.gfile.Exists(summaries_dir):
    tf.gfile.DeleteRecursively(summaries_dir)
tf.gfile.MakeDirs(summaries_dir)

# Set up the pre-trained graph.
helper.maybe_download_and_extract()
graph, bottleneck_tensor, jpeg_data_tensor, resized_image_tensor = (
      helper.create_inception_graph())

# Look at the folder structure, and create lists of all the images.
image_lists = helper.create_image_lists(image_dir, testing_percentage,
                                        validation_percentage)
class_count = len(image_lists.keys())
if class_count == 0:
    print('No valid folders of images found at ' + image_dir)
if class_count == 1:
    print('Only one valid folder of images found at ' + image_dir +
          ' - multiple classes are needed for classification.')
# See if the command-line flags mean we're applying any distortions.
do_distort_images = helper.should_distort_images(
      flip_left_right, random_crop, random_scale,
      random_brightness)
sess = tf.Session()

if do_distort_images:
    # We will be applying distortions, so setup the operations we'll need.
    distorted_jpeg_data_tensor, distorted_image_tensor = helper.add_input_distortions(
        flip_left_right, random_crop, random_scale,
        random_brightness)
else:
    # We'll make sure we've calculated the 'bottleneck' image summaries and
    # cached them on disk.
    helper.cache_bottlenecks(sess, image_lists, image_dir, bottleneck_dir,
                             jpeg_data_tensor, bottleneck_tensor)

# Add the new layer that we'll be training.
(train_step, cross_entropy, bottleneck_input, ground_truth_input,
   final_tensor) = helper.add_final_training_ops(len(image_lists.keys()),
                                                 final_tensor_name,
                                                 bottleneck_tensor)

# Create the operations we need to evaluate the accuracy of our new layer.
evaluation_step, prediction = helper.add_evaluation_step(
      final_tensor, ground_truth_input)

# Merge all the summaries and write them out to /tmp/retrain_logs (by default)
merged = tf.summary.merge_all()
train_writer = tf.summary.FileWriter(summaries_dir + '/train',
                                       sess.graph)
validation_writer = tf.summary.FileWriter(summaries_dir + '/validation')

# Set up all our weights to their initial default values.
init = tf.global_variables_initializer()
sess.run(init)

# FRB
output_graph_def = graph_util.convert_variables_to_constants(
    sess, graph.as_graph_def(), [final_tensor_name])
with gfile.FastGFile(output_graph_orig, 'wb') as f:
    f.write(output_graph_def.SerializeToString())

# Run the training for as many cycles as requested on the command line.
for i in range(how_many_training_steps):
    # Get a batch of input bottleneck values, either calculated fresh every time
    # with distortions applied, or from the cache stored on disk.
    if do_distort_images:
        train_bottlenecks, train_ground_truth = helper.get_random_distorted_bottlenecks(
            sess, image_lists, train_batch_size, 'training',
            image_dir, distorted_jpeg_data_tensor,
            distorted_image_tensor, resized_image_tensor, bottleneck_tensor)
    else:
        train_bottlenecks, train_ground_truth, _ = helper.get_random_cached_bottlenecks(
            sess, image_lists, train_batch_size, 'training',
            bottleneck_dir, image_dir, jpeg_data_tensor,
            bottleneck_tensor)
    # Feed the bottlenecks and ground truth into the graph, and run a training
    # step. Capture training summaries for TensorBoard with the `merged` op.
    train_summary, _ = sess.run([merged, train_step],
                                feed_dict={bottleneck_input: train_bottlenecks,
                                           ground_truth_input: train_ground_truth})
    train_writer.add_summary(train_summary, i)

    # Every so often, print out how well the graph is training.
    is_last_step = (i + 1 == how_many_training_steps)
    if (i % eval_step_interval) == 0 or is_last_step:
        train_accuracy, cross_entropy_value = sess.run(
            [evaluation_step, cross_entropy],
            feed_dict={bottleneck_input: train_bottlenecks,
                       ground_truth_input: train_ground_truth})
        print('%s: Step %d: Train accuracy = %.1f%%' % (datetime.now(), i,
                                                      train_accuracy * 100))
        print('%s: Step %d: Cross entropy = %f' % (datetime.now(), i,
                                                 cross_entropy_value))
        validation_bottlenecks, validation_ground_truth, _ = (
            helper.get_random_cached_bottlenecks(
                sess, image_lists, validation_batch_size, 'validation',
                bottleneck_dir, image_dir, jpeg_data_tensor,
                bottleneck_tensor))
        # Run a validation step and capture training summaries for TensorBoard
        # with the `merged` op.
        validation_summary, validation_accuracy = sess.run(
            [merged, evaluation_step],
            feed_dict={bottleneck_input: validation_bottlenecks,
                     ground_truth_input: validation_ground_truth})
        validation_writer.add_summary(validation_summary, i)
        print('%s: Step %d: Validation accuracy = %.1f%% (N=%d)' %
              (datetime.now(), i, validation_accuracy * 100,
               len(validation_bottlenecks)))

# We've completed all our training, so run a final test evaluation on
# some new images we haven't used before.
test_bottlenecks, test_ground_truth, test_filenames = (
    helper.get_random_cached_bottlenecks(sess, image_lists, test_batch_size,
                                         'testing', bottleneck_dir,
                                         image_dir, jpeg_data_tensor,
                                         bottleneck_tensor))
test_accuracy, predictions = sess.run(
    [evaluation_step, prediction],
    feed_dict={bottleneck_input: test_bottlenecks,
               ground_truth_input: test_ground_truth})
print('Final test accuracy = %.1f%% (N=%d)' % (
    test_accuracy * 100, len(test_bottlenecks)))

if print_misclassified_test_images:
    print('=== MISCLASSIFIED TEST IMAGES ===')
    for i, test_filename in enumerate(test_filenames):
        if predictions[i] != test_ground_truth[i].argmax():
            print('%70s  %s' % (test_filename, image_lists.keys()[predictions[i]]))

# Write out the trained graph and labels with the weights stored as constants.
output_graph_def = graph_util.convert_variables_to_constants(
    sess, graph.as_graph_def(), [final_tensor_name])
with gfile.FastGFile(output_graph, 'wb') as f:
    f.write(output_graph_def.SerializeToString())
with gfile.FastGFile(output_labels, 'w') as f:
    f.write('\n'.join(image_lists.keys()) + '\n')
    


The final test accuracy is **~85%**  for our two classes **House** and **House + Pool** which is quite substantial given our training set contained approxiamtely only ~300 images for each classes. This is where Transfer Learning really shines. All what we did was to use the trained Inception Model which already had learned basic features of lines, shapes and other features that increase in abstraction as we move towards the final layers of the model. We basically retrained the last layers where we supplied training images of Houses and House + Pools and the model using its pre-learnt features.  

## Want to give it a try ?

We have a couple of images you can use to test the model or you can download your owns from your favorite web site.

**image-01-resized.jpeg**  <img src="../data/test_images/image-01.jpeg" >
**image-02-resized.jpeg**  <img src="../data/test_images/image-02.jpeg" >
**image-03-resized.jpeg**  <img src="../data/test_images/image-03.jpeg" >
**image-04-resized.jpeg**  <img src="../data/test_images/image-04.jpeg" >
**image-05-resized.jpeg**  <img src="../data/test_images/image-05.jpeg" >
**image-06-resized.jpeg**  <img src="../data/test_images/image-06.jpeg" >
**image-07-resized.jpeg**  <img src="../data/test_images/image-07.jpeg" >


### Run the inference engine
Open a terminal window form the Jupyter Notebook. <br>
The **samples** directory includes some test images. 

<img src="../doc/source/images/terminal-access.png" >

To run the inference engine use the following command: <br><br>
**python test-new.py test_images/image-01.jpeg**
<br><br>
You can also download your own set of images and run them against the inference engine.
<br>

### Run the inference engine against the **original** Inception V3 model

In [None]:
%run ../scripts/test-orig.py ../data/test_images/image-03.jpeg

This is the expected result as the Inception V3 model has not been trained with the new image dataset.

### Run the inference engine against new model

In [None]:
%run ../scripts/test-new.py ../data/test_images/image-03.jpeg

Yeah, the new model is able to properly classify the images...

## Conclusion
I hope that you would now be able to apply pre-trained models to your problem statements. Be sure that the pre-trained model you have selected has been trained on a similar data set as the one that you wish to use it on. There are various architectures people have tried on different types of data sets and I strongly encourage you to go through these architectures and apply them on your own problem statements.
