In [None]:
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.

# Transfer Learning in TensorFlow with Inception V3

## Introduction

Transfer learning is the process of taking a pre-trained model (the weights and parameters of a network that has been trained on a large dataset by somebody else) and “fine-tuning” the model with your own dataset. The idea is that this pre-trained model will act as a feature extractor. You will remove the last layer of the network and replace it with your own classifier (depending on what your problem space is). You then freeze the weights of all the other layers and train the network normally (Freezing the layers means not changing the weights during gradient descent/optimization).

For this experiment we used Google's Inception-V3 pretrained model for Image Classification. This model consists of two parts:
    - Feature extraction part with a convolutional neural network.
    - Classification part with fully-connected and softmax layers.
The pre-trained Inception-v3 model achieves state-of-the-art accuracy for recognizing general objects with 1000 classes. The model extracts general features from input images in the first part and classifies them based on those features in the second part.

We will use this pre-trained model and re-train it it to classify houses with or without swimming pools. 

The following chart shows how the data flows in the Inception v3 model, which is a Convolutional Neural Network with many layers and a complicated structure. 

<img src="../doc/source/images/inception_flowchart.png">

In transfer learning, when you build a new model to classify your original dataset, you reuse the feature extraction part and re-train the classification part with your dataset. Since you don't have to train the feature extraction part (which is the most complex part of the model), you can train the model with less computational resources and training time.

<img src="../doc/source/images/inception_transfer_learning.png">

## Dataset

For this experiment, we built two small image datasets (less than 600 images) -- one with images of houses without swimming pools and another one with images of houses with swimming pools.

After downloading the images, we took an extra step to visualize the images and remove the false positives. All the images were then saved in two different directories identifying the proper classification.

In the public GitHub repo we only provided a subset of the images, but we also provided bottleneck files to represent the rest of the images in the dataset. It might be worth noting that most of the bottleneck files represent aerial view images. It would not be surprising if we recognize pools better from above.

### Resizing

The raw images need to be resized to 299 x 299. The notebook code will resize the raw images into a working directory. We're also able to reuse resized images like the one below which is stored in a folder in the repo. If you have your own large dataset, you might want to do the resize once and store the resized images to use instead of the raw images.

#### For example, running the notebook resizes this image:

<img src="../data/images/house_with_pool/house-429353_960_720.jpg">

#### To this 299 x 299 image: 

<img src="../data/images_resized/house_with_pool/house-429353_960_720.jpg">

# Install Python packages
The PowerAI TensorFlow already has TensorFlow and PIL, but we need python-resize-image for the image resizing step.
Run this cell at least once. You might need to restart your kernel after the install. Use the Kernel menu.

In [None]:
!pip install python-resize-image==1.1.11

# Imports
We put all the imports at the top of the code, because this is what most Python developers would expect.

In [None]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from datetime import datetime
import os.path
import shutil
import sys

from PIL import Image
from resizeimage import resizeimage
import tensorflow as tf
from tensorflow.python.framework import graph_util
from tensorflow.python.platform import gfile

# Import image retraining function definitions

The image_retraining example module from TensorFlow can be used from a notebook by importing it and calling the functions directly. A FLAGS object is used in the module. We just create one and set it in the `Parameters` section.

In [None]:
module_path = os.path.abspath('..')
if module_path not in sys.path:
    sys.path.append(module_path)

from image_retraining import retrain

# Parameters

Many of the parameters can be changed if you choose to experiment with different images and training settings.

In [None]:
# Set DEBUG to True for more output
DEBUG = False

# Expect image files to always end with one of these
JPEG_EXTENSIONS = ('.jpeg', '.JPEG', '.jpg', '.JPG')

# Raw input images come from this dir in the git repo (or you can customize this to point to a new dir).
# Only JPEG images are used. We will resize these images before using them.
image_dir = '../data/images'

# We kept some images separate for our manual testing at the end.
test_images_dir = '../data/test_images'

# If stored_images_resized, images here have already been resized are can be used w/o re-resizing
stored_images_resized = '../data/images_resized'  # set to None to ignore

# If stored_bottlenecks, supplement the image_dir collection with persisted bottlenecks from this dir
stored_bottlenecks = '../data/bottlenecks'  # set to None to ignore

# Working files are in /tmp by default
tmp_dir = '/tmp'
bottleneck_dir = os.path.join(tmp_dir, 'bottlenecks')
images_resized_dir = os.path.join(tmp_dir, 'images_resized')
summaries_dir = os.path.join(tmp_dir, 'retrain_logs')

# Download the original inception model to/from here
model_dir = os.path.join(tmp_dir, 'inception')
inception_url = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz'

# Store the graph before and after training
output_graph_orig = "output_graph_orig.pb"
output_graph = "output_graph.pb"
output_labels = "output_labels.txt"

# Training params
architecture = 'inception_v3'
final_tensor_name = "final_result"
how_many_training_steps = 500
learning_rate = 0.01
testing_percentage = 10
validation_percentage = 10
eval_step_interval = 10
train_batch_size = 100
test_batch_size = -1
validation_batch_size = 100
print_misclassified_test_images = False

# Since we are using persisted bottleneck files, we won't play with distortion.
# Distortion would have limited impact with our small set of image files.
flip_left_right = False
random_crop = 0
random_scale = 0
random_brightness = 0

# Download once and re-use by default
force_inception_download = False

# Create a FLAGS object with these attributes
FLAGS = type('FlagsObject', (object,), {
    'architecture': architecture,
    'model_dir': model_dir,
    'intermediate_store_frequency': 0,
    'summaries_dir': summaries_dir,
    'learning_rate': learning_rate,
    'image_dir': images_resized_dir,
    'testing_percentage': testing_percentage,
    'validation_percentage': validation_percentage,
    'random_scale': random_scale,
    'random_crop': random_crop,
    'flip_left_right': flip_left_right,
    'random_brightness': random_brightness,
    'bottleneck_dir': bottleneck_dir,
    'final_tensor_name': final_tensor_name,
    'how_many_training_steps': how_many_training_steps,
    'train_batch_size': train_batch_size,
    'test_batch_size': test_batch_size,
    'eval_step_interval': eval_step_interval,
    'validation_batch_size': validation_batch_size,
    'print_misclassified_test_images': print_misclassified_test_images,
    'output_graph': output_graph,
    'output_labels': output_labels
})

# Setting the FLAGS in retrain allows us to call the functions directly
retrain.FLAGS = FLAGS

# Download the Inception model

In [None]:
# Download the Inception model once and reuse it (set the flag and clobber it each time).
if force_inception_download and os.path.isdir(model_dir):    
    shutil.rmtree(model_dir)
retrain.maybe_download_and_extract()

# Prepare the images

The Inception model requires 299 X 299 pixel sizes.
First copy the files from `stored_images_resized` into `images_resized_dir`.
With these stored images that are already resized, we don't need to repeat the process.
Next copy and resize the remaining raw images from `image_dir` into `images_resized_dir`.

In [None]:
def resize_images(src_dir, dest_dir):
    if not os.path.isdir(src_dir):
        raise Exception(src_dir + " is not a directory")
    if not os.path.exists(dest_dir):
        os.mkdir(dest_dir)

    raw_images = {image for image in os.listdir(src_dir) if image.endswith(
        JPEG_EXTENSIONS)}
    dest_images = {image for image in os.listdir(dest_dir)}

    # Resize the ones that are not already in the dest dir
    for image in raw_images - dest_images:
        if DEBUG:
            print("Resizing " + image)
        resize_image(image, src_dir, dest_dir)


def resize_image(image_file, src_dir, dest_dir):
    in_file = os.path.join(src_dir, image_file)
    with open(in_file, 'r+b') as fd_img:
        with Image.open(fd_img) as img:
            resized_image = resizeimage.resize_contain(
                img, [299, 299]).convert("RGB")
            resized_image.save(os.path.join(dest_dir, image_file), img.format)

# Use a fresh working dir for the resized images
if os.path.isdir(images_resized_dir):
    shutil.rmtree(images_resized_dir)
os.mkdir(images_resized_dir)
    
subdirs = ('house_with_pool', 'house_without_pool')

# Copy in the image files
for subdir in subdirs:
    dest_dir = os.path.join(images_resized_dir, subdir)
    if not os.path.isdir(dest_dir):
        os.mkdir(dest_dir)
      
    # Copy the already resized files first, if any, from the repo or a custom dir
    if stored_images_resized:
        source_dir = os.path.join(stored_images_resized, subdir)
        if os.path.isdir(source_dir):
            for f in os.listdir(source_dir):
                path = os.path.join(source_dir, f)
                if (os.path.isfile(path)):
                    shutil.copy(path, dest_dir)
                    
    # Copy/resize the remaining raw images into the images_resized_dir(s)
    resize_images(os.path.join(image_dir, subdir), dest_dir)

# Visualize dataset images

Since Jupyter notebooks are great at showing markdown documentation as well as code and output, we can look at some of the images here.

To visualize a different image, double click on the displayed image below, the markdown text will show up. Change the image file name to display another one.

Some false positives have been removed from our dataset, but it is still interesting to see which images are harder to classify. Lakes and ponds would be something to look into. Some of the lower confidence numbers seem to come from shapes that resemble pools (and not bodies of water).

Removing false positives can often help the training but if we want to improve the training to classify those images with confidence as well, then we might just need a bigger dataset with a good amount of relevant examples to learn from.

<img src="../data/images/house_without_pool/giethoorn-2368494__340.jpg">

# Copy stored bottleneck files
Many previously calculated bottleneck files are stored in `stored_bottlenecks`
to improve our dataset size and reduce processing time. Here we copy them into the working `bottleneck_dir`.
We also create a placeholder image file so that they are included in our image lists for training, validation,
and testing. The placeholder contents won't be used because the bottleneck is used instead.

In [None]:
# Use a fresh working dir for the bottleneck files  
if os.path.isdir(bottleneck_dir):    
    shutil.rmtree(bottleneck_dir)
os.mkdir(bottleneck_dir)

subdirs = ('house_with_pool', 'house_without_pool')

# Copy in the stored bottleneck files
for subdir in subdirs:
    dest_dir = os.path.join(bottleneck_dir, subdir)
    if not os.path.isdir(dest_dir):
        os.mkdir(dest_dir)

    image_dest_dir = os.path.join(images_resized_dir, subdir)

    if stored_bottlenecks:
        source_dir = os.path.join(stored_bottlenecks, subdir)
        if os.path.isdir(source_dir):
            for f in os.listdir(source_dir):
                path = os.path.join(source_dir, f)
                if (os.path.isfile(path)):
                    # Copy the persisted bottleneck to bottlenecks dir
                    shutil.copy(path, dest_dir)
                    # "touch" the file (w/o the .txt) to create a placeholder image
                    # This image file will only be used to build the lists.
                    if DEBUG:
                        print("Creating placeholder image at %s" % os.path.join(image_dest_dir, f[:-4]))
                    open(os.path.join(image_dest_dir, f[:-4]), 'a').close
                        

# Retraining

The following code demonstrates how to take an Inception v3 architecture model trained on
ImageNet images, and train a new top layer that can recognize other classes of
images.

The top layer receives as input a 2048-dimensional vector for each image. We
train a softmax layer on top of this representation. Assuming the softmax layer
contains N labels, this corresponds to learning N + 2048*N model parameters
corresponding to the learned biases and weights.

We have a folder with two subfolders called **house_with_pool** and **house_without_pool**.
JPEG images have been selected for training and placed in the proper folder.
The subfolder names are important, since they define what label is applied to each image, but the filenames themselves don't matter. The label for each image is taken from the name of the subfolder it's in. This produces a new model file that can be loaded and run by any TensorFlow program.

In addition to the small sample of images, we have a larger set of bottlenecks. These were captured from
images used in earlier runs and will be used here to increase the size of the dataset.

## Main function

In [None]:
  # Setup the directory we'll write summaries to for TensorBoard
  if tf.gfile.Exists(FLAGS.summaries_dir):
    tf.gfile.DeleteRecursively(FLAGS.summaries_dir)
  tf.gfile.MakeDirs(FLAGS.summaries_dir)

  # Set up the pre-trained graph.
  graph, bottleneck_tensor, jpeg_data_tensor, resized_image_tensor = (
      retrain.create_inception_graph())

  # Look at the folder structure, and create lists of all the images.
  # This is why we use placeholder images when we reuse bottleneck files.
  image_lists = retrain.create_image_lists(FLAGS.image_dir, FLAGS.testing_percentage,
                                   FLAGS.validation_percentage)
  class_count = len(image_lists.keys())
  if class_count == 0:
    raise Exception('No valid folders of images found at ' + FLAGS.image_dir)
  if class_count == 1:
    raise Exception('Only one valid folder of images found at ' + FLAGS.image_dir +
          ' - multiple classes are needed for classification.')

  with tf.Session(graph=graph) as sess:

    # Calculate and cache bottleneck files based on the resized images
    retrain.cache_bottlenecks(sess, image_lists, FLAGS.image_dir,
                    FLAGS.bottleneck_dir, jpeg_data_tensor,
                    bottleneck_tensor)

    # Add the new layer that we'll be training.
    (train_step, cross_entropy, bottleneck_input, ground_truth_input,
     final_tensor) = retrain.add_final_training_ops(len(image_lists.keys()),
                                            FLAGS.final_tensor_name,
                                            bottleneck_tensor)

    # Create the operations we need to evaluate the accuracy of our new layer.
    evaluation_step, prediction = retrain.add_evaluation_step(
        final_tensor, ground_truth_input)

    # Merge all the summaries and write them out to the summaries_dir
    merged = tf.summary.merge_all()
    train_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/train',
                                         sess.graph)

    validation_writer = tf.summary.FileWriter(
        FLAGS.summaries_dir + '/validation')

    # Set up all our weights to their initial default values.
    init = tf.global_variables_initializer()
    sess.run(init)
    
    # Save the original graph, so we can compare results later!
    output_graph_def = graph_util.convert_variables_to_constants(
        sess, graph.as_graph_def(), [final_tensor_name])
    with gfile.FastGFile(output_graph_orig, 'wb') as f:
        f.write(output_graph_def.SerializeToString())

    # Run the training!
    for i in range(FLAGS.how_many_training_steps):

      (train_bottlenecks, train_ground_truth, _) = retrain.get_random_cached_bottlenecks(
             sess, image_lists, FLAGS.train_batch_size, 'training',
             FLAGS.bottleneck_dir, FLAGS.image_dir, jpeg_data_tensor,
             bottleneck_tensor)
    
      # Feed the bottlenecks and ground truth into the graph, and run a training
      # step. Capture training summaries for TensorBoard with the `merged` op.
      train_summary, _ = sess.run(
          [merged, train_step],
          feed_dict={bottleneck_input: train_bottlenecks,
                     ground_truth_input: train_ground_truth})
      train_writer.add_summary(train_summary, i)

      # Every so often, print out how well the graph is training.
      is_last_step = (i + 1 == FLAGS.how_many_training_steps)
      if (i % FLAGS.eval_step_interval) == 0 or is_last_step:
        train_accuracy, cross_entropy_value = sess.run(
            [evaluation_step, cross_entropy],
            feed_dict={bottleneck_input: train_bottlenecks,
                       ground_truth_input: train_ground_truth})
        print('%s: Step %d: Train accuracy = %.1f%%' % (datetime.now(), i,
                                                        train_accuracy * 100))
        print('%s: Step %d: Cross entropy = %f' % (datetime.now(), i,
                                                   cross_entropy_value))
        validation_bottlenecks, validation_ground_truth, _ = (
            retrain.get_random_cached_bottlenecks(
                sess, image_lists, FLAGS.validation_batch_size, 'validation',
                FLAGS.bottleneck_dir, FLAGS.image_dir, jpeg_data_tensor,
                bottleneck_tensor))
        # Run a validation step and capture training summaries for TensorBoard
        # with the `merged` op.
        validation_summary, validation_accuracy = sess.run(
            [merged, evaluation_step],
            feed_dict={bottleneck_input: validation_bottlenecks,
                       ground_truth_input: validation_ground_truth})
        validation_writer.add_summary(validation_summary, i)
        print('%s: Step %d: Validation accuracy = %.1f%% (N=%d)' %
              (datetime.now(), i, validation_accuracy * 100,
               len(validation_bottlenecks)))

    # We've completed all our training, so run a final test evaluation on
    # some new images we haven't used before.
    test_bottlenecks, test_ground_truth, test_filenames = (
        retrain.get_random_cached_bottlenecks(sess, image_lists, FLAGS.test_batch_size,
                                      'testing', FLAGS.bottleneck_dir,
                                      FLAGS.image_dir, jpeg_data_tensor,
                                      bottleneck_tensor))
    test_accuracy, predictions = sess.run(
        [evaluation_step, prediction],
        feed_dict={bottleneck_input: test_bottlenecks,
                   ground_truth_input: test_ground_truth})
    print('Final test accuracy = %.1f%% (N=%d)' % (
        test_accuracy * 100, len(test_bottlenecks)))

    if FLAGS.print_misclassified_test_images:
      print('=== MISCLASSIFIED TEST IMAGES ===')
      for i, test_filename in enumerate(test_filenames):
        if predictions[i] != test_ground_truth[i].argmax():
          print('%70s  %s' % (test_filename,
                              list(image_lists.keys())[predictions[i]]))

    # Write out the trained graph and labels with the weights stored as
    # constants.
    output_graph_def = graph_util.convert_variables_to_constants(
        sess, graph.as_graph_def(), [FLAGS.final_tensor_name])
    with gfile.FastGFile(FLAGS.output_graph, 'wb') as f:
      f.write(output_graph_def.SerializeToString())
    with gfile.FastGFile(FLAGS.output_labels, 'w') as f:
      f.write('\n'.join(image_lists.keys()) + '\n')

The final test accuracy is **~85%** for our two classes **house_with_pool** and **house_without_pool** which is quite substantial given our training set contained less than 600 images. This is where Transfer Learning really shines. We used the trained Inception Model which already had learned to recognize lines, shapes and other features that increase in abstraction as we move toward the final layers of the model. We only had to retrain the last layers where we supplied training images of houses with and without pools.  

# Want to give it a try?

We added some test images that you can use to test the model or you can download your own.

### Test images with pools:
<img src="../data/test_images/house_with_pool/home-2008825__340.jpg">
<img src="../data/test_images/house_with_pool/villa-2366288__340.jpg">
### Test images without pools:
<img src="../data/test_images/house_without_pool/holiday-house-177401__340.jpg">
<img src="../data/test_images/house_without_pool/weathered-2139859__340.jpg">

# Before and after

### Run the inference engine with the original graph file and then with the retrained graph

In [None]:
# Test with the test_images subdirs
for graph in (output_graph_orig, output_graph):
    print("\nTesting with graph=%s\n" % graph)
    for subdir in ('house_with_pool', 'house_without_pool'):
        test_dir = os.path.join(test_images_dir, subdir)
        for f in os.listdir(test_dir):
            if f.endswith(JPEG_EXTENSIONS):
                tf.reset_default_graph()
                image = os.path.join(test_dir, f)
                print(image)
                %run ../image_retraining/label_image.py --image=$image --graph=$graph --labels=$output_labels

### Results

The original results are not much better than a coin flip. This is the expected result as the Inception V3 model has not been trained for houses with or without pools.

The new graph classifies the images correctly and with significant confidence.

# Conclusion
I hope that you are now able to apply pre-trained models to your problem statements. Be sure that the pre-trained model you have selected has been trained on a similar dataset as the one that you wish to use it on. There are various architectures people have tried on different types of datasets and I strongly encourage you to go through these architectures and apply them to your own problem statements.
