# Retrain
This tutorial demonstrates the Transfer Learning technique applied to creating an image classifier capable of recognizing candies. It uses the pre-trained [MobileNet](https://ai.googleblog.com/2017/06/mobilenets-open-source-models-for.html) Convolutional Neural Network with an input image resolution of 224px and a model 50% the size of the largest MobileNet network. MobileNet reusable components (modules) are availabe on [TensorFlow Hub](https://www.tensorflow.org/hub/modules/image) along with modules of pther pre-trained image recognition networks like Inception and ResNet.

This tutorial is heavily inspired from the [TensorFlow for Poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets) codelab and the source is an adaptation of the code provided in the [retrain.py](https://github.com/googlecodelabs/tensorflow-for-poets-2/blob/master/scripts/retrain.py) file of the aforementioned codelab. A more explanation rich version of this codelab is available as a [TensorFlow tutorial on Transfer Learning](https://www.tensorflow.org/tutorials/image_retraining).

The source code of this tutorial is available on [Github](https://github.com/dan-anghel/datalab-ml).

## Preliminaries
Please run the code in the box below to import all required modules and to create the **gs://[GCP_PROJECT_ID]-image-classifier** bucket in which the trained model will be stored.
<p>
Display the content of the box if you want to have a closer look at the code.

In [4]:
import collections
import hashlib
import numpy as np
import os.path
import random
import re
import sys
import tarfile
import tensorflow as tf

from datetime import datetime
from six.moves import urllib
from tensorflow.python.framework import graph_util
from tensorflow.python.framework import tensor_shape
from tensorflow.python.platform import gfile
from tensorflow.python.util import compat

MAX_NUM_IMAGES_PER_CLASS = 2 ** 27 - 1  # ~134M

tf.logging.set_verbosity(tf.logging.INFO)

project_id = datalab_project_id()
bucket = 'gs://%s-image-classifier' % project_id
!gsutil mb $bucket

Creating gs://intelligent-candy-image-classifier/...
ServiceException: 409 Bucket intelligent-candy-image-classifier already exists.


## Visualize the data
We will train our classifier to recognize images of the following classes of candies: almondjoy, bounty, dove, mars, milkyway, snickers, threemusketeers, twix. Let's have a look at a few samples from the dataset.
<p>
Please run the code in the box below to display 4 random samples of images of each class. Run the code multiple times to get a sense of the dataset.

In [5]:
import IPython
import base64
import pandas as pd

from cStringIO import StringIO
from tensorflow.python.lib.io import file_io as tf_file_io


def resize_image(image_str_tensor):
  """Decodes jpeg string, resizes it and re-encode it to jpeg."""
  import tensorflow as tf
  
  # These constants are set by Inception v3's expectations.
  height = 299
  width = 299
  channels = 3

  image = tf.image.decode_jpeg(image_str_tensor, channels=channels)
  # Note resize expects a batch_size, but tf_map supresses that index,
  # thus we have to expand then squeeze.  Resize returns float32 in the
  # range [0, uint8_max]
  image = tf.expand_dims(image, 0)
  image = tf.image.resize_bilinear(image, [height, width], align_corners=False)
  image = tf.squeeze(image, squeeze_dims=[0])
  image = tf.cast(image, dtype=tf.uint8)
  image = tf.image.encode_jpeg(image, quality=100)
  return image


def display_images(image_files):
  """Predict using a deployed (online) model."""
  
  images = []
  for image_file in image_files:
    with tf_file_io.FileIO(image_file, 'r') as ff:
      images.append(ff.read())

  # To resize, run a tf session so we can reuse 'decode_and_resize()'
  # which is used in prediction graph. This makes sure we don't lose
  # any quality in prediction, while decreasing the size of the images
  # submitted to the model over network.
  image_str_tensor = tf.placeholder(tf.string, shape=[None])
  image = tf.map_fn(resize_image, image_str_tensor, back_prop=False)
  feed_dict = collections.defaultdict(list)
  feed_dict[image_str_tensor.name] = images
  with tf.Session() as sess:
    images_resized = sess.run(image, feed_dict=feed_dict)

  html = '<table>'
  for i, image in enumerate(images_resized):
    encoded_image = base64.b64encode(image)
    image_html = "<td><img src='data:image/jpg;base64, %s'></td>" % encoded_image
    if i % 4 == 0:
      html = html + '<tr>'
    html = html + image_html
    if i % 4 == 3:
      html = html + '</tr>'
  #IPython.display.display(IPython.display.Image(data=image))
  html = html + '</table>'
  IPython.display.display(IPython.display.HTML(html))


train_dataset = 'gs://candies-ml/dataset/v1/metadata/train_candies560.csv'
%storage read --object $train_dataset --variable text
df = pd.read_csv(StringIO(text), names=['image_uri', 'category'])

images = []
categories = df['category'].drop_duplicates().values
for category in categories:
  image_uris = df.loc[df['category'] == category]
  images.extend(image_uris['image_uri'].sample(n=4, replace=False).values)
display_images(images)

## Setup temporary folders
Please run the code in the box below to make sure all temporary folders required during the model training are created, most notably the summaries folder that will store data for TensorBoard visualisation of the accuracy and loss functions durint training.
<p>
Display the content of the box if you want to have a closer look at the code.

In [6]:
def ensure_dir_exists(dir_name):
  """Makes sure the folder exists on disk.

  Args:
    dir_name: Path string to the folder we want to create.
  """
  if not os.path.exists(dir_name):
    os.makedirs(dir_name)


def prepare_file_system(summaries_dir,
                        intermediate_output_graphs_dir,
                        intermediate_store_frequency):
  # Setup the directory we'll write summaries to for TensorBoard
  if tf.gfile.Exists(summaries_dir):
    tf.gfile.DeleteRecursively(summaries_dir)
  tf.gfile.MakeDirs(summaries_dir)
  if intermediate_store_frequency > 0:
    ensure_dir_exists(intermediate_output_graphs_dir)
  return

# Prepare necessary directories  that can be used during training
summaries_dir='tf_files/training_summaries/mobilenet_0.50_224'
intermediate_output_graphs_dir='/tmp/intermediate_graph/'
intermediate_store_frequency=0
prepare_file_system(summaries_dir=summaries_dir,
                    intermediate_output_graphs_dir=intermediate_output_graphs_dir,
                    intermediate_store_frequency=intermediate_store_frequency)

## Download the model
Please run the code in the box below to download the MobileNet model and to load it as a TensorFlow graph for further processing. As mentioned previously, we will use int hir tutorial a MobileNet model with an input image resolution of 224px and a model size 50% of the size of the largest MobileNet network.
<p>
Display the content of the box if you want to have a closer look at the code.

In [7]:
def create_model_info(architecture):
  """Given the name of a model architecture, returns information about it.

  There are different base image recognition pretrained models that can be
  retrained using transfer learning, and this function translates from the name
  of a model to the attributes that are needed to download and train with it.

  Args:
    architecture: Name of a model architecture.

  Returns:
    Dictionary of information about the model, or None if the name isn't
    recognized

  Raises:
    ValueError: If architecture name is unknown.
  """
  architecture = architecture.lower()
  if architecture == 'inception_v3':
    # pylint: disable=line-too-long
    data_url = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz'
    # pylint: enable=line-too-long
    bottleneck_tensor_name = 'pool_3/_reshape:0'
    bottleneck_tensor_size = 2048
    input_width = 299
    input_height = 299
    input_depth = 3
    resized_input_tensor_name = 'Mul:0'
    model_file_name = 'classify_image_graph_def.pb'
    input_mean = 128
    input_std = 128
  elif architecture.startswith('mobilenet_'):
    parts = architecture.split('_')
    if len(parts) != 3 and len(parts) != 4:
      tf.logging.error("Couldn't understand architecture name '%s'",
                       architecture)
      return None
    version_string = parts[1]
    if (version_string != '1.0' and version_string != '0.75' and
        version_string != '0.50' and version_string != '0.25'):
      tf.logging.error(
          """"The Mobilenet version should be '1.0', '0.75', '0.50', or '0.25',
  but found '%s' for architecture '%s'""",
          version_string, architecture)
      return None
    size_string = parts[2]
    if (size_string != '224' and size_string != '192' and
        size_string != '160' and size_string != '128'):
      tf.logging.error(
          """The Mobilenet input size should be '224', '192', '160', or '128',
 but found '%s' for architecture '%s'""",
          size_string, architecture)
      return None
    if len(parts) == 3:
      is_quantized = False
    else:
      if parts[3] != 'quantized':
        tf.logging.error(
            "Couldn't understand architecture suffix '%s' for '%s'", parts[3],
            architecture)
        return None
      is_quantized = True
    data_url = 'http://download.tensorflow.org/models/mobilenet_v1_'
    data_url += version_string + '_' + size_string + '_frozen.tgz'
    bottleneck_tensor_name = 'MobilenetV1/Predictions/Reshape:0'
    bottleneck_tensor_size = 1001
    input_width = int(size_string)
    input_height = int(size_string)
    input_depth = 3
    resized_input_tensor_name = 'input:0'
    if is_quantized:
      model_base_name = 'quantized_graph.pb'
    else:
      model_base_name = 'frozen_graph.pb'
    model_dir_name = 'mobilenet_v1_' + version_string + '_' + size_string
    model_file_name = os.path.join(model_dir_name, model_base_name)
    input_mean = 127.5
    input_std = 127.5
  else:
    tf.logging.error("Couldn't understand architecture name '%s'", architecture)
    raise ValueError('Unknown architecture', architecture)

  return {
      'data_url': data_url,
      'bottleneck_tensor_name': bottleneck_tensor_name,
      'bottleneck_tensor_size': bottleneck_tensor_size,
      'input_width': input_width,
      'input_height': input_height,
      'input_depth': input_depth,
      'resized_input_tensor_name': resized_input_tensor_name,
      'model_file_name': model_file_name,
      'input_mean': input_mean,
      'input_std': input_std,
  }

def maybe_download_and_extract(data_url, model_dir):
  """Download and extract model tar file.

  If the pretrained model we're using doesn't already exist, this function
  downloads it from the TensorFlow.org website and unpacks it into a directory.

  Args:
    data_url: Web location of the tar file containing the pretrained model.
  """
  dest_directory = model_dir
  if not os.path.exists(dest_directory):
    os.makedirs(dest_directory)
  filename = data_url.split('/')[-1]
  filepath = os.path.join(dest_directory, filename)
  if not os.path.exists(filepath):

    def _progress(count, block_size, total_size):
      sys.stdout.write('\r>> Downloading %s %.1f%%' %
                       (filename,
                        float(count * block_size) / float(total_size) * 100.0))
      sys.stdout.flush()

    filepath, _ = urllib.request.urlretrieve(data_url, filepath, _progress)
    print()
    statinfo = os.stat(filepath)
    tf.logging.info('Successfully downloaded {}, {} bytes.'.format(filename, statinfo.st_size))
  tarfile.open(filepath, 'r:gz').extractall(dest_directory)
  
def create_model_graph(model_info, model_dir):
  """"Creates a graph from saved GraphDef file and returns a Graph object.

  Args:
    model_info: Dictionary containing information about the model architecture.

  Returns:
    Graph holding the trained Inception network, and various tensors we'll be
    manipulating.
  """
  with tf.Graph().as_default() as graph:
    model_path = os.path.join(model_dir, model_info['model_file_name'])
    with gfile.FastGFile(model_path, 'rb') as f:
      graph_def = tf.GraphDef()
      graph_def.ParseFromString(f.read())
      bottleneck_tensor, resized_input_tensor = (tf.import_graph_def(
          graph_def,
          name='',
          return_elements=[
              model_info['bottleneck_tensor_name'],
              model_info['resized_input_tensor_name'],
          ]))
  return graph, bottleneck_tensor, resized_input_tensor

# Gather information about the model architecture we'll be using.
architecture='mobilenet_0.50_224'
model_info = create_model_info(architecture=architecture)

# Set up the pre-trained graph.
model_dir='tf_files/models/'
maybe_download_and_extract(model_info['data_url'], model_dir)
graph, bottleneck_tensor, resized_image_tensor = (create_model_graph(model_info, model_dir))

>> Downloading mobilenet_v1_0.50_224_frozen.tgz 100.1%()
INFO:tensorflow:Successfully downloaded mobilenet_v1_0.50_224_frozen.tgz, 6308169 bytes.


## Exercise 1: Generate training, validation and test datasets

All the candies images we dispose for this training are located in the Cloud Storage folder **gs://candies-ml/dataset/v1/images**. Images are further organized into subfolders by class: almondjoy, bounty, dove, mars, milkyway, snickers, threemusketeers, twix.

Your first task is to separate all the candies images into 3 distinct subsets:
* Training (70%): the dataset on which cross-entropy loss will be minimized through gradient descent during training
* Validation (15%): the dataset on which the accuracy and the cross entropy of the model will be measured periodically during training
* Test (15%): the dataset on which the final accuracy of the model will be calculated after training

In order to achieve this you will need to fill in the method **create_image_lists** by adding the image ('base_name' variable) consistently in either the training set, validation set or test set based on a probability calculated from a hash of the image file name. 

In [None]:
def create_image_lists(image_dir, testing_percentage, validation_percentage):
  """Builds a list of training images from the file system.

  Analyzes the sub folders in the image directory, splits them into stable
  training, testing, and validation sets, and returns a data structure
  describing the lists of images for each label and their paths.

  Args:
    image_dir: String path to a folder containing subfolders of images.
    testing_percentage: Integer percentage of the images to reserve for tests.
    validation_percentage: Integer percentage of images reserved for validation.

  Returns:
    A dictionary containing an entry for each label subfolder, with images split
    into training, testing, and validation sets within each label.
  """
  if not gfile.Exists(image_dir):
    tf.logging.error("Image directory '" + image_dir + "' not found.")
    return None
  result = collections.OrderedDict()
  sub_dirs = gfile.ListDirectory(image_dir)
  sub_dirs = sorted(
    item for item in sub_dirs
    if gfile.IsDirectory(os.path.join(image_dir, item)))
  print(sub_dirs)
  for dir_name in sub_dirs:
    extensions = ['jpg', 'jpeg', 'JPG', 'JPEG']
    file_list = []
    if dir_name == image_dir:
      continue
    tf.logging.info("Looking for images in '" + dir_name + "'")

    for extension in extensions:
      file_glob = os.path.join(image_dir, dir_name, '*.' + extension)
      file_list.extend(gfile.Glob(file_glob))
    if not file_list:
      tf.logging.warning('No files found')
      continue
    if len(file_list) < 20:
      tf.logging.warning(
          'WARNING: Folder has less than 20 images, which may cause issues.')
    elif len(file_list) > MAX_NUM_IMAGES_PER_CLASS:
      tf.logging.warning(
          'WARNING: Folder {} has more than {} images. Some images will '
          'never be selected.'.format(dir_name, MAX_NUM_IMAGES_PER_CLASS))
    label_name = re.sub(r'[^a-z0-9]+', ' ', dir_name.lower())
    training_images = []
    testing_images = []
    validation_images = []
    for file_name in file_list:
      base_name = os.path.basename(file_name)
      hash_name = re.sub(r'_nohash_.*$', '', file_name)
      hash_name_hashed = hashlib.sha1(compat.as_bytes(hash_name)).hexdigest()
      percentage_hash = ((int(hash_name_hashed, 16) % (MAX_NUM_IMAGES_PER_CLASS + 1)) * (100.0 / MAX_NUM_IMAGES_PER_CLASS))
      
      # Based on the percentage_hash probability add the image (base_name)
      # in a consistent manner in either the validation set, test set or
      # training set. 
      # <YOUR CODE HERE>
      raise NotImplementedError()
      
    result[label_name] = {
        'dir': dir_name,
        'training': training_images,
        'testing': testing_images,
        'validation': validation_images,
    }
  return result

image_dir='gs://candies-ml/dataset/v1/images'
testing_percentage=15
validation_percentage=15
image_lists = create_image_lists(image_dir=image_dir,
                                 testing_percentage=testing_percentage,
                                 validation_percentage=validation_percentage)

### Solution
Display the content of the box below to see the solution to the exercise.

In [None]:
def create_image_lists(image_dir, testing_percentage, validation_percentage):
  """Builds a list of training images from the file system.

  Analyzes the sub folders in the image directory, splits them into stable
  training, testing, and validation sets, and returns a data structure
  describing the lists of images for each label and their paths.

  Args:
    image_dir: String path to a folder containing subfolders of images.
    testing_percentage: Integer percentage of the images to reserve for tests.
    validation_percentage: Integer percentage of images reserved for validation.

  Returns:
    A dictionary containing an entry for each label subfolder, with images split
    into training, testing, and validation sets within each label.
  """
  if not gfile.Exists(image_dir):
    tf.logging.error("Image directory '" + image_dir + "' not found.")
    return None
  result = collections.OrderedDict()
  sub_dirs = gfile.ListDirectory(image_dir)
  sub_dirs = sorted(
    item for item in sub_dirs
    if gfile.IsDirectory(os.path.join(image_dir, item)))
  print(sub_dirs)
  for dir_name in sub_dirs:
    extensions = ['jpg', 'jpeg', 'JPG', 'JPEG']
    file_list = []
    if dir_name == image_dir:
      continue
    tf.logging.info("Looking for images in '" + dir_name + "'")

    for extension in extensions:
      file_glob = os.path.join(image_dir, dir_name, '*.' + extension)
      file_list.extend(gfile.Glob(file_glob))
    if not file_list:
      tf.logging.warning('No files found')
      continue
    if len(file_list) < 20:
      tf.logging.warning(
          'WARNING: Folder has less than 20 images, which may cause issues.')
    elif len(file_list) > MAX_NUM_IMAGES_PER_CLASS:
      tf.logging.warning(
          'WARNING: Folder {} has more than {} images. Some images will '
          'never be selected.'.format(dir_name, MAX_NUM_IMAGES_PER_CLASS))
    label_name = re.sub(r'[^a-z0-9]+', ' ', dir_name.lower())
    training_images = []
    testing_images = []
    validation_images = []
    for file_name in file_list:
      base_name = os.path.basename(file_name)
      hash_name = re.sub(r'_nohash_.*$', '', file_name)
      hash_name_hashed = hashlib.sha1(compat.as_bytes(hash_name)).hexdigest()
      percentage_hash = ((int(hash_name_hashed, 16) % (MAX_NUM_IMAGES_PER_CLASS + 1)) * (100.0 / MAX_NUM_IMAGES_PER_CLASS))
      if percentage_hash < validation_percentage:
        validation_images.append(base_name)
      elif percentage_hash < (testing_percentage + validation_percentage):
        testing_images.append(base_name)
      else:
        training_images.append(base_name)
        
    result[label_name] = {
        'dir': dir_name,
        'training': training_images,
        'testing': testing_images,
        'validation': validation_images,
    }
  return result

image_dir='gs://candies-ml/dataset/v1/images'
testing_percentage=15
validation_percentage=15
image_lists = create_image_lists(image_dir=image_dir,
                                 testing_percentage=testing_percentage,
                                 validation_percentage=validation_percentage)

## Extract bottlenecks for all images
You are now ready to generate the bottlenecks, or image feature vectors, for all images in your training, validation and test datasets respectively. Using the MobileNet TensorFlow graph loaded in the **graph** global variable and **model_info** variable containing information about the input images resolution expected by the MobileNet model, the code below runs a TensorFlow session on the loaded graph which does the following:
* Converts each image to the expected resolution (224px in our case)
* Extracts the bottleneck features by running the loaded graph on the image
* Saves the bottleneck features in a file for each image
<p>

**This is the longest step in the training process and may take up to 10 minutes. Please be patient.**
<p>
Display the content of the box if you want to have a closer look at the code.

In [None]:
def get_image_path(image_lists, label_name, index, image_dir, category):
  """"Returns a path to an image for a label at the given index.

  Args:
    image_lists: Dictionary of training images for each label.
    label_name: Label string we want to get an image for.
    index: Int offset of the image we want. This will be moduloed by the
    available number of images for the label, so it can be arbitrarily large.
    image_dir: Root folder string of the subfolders containing the training
    images.
    category: Name string of set to pull images from - training, testing, or
    validation.

  Returns:
    File system path string to an image that meets the requested parameters.

  """
  if label_name not in image_lists:
    tf.logging.fatal('Label does not exist %s.', label_name)
  label_lists = image_lists[label_name]
  if category not in label_lists:
    tf.logging.fatal('Category does not exist %s.', category)
  category_list = label_lists[category]
  if not category_list:
    tf.logging.fatal('Label %s has no images in the category %s.',
                     label_name, category)
  mod_index = index % len(category_list)
  base_name = category_list[mod_index]
  sub_dir = label_lists['dir']
  full_path = os.path.join(image_dir, sub_dir, base_name)
  return full_path


def get_bottleneck_path(image_lists, label_name, index, bottleneck_dir,
                        category, architecture):
  """"Returns a path to a bottleneck file for a label at the given index.

  Args:
    image_lists: Dictionary of training images for each label.
    label_name: Label string we want to get an image for.
    index: Integer offset of the image we want. This will be moduloed by the
    available number of images for the label, so it can be arbitrarily large.
    bottleneck_dir: Folder string holding cached files of bottleneck values.
    category: Name string of set to pull images from - training, testing, or
    validation.
    architecture: The name of the model architecture.

  Returns:
    File system path string to an image that meets the requested parameters.
  """
  return get_image_path(image_lists, label_name, index, bottleneck_dir,
                        category) + '_' + architecture + '.txt'


def run_bottleneck_on_image(sess, image_data, image_data_tensor,
                            decoded_image_tensor, resized_input_tensor,
                            bottleneck_tensor):
  """Runs inference on an image to extract the 'bottleneck' summary layer.

  Args:
    sess: Current active TensorFlow Session.
    image_data: String of raw JPEG data.
    image_data_tensor: Input data layer in the graph.
    decoded_image_tensor: Output of initial image resizing and  preprocessing.
    resized_input_tensor: The input node of the recognition graph.
    bottleneck_tensor: Layer before the final softmax.

  Returns:
    Numpy array of bottleneck values.
  """
  # First decode the JPEG image, resize it, and rescale the pixel values.
  resized_input_values = sess.run(decoded_image_tensor,
                                  {image_data_tensor: image_data})
  # Then run it through the recognition network.
  bottleneck_values = sess.run(bottleneck_tensor,
                               {resized_input_tensor: resized_input_values})
  bottleneck_values = np.squeeze(bottleneck_values)
  return bottleneck_values


bottleneck_path_2_bottleneck_values = {}


def create_bottleneck_file(bottleneck_path, image_lists, label_name, index,
                           image_dir, category, sess, jpeg_data_tensor,
                           decoded_image_tensor, resized_input_tensor,
                           bottleneck_tensor):
  """Create a single bottleneck file."""
  tf.logging.info('Creating bottleneck at ' + bottleneck_path)
  image_path = get_image_path(image_lists, label_name, index,
                              image_dir, category)
  if not gfile.Exists(image_path):
    tf.logging.fatal('File does not exist %s', image_path)
  image_data = gfile.FastGFile(image_path, 'rb').read()
  try:
    bottleneck_values = run_bottleneck_on_image(
        sess, image_data, jpeg_data_tensor, decoded_image_tensor,
        resized_input_tensor, bottleneck_tensor)
  except Exception as e:
    raise RuntimeError('Error during processing file %s (%s)' % (image_path,
                                                                 str(e)))
  bottleneck_string = ','.join(str(x) for x in bottleneck_values)
  with open(bottleneck_path, 'w') as bottleneck_file:
    bottleneck_file.write(bottleneck_string)


def get_or_create_bottleneck(sess, image_lists, label_name, index, image_dir,
                             category, bottleneck_dir, jpeg_data_tensor,
                             decoded_image_tensor, resized_input_tensor,
                             bottleneck_tensor, architecture):
  """Retrieves or calculates bottleneck values for an image.

  If a cached version of the bottleneck data exists on-disk, return that,
  otherwise calculate the data and save it to disk for future use.

  Args:
    sess: The current active TensorFlow Session.
    image_lists: Dictionary of training images for each label.
    label_name: Label string we want to get an image for.
    index: Integer offset of the image we want. This will be modulo-ed by the
    available number of images for the label, so it can be arbitrarily large.
    image_dir: Root folder string  of the subfolders containing the training
    images.
    category: Name string of which  set to pull images from - training, testing,
    or validation.
    bottleneck_dir: Folder string holding cached files of bottleneck values.
    jpeg_data_tensor: The tensor to feed loaded jpeg data into.
    decoded_image_tensor: The output of decoding and resizing the image.
    resized_input_tensor: The input node of the recognition graph.
    bottleneck_tensor: The output tensor for the bottleneck values.
    architecture: The name of the model architecture.

  Returns:
    Numpy array of values produced by the bottleneck layer for the image.
  """
  label_lists = image_lists[label_name]
  sub_dir = label_lists['dir']
  sub_dir_path = os.path.join(bottleneck_dir, sub_dir)
  ensure_dir_exists(sub_dir_path)
  bottleneck_path = get_bottleneck_path(image_lists, label_name, index,
                                        bottleneck_dir, category, architecture)
  if not os.path.exists(bottleneck_path):
    create_bottleneck_file(bottleneck_path, image_lists, label_name, index,
                           image_dir, category, sess, jpeg_data_tensor,
                           decoded_image_tensor, resized_input_tensor,
                           bottleneck_tensor)
  with open(bottleneck_path, 'r') as bottleneck_file:
    bottleneck_string = bottleneck_file.read()
  did_hit_error = False
  try:
    bottleneck_values = [float(x) for x in bottleneck_string.split(',')]
  except ValueError:
    tf.logging.warning('Invalid float found, recreating bottleneck')
    did_hit_error = True
  if did_hit_error:
    create_bottleneck_file(bottleneck_path, image_lists, label_name, index,
                           image_dir, category, sess, jpeg_data_tensor,
                           decoded_image_tensor, resized_input_tensor,
                           bottleneck_tensor)
    with open(bottleneck_path, 'r') as bottleneck_file:
      bottleneck_string = bottleneck_file.read()
    # Allow exceptions to propagate here, since they shouldn't happen after a
    # fresh creation
    bottleneck_values = [float(x) for x in bottleneck_string.split(',')]
  return bottleneck_values, bottleneck_string


def add_jpeg_decoding(input_width, input_height, input_depth, input_mean,
                      input_std):
  """Adds operations that perform JPEG decoding and resizing to the graph..

  Args:
    input_width: Desired width of the image fed into the recognizer graph.
    input_height: Desired width of the image fed into the recognizer graph.
    input_depth: Desired channels of the image fed into the recognizer graph.
    input_mean: Pixel value that should be zero in the image for the graph.
    input_std: How much to divide the pixel values by before recognition.

  Returns:
    Tensors for the node to feed JPEG data into, and the output of the
      preprocessing steps.
  """
  jpeg_data = tf.placeholder(tf.string, name='DecodeJPGInput')
  decoded_image = tf.image.decode_jpeg(jpeg_data, channels=input_depth)
  decoded_image_as_float = tf.cast(decoded_image, dtype=tf.float32)
  decoded_image_4d = tf.expand_dims(decoded_image_as_float, 0)
  resize_shape = tf.stack([input_height, input_width])
  resize_shape_as_int = tf.cast(resize_shape, dtype=tf.int32)
  resized_image = tf.image.resize_bilinear(decoded_image_4d,
                                           resize_shape_as_int)
  offset_image = tf.subtract(resized_image, input_mean)
  mul_image = tf.multiply(offset_image, 1.0 / input_std)
  return jpeg_data, mul_image


def cache_bottlenecks(sess, image_lists, image_dir, bottleneck_dir,
                      jpeg_data_tensor, decoded_image_tensor,
                      resized_input_tensor, bottleneck_tensor, architecture):
  """Ensures all the training, testing, and validation bottlenecks are cached.

  Because we're likely to read the same image multiple times it can speed
  things up a lot if we calculate the bottleneck layer values once for each
  image during preprocessing, and then just read those cached values
  repeatedly during training. Here we go through all the images we've found,
  calculate those values, and save them off.

  Args:
    sess: The current active TensorFlow Session.
    image_lists: Dictionary of training images for each label.
    image_dir: Root folder string of the subfolders containing the training
    images.
    bottleneck_dir: Folder string holding cached files of bottleneck values.
    jpeg_data_tensor: Input tensor for jpeg data from file.
    decoded_image_tensor: The output of decoding and resizing the image.
    resized_input_tensor: The input node of the recognition graph.
    bottleneck_tensor: The penultimate output layer of the graph.
    architecture: The name of the model architecture.

  Returns:
    Nothing.
  """
  how_many_bottlenecks = 0
  ensure_dir_exists(bottleneck_dir)
  bottleneck_gcs_dir = os.path.join(bucket, 'bottlenecks')
  for category in ['training', 'testing', 'validation']:
    bottlenecks = []
    for label_name, label_lists in image_lists.items():
      category_list = label_lists[category]
      for index, unused_base_name in enumerate(category_list):
        _, bottleneck_string = get_or_create_bottleneck(
            sess, image_lists, label_name, index, image_dir, category,
            bottleneck_dir, jpeg_data_tensor, decoded_image_tensor,
            resized_input_tensor, bottleneck_tensor, architecture)
        bottlenecks.append('%s,%s\n' % (bottleneck_string, label_name.strip()))

        how_many_bottlenecks += 1
        if how_many_bottlenecks % 100 == 0:
          tf.logging.info(str(how_many_bottlenecks) + ' bottleneck files created.')
    
    # Saving bottlenecks on GCS by category for Cloud ML Engine training
    bottleneck_gcs_path = os.path.join(bottleneck_gcs_dir, '%s.csv' % category)
    tf.logging.info('Writing %s bottlenecks on GCS to %s.' % (category, bottleneck_gcs_path))
    with gfile.FastGFile(bottleneck_gcs_path, 'w') as bottleneck_gcs_file:
      for bottleneck_string in bottlenecks:
        bottleneck_gcs_file.write(bottleneck_string)
      bottleneck_gcs_file.close()


bottleneck_dir='/tmp/bottleneck'
with tf.Session(graph=graph) as sess:
  # Set up the image decoding sub-graph.
  jpeg_data_tensor, decoded_image_tensor = add_jpeg_decoding(
      model_info['input_width'], model_info['input_height'],
      model_info['input_depth'], model_info['input_mean'],
      model_info['input_std'])

  # We'll make sure we've calculated the 'bottleneck' image summaries and
  # cached them on disk.
  cache_bottlenecks(sess, image_lists, image_dir,
                    bottleneck_dir, jpeg_data_tensor,
                    decoded_image_tensor, resized_image_tensor,
                    bottleneck_tensor, architecture)

## Code the model

### Exercise 2: Retrain the last layer of the model
After the bottlenecks have been extracted for each image of the dataset, retraining the last layer of the model implies the following:
* Applying [softmax](https://en.wikipedia.org/wiki/Softmax_function) to calculate the predictions of the last leayer for each one of the new classes
* Calculating the [cross entropy](https://en.wikipedia.org/wiki/Cross_entropy) loss with respect to the ground truth (the actual candies labels)
* Adjusting the weights and biases of the last layer through gradient descent in order to minimize the cross entropy loss

<p>
Your task in this exercise is to fully code the training of the last layer by following the indications in the code below.
<p>
*Please remember that TensorFlow has a deferred execution model. Therefore, the code you will write in this section will be executed in the **Run the training** section below.*

In [None]:
def variable_summaries(var):
  """Attach a lot of summaries to a Tensor (for TensorBoard visualization)."""
  with tf.name_scope('summaries'):
    mean = tf.reduce_mean(var)
    tf.summary.scalar('mean', mean)
    with tf.name_scope('stddev'):
      stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
    tf.summary.scalar('stddev', stddev)
    tf.summary.scalar('max', tf.reduce_max(var))
    tf.summary.scalar('min', tf.reduce_min(var))
    tf.summary.histogram('histogram', var)


def add_final_training_ops(class_count, final_tensor_name, bottleneck_tensor,
                           bottleneck_tensor_size, learning_rate):
  """Adds a new softmax and fully-connected layer for training.

  We need to retrain the top layer to identify our new classes, so this function
  adds the right operations to the graph, along with some variables to hold the
  weights, and then sets up all the gradients for the backward pass.

  The set up for the softmax and fully-connected layers is based on:
  https://www.tensorflow.org/versions/master/tutorials/mnist/beginners/index.html

  Args:
    class_count: Integer of how many categories of things we're trying to
    recognize.
    final_tensor_name: Name string for the new final node that produces results.
    bottleneck_tensor: The output of the main CNN graph.
    bottleneck_tensor_size: How many entries in the bottleneck vector.

  Returns:
    The tensors for the training and cross entropy results, and tensors for the
    bottleneck input and ground truth input.
  """
  with tf.name_scope('input'):
    bottleneck_input = tf.placeholder_with_default(
        bottleneck_tensor,
        shape=[None, bottleneck_tensor_size],
        name='BottleneckInputPlaceholder')

    ground_truth_input = tf.placeholder(tf.float32,
                                        [None, class_count],
                                        name='GroundTruthInput')

  # Organizing the following ops as `final_training_ops` so they're easier
  # to see in TensorBoard
  layer_name = 'final_training_ops'
  with tf.name_scope(layer_name):
    
    with tf.name_scope('weights'):
      
      # Create the 'layer_weights' tensor variable with the following characteristics:
      # - size: entries_in_the_bottleneck_vector x number_of_classes
      # - initial values: truncated normal distribution of 0.001 standard deviation
      # - name: "final_weights"
      # <YOUR CODE HERE>
      raise NotImplementedError()
      
      variable_summaries(layer_weights)
      
    with tf.name_scope('biases'):
      
      # Create the 'layer_biases' tensor variable with the following characteristics:
      # - size: number of classes
      # - initial values: 0
      # - name: "final_biases"
      # <YOUR CODE HERE>
      raise NotImplementedError()
      
      variable_summaries(layer_biases)
      
    with tf.name_scope('logits'):
      
      # Create the 'logits' operation for calculating the last layer logits:
      # logits = input * weights + biases
      # <YOUR CODE HERE>
      raise NotImplementedError()
      
      tf.summary.histogram('pre_activations', logits)
  
  # Create the 'final_tensor' representing the predictions of the last
  # layer by applying softmax on the logits previously calculated. Name
  # it with the value of the parameter 'final_tensor_name'. 
  # <YOUR CODE HERE>
  raise NotImplementedError()
  
  tf.summary.histogram('activations', final_tensor)

  with tf.name_scope('cross_entropy'):
    
    # Calculate the cross entropy loss as the difference between
    # the predicted labels and the ground truth.
    # Hint: tf.nn.softmax_cross_entropy_with_logits()
    # <YOUR CODE HERE>
    raise NotImplementedError()
    
    with tf.name_scope('total'):

      # Average the cross entropy over all the examples in the
      # current batch.
      # Hint: tf.reduce_mean()
      # <YOUR CODE HERE>
      raise NotImplementedError()
      
  tf.summary.scalar('cross_entropy', cross_entropy_mean)

  with tf.name_scope('train'):

    # Create a gradient descent optimizer with the given learning
    # rate. Create the 'training_step' operation that minimises 
    # the mean cross entropy with gradient descent.
    # <YOUR CODE HERE>
    raise NotImplementedError()

  return (train_step, cross_entropy_mean, bottleneck_input, ground_truth_input,
          final_tensor)

### Solution
Display the content of the box below to see the solution to the exercise.

In [12]:
def variable_summaries(var):
  """Attach a lot of summaries to a Tensor (for TensorBoard visualization)."""
  with tf.name_scope('summaries'):
    mean = tf.reduce_mean(var)
    tf.summary.scalar('mean', mean)
    with tf.name_scope('stddev'):
      stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
    tf.summary.scalar('stddev', stddev)
    tf.summary.scalar('max', tf.reduce_max(var))
    tf.summary.scalar('min', tf.reduce_min(var))
    tf.summary.histogram('histogram', var)


def add_final_training_ops(class_count, final_tensor_name, bottleneck_tensor,
                           bottleneck_tensor_size, learning_rate):
  """Adds a new softmax and fully-connected layer for training.

  We need to retrain the top layer to identify our new classes, so this function
  adds the right operations to the graph, along with some variables to hold the
  weights, and then sets up all the gradients for the backward pass.

  The set up for the softmax and fully-connected layers is based on:
  https://www.tensorflow.org/versions/master/tutorials/mnist/beginners/index.html

  Args:
    class_count: Integer of how many categories of things we're trying to
    recognize.
    final_tensor_name: Name string for the new final node that produces results.
    bottleneck_tensor: The output of the main CNN graph.
    bottleneck_tensor_size: How many entries in the bottleneck vector.

  Returns:
    The tensors for the training and cross entropy results, and tensors for the
    bottleneck input and ground truth input.
  """
  with tf.name_scope('input'):
    bottleneck_input = tf.placeholder_with_default(
        bottleneck_tensor,
        shape=[None, bottleneck_tensor_size],
        name='BottleneckInputPlaceholder')

    ground_truth_input = tf.placeholder(tf.float32,
                                        [None, class_count],
                                        name='GroundTruthInput')

  # Organizing the following ops as `final_training_ops` so they're easier
  # to see in TensorBoard
  layer_name = 'final_training_ops'
  with tf.name_scope(layer_name):
    with tf.name_scope('weights'):
      initial_value = tf.truncated_normal([bottleneck_tensor_size, class_count], stddev=0.001)
      layer_weights = tf.Variable(initial_value, name='final_weights')
      variable_summaries(layer_weights)
    with tf.name_scope('biases'):
      layer_biases = tf.Variable(tf.zeros([class_count]), name='final_biases')
      variable_summaries(layer_biases)
    with tf.name_scope('logits'):
      logits = tf.matmul(bottleneck_input, layer_weights) + layer_biases
      tf.summary.histogram('pre_activations', logits)

  final_tensor = tf.nn.softmax(logits, name=final_tensor_name)
  tf.summary.histogram('activations', final_tensor)

  with tf.name_scope('cross_entropy'):
    cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
        labels=ground_truth_input, logits=logits)
    with tf.name_scope('total'):
      cross_entropy_mean = tf.reduce_mean(cross_entropy)
  tf.summary.scalar('cross_entropy', cross_entropy_mean)

  with tf.name_scope('train'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    train_step = optimizer.minimize(cross_entropy_mean)

  return (train_step, cross_entropy_mean, bottleneck_input, ground_truth_input,
          final_tensor)

### Exercise 3: Calculate the accuracy
**Accuracy** is one of the metrics of success of our model. It is defined as the percentage of predictions generated by our model that are correct.
<p>
Your task in this exercise is to code the calculation of the accuracy on a given dataset following the indications in the code below.

*Please remember that TensorFlow has a deferred execution model. Therefore, the code you will write in this section will be executed in the **Run the training** section below.*

In [None]:
def add_evaluation_step(result_tensor, ground_truth_tensor):
  """Inserts the operations we need to evaluate the accuracy of our results.

  Args:
    result_tensor: The new final node that produces results.
    ground_truth_tensor: The node we feed ground truth data
    into.

  Returns:
    Tuple of (evaluation step, prediction).
  """
  with tf.name_scope('accuracy'):
    with tf.name_scope('correct_prediction'):
      
      # Create a 'prediction' tensor containing the predicted class. Remember
      # that the 'result_tensor' contains a probability distribution of predictions
      # for all classes for all examples in the batch and the class with the
      # highest probability is considered to be the predicted one.
      # <YOUR CODE HERE>
      raise NotImplementedError()
            
      # Calculate all the correct predictions by comparing the 'prediction' tnesor
      # with the ground_truth_tensor.
      # <YOUR CODE HERE>
      raise NotImplementedError()
      
    with tf.name_scope('accuracy'):
      
      # Create an 'evaluation_step' averaging the correct prediction over all
      # examples in the batch.
      # <YOUR CODE HERE>
      raise NotImplementedError()
      
  tf.summary.scalar('accuracy', evaluation_step)
  return evaluation_step, prediction

### Solution
Display the content of the box below to see the solution to the exercise.

In [13]:
def add_evaluation_step(result_tensor, ground_truth_tensor):
  """Inserts the operations we need to evaluate the accuracy of our results.

  Args:
    result_tensor: The new final node that produces results.
    ground_truth_tensor: The node we feed ground truth data
    into.

  Returns:
    Tuple of (evaluation step, prediction).
  """
  with tf.name_scope('accuracy'):
    with tf.name_scope('correct_prediction'):
      prediction = tf.argmax(result_tensor, 1)
      correct_prediction = tf.equal(
          prediction, tf.argmax(ground_truth_tensor, 1))
    with tf.name_scope('accuracy'):
      evaluation_step = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
  tf.summary.scalar('accuracy', evaluation_step)
  return evaluation_step, prediction

### Save the graph to a file
The last step of the training is to save the TensorFlow graph in a file for serving or deployment on mobile devices. Please run the code in the box below to create the method that will save the TensorFlow graph in a file at the end of the training.
<p>
Display the content of the box if you want to have a closer look at the code.
<p>
*Please remember that TensorFlow has a deferred execution model. Therefore, the code you will write in this section will be executed in the **Run the training** section below.*

In [14]:
def save_graph_to_file(sess, graph, graph_file_name, final_tensor_name):
  output_graph_def = graph_util.convert_variables_to_constants(
      sess, graph.as_graph_def(), [final_tensor_name])
  with gfile.FastGFile(graph_file_name, 'wb') as f:
    f.write(output_graph_def.SerializeToString())
  return

## Train the model

### Run TensorBoard
Before launching the training run [TensorBoard](https://www.tensorflow.org/programmers_guide/summaries_and_tensorboard), the learning visualization tool for TensorFlow. TensorBoard allows you to visualize precise metrics of your training, in our case **accuracy** and **cross entropy**.

In [15]:
from google.datalab.ml import TensorBoard
tb_id = TensorBoard.start('tf_files/training_summaries/mobilenet_0.50_224')

### Run the training
The training of the model will run for 1000 steps. At each step the weights and biases of the last layer will be adjusted to decrease the cross entropy loss on the training set.
<p>
Every 10 steps the accuracy and the cross entropy loss will be calculated on the validation set.
<p>
At the end of the training the final accuracy will be calculated on the test set and all misclassified images will be listed.

In [None]:
def get_random_cached_bottlenecks(sess, image_lists, how_many, category,
                                  bottleneck_dir, image_dir, jpeg_data_tensor,
                                  decoded_image_tensor, resized_input_tensor,
                                  bottleneck_tensor, architecture):
  """Retrieves bottleneck values for cached images directly from disk. It
  picks a random set of images from the specified category.

  Args:
    sess: Current TensorFlow Session.
    image_lists: Dictionary of training images for each label.
    how_many: If positive, a random sample of this size will be chosen.
    If negative, all bottlenecks will be retrieved.
    category: Name string of which set to pull from - training, testing, or
    validation.
    bottleneck_dir: Folder string holding cached files of bottleneck values.
    image_dir: Root folder string of the subfolders containing the training
    images.
    jpeg_data_tensor: The layer to feed jpeg image data into.
    decoded_image_tensor: The output of decoding and resizing the image.
    resized_input_tensor: The input node of the recognition graph.
    bottleneck_tensor: The bottleneck output layer of the CNN graph.
    architecture: The name of the model architecture.

  Returns:
    List of bottleneck arrays, their corresponding ground truths, and the
    relevant filenames.
  """
  class_count = len(image_lists.keys())
  bottlenecks = []
  ground_truths = []
  filenames = []
  if how_many >= 0:
    # Retrieve a random sample of bottlenecks.
    for unused_i in range(how_many):
      label_index = random.randrange(class_count)
      label_name = list(image_lists.keys())[label_index]
      image_index = random.randrange(MAX_NUM_IMAGES_PER_CLASS + 1)
      image_name = get_image_path(image_lists, label_name, image_index,
                                  image_dir, category)
      bottleneck, _ = get_or_create_bottleneck(
          sess, image_lists, label_name, image_index, image_dir, category,
          bottleneck_dir, jpeg_data_tensor, decoded_image_tensor,
          resized_input_tensor, bottleneck_tensor, architecture)
      ground_truth = np.zeros(class_count, dtype=np.float32)
      ground_truth[label_index] = 1.0
      bottlenecks.append(bottleneck)
      ground_truths.append(ground_truth)
      filenames.append(image_name)
  else:
    # Retrieve all bottlenecks.
    for label_index, label_name in enumerate(image_lists.keys()):
      for image_index, image_name in enumerate(
          image_lists[label_name][category]):
        image_name = get_image_path(image_lists, label_name, image_index,
                                    image_dir, category)
        bottleneck, _ = get_or_create_bottleneck(
            sess, image_lists, label_name, image_index, image_dir, category,
            bottleneck_dir, jpeg_data_tensor, decoded_image_tensor,
            resized_input_tensor, bottleneck_tensor, architecture)
        ground_truth = np.zeros(class_count, dtype=np.float32)
        ground_truth[label_index] = 1.0
        bottlenecks.append(bottleneck)
        ground_truths.append(ground_truth)
        filenames.append(image_name)
  return bottlenecks, ground_truths, filenames


def retrain(architecture='mobilenet_0.50_224',
            image_dir='gs://candies-ml/dataset/v2/images',
            output_graph='gs://candies-ml/model/test/retrained_graph.pb',
            output_labels='gs://candies-ml/model/test/retrained_labels.txt',
            final_tensor_name='final_result',
            how_many_training_steps=1000,
            eval_step_interval=10,
            learning_rate=0.01,
            train_batch_size=100,
            test_batch_size=-1,
            validation_batch_size=100,
            intermediate_store_frequency=0,
            intermediate_output_graphs_dir='/tmp/intermediate_graph/',
            bottleneck_dir='/tmp/bottleneck',
            summaries_dir='tf_files/training_summaries/mobilenet_0.50_224',         
            print_misclassified_test_images=True):

  # Look at the folder structure, and create lists of all the images.
  class_count = len(image_lists.keys())

  with tf.Session(graph=graph) as sess:

    # Add the new layer that we'll be training.
    (train_step, cross_entropy, bottleneck_input, ground_truth_input,
     final_tensor) = add_final_training_ops(
         len(image_lists.keys()), final_tensor_name, bottleneck_tensor,
         model_info['bottleneck_tensor_size'], learning_rate)

    # Create the operations we need to evaluate the accuracy of our new layer.
    evaluation_step, prediction = add_evaluation_step(
        final_tensor, ground_truth_input)

    # Merge all the summaries and write them out to the summaries_dir
    merged = tf.summary.merge_all()
    train_writer = tf.summary.FileWriter(summaries_dir + '/train', sess.graph)
    validation_writer = tf.summary.FileWriter(summaries_dir + '/validation')

    # Set up all our weights to their initial default values.
    init = tf.global_variables_initializer()
    sess.run(init)

    # Run the training for as many cycles as requested on the command line.
    for i in range(how_many_training_steps):
      # Get a batch of input bottleneck values from the cache stored on disk.
      (train_bottlenecks,
       train_ground_truth, _) = get_random_cached_bottlenecks(
           sess, image_lists, train_batch_size, 'training',
           bottleneck_dir, image_dir, jpeg_data_tensor,
           decoded_image_tensor, resized_image_tensor, bottleneck_tensor,
           architecture)
      
      # Feed the bottlenecks and ground truth into the graph, and run a training
      # step. Capture training summaries for TensorBoard with the `merged` op.
      train_summary, _ = sess.run(
          [merged, train_step],
          feed_dict={bottleneck_input: train_bottlenecks,
                     ground_truth_input: train_ground_truth})
      train_writer.add_summary(train_summary, i)

      # Every so often, print out how well the graph is training.
      is_last_step = (i + 1 == how_many_training_steps)
      if (i % eval_step_interval) == 0 or is_last_step:
        train_accuracy, cross_entropy_value = sess.run(
            [evaluation_step, cross_entropy],
            feed_dict={bottleneck_input: train_bottlenecks,
                       ground_truth_input: train_ground_truth})
        tf.logging.info('%s: Step %d: Train accuracy = %.1f%%' %
                        (datetime.now(), i, train_accuracy * 100))
        tf.logging.info('%s: Step %d: Cross entropy = %f' %
                        (datetime.now(), i, cross_entropy_value))
        validation_bottlenecks, validation_ground_truth, _ = (
            get_random_cached_bottlenecks(
                sess, image_lists, validation_batch_size, 'validation',
                bottleneck_dir, image_dir, jpeg_data_tensor,
                decoded_image_tensor, resized_image_tensor, bottleneck_tensor,
                architecture))
        # Run a validation step and capture training summaries for TensorBoard
        # with the `merged` op.
        validation_summary, validation_accuracy = sess.run(
            [merged, evaluation_step],
            feed_dict={bottleneck_input: validation_bottlenecks,
                       ground_truth_input: validation_ground_truth})
        validation_writer.add_summary(validation_summary, i)
        tf.logging.info('%s: Step %d: Validation accuracy = %.1f%% (N=%d)' %
                        (datetime.now(), i, validation_accuracy * 100,
                         len(validation_bottlenecks)))

      # Store intermediate results
      intermediate_frequency = intermediate_store_frequency

      if (intermediate_frequency > 0 and (i % intermediate_frequency == 0)
          and i > 0):
        intermediate_file_name = (intermediate_output_graphs_dir +
                                  'intermediate_' + str(i) + '.pb')
        tf.logging.info('Save intermediate result to : ' +
                        intermediate_file_name)
        save_graph_to_file(sess, graph, intermediate_file_name, final_tensor_name)

    # We've completed all our training, so run a final test evaluation on
    # some new images we haven't used before.
    test_bottlenecks, test_ground_truth, test_filenames = (
        get_random_cached_bottlenecks(
            sess, image_lists, test_batch_size, 'testing',
            bottleneck_dir, image_dir, jpeg_data_tensor,
            decoded_image_tensor, resized_image_tensor, bottleneck_tensor,
            architecture))
    test_accuracy, predictions = sess.run(
        [evaluation_step, prediction],
        feed_dict={bottleneck_input: test_bottlenecks,
                   ground_truth_input: test_ground_truth})
    tf.logging.info('Final test accuracy = %.1f%% (N=%d)' %
                    (test_accuracy * 100, len(test_bottlenecks)))

    if print_misclassified_test_images:
      tf.logging.info('=== MISCLASSIFIED TEST IMAGES ===')
      for i, test_filename in enumerate(test_filenames):
        if predictions[i] != test_ground_truth[i].argmax():
          tf.logging.info('%70s  %s' %
                          (test_filename,
                           list(image_lists.keys())[predictions[i]]))

    # Write out the trained graph and labels with the weights stored as
    # constants.
    save_graph_to_file(sess, graph, output_graph, final_tensor_name)
    with gfile.FastGFile(output_labels, 'w') as f:
      f.write('\n'.join(image_lists.keys()) + '\n')      

output_graph_path='%s/retrained_graph.pb' % bucket
output_labels_path='%s/retrained_labels.txt' % bucket
retrain(architecture=architecture,
        image_dir=image_dir,
        output_graph=output_graph_path,
        output_labels=output_labels_path,
        summaries_dir=summaries_dir,
        intermediate_output_graphs_dir=intermediate_output_graphs_dir,
        intermediate_store_frequency=intermediate_store_frequency,
        bottleneck_dir=bottleneck_dir)

## Exercise 4: Evaluate the model
As an exercise to understand the behavior of the model please try to answer the following questions:
* How do the accuracy and cross entropy behave on the training set? Could you explain why?
* How do the accuracy and the cross entropy behave on the validation set? Could you explain why?
* What is the final accuracy of the model on the test set?
* What images were misclassified? Could you explain why?

## Exercise 5: Train the model on Cloud ML Engine

In [22]:
import datetime
curr_date = '{:%Y%m%d%H%M}'.format(datetime.datetime.today())
train_file = 'gs://intelligent-candy-image-classifier/bottlenecks/training.csv'
eval_file = 'gs://intelligent-candy-image-classifier/bottlenecks/validation.csv'
train_steps = 1000
eval_steps = 100
job_name = 'retrain_mobilenets_candies_%s' % curr_date
job_dir = '%s/cloudml-training/%s' % (bucket, job_name)

gcloud_template_command = """ml-engine jobs submit training %s \
  --stream-logs \
  --runtime-version 1.4 \
  --job-dir %s \
  --module-name trainer.task \
  --package-path trainer/ \
  --region us-central1 \
  -- \
  --train-files %s \
  --eval-files %s \
  --train-steps %d \
  --eval-steps %d
"""
gcloud_command = gcloud_template_command % (job_name, job_dir, train_file, eval_file,
                                            train_steps, eval_steps)
print 'Submitting the Cloud ML Engine jow with the command: gcloud %s' % gcloud_command

!gcloud $gcloud_command

ml-engine jobs submit training retrain_mobilenets_candies_201805291432   --stream-logs   --runtime-version 1.4   --job-dir gs://intelligent-candy-image-classifier/cloudml-training/retrain_mobilenets_candies_201805291432   --module-name trainer.task   --package-path trainer/   --region us-central1   --   --train-files gs://intelligent-candy-image-classifier/bottlenecks/training.csv   --eval-files gs://intelligent-candy-image-classifier/bottlenecks/validation.csv   --train-steps 1000   --eval-steps 100

Job [retrain_mobilenets_candies_201805291432] submitted successfully.
INFO	2018-05-29 14:32:10 +0000	service		Validating job requirements...
INFO	2018-05-29 14:32:10 +0000	service		Job creation request has been successfully validated.
INFO	2018-05-29 14:32:11 +0000	service		Job retrain_mobilenets_candies_201805291432 is queued.
INFO	2018-05-29 14:32:11 +0000	service		Waiting for job to be provisioned.
INFO	2018-05-29 14:32:13 +0000	service		Waiting for TensorFlow to start.
INFO	2018-05-29

## Cleanup

In [None]:
TensorBoard.stop(tb_id)