![](img/nethone_logo_full_black.png)
![](img/daftcode_logo.jpg)

# First, let's look at the bottleneck activations

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
np.random.seed(0)

from sklearn.manifold import TSNE

from utils import read_bottlenecks

In [None]:
DF_activations = read_bottlenecks(model_name='mobilenet_1.0_224')

unique_labels = DF_activations['label'].unique()

We'll use **t-SNE** (t-distributed Stochastic Neighbor Embedding) for dimensionality reduction, to see whether the activation patterns in the last hidden layer (here called "bottleneck") can be grouped into clouds of points that (hopefully) correspond to the motorcycle types we're trying to identify.

(Note that *scikit-learn*'s TSNE by default uses the the Barnes-Hut approximation, which is much faster, but may lead to sub-optimal embeddings.)

Dimesionality reduction is a branch of the unsupervised machine learning field, which means that it does not look at the labels of the points (types of motorcycles). We'll use the labels only to color the points, after TSNE finds a mapping to the 2D space.

In [None]:
tsne = TSNE(n_components=2, init='pca', random_state=0, learning_rate=1)
DF_tsne_2D = pd.DataFrame(
    tsne.fit_transform(DF_activations.select_dtypes(include=[pd.np.number]))
)
DF_tsne_2D['label'] = DF_activations['label']

In [None]:
with plt.rc_context(dict(sns.axes_style("darkgrid"),
                         **sns.plotting_context("notebook", font_scale=2.2))):
    fg = sns.FacetGrid(
        data=DF_tsne_2D,
        hue='label',
        hue_order=unique_labels,
        aspect=1.3,
        size=10,)
    fg.map(plt.scatter, 0, 1, s=120).add_legend()

# Now, we'll see predictions for not yet seen images

Note, that we're using the `create_model_info` and `add_jpeg_decoding` functions from the "retrain.py" script. The first returns a dictionary describing the architecture (mainly tensor names that we then pass to `sess.graph.get_tensor_by_name`). The second function returns two tensors -- `jpeg_data_tensor` and `decoded_image_tensor` -- that correspond to: the raw string of the JPEG file, and to a resized, preprocessed image.

In [None]:
import glob
import sys
import tensorflow as tf

from IPython.display import Image, display

from utils import load_graph
sys.path.append('tensorflow/tensorflow/examples/image_retraining/')
from retrain import create_model_info, add_jpeg_decoding

mpl.rcParams.update(mpl.rcParamsDefault)
plt.rcParams["figure.figsize"] = (20, 20)

test_filenames = sorted(glob.glob('./test_images/*'))
with open('labels.txt') as labels_file:
    labels = [line.strip() for line in labels_file.readlines()]

In this presentation we'll only see predictions made by the "mobilenet_1.0_224" model, but you can carry out your own analysis by switching the `architecture` to "inception_v3". 

In [None]:
architecture = 'mobilenet_1.0_224' # 'inception_v3'
tf.reset_default_graph()
load_graph(architecture)

model_info = create_model_info(architecture)
jpeg_data_tensor, decoded_image_tensor = add_jpeg_decoding(
        model_info['input_width'], model_info['input_height'],
        model_info['input_depth'], model_info['input_mean'],
        model_info['input_std'])

Now we're ready to create a TensorFlow session, and to calculate the values of the tensors for particular input data (raw JPG strings), corresponding to images in the "test_images/" directory.

In [None]:
output_layer_name = 'final_result:0'
resized_input_tensor_name = model_info['resized_input_tensor_name']


with tf.Session() as sess:
    softmax_tensor = sess.graph.get_tensor_by_name(output_layer_name)
    resized_input_tensor = sess.graph.get_tensor_by_name(resized_input_tensor_name)

    for filename in test_filenames:
        with open(filename, 'rb') as image_file:
            image_data = image_file.read()

        image = sess.run(decoded_image_tensor, 
                         {jpeg_data_tensor: image_data})
        predictions, = sess.run(softmax_tensor, 
                                {resized_input_tensor: image})

        print('\n\n\n{}'.format(filename))
        display(Image(filename, width=1000))
        print('')

        top_k = predictions.argsort()[::-1]
        for node_id in top_k:
            human_string = labels[node_id]
            score = 100*predictions[node_id]
            print('{:6.2f}% for {}'.format(score, human_string))

## Let's see how a preprocessed image looks like

In [None]:
plt.imshow(image[0])

This is how the image is "seen" by the model.

Some of the images were misclassified, we'll gather them into a list `weird_cases`, and perform an analysis indicating which parts of the image were relevant.

In [None]:
weird_cases = ['./test_images/cruiser1.jpeg', 
               './test_images/x2.jpeg',
               './test_images/superbike3.jpeg',
               './test_images/cross1.jpeg',
               './test_images/cross2.jpeg',
               './test_images/classic2.jpeg']

# Which parts of the images led the model to these predictions?

In this part, we're using the LIME package to visualize which patches (superpixels) were relevant for the model when predicting motorcycle class.

In [None]:
import numpy as np
from lime import lime_image
from skimage.segmentation import mark_boundaries

from utils import GraphWrap


with tf.Session() as sess:
    for weird_case in weird_cases:
        with open(weird_case, 'rb') as image_file:
                image_data = image_file.read()
        print('\n\n\n### IMAGE: "{}" ##################################################'
              .format(weird_case))
        
        wrapper = GraphWrap(sess, resized_input_tensor_name)
        image = sess.run(decoded_image_tensor, {jpeg_data_tensor: image_data})
        image = image[0]
        prediction = wrapper.predict([image])[0]

        explainer = lime_image.LimeImageExplainer()
        explanation = explainer.explain_instance(image,
                                                 wrapper.predict,
                                                 top_labels=5,
                                                 num_samples=1000)
        
        for k in range(len(labels)):
            print('\n\n\n{:6.2f}% for {}'.format(100*prediction[k], labels[k]))
            img, mask = explanation.get_image_and_mask(k,
                                                        positive_only=False,
                                                        num_features=5,
                                                        hide_rest=False)
            plt.imshow(mark_boundaries(img, mask))
            plt.show()

# References

You'll find the materials for this presentation at: https://github.com/daftcode/PyCon-motorcycle-transfer-learning

### General references

A valuable set of rules regarding transfer learning: http://cs231n.github.io/transfer-learning/

MobileNets on Google Research Blog: https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html

A great source of learning materials: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/

The LIME repo: https://github.com/marcotcr/lime/ (where you'll also find a link to their article describing the algorithm in more detail)


### More detailed sources

An interesing Issue in the TensorFlow repo (in particular, there's an valuable observation regarding how resizing/cropping of images influences predictions): https://github.com/tensorflow/tensorflow/issues/4128

"Inception module: explained and implemented": https://hacktilldawn.com/2016/09/25/inception-modules-explained-and-implemented/

Two softmax outputs in the Inception v3 model?: https://stackoverflow.com/questions/39352108/does-the-inception-model-have-two-softmax-outputs

How to integrate MobileNets into your project: https://github.com/tensorflow/models/blob/master/slim/README.md


### Other interesting resources

TensorFlow mobile: https://www.tensorflow.org/mobile/