<a href="https://colab.research.google.com/github/GantMan/MachineLearningTraining/blob/master/Transfer_Truck_Identification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Transfer learning!

![transfer learning](https://cdn-images-1.medium.com/max/1600/1*FfQWL0PFb1FRgFVAUx5NWQ.jpeg)

Grab a well trained and weighted model to start.   Then use its well trained feature detection in your *"new"* model.

Here's an example of what multiple CNN layers can detect and organize into features:

![Detect features](https://i.stack.imgur.com/Hl2H6.png)

In [0]:
from pathlib import Path
import numpy as np
import joblib
from keras.preprocessing import image
from keras.applications import xception

[Xception](https://arxiv.org/abs/1610.02357) is a very capable pre-trained model created in 2016 by Google.  This is claimed to be better than Google's Inception v3, due to to it's "depthwise separable convolution."  Want to dig into what that means?  [Here's a blog post for you](https://towardsdatascience.com/review-xception-with-depthwise-separable-convolution-better-than-inception-v3-image-dc967dd42568), but otherwise just know it's pretty damn good.

[Keras gives us several lovely models we can build from](https://keras.io/applications/):

![keras models](https://i.imgur.com/rVn81It.png =600x)

In [0]:
from keras.datasets import cifar10
# Load data set from CIFAR-10 again
(x_train, y_train), (x_test, y_test) = cifar10.load_data()


# We'll be separating a test set differently this time
# So let's combine back in our test data
x_train = np.concatenate((x_train, x_test))
y_train = np.concatenate((y_train, y_test))

In [0]:
# adjust it so instead it sets all trucks to true and everything else to false
# broadcast boolean logic througout dataset
y_train = y_train == 9
# prepare x to be normalized (between 0 and 1)
x_train = x_train.astype('float32')
x_train /= 255



Now we load in Xception.  Thanks Keras!!!

Xception minimal size is 71x71, and CIFAR is 32x32, but leaving the input_shape variable size, makes it work for our case.

You will need all input images to be the same size.  So if you add an image from outside CIFAR, it will need to be resized to match our feature extractor.


In [0]:
# Load the pre-trained neural network to use
# as a feature extractor
feature_extractor = xception.Xception(
    weights='imagenet',
    include_top=False
)

Let's take a quick look at the wild world of Xception


In [0]:
feature_extractor.summary()

Let's use Xception to get alllllll the features we can out of our training set.  Technically speaking, the feature_extractor is known as the "convolutional base."  By running our training data through the convolutional base, it pre-chews our training data, so the resulting neural network (though deep) is a great starting point.

In [0]:
features_x = feature_extractor.predict(x_train)

You can store these features so you don't have to re-extract them everytime.

In [0]:
joblib.dump(features_x, "x_train.dat")
joblib.dump(y_train, "y_train.dat")

# Load anytime with x_train = joblib.load("x_train.dat")

In [0]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten



---


> ## Thoughtful side note
Our model expects input from Xception.  Another way we could have handled this, is to add Xception as our first sequential step, like so:

> `model.add(feature_extractor)`

> #### This would have had two drawbacks:

1.  We would not have been able to store our pre-chewed "x_train.dat" to speed up iterations on these new layers that we are experimenting with.
2. We would have had to have frozen (or partially frozen) our convolutional base from being trained.  Updating the weights of Xception's primary layers would be undoing a significant advantage.  We can lock `feature_extractor` with `feature_extractor.trainable = false`

> #### Benefits would have been:

1. We could have had a more streamlined model, with less code.
2. We could have unfrozen a few layers of Xception (top down) to fine-tune the model.



---







In [0]:
model = Sequential()

model.add(Flatten(input_shape=features_x.shape[1:]))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))


![too soon](https://pbs.twimg.com/media/Dv6XHN2WkAIKWPJ.jpg =600x)


And finally Sigmoid for percentage likelyhood.  Sigmoid with 'binary_crossentropy' is code for "Make very low value outputs close to zero, and high value close to one, thus telling me the likelyhood percentage of my categorization."

![sigmoid](https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/Logistic-curve.svg/320px-Logistic-curve.svg.png)

In [0]:
print("Trainable weights = " + str(len(model.trainable_weights)))
# View some info!
print(model.summary())

In [0]:
model.compile(
  loss ='binary_crossentropy',
  optimizer = 'adam',
  metrics=['accuracy']
)

# OPTIMIZE!
Momentum, weight decay, and so much more.
![momentum and decay](https://datascience-enthusiast.com/figures/opt_momentum.png)

*SmoooOOoooOOoooOOoth*

![smooth](https://media1.tenor.com/images/7591117adda5f9537830a976ada14bd4/tenor.gif?itemid=3484360)

![smooth curve](https://i.stack.imgur.com/bcaCP.png)

In [0]:
history = model.fit(
  features_x,
  y_train,
  validation_split=0.05,
  epochs=20,
  shuffle=True
)

**Let's graph it!**

In [0]:
import matplotlib.pyplot as plt
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training accuracy')
plt.plot(epochs, val_acc, 'b', label='Validation accuracy')
plt.title('Training and Validation Accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and Validation Loss')
plt.legend()

plt.figure()

Validation lets us identify overfitting, and thusly we can expect poor generalization!

**Overfitting looks like this:**

![overfit data](https://i.imgur.com/CR3sdQj.png)

## What can we do?

*   **Early Stop**
*   **More training data / Data augmentation**
*   **Regularization**
*   **Use bigger images**
*   **Fine-tune conv base**

There are lots of ways to make sure your trained model is a well trained model.  Awesome data means an awesome model.

![l2](https://pbs.twimg.com/media/DrqbthQVAAAEaWj.jpg =350x)


In [0]:
# Save it for later
model.save("truck_feature_classifier_model.h5")

The accuracy is lower than expected with 20 epochs, but faster to train!

We now have a model that can use with any image, solong as we extract the features from it first!



In [0]:
import urllib.request
from IPython.core.display import Image, display


urllib.request.urlretrieve ("https://www.cs.toronto.edu/~kriz/cifar-10-sample/truck3.png", "truck.png")
urllib.request.urlretrieve ("https://www.cs.toronto.edu/~kriz/cifar-10-sample/bird8.png", "not_truck.png")

# Show and tell
print('Here are my 2 images to test!')
display(Image("not_truck.png"))
display(Image("truck.png"))

In [0]:
# load and convert to numpy arrays
truck_img = image.img_to_array(image.load_img("truck.png"))
not_truck_img = image.img_to_array(image.load_img("not_truck.png"))

In [0]:
# add fourth dimension to all images (Keras expects bunch of images, not single)
truck_img = np.expand_dims(truck_img, axis=0)
not_truck_img = np.expand_dims(not_truck_img, axis=0)


In [0]:
truck_img = xception.preprocess_input(truck_img)
not_truck_img = xception.preprocess_input(not_truck_img)

In [0]:
truck_features = feature_extractor.predict(truck_img)
not_truck_features = feature_extractor.predict(not_truck_img)

In [0]:
truck_result = model.predict(truck_features)[0][0]
not_truck_result = model.predict(not_truck_features)[0][0]

print('truck.png : {:0.2f}% chance of truck'.format(truck_result * 100))
print('not_truck.png : {:0.2f}% chance of truck'.format(not_truck_result * 100))

What about some other truck out of the wild?  Like this one from a Google search?!

![truck](https://media.wired.com/photos/5b9c3d5e7d9d332cf364ad66/master/pass/AV-Trucks-187479297.jpg =600x)

**Let's grab this one and try it on!**

In [0]:
urllib.request.urlretrieve ("https://media.wired.com/photos/5b9c3d5e7d9d332cf364ad66/master/pass/AV-Trucks-187479297.jpg", "internet_truck.png")
# Resize image to 32x32 so the featureset size matches what our model expects as input!
internet_truck = image.img_to_array(image.load_img("internet_truck.png", target_size=(32, 32)))
internet_truck = np.expand_dims(internet_truck, axis=0)
internet_truck = xception.preprocess_input(internet_truck)
internet_truck_features = feature_extractor.predict(internet_truck)
print(internet_truck_features.shape)
internet_truck_result = model.predict(internet_truck_features)[0][0]

print('internet_truck.png : {:0.2f}% chance of truck'.format(internet_truck_result * 100))