# Deep Learning: Image Recognition
**Instructor:** Adam Geitgey

Thanks to deep learning, image recognition systems have improved and are now used for everything from searching photo libraries to generating text-based descriptions of photographs. In this course, learn how to build a deep neural network that can recognize objects in photographs. Find out how to adjust state-of-the-art deep neural networks to recognize new objects, without the need to retrain the network. Explore cloud-based image recognition APIs that you can use as an alternative to building your own systems. Learn the steps involved to start building and deploying your own image recognition system.

#### Build cutting-edge image recognition systems
* **Image recognition** is the ability for computers to look at a photograph and understand what's in the photograph
* In the last few years, researchers have made major break throughs in image recognition thanks to neural networks
* **Keras** is a high-level library for building neural networks in Python with only a few lines of code; built on top of either TensorFlow or Theano
* One of the most important things to configure in a neural network is activation functions
    * Before values flow from one layer to the next they pass through an activation function
    * **Activation functions** decide which inputs from the previous layer are important enough to feed to the next layer
* The final step of defining a neural network is to compile it by calling `model.compile()`; this tells Keras that we're done building the model and that we actually want to carry it out in memory
* The optimizer algorithm is used to train the neural network
* The loss function is how the training process measures how right or how wrong your NN's predictions are

#### Using Images as Input to a NN
* Bright points are closer to 255 and dark points are closer to 0
* We can think of an image as a 3D array that is always three layers deep; so to be able to feed this image into a NN, we need the NN to have 1 input node for every number in this 3-D array (ie pixel)
* These numbers add up very quickly
* For a small **256 x 256 pixel image**, (by modern terms, a pretty tiny image):
    * We need 256 x 256 x 3 = **196,608 input nodes**
    * And that's just for the input layer
    * The number of nodes in the entire neural network will quickly grow into the millions
    * That's why using NNs for image processing in so computationally intensive
        * **Because of this, image recognition systems tend to use small image sizes**
        * It's very common to build image recognition systems that work with images that are **between 128 and 512 pixels wide.**
        * Any larger than that, and it gets too slow and requires too much memory
        * When working with larger images, we usually just scale them down to those smaller sizes before feeding them into the neural network 

### Recognizing Image Contents with a Neural Network
* During the **inference phase** the neural network will give us a **prediction**; this prediction will be in the form of a probability
* We can also build a single neural neetwork that has more than one output

<img src='data/nn1.png' width="600" height="300" align="center"/>

* You can roughly think of the the top (leftmost) layers as looking for simple patterns like lines and sharp edges and the lower layers use the signals from the higher layers to look for more and more complex shapes and patterns
* With all the layers working together, the model can identify very complex objects
* That means that adding more layers to a NN tends to give it the capacity to learn more complex patterns and shapes; this is where the term **deep learning** originally came from
* **Deep learning** is just the idea that making models deeper by adding more capacity to them lets us recognize more complex patterns in data

### Adding convolution for translational invariance

* If we only train the NN with pictures of numbers that are perfectly centered, the NN will get confused if it sees anything else (for example, an uncentered "8")
* For example:

<img src='data/nn2.png' width="600" height="300" align="center"/>

<img src='data/nn3.png' width="600" height="300" align="center"/>

* The neural network won't be able to make a good prediction on the uncentered 8 from the lower example above (where the model is only trained on centered 8s). 
* But, the 8 could appear anywhere in the image; it could just as easily appear at the bottom, like this:

<img src='data/nn4.png' width="600" height="300" align="center"/>

* **We need to improve our neural network so that it can recognize objects in any position in the image.**
    * This is called **Translation invariance.**
    
#### Translation Invariance and Convolutional Layers
* **Translation invariance** is the idea that a machine learning model can recognize an object no matter whether it is moved (or *translated*) in the image.
* The solution is to add a new type of layer to our neural network: a **convolutional layer**
* Unlike a normal Dense layer, where every node is connected to every other node, this (convolutional) layer breaks apart the image in a special way so that it can recognize the same object in different positions
* We do this by passing a small window (shown in orange below) over the image. 
* Each time it lands somewhere, we grab a new image tile; we repeat this until we've covered the entire image

<img src='data/nn5.png' width="600" height="300" align="center"/>

* Next, we pass each image tile through the same NN layer(s). Each tile will be processed the same way and will save a value each time
* In other words, we're turning the image into an array, where each entry in the array represents whether or not the neural network thinks a certain pattern appears at that part of the image
* Next, we'll repeat the exact process again, but this time we'll use a different set of weights on the nodes in our NN layer
* This will create another feature map that tells us whether or not a certain pattern appears in the image
* But because we're using different weights, it will be looking for a different pattern than the first time
* We can repeat this process several times until we have several layers in our new array 

<img src='data/nn6.png' width="600" height="300" align="center"/>

<img src='data/nn7.png' width="600" height="300" align="center"/>

* This turns our original array into a 3D array
* Each element in the array represents where (whether?) a certain pattern occurs
* But because we are checking each tile of the original image, it doesn't matter where in the image a pattern occurs, we can find it anywhere
* This **3D array is waht we'll feed into the next layer of the neural network.**
* It will use this information to determine which patterns are most important in determining the final output
* Adding a convolutional layer makes it possible for our neural network to be able to find the pattern, no matter where it appears in an image
* **Normally, we'll have several convolutional layers** that repeat the above process multiple times. 
* **The rough idea is that we keep squishing down the image with each convolutional layer while still capturing the most important information from it.** By the time we reach the output layer, the neural network will have been able to identify whether or not the object appeared
* Convolutional neural networks are the standard approach to building image recognition systems

## 3. Designing a Deep Neural Network for Image Recognition

### Designing a neural network architecture for image recognition
* Before we start coding our image recognition NN, let's sketch out how a basic neural network works
* **A basic neural network comprised of all dense, or fully-connected, layers doesn't work efficiently for images because objects can appear in lots of different places in an image.**
* The solution is to add one or more convolution layers, which help us detect patterns no matter where they appear in our image
* **It can be very effective to place two or more convolutional layers in a row** so in our example we'll add them in pairs
* The convolutional layers are looking for patterns in our image and recording whether or not they found those patterns in each part of our image; but we don't usually need to know *where* in an image a pattern was found down to the specific pixel; it's good enough to know the rough location where it was found. To solve this problem we can use a technique called **max pooling**

#### Max Pooling

<img src='data/nn8.png' width="600" height="300" align="center"/>

* We could pass the above information (regarding whether or not a pixel corresponds to a cloud) directly to the rest of our neural network, but it we can reduce the amount of information that we pass to the next layer, it will make the neural network's job much easier (and faster)
* The idea of **max pooling** is to down sample the data by only passing on the most important bits

<img src='data/nn9.png' width="600" height="300" align="center"/>

* The idea above is that, by capturing the most important data (most extreme values), we'll get nearly the same result, but much more efficiently.

#### Dropout
* **Dropout** is a technique to make the NN more robust and prevent overfitting
* The idea is that we add a droppout layer between other layers that will randomly throw away some of the data passing throught by cutting some of the connections in the neural network

<img src='data/nn10.png' width="600" height="300" align="center"/>

* By randomly cutting connections with each training image, the neural network is forced to try harder to learn  multiple ways to represent the same ideas (rather than *memorize* an image).

<img src='data/nn11.png' width="600" height="300" align="center"/>

* If we want to make our network more powerful and able to recognize more complex images, we can add more layers to it
* But, instead of just adding layers randomly, we'll add more copies of our convolutional block.
* When all these layers are working together, we'll be able to detect complex objects 

<img src='data/nn12.png' width="600" height="300" align="center"/>

* This is a very typical design for an image recognition neural network, but it's also one of the most basic
* The latest designs involve branching pathways, shortcuts between groups of layers, and all sorts of other tricks, but they all build on these same basic ideas.

### Exploring the CIFAR-10 Data Set
* [See here](https://www.cs.toronto.edu/~kriz/cifar.html) for more detail on the CIFAR-10 dataset (and AlexNet)

#### Exploring your dataset
   * Always look through the data by hand
   * Check for obvious errors
   * Verify that the data makes sense
   
### Loading an image dataset
* The function `cifar10.load_data()` returns **four different arrays**
    * `X_train`
    * `y_train`
    * `X_test`
    * `y_test`
* **`(x_train, y_train), (x_test, y_test) = cifar10.load_data()`**
* **NNs work best when the data are floats between zero and one**:

```
# Normalize data set to 0-to-1 range
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
```

* cifar10 provides the labels for each class as values from 0 to 9, **but since we are creating a NN with 10 outputs, we need a separate expected value for each of those outputs. So we need to convert each label from a single number into an array with 10 elements.** In that array, one element should be set to one and the rest set to zero. 
* **This is something you'll almost always need to do with your trainind data, so keras provides a helper function: `keras.utils.to_categorical()`**
    * To use this function, you just pass in your array with the labels (which in our case is `y_train`) along with the numbe of classes it has (which in our case is `10`)
    * `y_train = keras.utils.to_categorical(y_train, 10)`
    * `y_test = keras.utils.to_categorical(y_test, 10)`

#### Dense Layers
* `relu` is the standard choice for activation function when working with images because it works well and is computationally efficient
* We'll need one node in the output layer for *each* object we want to detect
* **When doing classification with more than one kind of object, the output layer will almost always use a `softmax` activation function.**
    * The **softmax** activation function is a special function that $\star$ **makes sure all the output values from this layer add up to exactly one** $\star$
* When we're bilding a neural network and adding layers to it, it's helpful to print out a list of the layers in the neural networks so far; we can do this by calling:
    * `model.summary()`
    
#### Convolution layers
* **To be able to recognize images efficiently, we'll add convolutional layers before our densely connected layers**
* **Note that there are 2 types of convolutional layers: 1D and 2D**
    * Since we're working with images, we'll want to add the Conv2D layer
    * For some data like sound waves, you can use Conv1D (but typically you'll use Conv2D)
* Parameters:
    * The first parameter is how many different filters should be in the layer
        * Each filter will be able of detecting one pattern in the image (we'll start with 32, a power of 2)
    * Next, we need to pass in the size of the window that we'll use when creating image tiles from each image
        * By passing in the tuple `(3,3)`, we are selecting a 3 pixel x 3 pixel window
        * This will split up the original image into 3 x 3 tiles; when we do that, we have to decide what to do with the edges of the image. If the image size isn't exactly divisible by 3, we'll have a few extra pixels left over on the edge. We can either throw that info away or we can add padding to the image. **Padding is just extra zeros added to the edge(s) of the image to make the math work out, and also to avoid losing info from the edges.**
    * To add extra padding that causes the image to retain its original size: **`padding= same`**
    * Just like a normal Dense layer, convolutional layers also need an activation function and we almost always use the `relu` activation function because of its efficiency
* `model.add(Conv2D(32, (3, 3), padding="same", activation="relu", input_shape=(32, 32, 3)))`
* **Note:** Whenever we transition between convolutional layers and dense layers, we need to tell Keras that we're no longer working with 2D data
    * **To do that, we need to create a `Flatten()` layer**
    

```
# Create a model and add layers
model = Sequential()

model.add(Conv2D(32, (3, 3), padding="same", activation="relu", input_shape=(32, 32, 3)))
model.add(Conv2D(32, (3, 3), activation="relu"))

model.add(Conv2D(64, (3, 3), padding="same", activation="relu"))
model.add(Conv2D(64, (3, 3), activation="relu"))

model.add(Flatten())

model.add(Dense(512, activation="relu"))
model.add(Dense(10, activation="softmax"))

# Print a summary of the model
model.summary()
```

* Note that each layer also has a **total number or parameters** listed. This is the total number of weights in that layer (including bias)
* **The total params = the size or complexity of our NN**
* **Note too that 512 nodes in the first Dense layer, because images are input as 32 x 32 pixels, then flattened in the `Flatten()` layer.**

### Max pooling
* **Typically we'll do max pooling right after a block of convolutional layers**
* The only parameter that we have to pass in to a maxpooling layer is the size of the area we want to pool together (**`pool_size`**)

```
# Create a model and add layers
model = Sequential()

model.add(Conv2D(32, (3, 3), padding='same', input_shape=(32, 32, 3), activation="relu"))
model.add(Conv2D(32, (3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3), padding='same', activation="relu"))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(512, activation="relu"))
model.add(Dense(10, activation="softmax"))

# Print a summary of the model
model.summary()
```

* **Note** that the max pooling layers have zero parameters

### Dropout
* Force the NN to try harder to learn without memorizing the input data.
* **Usually we'll add `Dropout` right after `MaxPooling` layers or after a group of `Dense` layers.**
* The only parameter we need to pass in to a `Dropout()` layer is the percentage of NN connections to randomly cut
* **Usually a value between 25%-50% works well.**
* Note that we add dropout layers after each max pooling layer, but also after the first `Dense` layer; but here, **we make the NN work really hard to get the last answer correct and use a 50% dropout.**
* Note that the Dropout, Max Pooling and Flatten layers all have 0 parameters because they aren't adding any additional info to our model 

```
# Create a model and add layers
model = Sequential()

model.add(Conv2D(32, (3, 3), padding='same', input_shape=(32, 32, 3), activation="relu"))
model.add(Conv2D(32, (3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same', activation="relu"))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(10, activation="softmax"))

# Print a summary of the model
model.summary()
```

### A complete neural network for image recognition
* The final step of defining a NN is to **compile** the NN.
* When we `compile` the model, we're telling Keras we actually want to create the NN in memory. 
* We also tell Keras how we'll be training it and measuring its accuracy 

```
# Compile the model
model.compile(
    loss='categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)
```
* **Note that you can write your own custon loss function** but most of the time there are a few standard functions that you'll choose between 
* **If you're trying to classify an image into different categories, use `categorical_crossentropy`.**
* **If you're only checking if an image belongs to one category, use `binary_crossentropy`.**
* **`adam`**: **Ada**ptive **m**omemt estimation.
* Finally we need to tell Keras what metrics we want it to report during the training process

## 4. Building and Training the Deep Neural Network

### Setting up a neural network for training
* The first two parameters are our training dataset and our training labels
* **`batch_size`**: how many images we want to feed into the network at once during training
    * If we set the # too low, training will take a long time and might not ever finish
    * If we set the # too high, we'll run out of memory on our computer
    * Typical batch sizes are between **32-128 images**
* **`epochs`**: one full pass through the entire training dataset
    * The more passes through the data we do, the more chance the neural network has to learn, but the longer the training process will take (and you risk some serious overfitting)
    * In general, the larger your dataset, the less training passes you'll do over it
    * For example, for extremely large datasets with millions of images, you might only do 5 passes
* **`validation_data`**:
    * This is data that the model will never see during training and will only be used to test the accuracy of the trained model 
    * Pass `X_test` and `y_test` as an array 
* **`shuffle`**:
    * Randomizes the the order of the training data when `True`
    * It is important to set this to true (when possible) so that the model doesn't memorize the order of the data
    * Default is `True`

```
# Train the model
model.fit(
    x_train,
    y_train,
    batch_size=32,
    epochs=30,
    validation_data=(x_test, y_test),
    shuffle=True
)
```

* Note that here we aren't saving the results anywhere. So if we run the model right now, we'll be doing all the work and then throwing away the results.
* We'll definitely want to learn to save our results before running the lengthly training process. 

### Training a neural network and saving weights
* When we train a NN, we want to make sure that we save the results so that we can reuse the trained model later
* After he training completes, we want to save the NN to a file, so that we'll be able to use it to recognize objects in other programs.
* **Saving a NN is two separate steps:**
    * First, we want to save the structure of the NN itself; that includes which layers get created and the order that they're all hooked together
    * Second, we want to save the weights of the neural network 
    * **The reason that we save the structure separately from the weights is because often you'll train the same NN multiple times on different datasets. It's convenient to be able to load different sets of weights using the same neural network structure.**
    
#### Saving the NN structure itself
* Keras can convert the structure of a NN into JSON by calling the `model.to_JSON()` function. 
* Then we just need to write this JSON data to a text file; there's lots of ways to do this in Python, but one easy way using the `Path` library is shown below:


```
# Save neural network structure
model_structure = model.to_json()
f = Path("model_structure.json")
f.write_text(model_structure)

# Save neural network's trained weights
model.save_weights("model_weights.h5")
```
* To save the model weights, we just need to call `model.save_weights()` and pass the name we want to save the model as.
* The data that gets saved in this last step is in a binary format called HDF5
* The HDF5 format id designed for saving and loading large binary files efficiently
* The loss is the numerical representation of how wrong our model is right now

### Making predictions with the trained neural network
* When we pass an image through our NN its going to return a likelihood for each type of object it was trained to recognize.
* In order to decode those numbers to names, we need a list of names that correspond with each number:

```
# These are the CIFAR10 class labels from the training data (in order from 0 to 9)
class_labels = [
    "Plane",
    "Car",
    "Bird",
    "Cat",
    "Deer",
    "Dog",
    "Frog",
    "Horse",
    "Boat",
    "Truck"
]
```

* Now, we're ready to load the NN
* First we load the structure 

```
# Load the json file that contains the model's structure
f = Path("model_structure.json")
model_structure = f.read_text()

# Recreate the Keras model object from the json data
model = model_from_json(model_structure)
```
* So far we've only restored the structure of the NN
* To restore the training as well we need to load the weights file as well that we created when we trained the NN 

```
# Re-load the model's trained weights
model.load_weights("model_weights.h5")
```

#### Test one image and generate prediction start to finish:
* Note that right now we're only testing one image with our neural network, but for efficiency reasons Keras lets you pass in batches of images at once so you can run more than one image through the neural network at one time
* **So, we need to create a batch of images to pass in, even though we're only testing this one image**
* Keras expects these batches as a **4-dimensional array**
* The **first dimension is the list of images** and the other three dimensions are the image data itself
* Since we only have this one image, **we can turn it into a 4D array by adding a new axis to it with numpy.**
* We can do this by calling a function called **`np.expand_dims()`** and passing in the name of the array; **note that we also need to pass in `axis=0` to tell it that the new axis is the *first* dimension.** This is the convention that Keras expects
* The results variable will contain a **list of results** for each image that we passed in.
* Since we only passed in one image, we can just grab the first array index, hence `single_result = results[0]`
* The `single_result` array is an array with 10 elements; each element represents how likely the image is to belong to each of the object types we listed at the top of the program.
* Instead of returning 10 separate numbers, we can just grab the array element with the highest value (ie highest probability)
* We do this using `np.argmax()`
* We also grab the likelihood value of that array index so we can print it out later

```
from keras.models import model_from_json
from pathlib import Path
from keras.preprocessing import image
import numpy as np

# These are the CIFAR10 class labels from the training data (in order from 0 to 9)
class_labels = [
    "Plane",
    "Car",
    "Bird",
    "Cat",
    "Deer",
    "Dog",
    "Frog",
    "Horse",
    "Boat",
    "Truck"
]

# Load the json file that contains the model's structure
f = Path("model_structure.json")
model_structure = f.read_text()

# Recreate the Keras model object from the json data
model = model_from_json(model_structure)

# Re-load the model's trained weights
model.load_weights("model_weights.h5")

# Load an image file to test, resizing it to 32x32 pixels (as required by this model)
img = image.load_img("frog.png", target_size=(32, 32))

# Convert the image to a numpy array
image_to_test = image.img_to_array(img)

# Add a fourth dimension to the image (since Keras expects a list of images, not a single image)
list_of_images = np.expand_dims(image_to_test, axis=0)

# Make a prediction using the model
results = model.predict(list_of_images)

# Since we are only testing one image, we only need to check the first result
single_result = results[0]

# We will get a likelihood score for all 10 possible classes. Find out which class had the highest score.
most_likely_class_index = int(np.argmax(single_result))
class_likelihood = single_result[most_likely_class_index]

# Get the name of the most likely class
class_label = class_labels[most_likely_class_index]

# Print the result
print("This is image is a {} - Likelihood: {:2f}".format(class_label, class_likelihood))
```

# 5. Fine-Tuning Pre-trained Neural Networks

### Pre-trained neural networks included with Keras

* Researchers around the world compete to build the most accurate image recognition neural networks 
* **So instead of inventing our own neural network designs from scratch, it often makes sense to reuse an existing neural network design as a starting point for your own projects.**
* Even better, researcher also trained these neural network designs on large datasets and share the *trained* versions of the neural networks. So we can take those pre-trained neural networks and either reuse them directly or use them as a starting point for our own training
* Keras includes copies of many popular pre-trained NNs that are ready to use.
* **The image recognition models included with Keras are all trained to recognize images from the ImageNet dataset.**
* The **ImageNet** data set is a collection of millions of pictures of objects that have been labeled so that you can use them to train computers to recognize those objects
    * Each year ImageNet holds a worldwide image recognition contest called the **ImageNet Large Scale Visual Recognition Challenge** or **ILSVC**
    * Note that the pre-trained models included with Keras are trained on the more limited dataset used by this contest 
    * **This (limited) data set contains images of 1,000 differents types or classes of objects; the data set includes over 1200 pictures of just, for example, Granny Smith apples.**
    
#### Image Recognition Models
* Let's talk about the NN designs included with Keras that we can reuse:
    * **VGG** (University of Oxford)- DNN; Very standard CNN design; 16 or 19 layers; state of the art circa 2014; still used widely today as a basis for other models because its **easy to work with and understand,** but it's not as efficient as newer designs. 
    * **ResNet-50** (Microsoft Research) 50-layer NN; More complex design; state of the art circa 2015; more accurate than VGG but uses less memory; ResNet uses a more complicated design where higher layers in the NN are connected to not just the layer directly below them, but they also have multiple connections to deeper layers.
    * **Inception v-3** (Google); another 2015 design; even more complex design than ResNet50; also performs very well; its layers branch out into multiple separate paths before rejoining
    
**These (above) networks show the research trends in 2014 and 2015 to make NNs bigger and more complex that try to increase their accuracy. More recent NN designs tend to be more specialized:**

   * **MobileNet** (Google); 2017, low resource usage; specifically designed to be able to run well on low power devices
   * **NASNet** (Google): end of 2017; explores the idea of having algorithms design NNs. In a sense they're using ML to build and tweak ML models on their own; this let them create something that was moe accurate than existing models while still using even less computer power
   
#### Two Uses
* Having access to these pre-trained models is useful for two reasons:
    * **1) Reuse a trained model directly in your own programs to recognize objects in images** (if you need to recognize any of the 1,000 types of objects they're already trained on.
    * **2) $\star$ Transfer learning: Adapt an existing model to recognize new types of objects instead of starting from scratch.** $\star$
    
    
### Using a pre-trained network for object recognition
* All the pretrained models included with Keras are under the `applications` package
    * `from keras.applications import vgg16`
    
```
import numpy as np
from keras.preprocessing import image
from keras.applications import vgg16

# Load Keras' VGG16 model that was pre-trained against the ImageNet database
model = vgg16.VGG16()

# Load the image file, resizing it to 224x224 pixels (required by this model)
img = image.load_img("bay.jpg", target_size=(224, 224))

# Convert the image to a numpy array
x = image.img_to_array(img)

# Add a fourth dimension (since Keras expects a list of images)
x = np.expand_dims(x, axis=0)

# Normalize the input image's pixel values to the range used when training the neural network
x = vgg16.preprocess_input(x)

# Run the image through the deep neural network to make a prediction
predictions = model.predict(x)

# Look up the names of the predicted classes. Index zero is the results for the first image.
predicted_classes = vgg16.decode_predictions(predictions)

print("Top predictions for this image:")

for imagenet_id, name, likelihood in predicted_classes[0]:
    print("Prediction: {} - {:2f}".format(name, likelihood))
```

## Transfer Learning
* If you have a lot of training data, you can train a CNN to recognize objects and images, but here's a secret: **In the real world, you almost never need to train the neural network from scratch. Instead, we use *transfer learning*.**
* **Transfer learning** is using a model trained on one set of data as a starting point for modeling a new set of data (to give it a head start of solving a problem.
* A typical CNN is structured like this:

<img src='data/nn13.png' width="600" height="300" align="center"/>

* The network is made up of a series of convolutional layers and the training process teaches each of those layers to be activated when it sees certain patterns in the input image. Those layers learn to tell images apart by looking for those unique patterns
* Notice that as the layers get deeper, the patterns get more complex that the given layer is looking for:

<img src='data/nn14.png' width="600" height="300" align="center"/>

<img src='data/nn15.png' width="600" height="300" align="center"/>

<img src='data/nn16.png' width="600" height="300" align="center"/>

<img src='data/nn17.png' width="600" height="300" align="center"/>

<img src='data/nn18.png' width="600" height="300" align="center"/>

* Since NNs are really just mathematical models, it's impossible to tell exactly what each of these patterns represents, but you can see how each layer is getting more and more complex in what it's looking for 
* The basic idea is that **neural networks learn to detect simple patterns in the top layer, and then the next layer uses that information to look for slightly more complex patterns and so on down through the convolutional layers.**
* But the final layer of the neural network is a densely connected layer that uses the information from the convolutional layers to decide which object is in the image.
* With transfer learning, we're gonna start with a NN that's already been trained to recognize objects from a large dataset like ImageNet. To reuse this NN with new data, we can simply **slice off the last layer** (the dense output layer). We'll **keep all the layers that detect patterns, but remove the part that maps those patterns to specific objects.**
* **We'll call this pre-trained NN a feature extractor because we're using it to extract training features from images.**
* Next, we crete a new neural network to replace the last layer in the original network. This is the only part that we'll have to train ourselves.

### Training with Transfer Learning 
* When we build our new image recognition system, we'll pass our new training images through the "feature extractor" and save the results for each training image to a file; then we'll use those extracted features to train the new neural network.
* Since we're using the feature extractor to recognize shapes and patterns, our neural network only has to learn to tell which patterns map to which objects.
* Since this new neural network isn't doing much work, it can learn to do it with a small amount of training data.

### Predicting with Transfer Learning
* When we want to test the new image, we have to first pass it through the same feature extractor.
* Then we can use those extracted features as input to our newly-trained neural network, which will give us the final prediction.

#### When to use transfer learning
* Transfer learning can cut the time required to build an image recognition system from days to minutes 
* Always try it first-- because it's quick!
* If transfer learning works, there's no need to do the extra work to train a new model
* **Transfer learning is also really useful when you don't have a lot of training data but already have a model that solves a similar problem**.
* "Training a neural network from scratch is sort of like teaching a baby to read. The baby has to learn about letters and words and sentences before it can read and understand anything. Transfer learning is more like asking an adult that already knows how to read to learn something new."
* If you oly have a few hundred training images for your image recognition system, you don't have enough data to teach your model from scratch so it makes sense to start with a model trained for something else and adapt it to your problem. 

<img src='data/nn5.png' width="600" height="300" align="center"/>