## Introducing convolutional neural networks

In [None]:
# Images as data
import matplotlib.pyplot as plt
data = plt.imread('stop_sign.jpg')
plt.imshow(data)
plt.show()

In [None]:
data.shape  # (2832, 4256, 3)  Height, Width, (RGB)
data[1000, 1500] # array([0.73333333, 0.07843137, 0.14509804])  High intensity in the red color.

In [None]:
# Modifying image data
data[:, :, 1] = 0
data[:, :, 2] = 0
plt.imshow(data)
plt.show()   # Only the information in the red channel

In [None]:
Changing an image
data[200:1200, 200:1200, :] = [0, 1, 0]
plt.imshow(data)
plt.show()   # Result in an image with a green square in it.

## Classifying images

In [None]:
# Representing class data: one-hot encoding
labels = ["shoe", "dress", "shoe", "t-shirt", 
          "shoe", "t-shirt", "shoe", "dress"]

# Representing class data: one-hot encoding
array([[0., 0., 1.],    <= shoe
       [0., 1., 0.],    <= dress
       [0., 0., 1.],    <= shoe
       [1., 0., 0.],    <= t-shirt
       [0., 0., 1.],    <= shoe
       [1., 0., 0.],    <= t-shirt
       [0., 0., 1.],    <= shoe
       [0., 1., 0.]])   <= dress

Each row represents one sample, and each column corresponds to one of the classes.

In [None]:
# One-hot encoding
categories = np.array(["t-shirt", "dress", "shoe"])
n_categories = 3
ohe_labels = np.zeros((len(labels), n_categories))  # nSamples x nCategories
for ii in range(len(labels)):
    jj = np.where(categories == labels[ii])
    ohe_labels[ii, jj] = 1

```python3
test

array([[0., 0., 1.], 
       [0., 1., 0.], 
       [0., 0., 1.], 
       [0., 1., 0.], 
       [0., 0., 1.],
       [0., 0., 1.],
       [0., 0., 1.],
       [0., 1., 0.]])

prediction

array([[0., 0., 1.], 
       [0., 1., 0.], 
       [0., 0., 1.], 
       [1., 0., 0.], <= incorrect
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 0., 1.], <= incorrect
       [0., 1., 0.]])
       
(test * prediction).sum()
6.0       # The number of correct classifications.
```

## Classification with Keras

In [None]:
# Keras for image classification
from keras.models import Sequential
model = Sequential()

from keras.layers import Dense
train_data.shape  # (50, 28, 28, 1)  50 samples, each sample is 28x28 size black and white image.

In [None]:
# Fully connected layers.
model.add(Dense(10, activation='relu', input_shape=(784,)))  # Num of nodes -> model complexity
model.add(Dense(10, activation='relu'))
model.add(Dense(3, activation='softmax')) # Output units. softmax for multiclasses classification problem.

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

train_data = train_data.reshape((50, 784))

model.fit(train_data, train_labels, 
          validation_split=0.2,
          epochs=3)

```

Train on 40 samples, validate on 10 samples
Epoch 1/3

32/40 [=======================>......] - ETA: 0s - loss: 1.0117 - acc: 0.4688
40/40 [==============================] - 0s 4ms/step - loss: 1.0438 - acc: 0.4250 - val_loss: 0.9668 - val_acc: 0.4000
Epoch 2/3

32/40 [=======================>......] - ETA: 0s - loss: 0.9556 - acc: 0.5312
40/40 [==============================] - 0s 195us/step - loss: 0.9404 - acc: 0.5750 - val_loss: 0.9068 - val_acc: 0.4000
Epoch 3/3

32/40 [=======================>......] - ETA: 0s - loss: 0.9143 - acc: 0.5938
40/40 [==============================] - 0s 189us/step - loss: 0.8726 - acc: 0.6750 - val_loss: 0.8452 - val_acc: 0.4000
```

In [None]:
test_data = test_data.reshape((10, 784))
model.evaluate(test_data, test_labels)

```
10/10 [==============================] - 0s 335us/step
[1.0191701650619507, 0.4000000059604645]
```

## Convolutions

### Using correlations in images
- Natural images contain spatial correlations
- For example, pixels along a contour or edge
- How can we use these correlations?

### Biological inspiration
<img src="https://assets.datacamp.com/production/repositories/1820/datasets/c7810cb749aa1003f9fbb0ceead35c2ccda197d2/Visual_map_Swindale_Monkey_ori_domains_Blasdel_1986.jpg" width="400">  
  
Our own visual system uses these correlations, and each nerve cell in the visual areas in our brain responds to oriented edges at a particular location inthe visual field.  
This image depicts a small part of the visual cortex(the scale bar is 1mm in size).  
Each part of the image responds to some part of the visual field, and to the orientation depicted by the colors on the right.  
  
Looking for the same feature, such as a particular orientation, in every location in an image is like a mathematical operation called a convolution.  
This is the fundamental operation that convolutional neural networks use to preccess images.

## What is a convolution?

In [None]:
array = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

This array contains an "edge" in the middle, where the values go from zero to one.

In [None]:
kernel = np.array([-1, 1])

The kernel defines the feature that we are looking for.  
In this case, we are looking for a change from small values on the left to large values on the right.

In [None]:
conv[0] = (kernel * array[0:2]).sum()
conv[1] = (kernel * array[1:3]).sum()
conv[2] = (kernel * array[2:4]).sum()

Start the result as all zeros.  
Then, we slide the kernel along the array.

In [None]:
for ii in range(8):
    conv[ii] = (kernel * array[ii:ii+2]).sum()
conv # array([0, 0, 0, 0, 1, 0, 0, 0, 0])

In each location we multiply the values in the array with the values in the kernel and sum them up.

In [None]:
array = np.array([0, 0, 1, 1, 0, 0, 1, 1, 0, 0])
kernel = np.array([-1, 1])

In this example, the array goes between 0 and 1 twice.  
In this case, the edges that go from zero to one match the kernel, but the edges from 1 to 0 are the opposite of the kernel.  
In these locations, the convolution becomes negative.

In [None]:
conv = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0])
for ii in range(8):
    conv[ii] = (kernel * array[ii:ii+2]).sum()
conv # array([ 0,  1,  0, -1,  0,  1,  0, -1,  0])

## Image convolution
Convolutions of images do the same operation, but in two dimension.

### Two-dimensional convolution

In [None]:
kernel = np.array([[-1, 1], 
                   [-1, 1]])
conv = np.zeros((27, 27)
for ii in range(27):
    for jj in range(27):
        window = image[ii:ii+2, jj:jj+2]    # with same size of kernel.
        conv[ii, jj] = np.sum(window * kernel)

<img src="https://assets.datacamp.com/production/repositories/1820/datasets/6c00959d6fb39d4e14fb369cb605a6aa21562e75/no_padding_no_strides.gif">

The **kernel** is the gray 3-by-3 box that slides over the blue input image at the bottom.  
In each location, the window is multiplied with the values in the kernel and added up to create the value of one of the pixels in the resulting green array at the top.  
In neural networks, we call this resulting array a **"Feature map"**, because it contains a map of the locations in the image that match the feature represented by this kernel.

#### Image convolutions
The convolution of an image with a kernel summarizes a part of the image as the sum of the multiplication of that part of the image with the kernel.  
In this exercise, you will write the code that executes a convolution of an image with a kernel using Numpy.  
Given a black and white image that is stored in the variable `im`,  
write the operations inside the loop that would execute the convolution with the provided kernel.

Select the right window from the image in each iteration and multiply this part of the image with the kernel.  
Sum the result and allocate the sum to the correct entry in the output array `(results).`

In [None]:
kernel = np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
result = np.zeros(im.shape)

# Output array
for ii in range(im.shape[0] - 3):
    for jj in range(im.shape[1] - 3):
        result[ii, jj] = (im[ii:ii+3, jj:jj+3] * kernel).sum()

# Print result
print(result)

```
[[2.68104586 2.95947725 2.84313735 ... 0.         0.         0.        ]
 [3.01830077 3.07058835 3.05098051 ... 0.         0.         0.        ]
 [2.95163405 3.09934652 3.20261449 ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]]
```

### Defining image convolution kernels

You will be asked to define the kernel that finds a particular feature in the image.  
  
For example, the following kernel finds a vertical line in images:  
```  
np.array([[-1, 1, -1], 
          [-1, 1, -1], 
          [-1, 1, -1]])
```

#### 1. Define a kernel that finds horizontal lines in images.

In [None]:
kernel = np.array([[-1, -1, -1], 
                   [1, 1, 1],
                   [-1, -1, -1]])

#### 2. Define a kernel that finds a light spot surrounded by dark pixels.

In [None]:
kernel = np.array([[-1, -1, -1], 
                   [-1, 1, -1],
                   [-1, -1, -1]])

#### 3. Define a kernel that finds a dark spot surrounded by bright pixels.

In [None]:
kernel = np.array([[1, 1, 1], 
                   [1, -1, 1],
                   [1, 1, 1]])

## Implementing image convolutions in Keras

### Keras 'Convolution' layer

In [None]:
from keras.layers import Conv2D
Conv2D(10, kernel_size=3, activation='relu')   # 10 convolution units.

It resembles the "Dense" layers, but instead of having every unit in the layer connected to every unit in the previous layer,  
these connect to the previous layer through a convolution kernel.  
This means that the output of each unit in this layer is a convolution of a kernel over the image input.    
During training of a neural network that has convolutional layers, the kernels in each unit would be adjusted using back-propagation.  
  
A dense layer has one weight for each pixel in the image, but a convolution layer has only one weight for each pixel in the kernel.  
For example, if we set the kernel size argument to 3, that means that the kernel of each unit has 9 pixels.  
If the layer has 10 units, it would have 90 parameters for these kernels.

### Integrating convolution layers into a network

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten
model = Sequential()
model.add(Conv2D(10, kernel_size=3, activation='relu', 
              input_shape=(img_rows, img_cols, 1)))
model.add(Flatten())
model.add(Dense(3, activation='softmax'))

Flatten layer serves as a connector between convolution and densely connected layers.  
This takes the output of the convolution that we previously referred to as a "feature map",   
and flattens it into a **one-dimensional array.** &rarr; the expected input into the densely connected layer.   
  
Here, the output is one of three classes of clothing, so there are three units.  
To classify among the categories represented by the three units, we use the softmax activation function. 
  
<img src="https://assets.datacamp.com/production/repositories/1820/datasets/4acb90dffc2d4226bf351ba01e164b60e3188541/conv2d_1.png">

### Fitting a CNN

In [None]:
model.compile(optimizer='adam', 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])

For classification tasks categorical cross-entropy is an appropriate loss function.

In [None]:
train_data.shape # (50, 28, 28, 1)  50 training items, each has 28 by 28 pixel image with one channel.
model.fit(train_data, train_labels, validation_split=0.2, epochs=3)
model.evaluate(test_data, test_labels, epochs=3)

## Exercise

### Convolutional network for image classification
Convolutional networks for classification are constructed from a sequence of convolutional layers (for image processing) and fully connected (Dense) layers (for readout).  
In this exercise, you will construct a small convolutional network for classification of the data from the fashion dataset.  

#### Instructions
Add a Conv2D layer to construct the input layer of the network. Use a kernel size of 3 by 3.  
You can use the img_rows and img_cols objects available in your workspace to define the input_shape of this layer.  
Add a Flatten layer to translate between the image processing and classification part of your network.  
Add a Dense layer to classify the 3 different categories of clothing in the dataset.  

In [None]:
# Import the necessary components from Keras
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten

# Initialize the model object
model = Sequential()

# Add a convolutional layer
model.add(Conv2D(10, kernel_size=3, activation='relu', 
               input_shape=(img_rows,img_cols,1)))

# Flatten the output of the convolutional layer
model.add(Flatten())
# Add an output layer for the 3 categories
model.add(Dense(3, activation='softmax'))

### Training a CNN to classify clothing types
Before training a neural network it needs to be compiled with the right cost function, using the right optimizer.  
During compilation, you can also define metrics that the network calculates and reports in every epoch.  
Model fitting requires a training data set, together with the training labels to the network.  
  
The Conv2D model you built in the previous exercise is available in your workspace.  

#### Instructions
Compile the network using the 'adam' optimizer and the 'categorical_crossentropy' cost function.  
In the metrics list define that the network to report 'accuracy'.  
Fit the network on train_data and train_labels. Train for 3 epochs with a batch size of 10 images.  
In training, set aside 20% of the data as a validation set, using the validation_split keyword argument.  

In [None]:
# Compile the model 
model.compile(optimizer='adam', 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])

# Fit the model on a training set
model.fit(train_data, train_labels, 
          validation_split=0.2, 
          epochs=3, batch_size=10)

### Evaluating a CNN with test data
To evaluate a trained neural network, you should provide a separate testing data set of labeled images.  
The model you fit in the previous exercise is available in your workspace.  
#### Instructions
Evaluate the data on a separate test set: test_data and test_labels.  
Use the same batch size that was used for fitting (10 images per batch).  

In [None]:
# Evaluate the model on separate test data
model.evaluate(test_data, test_labels, batch_size=10)

```
[0.5284238457679749, 1.0]
```
The first number in the output is the value of the cross-entropy loss, the second is the value of the accuracy. For this model, it's 100%!

## Tweaking your convolutions

<img src="https://assets.datacamp.com/production/repositories/1820/datasets/6c00959d6fb39d4e14fb369cb605a6aa21562e75/no_padding_no_strides.gif">

The blue input image is larger than the green output image because the convolution kernel has the size of 3 by 3 pixels.  
In this case, it converts a 3 by 3 window into one pixel in the output image.  
One way to deal with this issue is to **zero-pad** the input image.

<img src="https://assets.datacamp.com/production/repositories/1820/datasets/d372dfbd801c83213bd137655da896e28eb7b5b3/convolution_animation.gif" width="300">

The dashed boxes around the central image are zeros that are added to the image.  
When the input image is zero padded, the output feature map has the same size as the input.  
This can be useful if you want to build a network that has many layers.  
Otherwise, you might lose a pixel off the edge of the image in each subsequent layer.  

### Zero padding in Keras

In [None]:
model.add(Conv2D(10, kernel_size=3, activation='relu', 
                 input_shape=(img_rows, img_cols, 1)),
                 padding='valid')

To implement zero padding in keras, we will use the Conv2D object's padding keyword argument.    
If we provide the value **'valid'**, **no zero padding** is added(default)  
On the other hand, if we provide the value **'same'**, **zero padding will be applied** to the input to this layer,  
so that the output of the convolution has the same size as the input into the convolution.

In [None]:
model.add(Conv2D(10, kernel_size=3, activation='relu', 
                 input_shape=(img_rows, img_cols, 1)),
                 padding='same')

Another factor that affects the size of the output of a convolution is the size of the step that we take with the kernel between input pixels.  
This is called the size of the **stride**.
<img src="https://assets.datacamp.com/production/repositories/1820/datasets/25bb19212056fe26e3f73e246a323768c1881fbf/padding_strides.gif" width="300">

In this animation the kernel is trided by two pixels in each step. This means again that the output size is smaller than the input size.  

### Strides in Keras
Strides are also implemented as a keyword argument to the Conv2D layers.  
The default is for the stride to be set to 1.  
If the stride is set to more than 1, the kernel jumps in steps of that number of pixels.  

In [None]:
model.add(Conv2D(10, kernel_size=3, activation='relu', 
                 input_shape=(img_rows, img_cols, 1)),
                 strides=1)

## Calculating the size of the output
$ O\ =\ ((I\ -\ K\ +\ 2P)\ /\ S)\ +\ 1$

where   
  
- $O$ = size of the output
- $I$ = size of the input
- $K$ = size of the kernel
- $P$ = size of the zero padding
- $S$ = strides

## Dilated convolutions

<img src="https://assets.datacamp.com/production/repositories/1820/datasets/228817e1e9e70f9d2afdba84d99cc6cf48648990/dilation.gif" width="400">

Finally, you can also tweak the spacing between the pixels affected by the kernel.  
In this case, the convolution kernel has only 9 parameters, but it has the same field of view as a kernel that would have the size 5 by 5.  
This is **useful** in cases where you **need to aggregate information across multiple scales**.  
This too is controlled through a keyword argument **'dilation_rate'**, that sets the distance between subsequent pixels.  
#### Dilation in Keras

In [None]:
model.add(Conv2D(10, kernel_size=3, activation='relu', 
                 input_shape=(img_rows, img_cols, 1)),
                 dilation_rate=2)

## Exercise

### Add padding to a CNN
Padding allows a convolutional layer to retain the resolution of the input into this layer.  
This is done by adding zeros around the edges of the input image, so that the convolution kernel can overlap with the pixels on the edge of the image.  
  
#### Instructions
Add a Conv2D layer and choose a padding such that the output has the same size as the input.  

In [None]:
# Initialize the model
model = Sequential()

# Add the convolutional layer
model.add(Conv2D(10, kernel_size=3, activation='relu', 
                 input_shape=(img_rows, img_cols, 1), 
                 padding='same'))

# Feed into output layer
model.add(Flatten())
model.add(Dense(3, activation='softmax'))

### Add strides to a convolutional network
The size of the strides of the convolution kernel determines whether the kernel will skip over some of the pixels as it slides along the image.  
This affects the size of the output because when strides are larger than one, the kernel will be centered on only some of the pixels.  
#### Instructions
Construct a neural network with a Conv2D layer with strided convolutions that skips every other pixel.

In [None]:
# Initialize the model
model = Sequential()

# Add the convolutional layer
model.add(Conv2D(10, kernel_size=3, activation='relu', 
              input_shape=(img_rows, img_cols, 1), 
              strides=2))

# Feed into output layer
model.add(Flatten())
model.add(Dense(3, activation='softmax'))

### Calculate the size of convolutional layer output
Zero padding and strides affect the size of the output of a convolution.  
What is the size of the output for an input of size 256 by 256, with a kernel of size 4 by 4, padding of 1 and strides of 2?

$ O\ =\ ((I\ -\ K\ +\ 2P)\ /\ S)\ +\ 1$

$O\ =\ ((256\ -\ 4\ +\ 2\ \dot\ 1)\ /\ 2)\ +\ 1\ =\ 128 $

## Going deeper

One of the major strengths of convolutional neural networks comes from building networks with **multiple layers** of convolutional filters.  
This is why using artificial neural networks is sometimes also called **"Deep Learning"**.

<img src="https://assets.datacamp.com/production/repositories/1820/datasets/4acb90dffc2d4226bf351ba01e164b60e3188541/conv2d_1.png" width="300">

It has one convolutional layer, followed by a flattening and readout with a fully connected layer with 3 units.

#### Implementation of this network.

In [None]:
model = Sequential()
model.add(Conv2D(10, kernel_size=2, activation='relu', 
                 input_shape=(img_rows, img_cols, 1)))
model.add(Flatten())
model.add(Dense(3, activation='softmax'))

#### Deeper network
<img src="https://assets.datacamp.com/production/repositories/1820/datasets/cdc8c93bd1986368c7e9bad5a0258f8d1d513935/Conv2D_2.png" width="400">

In [None]:
model = Sequential()
model.add(Conv2D(10, kernel_size=2, activation='relu', 
                 input_shape=(img_rows, img_cols, 1), 
                 padding='equal'))
# Second convolutional layer
model.add(Conv2D(10, kernel_size=2, activation='relu')
model.add(Flatten())
model.add(Dense(3, activation='softmax'))

### Why do we want deep networks?
<img src="https://assets.datacamp.com/production/repositories/1820/datasets/89a36121b1a50776992a9447b499efa564da411d/googlenet.png" width="600">

This is again motivated by our own visual system, which has multiple layers of processing in it.   
For example, this is the architecture of a network developed by Google researchers in 2014.  
It has 22 layers of convolutions, and some other kinds of layers, like pooling layers.  
For the time being, one way to understand why we would want a network this deep is by looking at the kinds of things that the kernels and feature maps in the different layers tend to respond to.

### Features in early layers
<img src="https://assets.datacamp.com/production/repositories/1820/datasets/3502a5aeb3de0cd032447b616583171cf65c9a12/layer2_viz.png" width="200">

<img src="https://assets.datacamp.com/production/repositories/1820/datasets/264ef75cf9c131c20ff627ffc69b1941dbff24fd/layer2_2_viz.png" width="200">


These are the kinds of things that layers in the early part of the network tend to respond to.  
**Oriented lines**, or **simple textures**.

### Features in intermediate layers
<img src="https://assets.datacamp.com/production/repositories/1820/datasets/20dcdf073a4ee899d3aeaf00f1594e42c1547bb4/layer4a_viz.png" width="200">

<img src="https://assets.datacamp.com/production/repositories/1820/datasets/8457337f9d46a7e64f8b15398cc11eae0954e090/layer4b_viz.png" width="200">


Intermediate layers of the network tend to respond to **more complex features**, that include **simple objects**, such as eyes.  

### Features in late layers
<img src="https://assets.datacamp.com/production/repositories/1820/datasets/94c904a51c33d5a64cb806b0c56c905cd2bd4a03/layer4e_viz.png" width="200">

<img src="https://assets.datacamp.com/production/repositories/1820/datasets/05b02ebadec6bb485f1d08b1a3f3673ac373aec1/layer4d_viz.png" width="200">


By the time the information travels up to higher layers of the network, the feature maps tend to extract **specific types of objects**.  
This aloows the fully connected layers at the top of the network to extract useful information for object classification based on the responses of these layers.  
In other words, **having multiple layers of convolutions in the network** allows the network to **gradually build up representations of objects in the images from simple features to more complex features** and up to **sensitivity to distinct categories of objects**.

### How deep?
- Depth comes at a computational cost
- May require more data

## Exercise
### Creating a deep learning network
A deep convolutional neural network is a network that has more than one layer.  
Each layer in a deep network receives its input from the preceding layer,  
with the very first layer receiving its input from the images used as training or test data.  
  
Here, you will create a network that has two convolutional layers.  
  
#### Instructions
The first convolutional layer is the input layer of the network. This should have 15 units with kernels of 2 by 2 pixels.  
It should have a 'relu' activation function. It can use the variables img_rows and img_cols to define its input_shape.  
The second convolutional layer receives its inputs from the first layer. It should have 5 units with kernels of 2 by 2 pixels.  
It should also have a 'relu' activation function.

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten

model = Sequential()

# Add a convolutional layer (15 units)
model.add(Conv2D(15, kernel_size=2, activation='relu', input_shape=(img_rows, img_cols, 1)))


# Add another convolutional layer (5 units)
model.add(Conv2D(5, kernel_size=2, activation='relu'))

# Flatten and feed to output layer
model.add(Flatten())
model.add(Dense(3, activation='softmax'))

### Train a deep CNN to classify clothing images
Training a deep learning model is very similar to training a single layer network. Once the model is constructed (as you have done in the previous exercise), the model needs to be compiled with the right set of parameters.  
Then, the model is fit by providing it with training data, as well as training labels.  
After training is done, the model can be evaluated on test data.  
  
The model you built in the previous exercise is available in your workspace.  
   
#### Instructions
Compile the model to use the categorical cross-entropy loss function and the Adam optimizer.  
Train the network with train_data for 3 epochs with batches of 10 images each.  
Use randomly selected 20% of the training data as validation data during training.  
Evaluate the model with test_data, use a batch size of 10.  

In [None]:
# Compile model
model.compile(optimizer='adam', 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])

# Fit the model to training data 
model.fit(train_data, train_labels, 
          validation_split=0.2, 
          epochs=3, batch_size=10)

# Evaluate the model on test data
model.evaluate(test_data, test_labels, batch_size=10)

```
Train on 40 samples, validate on 10 samples
    Epoch 1/3
    
    10/40 [======>.......................] - ETA: 0s - loss: 1.0957 - acc: 0.4000
    40/40 [==============================] - 0s 6ms/step - loss: 1.0861 - acc: 0.5750 - val_loss: 1.0628 - val_acc: 0.7000
    Epoch 2/3
    
    10/40 [======>.......................] - ETA: 0s - loss: 1.0664 - acc: 0.7000
    40/40 [==============================] - 0s 927us/step - loss: 1.0376 - acc: 0.8750 - val_loss: 0.9896 - val_acc: 1.0000
    Epoch 3/3
    
    10/40 [======>.......................] - ETA: 0s - loss: 1.0019 - acc: 1.0000
    40/40 [==============================] - 0s 935us/step - loss: 0.9547 - acc: 0.9750 - val_loss: 0.8810 - val_acc: 1.0000
    
    10/10 [==============================] - 0s 479us/step

Accuracy calculated on the test data is not subject to overfitting.
```

## How many parameters?

When considering the architecture of networks, it is sometimes useful to think about the number of parameters in the network.

### Counting parameters

In [None]:
model = Sequential()

model.add(Dense(10, activation='relu', 
          input_shape=(784,)))
model.add(Dense(10, activation='relu'))

model.add(Dense(3, activation='softmax'))

The first layer has 10 units. Each one of them is connected to each one of the pixels in the image throught a weight.  
The second layer has 10 units, and each one of these is connected to all the units in the first layer.  
Finally, each one of the units in the last layer is connected to each of the units in layer two.  
  
When you construct a Keras model, you can get a description of this model by calling the model's summary method. 

### Model summary

In [None]:
# Call the summary method 
model.summary()

```
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 10)                7850      
_________________________________________________________________
dense_2 (Dense)              (None, 10)                110       
_________________________________________________________________
dense_3 (Dense)              (None, 3)                 33        
=================================================================
Total params: 7,993
Trainable params: 7,993
Non-trainable params: 0
_________________________________________________________________
```

This tells us that the total number of parameters in the model is 7,993.

### Counting parameters

#### 1. First layer

In [None]:
model.add(Dense(10, 
                activation='relu', 
                input_shape=(784,)))

$parameters\ =\ 784\ \dot\ 10\ +\ 10\ =\ 7850$  
**Every pixel** in the image, 784 pixels, **times** **the number of units in this layer**, 10 units **+** 10 parameters for **bias** terms in every one of these units.

#### 2. Second layer

In [None]:
model.add(Dense(10, 
                activation='relu'))

$parameters\ =\ 10\ \dot\ 10\ +\ 10\ =\ =\ 110$

#### 3. Last layer

In [None]:
model.add(Dense(3,
               activation='softmax'))

$parameters\ =\ 10\ \dot\ 3\ +\ 3\ =\ 33$

Total number of parameters is 7850 + 110 + 33 = 7993

## The number of parameters in a CNN

In [None]:
model = Sequential()
model.add(Conv2D(10, kernel_size=3, activation='relu', 
                 input_shape=(28, 28, 1), padding='same'))
model.add(Conv2D(10, kernel_size=3, activation='relu', 
                 padding='same'))
model.add(Flatten())
model.add(Dense(3, activation='softmax'))

In [None]:
model.summary()

```
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)           (None, 28, 28, 10)        100       
_________________________________________________________________
conv2d_2 (Conv2D)           (None, 28, 28, 10)        910       
_________________________________________________________________
flatten_3 (Flatten)          (None, 7840)              0         
_________________________________________________________________
dense_4 (Dense)              (None, 3)                 23523     
=================================================================
Total params: 24,533
Trainable params: 24,533
Non-trainable params: 0
_________________________________________________________________
```

#### 1. First layer

In [None]:
model.add(Conv2D(10, kernel_size=3, 
          activation='relu', 
          input_shape=(28, 28, 1), 
          padding='same'))

$parameters\ =\ 9\ \dot\ 10\ +\ 10\ =\ 100$

10 kernels with 9 parameters each, + 10 bias terms.

#### 2. Second layer

In [None]:
model.add(Conv2D(10, kernel_size=3, 
                 activation='relu', 
                 padding='same'))

$parameters\ =\ 10\ \dot\ 9\ \dot\ 10\ +\ 10\ =\ 910$

Each unit is connected through a convolutional kernel to each feature map in the first layer.  
That's 10 times 9 times 10 parameters, and a bias term for each unit, which is a total of 910.

#### 3. Flatten layer

In [None]:
model.add(Flatten())

The **flatten layer has no parameters at all**.  
It just takes the output from the feature maps in layer 2 and flattens them into one big array.  

#### 4. Last layer

In [None]:
model.add(Dense(3, 
                activation='softmax'))

$parameters\ =\ 7840\ \dot\ 3\ +\ 3\ =\ 23523$

Because there is zero padding here, the convolutions leaves the same number of pixels in each subsequent layer, so we end up with 28 by 28 pixels in each feature map, with 10 feature maps or 7840 pixels in total times 3 units in the last layer is 23520 + 3 bias terms is 23523.

Adding together, we get 24533.  
So **convolutional layers don't necessarily reduce the number of parameters**.

그냥 units이 증가하는 Dense layer에서는 첫번째 layer에 parameter가 몰려있다.  
하지만 CNN에서는 Conve layer의 parameter는 별로 없고 마지막 Dense layer에 parameter가 몰려있다.  
  
One way to think about this is that the convolutions have more expressive power, so they require less parameters,  
but reading out these more expressive representations then requires many more parameters on the output side.

## Pooling operations

One of the challenges in fitting neural networks inis the large number of parameters.  
One way to mitigate this is to summarize the output of convolutional layers in concise manner.  
To do this, we can use pooling operations.   
  
For example, we might summarize a group of pixels based on its maximal value. This is called "max pooling".

<img src="https://assets.datacamp.com/production/repositories/1820/datasets/44bb48885e3c2c92a2defc3ef9695e74134e9e7c/maxpooling6.png" width="500">

<img src="https://assets.datacamp.com/production/repositories/1820/datasets/02ff70485a5583941a0699352430ad9ebbaa7d66/maxpooling_result.png" width="500">

Replace pixels with one large pixel that stores its maximal value.  
If we repeat this operation in multiple windows of size 2 by 2, we end up with an image that has a quarter of the number of the original pixels, and retains only the brightest feature in each part of the image.

### Implementing max pooling

In [None]:
result = np.zeros((im.shape[0]//2, im.shape[1]//2))

We start by allocating the output. This has half as many pixels on each dimension as the input.

In [None]:
result[0, 0] = np.max(im[0:2, 0:2])
result[0, 1] = np.max(im[0:2, 2:4])
result[0, 2] = np.max(im[0:2, 4:6])

# ...

result[1, 0] = np.max(im[2:4, 0:2])
result[1, 1] = np.max(im[2:4, 2:4])
# ...

We start from the first coordinate in the output, calculating the maximum of the image in the first two coordinates on each dimension of the input.  
Next, we slide along the window by 2 pixels along the first dimension, calculating the maximum for this window.  
We keep going like that, until we are done with the first row in the input.  
  
We then move the window to the beginning of the second row in the input, calculating the maximum for coordinates in the third and fourth rows in the input for this location.  
We continue sliding the window along.  
Ultimately, in each location in the output, we calculate the maximum for a window of 2 by 2 pixels at the corresponding location in the input.  

#### Another way of implementing this operation is with a loop.

In [None]:
for ii in range(result.shape[0]):
    for jj in range(result.shape[1]):
        result[ii, jj] = np.max(im[ii*2:ii*2+2, jj*2:jj*2+2])

In each iteration, we first select the corresponding rows: from the current row in the output, index ii, times two, and until 2 pixels beyond that.  
And the same for the internal loop on the column index jj.  
This performs the same operation that we previously broke down row by row.

### Max pooling in Keras
We can integrate max pooling operations into a Keras convolutional neural network, using the **MaxPool2D object**.

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten, MaxPool2D
model = Sequential()
model.add(Conv2D(5, kernel_size=3, activation='relu', 
              input_shape=(img_rows, img_cols, 1)))
model.add(MaxPool2D(2))
model.add(Conv2D(15, kernel_size=3, activation='relu', 
              input_shape=(img_rows, img_cols, 1)))
model.add(MaxPool2D(2))
model.add(Flatten())
model.add(Dense(3, activation='softmax'))

After each convolutional layer, we'll add a pooling layer.  
The input to the MaxPool2D object, two in this case, is the size of the pooling window.  
That means that here pooling will take the max over a window of two by two pixels from the input for each location in the output.  

In [None]:
model.summary()

```
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 26, 26, 5)         50        
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 5)         0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 15)        690       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 15)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 375)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 3)                 1128      
=================================================================
Total params: 1,868
Trainable params: 1,868
Non-trainable params: 0
_________________________________________________________________
```

We can see that using the pooling operatino dramatically reduces the number of parameters in the model.  
Instead of more than 30,000 parameters that this model had with no pooling operations, we now have less than 2,000 parameters.    

## Exercise

### Write your own pooling operation
As we have seen before, CNNs can have a lot of parameters.  
Pooling layers are often added between the convolutional layers of a neural network to summarize their outputs in a condensed manner, and reduce the number of parameters in the next layer in the network.  
This can help us if we want to train the network more rapidly, or if we don't have enough data to learn a very large number of parameters.  
  
A pooling layer can be described as a particular kind of convolution.   
For every window in the input it finds the maximal pixel value and passes only this pixel through.   
In this exercise, you will write your own max pooling operation, based on the code that you previously used to write a two-dimensional convolution operation.  
  
#### Instructions
Index into the input array (im) and select the right window.  
Find the maximum in this window.  
Allocate this into the right entry in the output array (result).  

In [None]:
# Result placeholder
result = np.zeros((im.shape[0]//2, im.shape[1]//2))

# Pooling operation
for ii in range(result.shape[0]):
    for jj in range(result.shape[1]):
        result[ii, jj] = np.max(im[ii*2:ii*2+2, jj*2:jj*2+2])

The resulting image is **smaller**, but **retains the salient features** in every location.

### Keras pooling layers
Keras implements a pooling operation as a layer that can be added to CNNs between other layers.  
In this exercise, you will construct a convolutional neural network similar to the one you have constructed before:  
  
Convolution => Convolution => Flatten => Dense  
  
However, you will also add a pooling layer.  
The architecture will add a single max-pooling layer between the convolutional layer and the dense layer with a pooling of 2x2:  
  
Convolution => Max pooling => Convolution => Flatten => Dense  
  
A Sequential model along with Dense, Conv2D, Flatten, and MaxPool2D objects are available in your workspace.  
  
#### Instructions
Add an input convolutional layer (15 units, kernel size of 2, relu activation).  
Add a maximum pooling operation (pooling over windows of size 2x2).  
Add another convolution layer (5 units, kernel size of 2, relu activation).  
Flatten the output of the second convolution and add a Dense layer for output (3 categories, softmax activation).  

In [None]:
# Add a convolutional layer
model.add(Conv2D(15, kernel_size=2, activation='relu', 
                 input_shape=(img_rows, img_cols, 1)))

# Add a pooling operation
model.add(MaxPool2D(2))

# Add another convolutional layer
model.add(Conv2D(5, kernel_size=2, activation='relu'))

# Flatten and feed to output layer
model.add(Flatten())
model.add(Dense(3, activation='softmax'))
model.summary()

```
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 27, 27, 15)        75        
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 15)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 12, 12, 5)         305       
_________________________________________________________________
flatten_1 (Flatten)          (None, 720)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 3)                 2163      
=================================================================
Total params: 2,543
Trainable params: 2,543
Non-trainable params: 0
_________________________________________________________________
```

This model is even deeper, but has fewer parameters.

### Train a deep CNN with pooling to classify images
Training a CNN with pooling layers is very similar to training of the deep networks that y have seen before.  
Once the network is constructed (as you did in the previous exercise), the model needs to be appropriately compiled, and then training data needs to be provided, together with the other arguments that control the fitting procedure.  
  
The following model from the previous exercise is available in your workspace:  
  
Convolution => Max pooling => Convolution => Flatten => Dense  
  
#### Instructions
Compile this model to use the categorical cross-entropy loss function and the Adam optimizer.  
Train the model for 3 epochs with batches of size 10.  
Use 20% of the data as validation data.  
Evaluate the model on test_data with test_labels (also batches of size 10).  

In [None]:
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit to training data
model.fit(train_data, train_labels, epochs=3, batch_size=10, validation_split=0.2)

# Evaluate on test data 
model.evaluate(test_data, test_labels, batch_size=10)

## Tracking learning

During learning, the weights used by the network change, and as they change, the network becomes more attuned to the features of the images that discriinate between calsses.  
This means that the loss function we use for training becomes smaller and smaller.  
Looking at the change in the loss with learning can be helpful to see whether learning is progressing as expected, and whether the network has learned enough.  
As long as learning is progressing well, we might expect the loss function to keep going down.


Epoch마다 Loss function을 그려보면 Training set에대해선 계속 내려가는데, Validation set에 대해서는 어느순간 다시 계속 증가하는 곳이 존재.  
뉴럴 네트워크가 매우 많은 파라미터들을 갖고 있기 때문에 학습을 계속할수록 트레이닝 데이터에 대해서는 정확하게 맞춰갈 수 있다.  
하지만 트레이닝 데이터만 맞춰지고 일반화 되지 않는다! **Overfitting**  
Validation loss가 증가하기 전에 학습을 마치자!

### Plotting training curves

In [None]:
training = model.fit(train_data, train_labels, epochs=3, validation_split=0.2)

import matplotlib.pyplot as plt

plt.plot(training.history['loss'])
plt.plot(training.history['vall_loss'])
plt.show()

### Storing the optimal parameters

오버피팅 전에 최적의 파라미터를 찾는 법.

In [None]:
from keras.callbacks import ModelCheckpoint

# This checkpoint object will store the model parameters in the file "weights.hdf5"
checkpoint = ModelCheckpoint('weights.hdf5', monitor='val_loss', save_best_only=True)

# Store in a list to be used during training
callbacks_list = [checkpoint]

# Fit the model on a training set, using the checkpoint as a callback
model.fit(train_data, train_labels, validation_split=0.2, epochs=3, callbacks=callbacks_list)

매 epoch가 끝날때마다 수행됨.  
weights를 저장. initialize하면 weights.hdf5라는 파일이 생성.  
val_loss를 monitor해서 validation loss가 나아지면(낮아지면) 그 때 weights 값을 덮어씀.  
즉 val_loss가 가장 낮을때의 weights가 저장됨.  
checkpoint object는 리스트로 저장해서 model.fit의 callbacks 파라미터로 넣어줌.  
fitting이 완료되면 best parameters가 저장..

### Loading stored parameters

In [None]:
model.load_weights('weights.hdf5')
model.predict_classes(test_data)

일단 모델을 다시 똑같이 초기화 하고 load_weights 메소드를 이용하여 최적의 파라미터를 불러옴.

## Exercise

### Plot the learning curves
During learning, the model will store the loss function evaluated in each epoch.  
Looking at the learning curves can tell us quite a bit about the learning process.  
In this exercise, you will plot the learning and validation loss curves for a model that you will train.  
  
#### Instructions
Fit the model to the training data (train_data).  
Use a validation split of 20%, 3 epochs and batch size of 10.  
Plot the training loss.  
Plot the validation loss.  

In [None]:
import matplotlib.pyplot as plt

# Train the model and store the training object
training = model.fit(train_data, train_labels, validation_split=0.2, epochs=3, batch_size=10)

# Extract the history from the training object
history = training.history

# Plot the training loss 
plt.plot(history['loss'])
# Plot the validation loss
plt.plot(history['val_loss'])

# Show the figure
plt.show()

### Using stored weights to predict in a test set
Model weights stored in an hdf5 file can be reused to populate an untrained model.  
Once the weights are loaded into this model, it behaves just like a model that has been trained to reach these weights.  
For example, you can use this model to make predictions from an unseen data set (e.g. test_data).  
  
#### Instructions
Load the weights from a file called 'weights.hdf5'.  
Predict the classes of the first three images from test_data.  

In [None]:
# Load the weights from file
model.load_weights('weights.hdf5')

# Predict from the first three images in the test data
model.predict(test_data[:3])

## Regularization

How do we prevent over-fitting and make the best out of our training data?  
One of the strategies that has proven effective is regularization.  
  
Here we'll discuss two strategies for regularization of convolutional neural networks.  
### 1. The first strategy is called **"dropout"**. In each learning step:
- Select a subset of the units
    - We choose a random subset of the units in a layer
- Ignore it in the forward pass and in the back-propagation of error
    - Ignored both on the forward pass throught the network, as well as in the back-propagation stage.
- 2014년에 처음 소개..
- Dropout allows us to train many different networks on different parts of the data.
- If part of the network becomes too sensitive to some noise in the data, other parts will compensate for this, because they havent' seen the samples with this noise.
- It also helps prevent different units in the network from becoming overly correlated in their activity.

### Dropout in Keras
In Keras, dropout i simplemented as a layer.

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Conv2D , Flatten, Dropout

model = Sequential()
model.add(Conv2D(5, kernel_size=3, activation='relu', input_shape=(img_rows, img_cols, 1)))

model.add(Dropout(0.25))   # Proportion of the units in the layer to ignore in each learning step.

model.add(Conv2D(15, kernel_size=3, activation='relu'))
model.add(Flatten())
model.add(Dense(3, activation='softmax'))

### 2. Batch Normalization
- Rescale the outputs
    - Takes the output of a particular layer
    - and rescales it so that it always has 0 mean and standard deviation of 1 in every batch of training.
- 2015년 Sergey loffe and Christian Szegedy의 논문에서 제안됨.
- This algorithm solves the problem where different batches of input might produce wildly different distributions of outputs in any given layer in the network.
- Because the adjustments to the weights through back-propagation depends on the activation of the units in every step of learning, this means that the network may progress very slowly through training.

### Batch Normalization in Keras
Batch normalization is also implented as another type of layer that can be added after each one of the layers whose output should be normalized.

In [None]:
from keras.models import sequential
from keras.layers import Dense, Conv2D, Flatten, BatchNormalization

model = Sequential()
model.add(Conv2D(5, kernel_size=3, activation='relu', input_shape=(img_rows, img_cols, 1)))
model.add(BatchNormalization())
model.add(Conv2D(15, kernel_size=3, activation='relu'))
model.add(Flatten())
model.add(Dense(3, activation='softmax'))

### Be careful when using them together!
- The disharmony between dropout and batch normalization.
    - Dropout slows down learning, making it more incremental and careful,
    - Batch Normalization tends to make learning go faster.
    - Their effects together may in fact counter each other, 
    - Networks sometimes perform worse when both of these methods are used together than they would if neither were used.

## Exercise

### Adding dropout to your network
Dropout is a form of regularization that removes a different random subset of the units in a layer in each round of training.  
In this exercise, we will add dropout to the convolutional neural network that we have used in previous exercises:  
  
Convolution (15 units, kernel size 2, 'relu' activation)  
Dropout (20%)  
Convolution (5 units, kernel size 2, 'relu' activation)  
Flatten  
Dense (3 units, 'softmax' activation)  
A Sequential model along with Dense, Conv2D, Flatten, and Dropout objects are available in your workspace.  
  
#### Instructions
Add dropout applied to the first layer with 20%.  
Add a flattening layer.  

In [None]:
# Add a convolutional layer
model.add(Conv2D(15, kernel_size=2, activation='relu', 
                 input_shape=(img_rows, img_cols, 1)))

# Add a dropout layer
model.add(Dropout(0.2))

# Add another convolutional layer
model.add(Conv2D(5, kernel_size=2, activation='relu'))

# Flatten and feed to output layer
model.add(Flatten())
model.add(Dense(3, activation='softmax'))

### Add batch normalization to your network
Batch normalization is another form of regularization that rescales the outputs of a layer to make sure that they have mean 0 and standard deviation 1.  
In this exercise, we will add batch normalization to the convolutional neural network that we have used in previous exercises:  

```
Convolution (15 units, kernel size 2, 'relu' activation)
Batch normalization
Convolution (5 unites, kernel size 2, 'relu' activation)
Flatten
Dense (3 units, 'softmax' activation)
A Sequential model along with Dense, Conv2D, Flatten, and Dropout objects are available in your workspace.
```
  
#### Instructions
Add the first convolutional layer.  
You can use the img_rows and img_cols objects available in your workspace to define the input_shape of this layer.  
Add batch normalization applied to the outputs of the first layer.  

In [None]:
# Add a convolutional layer
model.add(Conv2D(15, kernel_size=2, activation='relu', input_shape=(img_rows, img_cols, 1)))

# Add batch normalization layer
model.add(BatchNormalization())

# Add another convolutional layer
model.add(Conv2D(5, kernel_size=2, activation='relu'))

# Flatten and feed to output layer
model.add(Flatten())
model.add(Dense(3, activation='softmax'))

## Interpreting the model
One of the main criticisms of convolutional neural networks is that hey are **"Black boxes"** and that even when they work very well, it is hard to understand why they work so well.  
Many efforts are being made to improve the interpretability of neuralnetworks, and this field is likely to evolve rapidly in the next few years.   
One of the major thrusts of this evolution is that people are interested in visualizing what different parts of the network are doing.

### Selecting layers
Once a model is constructed and compiled, it will store its layers in an attribute called "layers".  
This is a list of layer objects.

In [None]:
model.layers

```
[<keras.layers.convolutional.Conv2D at 0x109f10c18>,
 <keras.layers.convolutional.Conv2D at 0x109ec5ba8>,
 <keras.layers.core.Flatten at 0x1221ffcc0>,
 <keras.layers.core.Dense at 0x1221ffef0>]
 ```

### Getting model weights
If we want to look at the first convolutional layer, we can pull it out by indexing the first item in this list.  
The weights for this layer are accessible through the get_weights() method.

In [None]:
conv1 = model.layers[0]
weights1 = conv1.get_weights()
len(weights1)  # 2

This method returns a list with two items.  
1. The first item in this list is an array that holds the values of the weights for the convolutional kernels for this layer.

In [None]:
kernels1 = weights1[0]
kernels1.shape # (3, 3, 1, 5)

The kernels array has the shape 3 by 3 by 1 by 5.  
The **first 2 dimensions** denote the **kernel size**.  
This network was initialized with kernel size of 3  
The **third dimension** denotes **the number of channels in the kernels**.  
This is one because the network was looking at black and white data.  
The **last dimension** denotes **the number of kernels in this layer**: 5

In [None]:
kernel1_1 = kernels1[:, :, 0, 0]
kernel1_1.shape # (3, 3)

To pull out the first kernel in this layer, we would use the index 0 into the last dimension.  
Because there is only one channel, we can also index on the channel dimension, to collapse that dimension.  
This would return the 3 by 3 array containing this convolutional kernel.

### Visualizing the kernel
We can then visualize this kernel directly, but understanding what kinds of features this kernel is responding to may be hard just from direct observation.

In [None]:
plt.imshow(kernel1_1)

<img src="https://assets.datacamp.com/production/repositories/1820/datasets/03f3b5f99d57f9e1ad9d7b10504e987925b265ae/kernel1_1.png" width="300">

### Visualizing the kernel responses
To understand what this kernel does, it might sometimes be even more useful to convolve one of the images from our test set with this kernel and see what aspects of the image are emphasized by this kernel.  

In [None]:
test_image = test_data[3, :, :, 0]   # The fourth image from the test_set
plt.imshow(test_image)

<img src="https://assets.datacamp.com/production/repositories/1820/datasets/eaab5e32e6e2007474d61f31180097f9958aad09/test_img3.png" width="300">

In [None]:
filtered_image = convolution(test_image, kernel1_1)
plt.imshow(filtered_image)

We convolve it with the kernel using the function that we created previously, and create a filtered image that is the result of this convolution.  
<img src="https://assets.datacamp.com/production/repositories/1820/datasets/13f309b9d8acd32a3799654104ca600f2b90d832/conv1_1.png" width="300">

This filter seems to like the external edges of this image on the left.

다른 kernel, 사진에 대해 반복해서 확인하면서 영감 얻기.. 네트워크를 해석하는데 이용할 수 있다.

## Exercise
### Extracting a kernel from a trained network
One way to interpret models is to examine the properties of the kernels in the convolutional layers.  
In this exercise, you will extract one of the kernels from a convolutional neural network with weights that you saved in a hdf5 file.
  
#### Instructions
Load the weights into the model from the file weights.hdf5.  
Get the first convolutional layer in the model from the layers attribute.  
Use the .get_weights() method to extract the weights from this layer.  

In [None]:
# Load the weights into the model
model.load_weights('weights.hdf5')

# Get the first convolutional layer from the model
c1 = model.layers[0]

# Get the weights of the first convolutional layer
weights1 = c1.get_weights()

# Pull out the first channel of the first kernel in the first layer
kernel = weights1[0][...,0, 0]
print(kernel)

```
[[ 0.03504268  0.4328133 ]
 [-0.17416623  0.4680562 ]]
```

### Shape of the weights
A Keras neural network stores its layers in a list called model.layers.  
For the convolutional layers, you can get the weights using the .get_weights() method.  
This returns a list, and the first item in this list is an array representing the weights of the convolutional kernels.  
If the shape of this array is (2, 2, 1, 5), what does the first number `2` represent?  
  
#### Instructions
Possible Answers  
1. There are 2 channels in black and white images.  
2. The kernel size is 2 by 2.
3. The model used a convolutional unit with 2 dimensions.
4. There are 2 convolutional layers in the network.
  
The answer is 2, each of the 2s in this shape is one of the dimensions of the kernel.

### Visualizing kernel responses
One of the ways to interpret the weights of a neural network is to see how the kernels stored in these weights "see" the world.  
That is, what properties of an image are emphasized by this kernel.  
In this exercise, we will do that by convolving an image with the kernel and visualizing the result.  
Given images in the test_data variable, a function called extract_kernel() that extracts a kernel from the provided network, and the function called convolution() that we defined in the first chapter, extract the kernel, load the data from a file and visualize it with matplotlib.  
  
A deep CNN model, a function convolution(), along with the kernel you extracted in an earlier exercise is available in your workspace. 
  
#### Instructions
Use the convolution() function to convolve the extracted kernel with the first channel of the fourth item in the image array.  
Visualize the resulting convolution with imshow().

In [None]:
import matplotlib.pyplot as plt

# Convolve with the fourth image in test_data
out = convolution(test_data[3, :, :, 0], kernel)

# Visualize the result
plt.imshow(out)
plt.show()



### What did we learn?
- Image classification
- Convolutions
- Reducing the number of parameters
    - Tweaking your convolutions
    - Adding pooling layers
- Improving your network
    - Regularization
- Understanding your network
    - Monitoring learning
    - Interpreting the parameters

#### Model interpretation
https://distill.pub/2017/feature-visualization/

## What next?
- Even deeper networks
### Residual networks
<img src="https://assets.datacamp.com/production/repositories/1820/datasets/8bd8fc985a4e9ed8c3081c05a874210e826220c4/resnet.png" width="400">
These include connections that skip over several layers, and they are called residual networks because the network will use this skipped connectino to compute a difference between the input of a stack of layers and their output.  
Learning this difference, or residual, turns out to often be easier than learning the output.  
This means that these networks have been surprisingly effective at tasks such as classification.

### Transfer learning
In this approach an already-trained network is adapted to a new task.  
It's a great strategy for cases where you don't have a lot of data

### Fully Convolutional Networks
Take an image as input and produce another image as output.  
<img src="https://assets.datacamp.com/production/repositories/1820/datasets/9191cd9da78093f76937101eea8aed1cda994c2c/fully-convolutional.png" width="500">
For example, these networks can be used to find the part of an image that contains a particular kind of object doing segmentation rather than classification.

### Generative Adversarial Networks(GAN)
<img src="https://assets.datacamp.com/production/repositories/1820/datasets/0625067ec02a09a3557dc5481feb5a96ec18f59f/gan.png" width="500">
These complex architectures can be used to train a network to create new images that didn't exist before.