# Chapter 8 Introduction to Deep Learning for Computer Vision

This chapter introduces convolutional neural networks, also known as convnets, the
 type of deep learning model that is now used almost universally in computer vision
 applications. You’ll learn to apply convnets to image-classification problems—in particular those involving small training datasets, which are the most common use case i 
 you aren’t a large tech company

## 8.1 Introduction to convnets

First, let’s take a practical look at a simple convnet example that classifies MNIST digits, a task we performed in chapter 2 using a
densely connected network (our test accuracy then was 97.8%). Even though the
convnet will be basic, its accuracy will blow our densely connected model from chapter 2 out of the water

The following listing shows what a basic convnet looks like. 

It’s a stack of Conv2D and MaxPooling2D layers. 

You’ll see in a minute exactly what they do. We’ll build the
model using the Functional API, which we introduced in the previous chapter.

In [1]:
from tensorflow import keras
from keras import layers

In [3]:
inputs = keras.Input(shape=(28,28,1))

x = layers.Conv2D(filters=32,kernel_size=3,activation='relu')(inputs)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64,kernel_size=3,activation='relu')(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128,kernel_size=3,activation='relu')(x)
x = layers.Flatten()(x)
outputs = layers.Dense(10,activation='softmax')(x)

model = keras.Model(inputs = inputs,outputs=outputs)




Importantly, a convnet takes as input tensors of shape __(image_height, image_width,
image_channels)__, not including the batch dimension. 

In this case, we’ll configure the
convnet to process inputs of size __(28, 28, 1)__, which is the format of MNIST images.

Listing 8.2 Displaying the model’s summary

In [4]:
model.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 conv2d_3 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 13, 13, 32)       0         
 2D)                                                             
                                                                 
 conv2d_4 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 conv2d_5 (Conv2D)           (None, 3, 3, 128)         7385

1. You can see that the output of every Conv2D and MaxPooling2D layer is a rank-3 tensor
of shape (height, width, channels). The width and height dimensions tend to
shrink as you go deeper in the model. The number of channels is controlled by the
first argument passed to the Conv2D layers (32, 64, or 128).


2. After the last Conv2D layer, we end up with an output of shape (3, 3, 128)—a 3 × 3
feature map of 128 channels. The next step is to feed this output into a densely connected classifier like those you’re already familiar with: a stack of Dense layers. These
classifiers process vectors, which are 1D, whereas the current output is a rank-3 tensor.


3. To bridge the gap, we flatten the 3D outputs to 1D with a Flatten layer before adding
the Dense layers.
 
 
4. Finally, we do 10-way classification, so our last layer has 10 outputs and a softmax
activation.

Now, let’s train the convnet on the MNIST digits. We’ll reuse a lot of the code from
 the MNIST example in chapter 2. 
 
Because we’re doing 10-way classification with a
 softmax output, we’ll use the categorical crossentropy loss, and because our labels are
 integers, we’ll use the sparse version, sparse_categorical_crossentropy.

Listing 8.3 Training the convnet on MNIST images

In [9]:
from keras.datasets import mnist
import numpy as np

(train_images,train_labels),(test_images,test_labels) = mnist.load_data()
train_images = train_images.reshape((60000,28,28,1))
train_images = train_images.astype('float32')/255

test_images = test_images.reshape((10000,28,28,1))
test_images = test_images.astype('float32')/255



In [11]:
model.compile(
    optimizer = keras.optimizers.RMSprop(),
    loss = keras.losses.SparseCategoricalCrossentropy(),
    metrics = ['accuracy']
)

model.fit(train_images,train_labels,epochs=5,batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x17382379370>

Let's check the evaluation accuacy

In [12]:
test_loss, test_acc = model.evaluate(test_images,test_labels)
print("Test Accuracy: {}".format(test_acc))

Test Accuracy: 0.991599977016449


### 8.1.1 The convolution operation