# Deep Learning for computer vision

The problem we’re trying to solve here is to classify grayscale images of handwritten digits (28 × 28 pixels) into their 10 categories (0 through 9). We’ll use the MNIST dataset, it’s a set of 60,000 training images, plus 10,000 test images, assembled by the National Institute of Standards and Technology (the NIST in MNIST) in the 1980s. 

First we load the MINST dataset

In [1]:
from keras.datasets import mnist

(train_images_orig, train_labels_orig), (test_images_orig, test_labels_orig) = mnist.load_data()

Let's look at the training data.

In [2]:
train_images_orig.shape

(60000, 28, 28)

In [3]:
len(train_labels_orig)

60000

In [4]:
train_labels_orig

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

Here is the test data.

In [5]:
test_images_orig.shape

(10000, 28, 28)

In [6]:
len(test_labels_orig)

10000

In [7]:
test_labels_orig

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

## Dense network

First we will build a densly connected network

In [8]:
from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28*28,)))
network.add(layers.Dense(10, activation='softmax'))

In [9]:
network.compile(optimizer='adam',
             loss='categorical_crossentropy',
             metrics=['acc'])

Before training, we’ll preprocess the data by reshaping it into the shape the network expects and scaling it so that all values are in the [0, 1] interval. Previously, our training images, for instance, were stored in an array of shape (60000, 28, 28) of type uint8 with values in the [0, 255] interval. We transform it into a float32 array of shape (60000, 28 * 28) with values between 0 and 1

In [10]:
train_images = train_images_orig.reshape((60000,28*28))
train_images = train_images.astype('float32') / 255

test_images = test_images_orig.reshape((10000,28*28))
test_images = test_images.astype('float32') / 255

In [11]:
train_images.shape

(60000, 784)

Next we categorical encode the labels.

In [12]:
from keras.utils import to_categorical

train_labels = to_categorical(train_labels_orig)
test_labels = to_categorical(test_labels_orig)

In [13]:
train_labels

array([[0., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.]], dtype=float32)

In [14]:
train_labels.shape

(60000, 10)

Next we train the network.

In [15]:
network.fit(train_images,train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1e2d131d580>

Next we check how the network performs on the test data.

In [16]:
test_loss, test_acc = network.evaluate(test_images,test_labels)



In [17]:
print(f'test accuracy:{test_acc} and test loss:{test_loss}')

test accuracy:0.9786999821662903 and test loss:0.06821893900632858


The test-set accuracy turns out to be 97.8%—that’s quite a bit lower than the training set accuracy. This gap between training accuracy and test accuracy is an example of overfitting: the fact that machine-learning models tend to perform worse on new data than on their training data

## Convolutional Network (Convnets)

Now let's use convnet to classify the MNIST digits. The following lines of code show you what a basic convnet looks like. It’s a stack of Conv2D and MaxPooling2D layers.

In [18]:
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3),activation='relu', input_shape=(28,28,1)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3), activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3),activation='relu'))

model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 3, 3, 64)          36928     
                                                                 
Total params: 55,744
Trainable params: 55,744
Non-trai

The next step is to feed the last output tensor (of shape (3, 3, 64)) into a densely connected classifier network like those you’re already familiar with: a stack of Dense layers. These classifiers process vectors, which are 1D, whereas the current output is a 3D tensor. First we have to flatten the 3D outputs to 1D, and then add a few Dense layers on top.

In [19]:
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

In [20]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 3, 3, 64)          36928     
                                                                 
 flatten (Flatten)           (None, 576)              

Now, let’s train the convnet on the MNIST digits

In [21]:
train_images_conv = train_images_orig.reshape((60000,28,28,1))
train_images_conv = train_images_conv.astype('float32') / 255

test_images_conv = test_images_orig.reshape((10000,28,28,1))
test_images_conv = test_images_conv.astype('float32') / 255

In [22]:
model.compile(optimizer='adam',
             loss='categorical_crossentropy',
             metrics=['acc'])
model.fit(train_images_conv,train_labels,epochs=5,batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1e2d2a94580>

In [23]:
test_loss_conv, test_acc_conv = model.evaluate(test_images_conv,test_labels)
test_acc_conv



0.9904999732971191

Whereas the densely connected network had a test accuracy of 97.8%, the basic convnet has a test accuracy of 99.3%: we decreased the error rate by 68% (relative). Not bad!