# 5.1 - Introduction to convnets

- We will use our first convnet to classify MNIST digits.
- Previously, a densely-connected network attained a test accuracy of 97.8%
- An accuracy that is easily surpassed by our basic convnet. 

----

- The six lines of code below show demonstrate a basic convnet. 
- It's a stack of `Conv2D` and `MaxPooling2D` layers. 
- Input tensors of shape `(image_height, image_width, image_channels)` (not including the batch dimension). 
- In our case, we will configure our convnet to process inputs of size `(28, 28, 1)`, which is the format of MNIST images. 
- We do this via passing the argument `input_shape=(28, 28, 1)` to our first layer.

In [1]:
import keras
keras.__version__

Using TensorFlow backend.


'2.3.1'

In [6]:
from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

- Let's display the architecture of our convnet so far:

In [7]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 3, 3, 64)          36928     
Total params: 55,744
Trainable params: 55,744
Non-trainable params: 0
_________________________________________________________________


- The output of every `Conv2D` and `MaxPooling2D` layer is a 3D tensor of shape `(height, width, channels)`. 
- The width and height dimensions tend to shrink as we go deeper in the network. 
- The number of channels is controlled by the first argument passed to the `Conv2D` layers (e.g. 32 or 64).

----

- The next step would be to feed our last output tensor (of shape `(3, 3, 64)`) into a densely-connected classifier network 
- a stack of `Dense` layers. 
- These classifiers process vectors, which are 1D, whereas our current output is a 3D tensor. 
- So first, we will have to flatten our 3D outputs to 1D, and then add a few `Dense` layers on top:

In [8]:
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

- We are going to do 10-way classification, so we use a final layer with 10 outputs and a softmax activation. 
- Now here's what our network looks like:

In [9]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_2 (Flatten)          (None, 576)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 64)               

As you can see, our `(3, 3, 64)` outputs were flattened into vectors of shape `(576,)`, before going through two `Dense` layers.

Now, let's train our convnet on the MNIST digits. We will reuse a lot of the code we have already covered in the MNIST example from Chapter 
2.

In [10]:
from keras.datasets import mnist
from keras.utils import to_categorical

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

In [11]:
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x10afa1860>

- Let's evaluate the model on the test data:

In [8]:
test_loss, test_acc = model.evaluate(test_images, test_labels)



In [9]:
test_acc

0.9894

- An improvement on the densely-connected network result from Chapter 2

## 5.1.1 The convolution operator

- Dense layers learn global patterns, convolution layers learn local patterns.
- These local patterns are *translation invariant*.
    - A pattern learn in (say) the lower-left corner can be regognised anywhere (e.g. top-right corner).
    - A dense layer would have to re-learn a local pattern at a different location.
    - Convnets are efficient in visual tasks becasue *the visual world is fundamentally translation invariant*.
    - They need fewer training samples to learn representaions with generalisation power.
- Convnets learn *spatial pattern hierarchies*. 
    - The first layer might learn edges, the second layer learns patterns of edges and so on.
    - They learn increasingly complex and abstract patterns because *the visual world is fundamentally hierarchical*.

In [74]:
import numpy as np
#print(test_images[0])
print(test_labels[950])
lool=test_images[1600]
lool = lool.reshape((1, 28, 28, 1))
#lool=lool.astype('float32') / 255
print(type(lool))
prediction=model.predict_classes(np.array(lool))
prediction1=model.predict(np.array(lool))
print(prediction)
print(prediction1)

loool=test_images[7382]
loool = loool.reshape((1, 28, 28, 1))
loool=loool.astype('float32') / 255
print(type(loool))
predictionn=model.predict_classes(np.array(loool))
predictionn1=model.predict(np.array(loool))
print(predictionn)
print(predictionn1)

[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
<class 'numpy.ndarray'>
[3]
[[1.4265431e-15 4.5707474e-12 8.8261096e-15 9.9678457e-01 1.3733521e-14
  3.2154343e-03 2.6658595e-12 3.0703218e-13 8.0104261e-09 3.7800847e-09]]
<class 'numpy.ndarray'>
[8]
[[0.09769746 0.09149734 0.08825679 0.10723402 0.09863829 0.0974934
  0.09340783 0.09571445 0.1349504  0.09511005]]


In [59]:
[8]
[[0.09781159 0.09149142 0.08806807 0.10727475 0.09843362 0.09767821
  0.09345993 0.09571585 0.13497144 0.09509517]]

SyntaxError: invalid syntax (<ipython-input-59-4ae9fd1ffb20>, line 2)

In [72]:
print(train_images[4])

[[[0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]]

 [[0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]]

 [[0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0.        ]
  [0. 