<a href="https://colab.research.google.com/github/JingQian87/AppliedDL/blob/master/Note_DLwPython_C5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Chapter 5. Deep learning for computer vision

**Convnets**: convolutional neural network, in CV application.

In [2]:
from keras import layers
from keras import models

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

Using TensorFlow backend.


Instructions for updating:
Colocations handled automatically by placer.


convnets take input insors (height, width, channels/depth).
每个conv2D & MaxPooling2D layer输出是3D tesnsor of (height, width, channels).
随着go deeper in the network, height & width 会shrink，而channels 数量是Conv2D的第一个参数（32/64）。

接下来将结果（3，3，64）输出到densely connected classifier network: a stack of Dense Layers. 因为classifier输入是1维的，所以要把convnets的输出flatten.

In [0]:
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

In [4]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                36928     
__________

这里是10-way classifier。而在conv2d到dense的flatten是3*3*64 = 576.

In [5]:
# Training the convnet on MNIST images
from keras.datasets import mnist
from keras.utils import to_categorical
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=64)

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
Instructions for updating:
Use tf.cast instead.
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f11e9fdfda0>

In [6]:
# Evaluate the model on the test data:
test_loss, test_acc = model.evaluate(test_images, test_labels)
test_acc



0.9911

## 5.1.1. The convolution operation
#### 1. Fundamental difference between dense and convulution layer: <br>
dense learns **global** patterns,<br>
convnets learns **local** patterns.  

#### 2. Two properties of convnets:<br>
2.1. **Patterns learnt are translation invariant**. efficient when processing images, need fewer training samples for generalization power.<br>
2.2. **Can learn spatial hierarchies of patterns**. 第一层是小的局域pattern，比如edges; 第二层会是第一层feature组合成的大点的pattern等等。efficiently learn increasingly complex and abstract visual concepts.

#### 3. Feature maps: 
3D tensors, 2 spatial axes (height+width) + depth/channels axis.
对于RGB，# in depth = 3; 黑白，# in depth = 1.

convnets输出的还是3D tensors => output feature map. 这里depth不再是输入的代表颜色的，#channels由参数确定，被称为filters, 理解为"presence of a fact in the input".

#### 4. Two key parameters of convolutions. Conv2D(output_depth, (window_height, window_width))
4.1. **size of the patches extracted from inputs**，通常3x3或5x5，就是Conv2D(32, (3, 3))里的(3,3)。

4.2. **Depth of the output future map**, 这里我们开始是32，结束是64.

#### 5. Concolution work流程(Fig5.4)
在input feature map上滑动, 比如截取3 x 3 x input depth的patches; <br>
patches与convolutional kernel点乘变成1D (1 x output depth)的transformed patches; <br>
将上面的组装起来，变成3 x 3 x output depth 的output feature map.

#### 6. Border effects and padding
**Border effects**, 如果逐渐挪动，每个patches输出一列结果，得到的output与input相比，height-2且width-2。比如上面的例子中，28x28变成了26 x 26.

如果希望输出仍然是28 x 28，就需要padding. 比如在上下各垫一行，左右各垫一列。

tf.layers.conv2d(..., padding='valid',...)这里可以选'valid'或'same'。'valid'是no padding, 默认；'same'意思是输出与输入的height, width相同。


#### 6. Convolution strides
tf.layers.conv2d(..., strides=(1, 1),...)，单次滑动步长。默认是1.比如5x5的input 取3x3 convolution，一共是（5-2)x(5-2)=9个。
实际运用中，downsample feature maps, 不用这个，而是max-pooling。



## 5.1.2. The max-pooling operation
#### 1.Role: downsample feature maps.
从input feature maps中提取windows, 然后输出每个channel最大值。
与convolution不同，它通过hardcoded max tensor operation进行(而不是用convolution kernal transform)。
常用的事2x2 window, stride 2,这样就可以downsample the feature maps by 2. 
从之前的model summary可以看到，size从26x26变成13x13。

#### 2. 比逐步conv2d的好处：
2.1. 有利于spatial hierarchy of features. <br>
2.2. 不会像后者一样产生超级多的参数，造成overfitting.

#### 3. 其它downsampling methods
还有是取average pooling的，就是take average value of each channel over the patch, 而不是max.
但是max pooling结果更好。

最合理的方式是：<br>
first produce dense maps of features (via unstrided convolutions) <br>
look at the maximal activation of the features over small patches <br>
也就是像上面的程序里一样，先conv2d产生很多小的patches, 然后对小patches进行maxpooling.

<font color=red>可是为什么要conv2d, maxpooling间隔进行呢？为什么不能conv2d, conv2d, maxpooling, maxpooling? 有什么讲究么 <font>

# 5.2. Training a convnet from scratch on a small dataset




In [0]:
layers.Cov2D()