##  Convolution Neural Network

In [3]:
import tensorflow as tf


image_batch = tf.constant([
    [
        [[0, 255, 0], [0, 255, 0], [0, 255, 0]],
        [[0, 255, 0], [0, 255, 0], [0, 255, 0]]
    ],
    [
        [[0, 255, 0], [0, 255, 0], [0, 255, 0]],
        [[0, 255, 0], [0, 255, 0], [0, 255, 0]]
    ]
])

sess = tf.Session()

print(image_batch.get_shape())

(2, 2, 3, 3)


In [9]:
sess.run(image_batch)[0]
sess.run(image_batch)[0][0]
sess.run(image_batch)

array([[[[  0, 255,   0],
         [  0, 255,   0],
         [  0, 255,   0]],

        [[  0, 255,   0],
         [  0, 255,   0],
         [  0, 255,   0]]],


       [[[  0, 255,   0],
         [  0, 255,   0],
         [  0, 255,   0]],

        [[  0, 255,   0],
         [  0, 255,   0],
         [  0, 255,   0]]]])

### input and kernel

In [10]:
input_batch = tf.constant([
    [
        [[0.0], [1.0]],
        [[2.0], [3.0]]
    ],
    [
        [[2.0], [4.0]],
        [[6.0], [8.0]]
    ]
])
kernel = tf.constant([
    [
        [[1.0, 2.0]]
    ]
])

In [11]:
conv2d = tf.nn.conv2d(input_batch, kernel, strides=[1,1,1,1], padding="SAME")
sess.run(conv2d)

array([[[[  0.,   0.],
         [  1.,   2.]],

        [[  2.,   4.],
         [  3.,   6.]]],


       [[[  2.,   4.],
         [  4.,   8.]],

        [[  6.,  12.],
         [  8.,  16.]]]], dtype=float32)

In [15]:
print(input_batch.get_shape())
print(_.shape)

(2, 2, 2, 1)
(2, 2, 2, 2)


### strides

useful to skip some points and reduce the dimension.

In [28]:
input_batch = tf.constant([
    [
        [[0.0], [1.0], [2.0], [3.0], [4.0], [5.0]],
        [[0.1], [1.1], [2.1], [3.1], [4.1], [5.1]],
        [[0.2], [1.2], [2.2], [3.2], [4.2], [5.2]],
        [[0.3], [1.3], [2.3], [3.3], [4.3], [5.3]],
        [[0.4], [1.4], [2.4], [3.4], [4.4], [5.4]],
        [[0.5], [1.5], [2.5], [3.5], [4.5], [5.5]],
    ],
])

kernel = tf.constant([
    [[[0.0]], [[0.5]], [[0.0]]],
    [[[0.0]], [[1.0]], [[0.0]]],
    [[[0.0]], [[0.5]], [[0.0]]],
])

conv2d = tf.nn.conv2d(input_batch, kernel, strides=[1, 1, 1, 1], padding="VALID")
sess.run(conv2d)


array([[[[ 2.20000005],
         [ 4.19999981],
         [ 6.19999981],
         [ 8.19999981]],

        [[ 2.4000001 ],
         [ 4.4000001 ],
         [ 6.4000001 ],
         [ 8.39999962]],

        [[ 2.5999999 ],
         [ 4.60000038],
         [ 6.60000038],
         [ 8.60000038]],

        [[ 2.79999995],
         [ 4.80000019],
         [ 6.80000019],
         [ 8.80000019]]]], dtype=float32)

### padding

Filling the missing area of the image.
In practice, "VALID" is preferred if it is the best, nor the "SAME" can be used to fit other situations.

### Common Layers

###### tf.nn.depthwise_conv2d: 
attaching the output of one convolution to the input of another convolution layer

###### tf.nn.separable_conv2d: 
it speeds up training without sacrificing accuracy. For small models, it will converge quickly with worse accuracy 

###### tf.nn.conv2d_transpose: 
This applies a kernel to a new feature map where each section is filled with the same values as the kernel. As the kernel strides over the new image, any overlapping sections are summed together

### Activation Functions

the activation functions must meet two conditions:
###### monotonic 单调的
###### differentiable 可导的


##### tf.nn.relu
ReLU suffer from neurons becoming saturated when too high of a learning rate is used

##### tf.nn.sigmoid
The reduced range of output values can cause trouble with input becoming saturated and changes in input becoming
exaggerated

##### tf.tanh
双曲正切函数，S型曲线，取值范围为[-1.0, 1.0]


##### tf.nn.dropout
This layer performs well in scenarios where a little randomness helps training. An example scenario is when there are patterns being learned that are too tied to their neighboring features

In [36]:
features = tf.constant([-0.1, 0.0, 0.1, 0.2])
sess.run([features, tf.nn.dropout(features, keep_prob=0.5)])

[array([-0.1,  0. ,  0.1,  0.2], dtype=float32),
 array([-0.2,  0. ,  0.2,  0. ], dtype=float32)]

### Pooling Layers
Pooling layers reduce over-fitting and improving performance by reducing the size of
the input.

##### tf.nn.max_pool 
Useful when the intensity of the input data is relevant to importance in the image
2x2 max-pooling operation is common used

##### tf.nn.avg_pool
Useful when reducing values where the entire kernel is important, for example, input tensors with a large width and height but small depth.


In [40]:
batch_size = 1
input_height = 3
input_width = 3
input_channels = 1

layer_input = tf.constant([
    [
        [[1.0], [0.2], [1.5]],
        [[0.1], [1.2], [1.4]],
        [[1.1], [0.4], [0.4]]
    ]
])

kernel = [batch_size, input_height, input_width, input_channels]
max_pool = tf.nn.max_pool(layer_input, kernel, [1,1,1,1], "VALID")
sess.run(max_pool)

avg_pool = tf.nn.avg_pool(layer_input, kernel, [1,1,1,1], "VALID")
sess.run(avg_pool)

array([[[[ 0.81111115]]]], dtype=float32)

### Normalization
Often useful to utilize some form of normalization to identify high-frequency features.


##### tf.nn.local_response_normalization (tf.nn.lrn)
Local response normalization normalizes values while taking into account the significance of each value

In [41]:
layer_input = tf.constant([
    [[[1.]], [[2.]], [[3.]]]
])

lrn = tf.nn.local_response_normalization(layer_input)
sess.run([layer_input, lrn])

[array([[[[ 1.]],
 
         [[ 2.]],
 
         [[ 3.]]]], dtype=float32), array([[[[ 0.70710677]],
 
         [[ 0.89442718]],
 
         [[ 0.94868326]]]], dtype=float32)]

### High Level Layers
useful to avoid duplicate code while following best practices

##### tf.contrib.layers.convolution2d
a wrapper for tf.conv2d

##### tf.contrib.layers.fully_connected
 for CNNs, the last layer is quite often fully connected. The tf.contrib.layers.fully_connected layer offers a great short-hand to create this last layer while following best practices

In [48]:
image_input = tf.constant([
    [
        [[0., 0., 0.], [255., 255., 255.], [254., 0., 0.]],
        [[0., 191., 0.], [3., 108., 233.], [0., 191., 0.]],
        [[254., 0., 0.], [255., 255., 255.], [0., 0., 0.]]
    ]
])

conv2d = tf.contrib.layers.convolution2d(
    image_input, 
    num_outputs=4,
    kernel_size=(1,1),
    activation_fn=tf.nn.relu,
    stride=(1,1),
    trainable=True
)
sess.run(tf.global_variables_initializer())
sess.run(conv2d)

array([[[[   0.        ,    0.        ,    0.        ,    0.        ],
         [   0.        ,  308.15246582,  161.81060791,   45.64549255],
         [   0.        ,   14.6790638 ,   73.46196747,    0.        ]],

        [[ 171.01591492,  143.91101074,    0.        ,   19.55454254],
         [  38.12561798,  174.09210205,  151.00965881,  174.52609253],
         [ 171.01591492,  143.91101074,    0.        ,   19.55454254]],

        [[   0.        ,   14.6790638 ,   73.46196747,    0.        ],
         [   0.        ,  308.15246582,  161.81060791,   45.64549255],
         [   0.        ,    0.        ,    0.        ,    0.        ]]]], dtype=float32)

In [51]:
features = tf.constant(
    [[1.2], [3.4]]
)
fc = tf.contrib.layers.fully_connected(features, num_outputs=2)
sess.run(tf.global_variables_initializer())
sess.run(fc)

array([[ 0.17464128,  1.00942397],
       [ 0.49481696,  2.8600347 ]], dtype=float32)