This page will introduce the basic knowledge about CNN, with the implementation in tensorflow. The simple flow is shown below:

<img src="simple_CNN_flow.png" width="70%">

In this flow, we list the paramters as flows:

<img src="paramter_in_each_layer.png" width="70%">



In each CNN layer, we give the related tensorflow functions:
convolution layer:      tf.nn.conv2d
activation layer:       tf.nn.relu
pooling layer:          tf.nn.max_pool
full-connected layer:   tf.matmul(x, W) + b

1. conv2d usage

filter is [filter_height,filter_width,in_channels,out_channels]：height, width, image channels, num of filters. Note that it is completely different from input as [batch,in_height,in_width,in_channels]
It is very hard to understand that the conversion of 4-D filter. Here lists some figures to elaborate this conversion.

<img src="multi_D_filter.png" width="70%">


The filter matrix:
<img src="4D_filter.png" width="60%">

One red column means one filter, so two columns mean two filters. The green and yellow columns mean two channels. (very hard to understand the dim conversion)

The result:
<img src="conv2D_result.png" width="60%">


Same here, one red column means one filter result.


In [7]:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
import numpy as np
tfe.enable_eager_execution()

# input data
data = np.array([0, 1, 1, 2, 3, 2, 1, 2], dtype=np.float32)
#reshaped_data = data.reshape(data, [2, 2, 2, 2])
reshaped_data = data.reshape([1, 2, 2, 2])

print("input data: ")
print(reshaped_data)

# filter
filter = np.array([1, 1, 1, 1, 0, 0, 0, 0], dtype=np.float32)
reshaped_filter = filter.reshape([1, 2, 2, 2])

print("filter: ")
print(reshaped_filter)

# conv2d(input, filter, stribes, padding, use_cudnn_on_gpu(default true), data_format(NHWC(default) or NCHW), name)
# N: number of batchs H:height W: weight C: channels
convolution = tf.nn.conv2d(reshaped_data, reshaped_filter, [1, 1, 1, 1], padding="SAME")


print("convolution: ")
print(convolution)



input data: 
[[[[0. 1.]
   [1. 2.]]

  [[3. 2.]
   [1. 2.]]]]
filter: 
[[[[1. 1.]
   [1. 1.]]

  [[0. 0.]
   [0. 0.]]]]
convolution: 
tf.Tensor(
[[[[1. 1.]
   [3. 3.]]

  [[5. 5.]
   [3. 3.]]]], shape=(1, 2, 2, 2), dtype=float32)


2. sigmoid and Tanh usage
-> sigmoid(x) = 1/(1+e^(-x))
-> Tanh(x) = Sinh(x)/Conh(x) = (e^(x) - e^(-x))/(e^(x) + e^(-x)) = 2 sigmoid(2x)-1
-> ReLU: f(y)=0 when y<0 f(y)=y when y>0

In [23]:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
import numpy as np
tfe.enable_eager_execution()

x = tf.range(-1, 3)
y = tf.to_float(x)

z = tf.sigmoid(y)

print(z)

m = tf.tanh(y)

print(m)

n = tf.nn.relu(y)
print(n)


tf.Tensor([0.26894143 0.5        0.7310586  0.8807971 ], shape=(4,), dtype=float32)
tf.Tensor([-0.7615942  0.         0.7615942  0.9640276], shape=(4,), dtype=float32)
tf.Tensor([0. 0. 1. 2.], shape=(4,), dtype=float32)


3. dropout usage
-> keep_prob: keep prob, and the left elements should multiply 1/keep_prob
-> noise_shape: if the input data is [k,l,m,n], the noise shape is [k,1,1,n], then the data should dropout at first dimension and fourth dimension.

In [29]:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
import numpy as np
tfe.enable_eager_execution()

a = tf.constant([
                    [12, 11, 10, 9],
                    [8, 7, 6, 5],
                    [4, 3, 2, 1],
                    [-1, -2, -3, -4]
])

a = tf.to_float(a)
print(a, a.shape)

print("#####random dropout#######")
print(tf.nn.dropout(a, 0.2))

print("#####first dim############")
print(tf.nn.dropout(a, 0.4, noise_shape=[4, 1]))

print("#####second dim###########")
print(tf.nn.dropout(a, 0.4, noise_shape=[1, 4]))



W0707 15:21:10.445700 10624 nn_ops.py:4224] Large dropout rate: 0.8 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.
W0707 15:21:10.445700 10624 nn_ops.py:4224] Large dropout rate: 0.6 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.


tf.Tensor(
[[12. 11. 10.  9.]
 [ 8.  7.  6.  5.]
 [ 4.  3.  2.  1.]
 [-1. -2. -3. -4.]], shape=(4, 4), dtype=float32) (4, 4)
#####random dropout#######
tf.Tensor(
[[  0.  55.   0.   0.]
 [  0.   0.   0.   0.]
 [  0.   0.   0.   0.]
 [ -0. -10.  -0.  -0.]], shape=(4, 4), dtype=float32)
#####first dim############
tf.Tensor(
[[  0.    0.    0.    0. ]
 [  0.    0.    0.    0. ]
 [  0.    0.    0.    0. ]
 [ -2.5  -5.   -7.5 -10. ]], shape=(4, 4), dtype=float32)
#####second dim###########
tf.Tensor(
[[  0.    0.   25.   22.5]
 [  0.    0.   15.   12.5]
 [  0.    0.    5.    2.5]
 [ -0.   -0.   -7.5 -10. ]], shape=(4, 4), dtype=float32)


4. max_pool/avg_pool usage
-> ksize: [1, height, width, 1] pooling window size
-> strides: [1, stride, stride, 1]
-> padding: valid, same


In [13]:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
import numpy as np
tfe.enable_eager_execution()

a = tf.constant([
                  [
                    [[12], [11], [10], [9]],
                    [[8], [7], [6], [5]],
                    [[4], [3], [2], [1]],
                    [[-1], [-2], [-3], [-4]]
                  ]
                ])


a = tf.to_float(a)
#print(a, a.shape)


b = tf.nn.max_pool(a, [1,2,2,1], [1,2,2,1], padding="VALID")

print(a.shape, b.shape)



(1, 4, 4, 1) (1, 2, 2, 1)


Normalization:
是为了克服神经网络层数加深导致难以训练而诞生的一个算法。根据ICS理论，当训练集的样本数据和目标样本集分布不一致的时候，训练得到的模型无法很好的泛化。
而在神经网络中，每一层的输入在经过层内操作之后必然会导致与原来对应的输入信号分布不同,并且前层神经网络的增加会被后面的神经网络不对的累积放大。这个问题的一个解决思路就是根据训练样本与目标样本的比例对训练样本进行一个矫正，而BN算法（批标准化）则可以用来规范化某些层或者所有层的输入，从而固定每层输入信号的均值与方差。

If no normalization, when the layer becomes more, the data will lose the its effectiveness. So before the data going to next layer, we normalize it to make the data located in the sensitive area of activation function. Then data can be kept effective after many layers. 

<img src="batch_norm.jpg" width="70%">

epsilon(ε) is a small number to avoid divider equals to zero.

批标准化一般用在非线性映射（激活函数）之前，对y= Wx + b进行规范化，是结果(输出信号的各个维度)的均值都为0,方差为1,让每一层的输入有一个稳定的分布会有利于网络的训练
<img src="batch_norm_layer.png" width="50%">



In [41]:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
import numpy as np
tfe.enable_eager_execution()


img_shape= [128, 32, 32, 64]
Wx_plus_b = tf.Variable(tf.random_normal(img_shape))
axis = list(range(len(img_shape)-1)) # [0,1,2] 
wb_mean, wb_var = tf.nn.moments(Wx_plus_b, axis) # cal mean and variance

print(wb_mean)
print(wb_var)

variance_epsilon =  0.001
offset_B = tf.Variable(tf.ones([64]))
scale_r = tf.Variable(tf.ones([64]))

Wx_plus_b_normalized = tf.nn.batch_normalization(Wx_plus_b, wb_mean, wb_var, offset_B, scale_r, variance_epsilon)
print(Wx_plus_b_normalized)





tf.Tensor(
[ 3.8289744e-03 -5.6336448e-04  1.4799552e-03  5.7142577e-04
  4.7703541e-04 -2.0087755e-04  6.9286919e-04  1.1301619e-03
  1.3710989e-05 -8.5227739e-04  3.0588850e-03  1.3085750e-03
 -6.7887345e-04  1.5129700e-03  3.1190203e-03 -3.7281897e-03
 -2.3043153e-03  1.9120270e-03  1.3753374e-03 -1.5845909e-03
 -3.1477124e-03  1.1667183e-03  1.5678816e-06  4.1462742e-03
 -1.5225249e-03  1.5106871e-03  1.7305558e-03  5.9000710e-03
 -2.1053595e-03  1.3912013e-03 -1.8897259e-03 -3.6140513e-03
  5.8038225e-03  3.0917746e-03 -4.6636704e-03  6.2597808e-03
 -2.1593431e-03  9.2474639e-04  1.4937918e-03  4.0106038e-03
 -4.3696724e-04 -2.2082906e-03  2.2996292e-03 -1.9521383e-04
  3.8914609e-04  1.6991515e-04 -1.5982927e-03 -1.8002823e-03
 -2.5288896e-03  5.9729507e-03  1.8585534e-03  1.1368748e-03
 -3.2065357e-03 -3.6522895e-03 -4.1670967e-03  2.8582634e-03
  1.3978590e-04 -2.0171325e-03  1.4095200e-03  5.9454900e-04
 -4.8744190e-04  8.1463857e-04 -1.3336164e-03 -1.9655142e-03], shape=(64,)

<img src="batch_norm_example.png" width="50%">



Now we can have a complete CNN example. We use CNN to do the MNIST classification.

In [11]:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
import numpy as np

tfe.enable_eager_execution()
from tensorflow.examples.tutorials.mnist import input_data

class MNIST:
    def __init__(self):
        self.mnist = input_data.read_data_sets('data/MNIST_data/', one_hot=True)
        
        self.train_ds = tf.data.Dataset.from_tensor_slices((self.mnist.train.images, self.mnist.train.labels))\
                        .map(lambda x, y: (x, tf.cast(y, tf.float32)))\
                        .shuffle(buffer_size=1000)\
                        .batch(100)
        
        self.test_ds = tf.data.Dataset.from_tensor_slices((self.mnist.test.images, self.mnist.test.labels))\
                        .map(lambda x, y: (x, tf.cast(y, tf.float32)))\
                        .shuffle(buffer_size=1000)\
                        .batch(100)
        
        self.filter1 = tf.get_variable(name="filter1", shape=[5, 5, 1, 32], dtype=tf.float32)
        self.bias1 = tf.get_variable(name="bias1", shape=[32], dtype=tf.float32)
              
        self.filter2 = tf.get_variable(name="filter2", shape=[5, 5, 32, 64], dtype=tf.float32)
        self.bias2 = tf.get_variable(name="bias2", shape=[64], dtype=tf.float32)

        self.Weight_fc = tf.get_variable(name="full_connection_weight", shape=[7 * 7 * 64, 1024], dtype=tf.float32)
        self.Bias_fc = tf.get_variable(name="full_connection_bias", shape=[1024], dtype=tf.float32)
 
        self.Weight_sm = tf.get_variable(name="softmax_weight", shape=[1024, 10], dtype=tf.float32)
        self.Bias_sm = tf.get_variable(name="softmax_bias", shape=[10], dtype=tf.float32)
               
    def cnn_model(self, image_batch):
        # image_batch: 100 * 784
        input_image = tf.reshape(image_batch, [-1, 28, 28, 1])
        # construct first convolution layer
        # step 1: initial filter
        #input_image = tf.zeros([1, 28, 28, 1], dtype=tf.float32)
        input_image = tf.to_float(input_image)
        conv1_result = tf.nn.conv2d(input_image, self.filter1, strides=[1, 1, 1, 1], padding="SAME")
        relu_result = tf.nn.relu(conv1_result + self.bias1)
        #print(relu_result.shape)
        max_pool_result = tf.nn.max_pool(relu_result, [1, 2, 2, 1], [1, 2, 2, 1], padding="SAME")
        #print(max_pool_result.shape) # (1, 14, 14, 32)

        # construct second convolution layer.
        # step 2: second filter
        conv2_result = tf.nn.conv2d(max_pool_result, self.filter2, strides=[1, 1, 1, 1], padding="SAME")
        relu2_result = tf.nn.relu(conv2_result + self.bias2)
        #print(relu2_result.shape)
        max_pool2_result = tf.nn.max_pool(relu2_result, [1, 2, 2, 1], [1, 2, 2, 1], padding="SAME")
        #print(max_pool2_result.shape)

        # The difficulty is the dimension. Need to think carefully each time.

        # full connection layer
        full_connection_input = tf.reshape(max_pool2_result, [-1, 7 * 7 * 64])
        full_connection_output = tf.nn.relu(tf.matmul(full_connection_input, self.Weight_fc) + self.Bias_fc)
        #print(full_connection_output.shape)

        # dropout and softmax 
        keep_prob = 0.5
        softmax_input = tf.nn.dropout(full_connection_output, keep_prob)
        y_conv = tf.nn.softmax(tf.matmul(softmax_input, self.Weight_sm) + self.Bias_sm)
        #print(y_conv, y_conv.shape)

        return y_conv
        
    def cross_entropy(self, image_batch, label_batch):
        y = self.cnn_model(image_batch)
        loss = tf.reduce_mean(-tf.reduce_sum(label_batch * tf.log(y), 1))
        
        return loss
        
    def cal_gradient(self, image_batch, label_batch):
        grad = tfe.implicit_value_and_gradients(self.cross_entropy)
        
        return grad(image_batch, label_batch)    
    
    def train(self):
        optimizer = tf.train.GradientDescentOptimizer(0.5)
        #print("hello")
        for step, (image_batch, label_batch) in enumerate(tfe.Iterator(self.train_ds)):
            loss, grads_and_vars = self.cal_gradient(image_batch, label_batch)
            train_step = tf.train.GradientDescentOptimizer(0.5).apply_gradients(grads_and_vars) # learning rate is 0.5
            print("step: {} loss: {}".format(step, loss.numpy()))

    def predict(self):
        print(self.mnist.test.images.shape)
        for step, (image_batch, label_batch) in enumerate(tfe.Iterator(self.test_ds)):
            y = self.cnn_model(image_batch)
            correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(label_batch, 1)) # [true, true, false,..., true] boolen type
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))    # [1, 1, 0, ..., 1] with cast to convert
            
            print("test accuracy = {}".format(accuracy.numpy()))
           
    
if __name__ == '__main__':
    mnist_model = MNIST()
    mnist_model.train()
    mnist_model.predict()
    
    
    

Extracting data/MNIST_data/train-images-idx3-ubyte.gz
Extracting data/MNIST_data/train-labels-idx1-ubyte.gz
Extracting data/MNIST_data/t10k-images-idx3-ubyte.gz
Extracting data/MNIST_data/t10k-labels-idx1-ubyte.gz
step: 0 loss: 2.4305596351623535
step: 1 loss: 2.605008602142334
step: 2 loss: 2.4038338661193848
step: 3 loss: 2.3041231632232666
step: 4 loss: 2.376779079437256
step: 5 loss: 2.3150153160095215
step: 6 loss: 2.317183017730713
step: 7 loss: 2.2991952896118164
step: 8 loss: 2.2660367488861084
step: 9 loss: 2.29044246673584
step: 10 loss: 2.233370065689087
step: 11 loss: 2.174989700317383
step: 12 loss: 2.1273531913757324
step: 13 loss: 2.019120693206787
step: 14 loss: 1.9836868047714233
step: 15 loss: 2.243748188018799
step: 16 loss: 2.1470084190368652
step: 17 loss: 1.798295259475708
step: 18 loss: 1.791310429573059
step: 19 loss: 2.3259949684143066
step: 20 loss: 2.201274871826172
step: 21 loss: 2.0483617782592773
step: 22 loss: 1.892544150352478
step: 23 loss: 2.1100966930

step: 230 loss: 0.055690936744213104
step: 231 loss: 0.2711726129055023
step: 232 loss: 0.11379889398813248
step: 233 loss: 0.11739370971918106
step: 234 loss: 0.07144855707883835
step: 235 loss: 0.160944402217865
step: 236 loss: 0.26357725262641907
step: 237 loss: 0.10892410576343536
step: 238 loss: 0.1572379171848297
step: 239 loss: 0.19911348819732666
step: 240 loss: 0.33268195390701294
step: 241 loss: 0.4183018207550049
step: 242 loss: 0.38179272413253784
step: 243 loss: 0.20210213959217072
step: 244 loss: 0.14793500304222107
step: 245 loss: 0.16565610468387604
step: 246 loss: 0.11878971755504608
step: 247 loss: 0.23818595707416534
step: 248 loss: 0.24248270690441132
step: 249 loss: 0.15782903134822845
step: 250 loss: 0.08866970986127853
step: 251 loss: 0.2175481915473938
step: 252 loss: 0.07118186354637146
step: 253 loss: 0.0911497175693512
step: 254 loss: 0.18688569962978363
step: 255 loss: 0.18400244414806366
step: 256 loss: 0.2608906626701355
step: 257 loss: 0.2703910171985626


step: 460 loss: 0.13307665288448334
step: 461 loss: 0.10938932746648788
step: 462 loss: 0.03443401679396629
step: 463 loss: 0.06547408550977707
step: 464 loss: 0.03877031430602074
step: 465 loss: 0.023705828934907913
step: 466 loss: 0.10590185225009918
step: 467 loss: 0.0684177428483963
step: 468 loss: 0.07310835272073746
step: 469 loss: 0.0306281466037035
step: 470 loss: 0.21698857843875885
step: 471 loss: 0.2126525491476059
step: 472 loss: 0.292464017868042
step: 473 loss: 0.12313944101333618
step: 474 loss: 0.09717605262994766
step: 475 loss: 0.07198337465524673
step: 476 loss: 0.1158265620470047
step: 477 loss: 0.08269722014665604
step: 478 loss: 0.09573327004909515
step: 479 loss: 0.08865087479352951
step: 480 loss: 0.11767527461051941
step: 481 loss: 0.04076649621129036
step: 482 loss: 0.12104841321706772
step: 483 loss: 0.12847387790679932
step: 484 loss: 0.021216420456767082
step: 485 loss: 0.07959320396184921
step: 486 loss: 0.04996421933174133
step: 487 loss: 0.02918371744453