![alt text](https://adeshpande3.github.io/assets/Cover.png)
# *Reference*： https://www.tensorflow.org/tutorials/quickstart/advanced




# **Import Libraries**

In [0]:
import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPool2D, Dropout
from tensorflow.keras import Model

# **Load the dataset and split it**<br>
There are 60,000 images in the training set, 10,000 images in the test set<br> Each image is represented as 28 x 28 pixels<br>
x_train.shape = (60000, 28, 28)<br>
x_test.shape = (10000, 28, 28)

In [0]:
dataset = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = dataset.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black).<br>
Normalize the data, turn the integers in x_train and x_test into floating points between 0-1









In [0]:
x_train, x_test = x_train/255.0, x_test/255.0

The Shape of an Image Tensor: [Sample, length, width, Channel]<br>
Sample : number of images<br>
Channel: number of color channels<br>
Since the gray-scale picture is read here, the number of color channels is 1 (the number of color channels of a color RGB image is 3)<br>
x_train.shape = (60000, 28, 28, 1)<br>
x_test.shape = (10000, 28, 28, 1)

In [0]:
x_train = x_train[..., tf.newaxis]  
x_test = x_test[..., tf.newaxis]

# **Data preprocessing: shuffle the dataset and batch**
tf.data.Dataset.from_tensor_slices(), suitable for the case where the amount of data is small (it can be loaded into memory as a whole)<br>
When providing multiple tensors as input, the 0th dimension of tensors must be the same, and multiple tensors must be used as tuples<br>

In [0]:
batch_size = 128
train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle(10000).batch(batch_size)
test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(batch_size)

# **Build the model**

# **Zero Padding**<br> 
With Zero Padding, the convolution layer is the same size as the original
image (or the previous layer).<br>
Kernel size (Receptive field): [4, 4]<br>
![alt text](https://xrds.acm.org/blog/wp-content/uploads/2016/06/Figure_3.png)

# **Max Pooling**<br> 
MaxPooling is the process of downsampling the image. For all values in each sliding window, the maximum value is output.
![alt text](https://cdn-images-1.medium.com/freeze/max/1000/1*GksqN5XY8HPpIddm5wzm7A.jpeg?q=20)

# **Activation functions**<br>
The activation function is to increase the nonlinearity of the neural network model. Each layer without an activation function is equivalent to matrix multiplication

# **Why ReLU?**
1.   When using functions such as sigmoid, the calculation amount is large, and using the Relu activation function, the calculation amount is saved a lot.
2.   For deep networks, when the sigmoid function is back-propagated, it is easy for the gradient to disappear (when the sigmoid is close to the saturation region, the transformation is too slow, the derivative tends to 0, this situation will cause information loss), and the deep layer cannot be completed Network training.
3.   Relu will make the output of some neurons to 0, which causes the sparsity of the network, and reduces the interdependence of parameters, alleviating the occurrence of overfitting problems.




![alt text](https://miro.medium.com/max/1192/1*4ZEDRpFuCIpUjNgjDdT2Lg.png)

In [0]:
class CNN(Model):
    def __init__(self):
        super().__init__()
        self.conv1 = Conv2D(
            filters=64,     # Number of convolutional neurons (convolution kernel)      
            kernel_size=[5, 5], # Receptive Field    
            padding='same',   # zero padding    
            activation=tf.nn.relu   
        )
        self.max_pool1 = MaxPool2D(pool_size=[2, 2], strides=2)
        self.conv2 = Conv2D(
            filters=128,
            kernel_size=[5, 5],
            padding='same',
            activation=tf.nn.relu
        )
        self.max_pool2 = MaxPool2D(pool_size=[2, 2], strides=2)
        self.flatten = Flatten()
        self.dense1 = Dense(units=1024, activation=tf.nn.relu)
        self.dropout = Dropout(rate = 0.5) # prevent overfitting
        self.dense2 = Dense(units=10)

    def call(self, inputs, training):
        x = self.conv1(inputs)    # [batch_size, 28, 28, 32]
        x = self.max_pool1(x)     # [batch_size, 14, 14, 32]
        x = self.conv2(x)       # [batch_size, 14, 14, 64]
        x = self.max_pool2(x)     # [batch_size, 7, 7, 64]
        x = self.flatten(x)      # [batch_size, 7 * 7 * 64]
        x = self.dense1(x)      # [batch_size, 1024] 
        if(training == True):
            x = self.dropout(x)  # [batch_size, 1024]
        x = self.dense2(x)      # [batch_size, 10], 10 represents the probability that this picture belongs to 0 to 9
        output = tf.nn.softmax(x)  # After using softmax, each element in this vector is between [0, 1] and the sum of all elements of this vector is 1
        return output

model = CNN()
# model.summary()

# **Train and test the model**

tf.keras.losses.SparseCategoricalCrossentropy(): the closer the predicted probability distribution and the true distribution are, the smaller the value of cross-entropy and vice versa (the function expects labels to be provided as integers)<br>
Note: Using from_logits=True may be more numerically stable

In [0]:
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
learning_rate = 0.001
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

tf.keras.metrics.SparseCategoricalAccuracy(): compare the predicted results of the model with the real results, and output the ratio of the number of samples predicted correctly to the total number of samples

In [0]:
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')

test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')

tf.GradientTape() can record the calculation steps and operations in its context and be used for automatic derivation
1.   Use tape.gradient() to automatically calculate the gradient
2.   Use optimizer.apply_gradients() to automatically update model parameters

In [0]:
@tf.function
def train_step(images, labels):
    with tf.GradientTape() as tape:
    # training=True is only needed if there are layers with different
    # behavior during training versus inference (e.g. Dropout).
        predictions = model(images, training=True)
        loss = loss_object(labels, predictions)

    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    
    train_loss(loss)
    train_accuracy(labels, predictions)

In [0]:
@tf.function
def test_step(images, labels):
    # training=False is only needed if there are layers with different
    # behavior during training versus inference (e.g. Dropout).
    predictions = model(images, training=False)
    loss = loss_object(labels, predictions)

    test_loss(loss)
    test_accuracy(labels, predictions)

In [0]:
num_epoch = 5

for epoch in range(num_epoch):
    # Reset the metrics at the start of the next epoch
    train_loss.reset_states()
    train_accuracy.reset_states()
    test_loss.reset_states()
    test_accuracy.reset_states()

    for train_images, train_labels in train_ds:
        train_step(train_images, train_labels)

    for test_images, test_labels in test_ds:
        test_step(test_images, test_labels)
        
    template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
    print(template.format(epoch+1, train_loss.result(), train_accuracy.result()*100, test_loss.result(), test_accuracy.result()*100))

print('End')



To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

Epoch 1, Loss: 1.530685544013977, Accuracy: 93.25833129882812, Test Loss: 1.4822640419006348, Test Accuracy: 97.93999481201172
Epoch 2, Loss: 1.4836128950119019, Accuracy: 97.79666900634766, Test Loss: 1.4737579822540283, Test Accuracy: 98.79000091552734
Epoch 3, Loss: 1.47808039188385, Accuracy: 98.3316650390625, Test Loss: 1.4733924865722656, Test Accuracy: 98.77999877929688
Epoch 4, Loss: 1.4762775897979736, Accuracy: 98.50833129882812, Test Loss: 1.4718986749649048, Test Accuracy: 98.94999694824219
Epoch 5, Loss: 1.4743982553482056, Accuracy: 98.67666625976562, Test Loss: 1.4702008962631226, Test Accuracy: 99.0999984741211
End
