<a href="https://colab.research.google.com/github/NurFortuna/Deep_Learning_with_Tensorflow_notes/blob/main/Deep_Learning_applied_on_MNIST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install grpcio==1.24.3
!pip install tensorflow==2.2.0

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting grpcio==1.24.3
  Downloading grpcio-1.24.3-cp38-cp38-manylinux2010_x86_64.whl (2.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: grpcio
  Attempting uninstall: grpcio
    Found existing installation: grpcio 1.51.1
    Uninstalling grpcio-1.51.1:
      Successfully uninstalled grpcio-1.51.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
grpcio-status 1.48.2 requires grpcio>=1.48.2, but you have grpcio 1.24.3 which is incompatible.
google-cloud-bigquery 3.4.1 requires grpcio<2.0dev,>=1.47.0, but you have grpcio 1.24.3 which is incompatible.[0m[31m
[0mSuccessfully installed grpcio-1.24.3
Looking in indexes: https://pypi.org/simple, https://u

In [39]:
import tensorflow as tf
from IPython.display import Markdown, display

In [40]:
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [41]:
x_train, x_test = x_train / 255.0, x_test / 255.0

In [42]:
print("categorical labels")
print(y_train[0:5])

# make labels one hot encoded
y_train = tf.one_hot(y_train, 10)
y_test = tf.one_hot(y_test, 10)

print("one hot encoded labels")
print(y_train[0:5])

categorical labels
[5 0 4 1 9]
one hot encoded labels
tf.Tensor(
[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]], shape=(5, 10), dtype=float32)


In [44]:
#Initial parameters
width = 28 # width of the image in pixels 
height = 28 # height of the image in pixels
flat = width * height # number of pixels in one image 
class_output = 10 # number of possible classifications for the problem

In [45]:
#Converting images of the data set to tensors
x_image_train = tf.reshape(x_train, [-1,28,28,1])  
x_image_train = tf.cast(x_image_train, 'float32') 

x_image_test = tf.reshape(x_test, [-1,28,28,1]) 
x_image_test = tf.cast(x_image_test, 'float32') 

#creating new dataset with reshaped inputs
train_ds = tf.data.Dataset.from_tensor_slices((x_image_train, y_train)).batch(50)
test_ds = tf.data.Dataset.from_tensor_slices((x_image_test, y_test)).batch(50)

In [46]:
x_image_train = tf.slice(x_image_train,[0,0,0,0],[10000, 28, 28, 1])
y_train = tf.slice(y_train,[0,0],[10000, 10])

### Convolutional Layer 1 ###

We define a kernel here. The Size of the filter/kernel is 5x5; Input channels is 1 (grayscale); and we need 32 different feature maps (here, 32 feature maps means 32 different filters are applied on each image. So, the output of convolution layer would be 28x28x32). In this step, we create a filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels]

In [47]:
W_conv1 = tf.Variable(tf.random.truncated_normal([5, 5, 1, 32], stddev=0.1, seed=0))
b_conv1 = tf.Variable(tf.constant(0.1, shape=[32])) # need 32 biases for 32 outputs

tensor of shape [batch, in_height, in_width, in_channels]. x of shape [batch_size,28 ,28, 1]

a filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels]. W is of size [5, 5, 1, 32]

stride which is [1, 1, 1, 1]. The convolutional layer, slides the "kernel window" across the input tensor. As the input tensor has 4 dimensions: [batch, height, width, channels], then the convolution operates on a 2D window on the height and width dimensions. strides determines how much the window shifts by in each of the dimensions. As the first and last dimensions are related to batch and channels, we set the stride to 1. But for second and third dimension, we could set other values, e.g. [1, 2, 2, 1]

Process:

Change the filter to a 2-D matrix with shape [5*5*1,32]

Extracts image patches from the input tensor to form a virtual tensor of shape [batch, 28, 28, 5*5*1].

For each batch, right-multiplies the filter matrix and the image vector.

In [48]:
def convolve1(x):
    return(
        tf.nn.conv2d(x, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1)

![resim](https://www.nomidl.com/wp-content/uploads/2022/04/image-10.png)

In [49]:
#Apply the ReLU activation Function
def h_conv1(x): return(tf.nn.relu(convolve1(x)))

 ### Apply the max pooling ###
 It partitions the input image into a set of rectangles and, and then find the maximum value for that region.

 Lets use tf.nn.max_pool function to perform max pooling. Kernel size: 2x2 (if the window is a 2x2 matrix, it would result in one output pixel)

 Strides: dictates the sliding behaviour of the kernel. In this case it will move 2 pixels everytime, thus not overlapping. The input is a matrix of size 28x28x32, and the output would be a matrix of size 14x14x32.

 ![resim](https://production-media.paperswithcode.com/methods/MaxpoolSample2.png)

In [50]:
def conv1(x):
    return tf.nn.max_pool(h_conv1(x), ksize=[1, 2, 2, 1], 
                          strides=[1, 2, 2, 1], padding='SAME')

### Convolutional Layer 2 ###

Filter/kernel: 5x5 (25 pixels)

Input channels: 32 (from the 1st Conv layer, we had 32 feature maps)

64 output feature maps

In [51]:
W_conv2 = tf.Variable(tf.random.truncated_normal([5, 5, 32, 64], stddev=0.1, seed=1))
b_conv2 = tf.Variable(tf.constant(0.1, shape=[64])) #need 64 biases for 64 outputs

In [52]:
#Convolve image with weight tensor and add biases.
def convolve2(x): 
    return( 
    tf.nn.conv2d(conv1(x), W_conv2, strides=[1, 1, 1, 1], padding='SAME') + b_conv2)

In [53]:
#Apply the ReLU activation Function
def h_conv2(x):  return tf.nn.relu(convolve2(x))

In [54]:
#Apply the max pooling
def conv2(x):  
    return(
    tf.nn.max_pool(h_conv2(x), ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME'))

#### Fully Connected Layer ####

You need a fully connected layer to use the Softmax and create the probabilities in the end. Fully connected layers take the high-level filtered images from previous layer, that is all 64 matrices, and convert them to a flat array.

So, each matrix [7x7] will be converted to a matrix of [49x1], and then all of the 64 matrix will be connected, which make an array of size [3136x1]. We will connect it into another layer of size [1024x1]. So, the weight between these 2 layers will be [3136x1024]


In [55]:
#Flattening Second Layer
def layer2_matrix(x): return tf.reshape(conv2(x), [-1, 7 * 7 * 64])

In [56]:
#Weights and Biases between layer 2 and 3
W_fc1 = tf.Variable(tf.random.truncated_normal([7 * 7 * 64, 1024], stddev=0.1, seed = 2))
b_fc1 = tf.Variable(tf.constant(0.1, shape=[1024])) # need 1024 biases for 1024 outputs

In [57]:
#Matrix Multiplication (applying weights and biases)
def fcl(x): return tf.matmul(layer2_matrix(x), W_fc1) + b_fc1

In [58]:
#Apply the ReLU activation Function
def h_fc1(x): return tf.nn.relu(fcl(x))

#### Dropout Layer, Optional phase for reducing overfitting ####

It is a phase where the network "forget" some features. At each training step in a mini-batch, some units get switched off randomly so that it will not interact with the network. That is, it weights cannot be updated, nor affect the learning of the other network nodes. This can be very useful for very large neural networks to prevent overfitting.

In [59]:
keep_prob=0.5
def layer_drop(x): return tf.nn.dropout(h_fc1(x), keep_prob)

In [60]:
#Softmax Layer
#Input channels: 1024 (neurons from the 3rd Layer); 10 output features

In [61]:
W_fc2 = tf.Variable(tf.random.truncated_normal([1024, 10], stddev=0.1, seed = 2)) #1024 neurons
b_fc2 = tf.Variable(tf.constant(0.1, shape=[10])) # 10 possibilities for digits [0,1,2,3,4,5,6,7,8,9]

In [62]:
#Matrix Multiplication (applying weights and biases)
def fc(x): return tf.matmul(layer_drop(x), W_fc2) + b_fc2

In [63]:
#Apply the Softmax activation Function
def y_CNN(x): return tf.nn.softmax(fc(x))

#### Define the loss function ####

In [64]:
def cross_entropy(y_label, y_pred):
    return (-tf.reduce_sum(y_label * tf.math.log(y_pred + 1.e-10)))

#### Define the optimizer ####

It is obvious that we want minimize the error of our network which is calculated by cross_entropy metric. To solve the problem, we have to compute gradients for the loss (which is minimizing the cross-entropy) and apply gradients to variables. It will be done by an optimizer: GradientDescent or Adagrad.

In [65]:
optimizer = tf.keras.optimizers.Adam(1e-4)

In [66]:
variables = [W_conv1, b_conv1, W_conv2, b_conv2, 
             W_fc1, b_fc1, W_fc2, b_fc2, ]

def train_step(x, y):
    with tf.GradientTape() as tape:
        current_loss = cross_entropy( y, y_CNN( x ))
        grads = tape.gradient( current_loss , variables )
        optimizer.apply_gradients( zip( grads , variables ) )
        return current_loss.numpy()


In [67]:
#Define prediction
correct_prediction = tf.equal(tf.argmax(y_CNN(x_image_train), axis=1), tf.argmax(y_train, axis=1))

In [68]:
#Define accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float32'))

In [70]:
#train
loss_values=[]
accuracies = []
epochs = 1

for i in range(epochs):
    j=0
    # each batch has 50 examples
    for x_train_batch, y_train_batch in train_ds:
        j+=1
        current_loss = train_step(x_train_batch, y_train_batch)
        if j%50==0: #reporting intermittent batch statistics
            correct_prediction = tf.equal(tf.argmax(y_CNN(x_train_batch), axis=1),
                                  tf.argmax(y_train_batch, axis=1))
            #  accuracy
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)).numpy()
            print("epoch ", str(i), "batch", str(j), "loss:", str(current_loss),
                     "accuracy", str(accuracy)) 
            
    current_loss = cross_entropy( y_train, y_CNN( x_image_train )).numpy()
    loss_values.append(current_loss)
    correct_prediction = tf.equal(tf.argmax(y_CNN(x_image_train), axis=1),
                                  tf.argmax(y_train, axis=1))
    #  accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)).numpy()
    accuracies.append(accuracy)
    print("end of epoch ", str(i), "loss", str(current_loss), "accuracy", str(accuracy) )  

epoch  0 batch 50 loss: 93.19175 accuracy 0.54
epoch  0 batch 100 loss: 50.721977 accuracy 0.8
epoch  0 batch 150 loss: 28.514547 accuracy 0.78
epoch  0 batch 200 loss: 20.925966 accuracy 0.86
epoch  0 batch 250 loss: 31.7288 accuracy 0.84
epoch  0 batch 300 loss: 19.00697 accuracy 0.9
epoch  0 batch 350 loss: 30.449793 accuracy 0.84
epoch  0 batch 400 loss: 15.62064 accuracy 0.86
epoch  0 batch 450 loss: 23.445072 accuracy 0.84
epoch  0 batch 500 loss: 12.635733 accuracy 0.86
epoch  0 batch 550 loss: 7.8887844 accuracy 0.94
epoch  0 batch 600 loss: 21.098469 accuracy 0.86
epoch  0 batch 650 loss: 22.575092 accuracy 0.9
epoch  0 batch 700 loss: 9.838955 accuracy 0.9
epoch  0 batch 750 loss: 15.91292 accuracy 0.9
epoch  0 batch 800 loss: 16.354631 accuracy 0.94
epoch  0 batch 850 loss: 19.517063 accuracy 0.9
epoch  0 batch 900 loss: 9.14138 accuracy 0.9
epoch  0 batch 950 loss: 11.600172 accuracy 0.94
epoch  0 batch 1000 loss: 7.836789 accuracy 0.92
epoch  0 batch 1050 loss: 5.4864144 a

In [72]:
#Evaluate the model
j=0
acccuracies=[]
# evaluate accuracy by batch and average...reporting every 100th batch
for x_train_batch, y_train_batch in train_ds:
        j+=1
        correct_prediction = tf.equal(tf.argmax(y_CNN(x_train_batch), axis=1),
                                  tf.argmax(y_train_batch, axis=1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)).numpy()
        #accuracies.append(accuracy)
        if j%100==0:
            print("batch", str(j), "accuracy", str(accuracy) ) 
import numpy as np
print("accuracy of entire set", str(np.mean(accuracies)))    

batch 100 accuracy 0.96
batch 200 accuracy 0.92
batch 300 accuracy 0.96
batch 400 accuracy 0.98
batch 500 accuracy 0.94
batch 600 accuracy 0.94
batch 700 accuracy 0.98
batch 800 accuracy 0.96
batch 900 accuracy 0.92
batch 1000 accuracy 0.96
batch 1100 accuracy 0.9
batch 1200 accuracy 0.94
accuracy of entire set 0.945
