# Deep Neural Network for MNIST Classification

We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

### Importing Packages

In [4]:
# !pip install tensorflow-datasets

import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds

### Data

In [5]:
# as_supervised splits the data into training and testing as we can use it easily
# with_info gives information about the dataset that we can use 

mnist_data, mnist_info  = tfds.load(name='mnist', with_info=True, as_supervised=True)

In [6]:
mnist_info

tfds.core.DatasetInfo(
    name='mnist',
    full_name='mnist/3.0.1',
    description="""
    The MNIST database of handwritten digits.
    """,
    homepage='http://yann.lecun.com/exdb/mnist/',
    data_dir='/home/alexender/tensorflow_datasets/mnist/3.0.1',
    file_format=tfrecord,
    download_size=11.06 MiB,
    dataset_size=21.00 MiB,
    features=FeaturesDict({
        'image': Image(shape=(28, 28, 1), dtype=uint8),
        'label': ClassLabel(shape=(), dtype=int64, num_classes=10),
    }),
    supervised_keys=('image', 'label'),
    disable_shuffling=False,
    splits={
        'test': <SplitInfo num_examples=10000, num_shards=1>,
        'train': <SplitInfo num_examples=60000, num_shards=1>,
    },
    citation="""@article{lecun2010mnist,
      title={MNIST handwritten digit database},
      author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
      journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
      volume={2},
      year={2010}
    }""",
)

In [7]:
# We can split the train and test sets from data 

mnist_train, mnist_test = mnist_data['train'], mnist_data['test']

In [8]:
# We need validation samples so we are spliting training data using the info we have from the dataset and converting it into int64 using TF

num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

In [9]:
def scale(image, label):
    """Now we will generally scale our inputs(eg: from 0 to 1) we will write a function.
    It will take mnist image and its label as input and cast the images as float"""
    image = tf.cast(image, tf.float32)
    
    # To scale our image we have to divide it into 255 because each image consist of values from 0 to 255 in each cell in 28*28 matrix
    
    image = image/255.
    return image, label
    

In [10]:
scaled_train_and_validation_data = mnist_train.map(scale)
scaled_test_data = mnist_test.map(scale)

In [11]:
# Shuffling the data

BUFFER_SIZE = 10000
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

In [12]:
# Spliting validation set 

validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

In [13]:
# Creating batch for mini batch Gradient Descent 

BATCH_SIZE = 100
train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(num_validation_samples)
test_data = scaled_test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data))

2025-02-09 20:19:01.698067: I tensorflow/core/kernels/data/tf_record_dataset_op.cc:376] The default buffer size is 262144, which is overridden by the user specified `buffer_size` of 8388608
2025-02-09 20:19:02.019631: W tensorflow/core/kernels/data/cache_dataset_ops.cc:914] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.


### Model

#### Outline of the model

In [19]:
input_size = 784
output_size = 10
hidden_layer_size = 200

model = tf.keras.Sequential([
                                # tf.keras.Input(shape=(28,28,1)),
                                tf.keras.layers.Flatten(input_shape=(28,28,1)),
                                tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                                tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                                tf.keras.layers.Dense(hidden_layer_size,activation='relu'),
                                tf.keras.layers.Dense(output_size, activation='softmax'),
                            ])

  super().__init__(**kwargs)


#### Choosing Optimizer and Loss Function

In [20]:
# This is a classification problem so we are choosing loss function as 'sparse_categorical_crossentropy' as its best for classification 
# For Optimizer we ara choosing the best ADAM optimizer 
# custom_optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

#### Training 

In [21]:
NUM_EPOCH = 10

model.fit(train_data, epochs=NUM_EPOCH, callbacks=[tf.keras.callbacks.EarlyStopping(patience=2)], verbose=2, validation_data=(validation_inputs, validation_targets))

Epoch 1/10


I0000 00:00:1739112922.439288    8866 service.cc:148] XLA service 0x78576000a820 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1739112922.441357    8866 service.cc:156]   StreamExecutor device (0): NVIDIA GeForce GTX 1650 Ti, Compute Capability 7.5
2025-02-09 20:25:22.476921: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
I0000 00:00:1739112922.628374    8866 cuda_dnn.cc:529] Loaded cuDNN version 90300
I0000 00:00:1739112923.670506    8866 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


540/540 - 7s - 14ms/step - accuracy: 0.9228 - loss: 0.2623 - val_accuracy: 0.9570 - val_loss: 0.1431
Epoch 2/10
540/540 - 2s - 3ms/step - accuracy: 0.9681 - loss: 0.1022 - val_accuracy: 0.9693 - val_loss: 0.1001
Epoch 3/10
540/540 - 4s - 8ms/step - accuracy: 0.9778 - loss: 0.0683 - val_accuracy: 0.9778 - val_loss: 0.0764
Epoch 4/10
540/540 - 2s - 4ms/step - accuracy: 0.9838 - loss: 0.0514 - val_accuracy: 0.9848 - val_loss: 0.0498
Epoch 5/10
540/540 - 2s - 3ms/step - accuracy: 0.9862 - loss: 0.0412 - val_accuracy: 0.9822 - val_loss: 0.0534
Epoch 6/10
540/540 - 2s - 3ms/step - accuracy: 0.9881 - loss: 0.0354 - val_accuracy: 0.9837 - val_loss: 0.0550


<keras.src.callbacks.history.History at 0x78585ffa2510>

### Testing the model 

In [22]:
test_loss, test_accuracy = model.evaluate(test_data)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step - accuracy: 0.9743 - loss: 0.0971


In [23]:
print("Test Loss: {:.2f}, Test Accuracy: {:.2f}".format(test_loss, test_accuracy*100))

Test Loss: 0.10, Test Accuracy: 97.43


In [25]:
model.summary()