# Deep Neural Network for MNIST Classification

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

we created an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 



## Import the relevant packages

In [2]:
pip install tensorflow-datasets 

Collecting tensorflow-datasets
  Downloading tensorflow_datasets-4.9.7-py3-none-any.whl.metadata (9.6 kB)
Collecting click (from tensorflow-datasets)
  Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting dm-tree (from tensorflow-datasets)
  Downloading dm_tree-0.1.8-cp312-cp312-win_amd64.whl.metadata (2.0 kB)
Collecting immutabledict (from tensorflow-datasets)
  Downloading immutabledict-4.2.0-py3-none-any.whl.metadata (3.4 kB)
Collecting promise (from tensorflow-datasets)
  Downloading promise-2.3.tar.gz (19 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting pyarrow (from tensorflow-datasets)
  Downloading pyarrow-18.0.0-cp312-cp312-win_amd64.whl.metadata (3.4 kB)
Collecting simple-parsing (from tensorflow-datasets)
  Downloading simple_parsing-0.1.6-py3-none-any.whl.metadata (7.3 kB)
Collecting tensorflow-metadata (from tensorflow-datasets)
  Downloading tensorflow_metadata-1.16.1-py3-none-any.whl.met

In [1]:
import numpy as np
import tensorflow as tf

# TensorFLow includes a data provider for MNIST that we'll use.
import tensorflow_datasets as tfds


##  Load MNIST dataset

That's where we load and preprocess our data.

In [2]:

mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)


In [3]:
mnist_train, mnist_test = mnist_dataset['train'],mnist_dataset['test']

# Calculate validation and test samples
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples , tf.int64 )
num_test_samples = mnist_info.splits['test'].num_examples 
num_test_samples = tf.cast(num_test_samples , tf.int64)


### Define scaling and reshaping function


In [4]:
def scale_and_reshape(image, label):
    image = tf.cast(image, tf.float32)
    image = image / 255.0
    image = tf.reshape(image, (28, 28, 1))  # Ensure image shape is (28, 28, 1)
    return image, label


scaled_train_and_validation_data = mnist_train.map(scale_and_reshape) 
test_data = mnist_test.map(scale_and_reshape) 


### Shuffle and batch data

In [5]:
#we should shuffle the data for efficient SGD 
BUFFER_SIZE = 10000

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)

train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

#we are using batching so we have to set the batch size 

batch_size = 100

train_data = train_data.batch(batch_size) 
validation_data = validation_data.batch(num_validation_samples) 
test_data = test_data.batch(num_test_samples)

validation_inputs , validation_targets = next(iter(validation_data))


## Define the model

### outline the model 

In [6]:
#784 input layers , 50 hidden layer , 50 , 10 outputs 
#28*28=784
input_size=784
#digits(0-9)
output_size = 10 
hidden_layer_size = 1000 #the width 

model = tf.keras.Sequential([
    #we flatten any multidimmensional arrays like picture because the dense layers operates only on vectors(1d array)
    #each image has a shape of (28,28,1) after the flatten its shape become (784,)
    tf.keras.Input(shape=(28, 28, 1)),
    tf.keras.layers.Flatten(),
    #the dense layer is here to calculate weights and biases
    # tf.keras.layers.Dense is basically implementing: output = activation(dot(input, weight) + bias)
    tf.keras.layers.Dense(hidden_layer_size , activation='relu'),#1st hidden layer
    tf.keras.layers.Dropout(0.3),  # Dropout after first hidden layer, prevent from overfiting 
    
    tf.keras.layers.Dense(hidden_layer_size , activation='relu'),#2nd hidden layer 
    tf.keras.layers.Dropout(0.3),  # Dropout after second hidden layer
    
    tf.keras.layers.Dense(hidden_layer_size , activation='relu'),#3rd hidden layer 
    tf.keras.layers.Dropout(0.3),  # Dropout after third hidden layer
    
    # we use softmax in the output because we need probabilities as outputs (suitable for multi-class classification)
    tf.keras.layers.Dense(output_size , activation='softmax')
    
])


### choose the optimizer and the loss function 

In [7]:
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Define callbacks

In [8]:
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau

early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
model_checkpoint = ModelCheckpoint('best_mnist_model.keras', save_best_only=True, save_weights_only=False, monitor='val_loss', mode='min')
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, min_lr=1e-6)

## Train the model

In [9]:
#we used early stopping technique 

num_epochs = 10
model.fit(train_data,
          epochs=num_epochs,
          validation_data=(validation_inputs, validation_targets),
          callbacks=[early_stopping, model_checkpoint, reduce_lr],
          verbose=2)

Epoch 1/10
540/540 - 10s - 19ms/step - accuracy: 0.9223 - loss: 0.2526 - val_accuracy: 0.9683 - val_loss: 0.1123 - learning_rate: 1.0000e-03
Epoch 2/10
540/540 - 8s - 16ms/step - accuracy: 0.9633 - loss: 0.1220 - val_accuracy: 0.9723 - val_loss: 0.0860 - learning_rate: 1.0000e-03
Epoch 3/10
540/540 - 8s - 16ms/step - accuracy: 0.9713 - loss: 0.0935 - val_accuracy: 0.9827 - val_loss: 0.0607 - learning_rate: 1.0000e-03
Epoch 4/10
540/540 - 8s - 16ms/step - accuracy: 0.9761 - loss: 0.0798 - val_accuracy: 0.9833 - val_loss: 0.0590 - learning_rate: 1.0000e-03
Epoch 5/10
540/540 - 8s - 15ms/step - accuracy: 0.9784 - loss: 0.0679 - val_accuracy: 0.9838 - val_loss: 0.0552 - learning_rate: 1.0000e-03
Epoch 6/10
540/540 - 8s - 15ms/step - accuracy: 0.9799 - loss: 0.0652 - val_accuracy: 0.9880 - val_loss: 0.0422 - learning_rate: 1.0000e-03
Epoch 7/10
540/540 - 8s - 15ms/step - accuracy: 0.9825 - loss: 0.0593 - val_accuracy: 0.9845 - val_loss: 0.0570 - learning_rate: 1.0000e-03
Epoch 8/10
540/540 

<keras.src.callbacks.history.History at 0x137f82ad070>

## Evaluate the model on the test data

In [10]:
test_loss, test_accuracy = model.evaluate(test_data)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 383ms/step - accuracy: 0.9810 - loss: 0.0728


In [11]:
# We can apply some nice formatting if we want to
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.07. Test accuracy: 98.10%


## Save the model

In [None]:
model.save('final_mnist_model.keras')
print("Model saved as 'final_mnist_model.keras'")