## 2.1 Loading the mnist dataset

In [None]:

from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()


train_images and train_labels form the training set, the data that the model will learn from. The model will then be tested on the test set, test_images and test_labels.

The images are encoded as NumPy arrays, and the labels are an array of digits, ranging from 0 to 9. The images and labels have a one-to-one correspondence.
Let’s look at the training data:

In [2]:
train_images.shape

(60000, 28, 28)

In [3]:
len(train_labels)

60000

In [4]:
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [5]:
test_images.shape

(10000, 28, 28)

In [6]:
len(test_labels)

10000

In [7]:
test_labels

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

## 2.2 The network acrhitecture 

In [8]:
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential([
    layers.Dense(512, activation='relu'), # 512 neurons in the first layer
    layers.Dense(10, activation='softmax') # 10 neurons in the second layer (one for each digit) 
    # 10 neurons reprsent the probability of the image being a digit from 0 to 9 based on probabilities of the 10 neurons. 
    # activion function softmax is used to convert the output to probabilities 
    # activion relu is used to convert the output to probabilities 
])




The core building block of neural networks is the layer. You can think of a layer as a filter for data: some data goes in, and it comes out in a more useful form. Specifically, layers extract representations out of the data fed into them—hopefully, representations that are more meaningful for the problem at hand. Most of deep learning consists of chaining together simple layers that will implement a form of progressive data distillation. A deep learning model is like a sieve for data processing, made of a succession of increasingly refined data filters—the layers. 
**Here, our model consists of a sequence of two Dense layers, which are densely connected (also called fully connected) neural layers. The second (and last) layer is a 10-way softmax classification layer, which means it will return an array of 10 probability scores (summing to 1). Each score will be the probability that the current digit image belongs to one of our 10 digit classes.**

#### To make a model ready for training, we need to pick three more thigns as part of the compilation step:

* An optimaizer - The mechancism through which the model will update itself based on the training data it see, so as to improve its performances. 

* A loss function - How the model will able to measure its performance onthe trainign data, and thus how it will be able to steer itself in the right direction

* metrics to monitor during training and testing - Here, we we'l only care about accuracy (The fraction of the images that were correcly classified)



##### Epochs 

The number of epochs is a hyperparameter that determines how many times the model will iterate over the entire training dataset. Each epoch allows the model to learn from the data and refine its predictions. By increasing the number of epochs, the model has more opportunities to improve its performance.

## 2.3 The compliation step

In [9]:
model.compile(optimizer="rmsprop",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])
# optimizer is the mechanism through which the model updates itself based on the data it sees and its loss function
# loss function is the mechanism through which the model will be able to measure its performance on the training data
# metrics is the mechanism through which the model will be able to measure its performance on the test data




Befroe training, we will preprocess the data by reshaping it into the sahpe the model expects and scalign it so that all values are in [0,1] interval. Previously, our training images were stored in a array of shape (60000,28,28) of uint 8 with values in the [0,255] interval. We'll transform it a float32 of array of shape (6000,28 * 28) with values of between 0 and 1 

## 2.4 Preparing the image data

In [10]:
train_images = train_images.reshape((60000, 28 * 28)) # reshape the training data from 3D to 2D array 
train_images = train_images.astype('float32') / 255 # normalize the training data 
test_images = test_images.reshape((10000, 28 * 28)) # reshape the test data from 3D to 2D array 
test_images = test_images.astype('float32') / 255   # normalize the test data 

## 2.5 "Fitting" the model 

In [11]:
model.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x1a1edc78c10>

## using the model predictions

In [12]:
test_digits = test_images[0:10]
prediction = model.predict(test_digits) 
prediction[0]



array([5.7792047e-08, 2.3060052e-09, 4.4718195e-06, 2.3890556e-05,
       3.1259901e-11, 8.6375831e-09, 1.4229726e-12, 9.9997091e-01,
       3.3524625e-08, 6.6297309e-07], dtype=float32)

In [13]:
prediction[0].argmax()

7

In [14]:
prediction[0][7]

0.9999709