In [2]:
import keras
from keras.datasets import mnist
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD, Adam

In [3]:
(train_x, train_y), (test_x, test_y)= mnist.load_data()

In [4]:
train_x.shape, test_x.shape, train_y.shape, test_y.shape

((60000, 28, 28), (10000, 28, 28), (60000,), (10000,))

In [5]:
#rescale dataset
train_x = train_x.astype('float32')/255
test_x = test_x.astype('float32')/255

In [6]:
# #normalization
# train_x = train_x.astype('float32') - 127.5/ 127.5
# test_x = test_x.astype('float32') - 127.5/127.5

In [7]:
#flatten the image pixels
train_x = train_x.reshape(60000, 784)#784 is 28 multiply by 28
test_x =  test_x.reshape(10000, 784)

Here, we reshape each of our 28 x 28 pixels into a 1 Dimensional array of 784 
pixels (28 * 28)
This is an undesirable but necessary step. Undesirable because by flattening our 
image into a single dimension, the entire 2 D structure of the image is lost. It is 
necessary because the fully connected neural network we are about to develop 
cannot handle such high dimensional data.

In [8]:
train_x.shape

(60000, 784)

In [9]:
#convert the labels to Vectors
train_y = keras.utils.to_categorical(train_y,10)
test_y = keras.utils.to_categorical(test_y,10)
#this seems like using one hot encoding to eencode the 10 classes in the mnist dataset

In [10]:
test_y.shape

(10000, 10)

In [11]:
# #define the model network
# model = Sequential()
# model.add(Dense(units= 128, activation= 'relu', input_shape = (784,)))
# model.add(Dense(units= 128, activation= 'relu'))
# model.add(Dense(units= 128, activation= 'relu'))
# model.add(Dense(units= 10, activation= 'softmax'))


The input shape passed to the first layer is necessary for keras to determine the shape of the data. In this case, the shape of each 28 x 28 image has become 784 after we flatten it.

The last layer uses the softmax function we explained under Loss Functions. Since 
our classes are ten, 0 – 9, this final output layer needs exactly ten neurons to 
output ten softmax computed probabilities for each class

In [12]:
# model.summary()

In [13]:
# #compile the function
# model.compile(optimizer = SGD(0.01), loss= 'categorical_crossentropy', metrics= ['accuracy'])

This part is very vital to our training procedure, first we specify the optimizer, which in this case is SGD (Stochastic Gradient Descent) with a learning rate of 0.01. This is a very 
good learning rate value, if it is too small, training would become unnecessarily slow, while if it’s too high, SGD is likely to overshoot, failing to converge to neither a global 
minima nor a local minima.

Next we specify our loss function, in this case categorical_crossentropy , just another name for softmax crossentropy.

Finally, we state the metrics we want to obtain, in this case we are most interested in the accuracy, i.e the ratio of images classified correctly. In a classification setting, we are most interested in the accuracy metric, however, in regression setting, we are primarily concerned with the Mean Squared Error.

In [14]:
# #fit the function
# model.fit(train_x, train_y, batch_size= 32, epochs= 10, verbose=1)

Here we pass the training images and their corresponding labels into our model, we also specify the batch size to be 32. This entails we want the model to process only 32 images at a time, finally, we specify that training should run for 10 
iterations. Note that at each iteration, the training would be done on all the batches. In this case, each of the 10 iterations is divided into 60000/32 iterations.
Verbose = 1 simply tells keras to log to the console at every iteration.

Calling the fit function would invoke the training process right away.

In [15]:
# accuracy = model.evaluate(x= test_x, y = test_y, batch_size= 32)
# print('Accuracy: ', accuracy)

In [17]:
#define the model network
model2 = Sequential()
model2.add(Dense(units= 256, activation= 'relu',input_shape = (784,)))
model2.add(Dense(units= 256, activation= 'relu'))
model2.add(Dense(units= 256, activation= 'relu'))
model2.add(Dense(units= 10, activation='softmax'))

In [18]:
# Build the model by specifying the input shape
model2.build(input_shape=(None, 784))

model2.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_4 (Dense)             (None, 256)               200960    
                                                                 
 dense_5 (Dense)             (None, 256)               65792     
                                                                 
 dense_6 (Dense)             (None, 256)               65792     
                                                                 
 dense_7 (Dense)             (None, 10)                2570      
                                                                 
Total params: 335114 (1.28 MB)
Trainable params: 335114 (1.28 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [19]:
#compile the function
model2.compile(optimizer ='Adam', loss= 'categorical_crossentropy', metrics= ['accuracy'])

In [20]:
#fit the function
model2.fit(train_x, train_y, batch_size= 32, epochs= 10, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x157eba7f3d0>

Increasing the neuron size increases the accuracy to 99% at 10 epochs

while changing the optimizers also increased the accuracy

In [21]:
#Evaluate the accuracy of the test dataset
accuracy2 = model2.evaluate(x=test_x,y=test_y,batch_size=32)
print("Accuracy: ",accuracy2)

Accuracy:  [0.10138635337352753, 0.9775000214576721]


In [22]:
model2.save('mnistmodel22.h5')

  saving_api.save_model(
