# Creating a Neural Network for MNIST Classification Using NumPy

In this notebook, we will use a neural network to predict the number in an image. We will use the MNIST dataset, which is a dataset of 28x28 images of handwritten digits. The dataset contains 60,000 training images and 10,000 test images. Each image is labeled with the corresponding digit.

## Importing the libraries

We will use the following libraries:

In [2]:
import numpy as np # linear algebra
import struct # get image data

## Loading the dataset

First, we will load the dataset.

In [14]:
image_file:str = "./train-images.idx3-ubyte"
label_file:str = "./train-labels.idx1-ubyte"

def read_idx(file:str) -> object:
	with open(file, 'rb') as f:
		zero, data_type, dims = struct.unpack('>HBB', f.read(4))
		shape:tuple = tuple(struct.unpack('>I', f.read(4))[0] for d in range(dims))
		return np.frombuffer(f.read(), dtype=np.uint8).reshape(shape)

image_data:object = read_idx(image_file)
print(image_data.shape)
label_data:object = read_idx(label_file)
for i in range(3):
	print(label_data[i])
	for j in range(0, image_data.shape[1]):
		for k in range(0, image_data.shape[2]):
			if image_data[i][j][k] > 127:
				print("1", end="")
			else:
				print("0", end="")
		print("")

(60000, 28, 28)
5
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000011011100000
0000000000011111111111100000
0000000011111111110000000000
0000000011111111110000000000
0000000001011100010000000000
0000000000011000000000000000
0000000000011100000000000000
0000000000001100000000000000
0000000000000111000000000000
0000000000000011100000000000
0000000000000001111000000000
0000000000000000011100000000
0000000000000000011100000000
0000000000000001111100000000
0000000000000111111100000000
0000000000001111110000000000
0000000000111111000000000000
0000000111111100000000000000
0000011111111000000000000000
0000111111100000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000111000000000
00000000000000011111000

## Creating the neural network

Next, we will create a neural network & train it.

In [17]:
class NeuralNetwork:
	def __init__(self, input_size:int, hidden_size:int, output_size:int) -> None:
		self.W1:object = np.random.randn(input_size, hidden_size) * 0.01
		self.b1:object = np.zeros((1, hidden_size))
		self.W2:object = np.random.randn(hidden_size, output_size) * 0.01
		self.b2:object = np.zeros((1, output_size))

	def relu(self, Z:object) -> object:
		return np.maximum(0, Z)

	def relu_derivative(self, Z:object) -> object:
		return Z > 0

	def softmax(self, Z:object) -> object:
		expZ:object = np.exp(Z - np.max(Z, axis=1, keepdims=True))
		return expZ / np.sum(expZ, axis=1, keepdims=True)

	def cross_entropy_loss(self, Y:object, Y_hat:object) -> object:
		m = Y.shape[0]
		log_likelihood = -np.log(Y_hat[range(m), np.argmax(Y, axis=1)])
		loss = np.sum(log_likelihood) / m
		return loss

	def forward(self, X:object) -> object:
		self.Z1:object = np.dot(X, self.W1) + self.b1
		self.A1:object = self.relu(self.Z1)
		self.Z2:object = np.dot(self.A1, self.W2) + self.b2
		self.A2:object = self.softmax(self.Z2)
		return self.A2

	def backward(self, X:object, Y:object, learning_rate:float) -> None:
		m:object = X.shape[0]

		dZ2:object = self.A2 - Y
		dW2:object = np.dot(self.A1.T, dZ2) / m
		db2:object = np.sum(dZ2, axis=0, keepdims=True) / m

		dA1:object = np.dot(dZ2, self.W2.T)
		dZ1:object = dA1 * self.relu_derivative(self.Z1)
		dW1:object = np.dot(X.T, dZ1) / m
		db1:object = np.sum(dZ1, axis=0, keepdims=True) / m

		self.W1 -= learning_rate * dW1
		self.b1 -= learning_rate * db1
		self.W2 -= learning_rate * dW2
		self.b2 -= learning_rate * db2

	def train(self, X:object, Y:object, epochs:int, learning_rate:float) -> None:
		for epoch in range(epochs):
			Y_hat:object = self.forward(X)
			loss:object = self.cross_entropy_loss(Y, Y_hat)
			self.backward(X, Y, learning_rate)
			if epoch % 5 == 0:
				print(f"Epoch {epoch}, Loss: {loss}")

input_size = image_data.shape[1] * image_data.shape[2]
hidden_size = 128	# Number of hidden neurons
output_size = 10	# 10 classes (digits 0-9)
epochs = 100		# Number of training iterations
learning_rate = 0.1	# Learning rate

x_train = image_data.reshape(image_data.shape[0], -1) / 255
y_train = np.eye(10)[label_data]

nn = NeuralNetwork(input_size, hidden_size, output_size)
nn.train(x_train, y_train, epochs, learning_rate)

Epoch 0, Loss: 2.304340548869987
Epoch 5, Loss: 2.298829195990266
Epoch 10, Loss: 2.2927531845543534
Epoch 15, Loss: 2.284759239654879
Epoch 20, Loss: 2.2734159610461333
Epoch 25, Loss: 2.256937469695447
Epoch 30, Loss: 2.233081524923437
Epoch 35, Loss: 2.1992654331961345
Epoch 40, Loss: 2.152844245623377
Epoch 45, Loss: 2.0912937903266933
Epoch 50, Loss: 2.0123410036099583
Epoch 55, Loss: 1.9145460333887507
Epoch 60, Loss: 1.7989010375346504
Epoch 65, Loss: 1.6704877064573498
Epoch 70, Loss: 1.5379612639931388
Epoch 75, Loss: 1.4104419171989566
Epoch 80, Loss: 1.294251276131505
Epoch 85, Loss: 1.191974249550084
Epoch 90, Loss: 1.103491323720678
Epoch 95, Loss: 1.0273988338452178


## Testing

Finally, we will test the neural network on the test dataset to see how accurate it is for unseen input data.

In [18]:
x_test = read_idx("./t10k-images.idx3-ubyte").reshape(10000, -1) / 255
y_test = np.eye(10)[read_idx("./t10k-labels.idx1-ubyte")]
y_test_pred = nn.forward(x_test)

accuracy = np.mean(np.argmax(y_test_pred, axis=1) == np.argmax(y_test, axis=1))
print(f"Test set accuracy: {accuracy * 100:.2f}%")

Test set accuracy: 79.61%
