# Implementing Nerual Networks Model to classify the given image into 1 of the 10 categories of the fashion MNIST dataset

## Step 1: Importing the dataset

In [1]:
path = r"C:\NumberMNIST"

print("Path to dataset files:", path)

Path to dataset files: C:\NumberMNIST


## Step 2: Importing necessary libraries and loading the dataset

In [2]:
import numpy as np
import struct
import os

def load_images(file_path):
    with open(file_path, 'rb') as f:
        magic, num_images, rows, cols = struct.unpack('>IIII', f.read(16))
        images = np.frombuffer(f.read(), dtype=np.uint8).reshape(num_images, rows * cols)
    return images

def load_labels(file_path):
    with open(file_path, 'rb') as f:
        magic, num_labels = struct.unpack('>II', f.read(8))
        labels = np.frombuffer(f.read(), dtype=np.uint8)
    return labels

os.listdir(path) # List files in the dataset directory to verify paths


['t10k-images-idx3-ubyte',
 't10k-images.idx3-ubyte',
 't10k-labels-idx1-ubyte',
 't10k-labels.idx1-ubyte',
 'train-images-idx3-ubyte',
 'train-images.idx3-ubyte',
 'train-labels-idx1-ubyte',
 'train-labels.idx1-ubyte']

In [3]:
X_train = load_images(path + r"\train-images.idx3-ubyte")
y_train = load_labels(path + r"\train-labels.idx1-ubyte")

X_test = load_images(path + r"\t10k-images.idx3-ubyte")
y_test = load_labels(path + r"\t10k-labels.idx1-ubyte")

print("Train shape:", X_train.shape)
print("Test shape:", X_test.shape)
print("Train labels shape:", y_train.shape)
print("Test labels shape:", y_test.shape)

Train shape: (60000, 784)
Test shape: (10000, 784)
Train labels shape: (60000,)
Test labels shape: (10000,)


## Step 3: Data Preprocessing

**Flattening** is the process of converting a multi-dimensional input (such as a 2D image matrix) into a one-dimensional vector so that it can be fed into a fully connected neural network.

Here, flattening is the process of converting each 28Ã—28 image into a 784-dimensional vector so that it can be used as input to a fully connected neural network.

**In this case, flattening images is not necessary as they are already in the shape (num_samples, 784)!**

In [4]:
np.min(X_train), np.max(X_train) # Check pixel value range

(0, 255)

In [5]:
# Normalize pixel values to [0, 1]
X_train= X_train / 255.0
X_test = X_test / 255.0

# Why 255.0? Because pixel values in the dataset are in the range [0, 255], 
# and dividing by 255.0 scales them to the range [0, 1], 
# which is beneficial for training neural networks.

In [6]:
print(f"Unique labels in y_train: {np.unique(y_train)}")

Unique labels in y_train: [0 1 2 3 4 5 6 7 8 9]


Neural network needs 0s and 1s, not other numbers, hence we one-hot encode the original labels

In [7]:
# One-hot encode labels

def one_hot_encode(y, num_classes):
    one_hot_matrix= np.zeros((y.shape[0], num_classes))
    for i in range(y.shape[0]):
        one_hot_matrix[i, y[i]] = 1
    return one_hot_matrix

y_train_encoded = one_hot_encode(y_train, 10)
y_test_encoded = one_hot_encode(y_test, 10)

print("One-hot encoded train labels shape:", y_train_encoded.shape)
print("One-hot encoded test labels shape:", y_test_encoded.shape)

One-hot encoded train labels shape: (60000, 10)
One-hot encoded test labels shape: (10000, 10)


**What One-Hot Encoding Does:**

If:

y = [2, 0, 3]
num_classes = 4


Then output becomes:

[[0 0 1 0]
[1 0 0 0]
[0 0 0 1]]

where index of each one in each 1-D array denotes the class (for eg., index 2 -> Class 2).

**-> One-hot encoding is required for computing cross-entropy loss in multi-class classification problems.**

Now initially, the shape of the labels were:

Train labels shape: (60000,)
Test labels shape: (10000,)

i.e., they were 1D array initally.

After one-hot encoding:

One-hot encoded train labels shape: (60000, 10)
One-hot encoded test labels shape: (10000, 10)


i.e., they are 2D matrices now.

Now the question is: **Why Do We Need 2D Matrices?**

It's because we are training on many samples at once, not one image at a time.

Neural networks use matrix multiplication to process batches efficiently.

In [8]:
# Quick dataset analysis

unique, counts = np.unique(y_train, return_counts=True)

for u, c in zip(unique, counts):
    print(f"Class {u}: {c} samples")

# zip() is a built-in Python function that lets you iterate over 
# multiple sequences (or lists) at the same time.

Class 0: 5923 samples
Class 1: 6742 samples
Class 2: 5958 samples
Class 3: 6131 samples
Class 4: 5842 samples
Class 5: 5421 samples
Class 6: 5918 samples
Class 7: 6265 samples
Class 8: 5851 samples
Class 9: 5949 samples


Here, each class has around 6000 samples, hence the dataset is **mostly balanced**.

Before we move on to the next step, we have three options for training: 

1. Batch Gradient Descent (use all 60,000 samples at once)
2. Mini-Batch Gradient Descent (use small batches like 32/64/128)
3. Stochastic Gradient Descent (1 sample at a time)

We will pick the second one here because it has faster convergence, more stable than SGD and has better generalisation.

### Step 4: Implementing Mini-Batch Training