#**This question should be answered using PyTorch and MNIST data set discussed in the class.**

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

In [2]:
# Download training data from open datasets.
training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets
test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1007)>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9.91M/9.91M [00:00<00:00, 45.4MB/s]


Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1007)>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28.9k/28.9k [00:00<00:00, 1.76MB/s]


Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1007)>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1.65M/1.65M [00:00<00:00, 13.3MB/s]


Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1007)>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4.54k/4.54k [00:00<00:00, 4.50MB/s]

Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw






In [3]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


#**a. Find a suitable configuration of Neural Network and fit the model to predict handwritten digits with high accuracy.**

In [39]:
# Get cpu, gpu or mps device for training.
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 1024),
            nn.ReLU(),
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

Using cpu device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=1024, bias=True)
    (1): ReLU()
    (2): Linear(in_features=1024, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


In [40]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In [41]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [42]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [43]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.308900  [   64/60000]
loss: 2.300448  [ 6464/60000]
loss: 2.300684  [12864/60000]
loss: 2.282619  [19264/60000]
loss: 2.287493  [25664/60000]
loss: 2.279827  [32064/60000]
loss: 2.270112  [38464/60000]
loss: 2.277222  [44864/60000]
loss: 2.255399  [51264/60000]
loss: 2.248954  [57664/60000]
Test Error: 
 Accuracy: 46.8%, Avg loss: 2.246060 

Epoch 2
-------------------------------
loss: 2.245853  [   64/60000]
loss: 2.233983  [ 6464/60000]
loss: 2.249128  [12864/60000]
loss: 2.202466  [19264/60000]
loss: 2.223453  [25664/60000]
loss: 2.214411  [32064/60000]
loss: 2.191483  [38464/60000]
loss: 2.218281  [44864/60000]
loss: 2.174505  [51264/60000]
loss: 2.160833  [57664/60000]
Test Error: 
 Accuracy: 58.9%, Avg loss: 2.160034 

Epoch 3
-------------------------------
loss: 2.159463  [   64/60000]
loss: 2.137447  [ 6464/60000]
loss: 2.170759  [12864/60000]
loss: 2.083395  [19264/60000]
loss: 2.120805  [25664/60000]
loss: 2.106322  [32064/600

#**b. Demonstrate prediction for some values.**

In [47]:
import random

# Select a random subset of data points from the test set
num_samples_to_predict = 10
random_indices = random.sample(range(len(test_data)), num_samples_to_predict)

# Create a subset of the test data loader for selected indices.
subset_test_data = torch.utils.data.Subset(test_data, random_indices)
subset_test_dataloader = DataLoader(subset_test_data, batch_size=1)

# Perform predictions
model.eval()
with torch.no_grad():
    for X, y in subset_test_dataloader:
        X, y = X.to(device), y.to(device)
        pred = model(X)
        predicted_label = pred.argmax(1).item()
        print(f"Predicted label: {predicted_label}, Actual label: {y.item()}")

Predicted label: 7, Actual label: 5
Predicted label: 7, Actual label: 7
Predicted label: 3, Actual label: 3
Predicted label: 9, Actual label: 9
Predicted label: 1, Actual label: 1
Predicted label: 8, Actual label: 8
Predicted label: 4, Actual label: 4
Predicted label: 8, Actual label: 8
Predicted label: 1, Actual label: 1
Predicted label: 0, Actual label: 0


#**c. Print the possible Performance Metrics of the model.**

In [48]:
from sklearn.metrics import classification_report, confusion_matrix

def get_predictions(dataloader, model):
    model.eval()
    all_predictions = []
    all_labels = []
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            predicted_labels = pred.argmax(1).cpu().numpy()
            all_predictions.extend(predicted_labels)
            all_labels.extend(y.cpu().numpy())
    return all_predictions, all_labels


y_pred, y_true = get_predictions(test_dataloader, model)


print(classification_report(y_true, y_pred))
print(confusion_matrix(y_true, y_pred))

              precision    recall  f1-score   support

           0       0.70      0.98      0.81       980
           1       0.76      0.99      0.86      1135
           2       0.85      0.72      0.78      1032
           3       0.63      0.84      0.72      1010
           4       0.82      0.65      0.72       982
           5       1.00      0.09      0.17       892
           6       0.86      0.87      0.86       958
           7       0.74      0.87      0.80      1028
           8       0.79      0.64      0.71       974
           9       0.67      0.72      0.70      1009

    accuracy                           0.75     10000
   macro avg       0.78      0.74      0.71     10000
weighted avg       0.78      0.75      0.72     10000

[[ 956    0    2    6    0    0   10    1    5    0]
 [   0 1119    6    3    0    0    4    1    2    0]
 [  67   93  745   36   19    0   26   22   24    0]
 [  28   17   28  852    0    0    9   31   35   10]
 [  12   26    4    0  634   

#**Summary:-**

The code trains a neural network on the MNIST dataset to classify handwritten digits.  Here's a summary of the inferences:

1. **Model Architecture:** A simple feedforward neural network with two hidden layers (1024 and 512 neurons) and ReLU activation functions is used.  The input is flattened from a 28x28 image, and the output layer has 10 neurons (one for each digit). Any other combination of the layers and neurons were either slow or was not as accurate.

2. **Training:** The model is trained using stochastic gradient descent (SGD) with a learning rate of 1e-3 and the cross-entropy loss function.  The training process involves iterating through the training data for 5 epochs, printing the loss at intervals, and evaluating performance on the test set after each epoch.

3. **Evaluation:**  Accuracy and average loss are reported on the test set after each epoch.  A `classification_report` provides precision, recall, F1-score, and support for each digit class. A confusion matrix is also generated, showing the counts of correct and incorrect predictions for each digit class.

4. **Prediction Demonstration:** The code randomly selects 10 samples from the test dataset and demonstrates the model's predictions on these samples.

5. **Performance Metrics:**  The provided `classification_report` and `confusion_matrix` offer detailed insights into model performance. The classification report gives class-wise metrics. The confusion matrix provides a visualization of how often one digit is misclassified as another.


In essence, the code trains and evaluates a basic neural network for handwritten digit recognition, provides examples of the model's predictions and evaluates the model's performance using common classification metrics.

**The average accuracy of the model is about 75%.**(maximum possible for me)

