# MNIST analysis

## Imports
The first step is to import the necessary libraries. We will use the following libraries:
1. **numpy**: for numerical operations
2. **matplotlib**: for plotting
3. **pytorch**: for neural networks
4. **torchvision**: for image processing

These can be installed by running the following command in the terminal:
```bash
pip install -r requirements.txt
```

- Displaying the first MNIST image

In [1]:
import torch
import torchvision
import numpy as np

# Load the MNIST dataset
mnist_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=None, target_transform=None, download=True)

# Get the first image from the dataset
image, label = mnist_dataset[0]

# display the image
image.show()

# display 10 images from the dataset
# for i in range(10):
# 	image, label = mnist_dataset[i]
# 	image.show()

- Displaying the image in form of pixels

Since in order to analyse the image, we need to convert it into an array of pixels. We can do this by using the **numpy** library. We will use the **numpy** function **reshape** to convert the image into a 28x28 array.

In [2]:
# displaying the image in form of 28x28 matrix of pixels
np.array(image)

array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   3,
         18,  18,  18, 126, 136, 175,  26, 166, 255, 247, 127,   0,   0,
          0,   0],
       [  

The above image and the array are the same. The only difference is that the image is a 28x28 matrix of pixels, while the array is a 1D array of 784 elements. The data they are both representing is the number 5.

## Finding the occurances of each digit

In [4]:
# Step 1: Import the necessary libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import numpy as np

# Step 2: Load the MNIST dataset
transform = transforms.ToTensor()

train_data = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_data = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_data, batch_size=200, shuffle=True)
test_loader = DataLoader(test_data, batch_size=200, shuffle=False)

# Step 3: Define the CNN model
class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 5)
        self.fc1 = nn.Linear(12*12*32, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 12*12*32)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = ConvNet()

# Step 4: Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Step 5: Train the model
for epoch in range(10):
    for i, (images, labels) in enumerate(train_loader):
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

# Step 6: Evaluate the model
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy: {} %'.format(100 * correct / total))

# Step 7: Predict on the test data and count the occurrences of each digit
with torch.no_grad():
    all_predicted = []
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        all_predicted += predicted.tolist()

(unique, counts) = np.unique(all_predicted, return_counts=True)
frequencies = np.asarray((unique, counts)).T
print(frequencies)

Test Accuracy: 98.81 %
[[   0  992]
 [   1 1139]
 [   2 1038]
 [   3 1032]
 [   4  968]
 [   5  879]
 [   6  944]
 [   7 1026]
 [   8  977]
 [   9 1005]]
