Description:Creating NN using torch and understanding steps involved

Dataset: MNIST

Credits: [Tutorial by Nick Ochanack](https://github.com/nicknochnack/PyTorchin15)

In [1]:
# Import dependencies
import torch 
from PIL import Image
from torch import nn, save, load
from torch.optim import Adam
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

In [2]:
# Get data 
train = datasets.MNIST(root="data", download=True, train=True, transform=ToTensor())
dataset = DataLoader(train, 32)
#1,28,28 - classes 0-9

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9912422/9912422 [00:00<00:00, 21181168.74it/s]


Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28881/28881 [00:00<00:00, 57274559.73it/s]


Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1648877/1648877 [00:00<00:00, 18598355.26it/s]


Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4542/4542 [00:00<00:00, 10756933.24it/s]


Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw



In [3]:
# Image Classifier Neural Network
class ImageClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        # Define the layers of the neural network using nn.Sequential
        self.model = nn.Sequential(
            # First convolutional layer: input channels=1, output channels=32, kernel size=(3,3)
            nn.Conv2d(1, 32, (3, 3)),
            nn.ReLU(),  # ReLU activation function
            # Second convolutional layer: input channels=32, output channels=64, kernel size=(3,3)
            nn.Conv2d(32, 64, (3, 3)),
            nn.ReLU(),  # ReLU activation function
            # Third convolutional layer: input channels=64, output channels=64, kernel size=(3,3)
            nn.Conv2d(64, 64, (3, 3)),
            nn.ReLU(),  # ReLU activation function
            nn.Flatten(),  # Flatten the output to a 1D tensor
            # Fully connected layer: input size=64*(28-6)*(28-6), output size=10
            nn.Linear(64 * (28 - 6) * (28 - 6), 10)
        )

    def forward(self, x):
        # Define the forward pass of the neural network
        return self.model(x)

Here's a summary of each part:

### Constructor (__init__) Method:
- Initializes the ImageClassifier class by inheriting from nn.Module.
- Defines a sequential model (self.model) that consists of convolutional layers, ReLU activation functions, and a fully connected layer.

### Sequential Model (self.model):
Convolutional Layers:
- The first convolutional layer has 1 input channel (assumed to be grayscale images), produces 32 output channels, and uses a 3x3 kernel.
- The second convolutional layer takes 32 input channels, produces 64 output channels, and uses a 3x3 kernel.
- The third convolutional layer takes 64 input channels, produces 64 output channels, and uses a 3x3 kernel.

ReLU Activation Functions:
- Rectified Linear Unit (ReLU) activation functions follow each convolutional layer.

Flatten Layer:
- Flattens the output of the convolutional layers into a 1D tensor.

Fully Connected Layer:
- A fully connected layer (linear layer) takes the flattened input with a size calculated based on the output shape of the last convolutional layer and produces an output of size 10. This is common for classification tasks where you have 10 classes.

### Forward Method (forward):
- Defines the forward pass of the neural network by applying the layers defined in the constructor (self.model) to the input tensor x.

This code assumes input images of size 28x28 pixels (MNIST-like), and it's designed for a classification task with 10 output classes. The specific architecture and hyperparameters can be adjusted based on the requirements of the task at hand.

In [4]:
# Instance of the neural network, loss, optimizer 
clf = ImageClassifier().to('cuda')
opt = Adam(clf.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss() 

In [5]:
if __name__ == "__main__":
    # Training loop for 10 epochs
    for epoch in range(10):
        for batch in dataset:
            X, y = batch
            X, y = X.to('cuda'), y.to('cuda')
            yhat = clf(X)  # Forward pass
            loss = loss_fn(yhat, y)

            # Apply backpropagation
            opt.zero_grad()
            loss.backward()
            opt.step()

        print(f"Epoch:{epoch} loss is {loss.item()}")

    # Save the trained model
    with open('model_state.pt', 'wb') as f:
        save(clf.state_dict(), f)

    # Load the saved model
    with open('model_state.pt', 'rb') as f:
        clf.load_state_dict(load(f))

    # Load an image and make a prediction
    img = Image.open('img_3.jpg')
    img_tensor = ToTensor()(img).unsqueeze(0).to('cuda')

    # Make a prediction on the image
    print(torch.argmax(clf(img_tensor)))

Epoch:0 loss is 0.020670462399721146
Epoch:1 loss is 0.0012369464384391904
Epoch:2 loss is 0.0040187956765294075
Epoch:3 loss is 3.8196114473976195e-05
Epoch:4 loss is 0.00019526893447618932
Epoch:5 loss is 2.0104871509829536e-05
Epoch:6 loss is 8.828911290947872e-07
Epoch:7 loss is 3.0397582122532185e-06
Epoch:8 loss is 8.940689610881236e-08
Epoch:9 loss is 6.444721520892926e-07
tensor(9, device='cuda:0')


Training Loop:
- The code iterates through 10 epochs, where each epoch processes batches of data from the dataset.
- For each batch, it performs a forward pass (clf(X)) to get predictions, calculates the loss using loss_fn, and then performs backpropagation and optimization using the opt optimizer.

Print Epoch Loss:
- After each epoch, it prints the epoch number and the corresponding loss.

Save Model:
- It saves the state dictionary of the trained model (clf.state_dict()) to a file named 'model_state.pt'.

Load Model:
- It loads the saved model state back into the clf model.

Make Prediction on an Image:
- It loads an image ('img_3.jpg'), converts it to a PyTorch tensor using ToTensor(), adds a batch dimension with unsqueeze(0), and moves it to the GPU ('cuda').
- It then makes a prediction using the trained model (clf) and prints the index of the maximum value in the prediction tensor (argmax), which corresponds to the predicted class.

In [8]:
img = Image.open('img_1.jpg') 
img_tensor = ToTensor()(img).unsqueeze(0).to('cuda')

print(torch.argmax(clf(img_tensor)))

tensor(2, device='cuda:0')
