# Stochastic Regression Lab
In this lab you will use stochastic regression to categorize our favourite dataset, the MNIST handwritten digits

The Python libary we will be using is the PyTorch library

Use the pytorch 2.0.0 Python 3.10 CPU Optmized kernel

## Step One -- Import the libraries we need


In [None]:
import torch
import torchvision.transforms as transforms
from torchvision import datasets
import matplotlib.pyplot as plt
import pandas as pd


## Step Two -- Getting the Data

The MNIST dataset is part of the PyTorch library

Load 6000 images for training 


In [None]:
train_dataset = datasets.MNIST(root='./data', 
                               train=True, 
                               transform=transforms.ToTensor(),
                               download=True)

Now load 1000 images for testing

In [3]:
test_dataset = datasets.MNIST(root='./data', 
                              train=False, 
                              transform=transforms.ToTensor())

Confirm the number of images in the data sets

In [None]:
print("number of training samples: " + str(len(train_dataset)) + "\n" +
      "number of testing samples: " + str(len(test_dataset)))

Inspect the shape of the the first training sample, it is an 28 x 28 greyscale image and a lable

In [None]:
print("datatype of the 1st training sample: ", train_dataset[0][0].type())
print("size of the 1st training sample: ", train_dataset[0][0].size())

And the label of the first two samplea

In [None]:
print("label of the first training sample: ", train_dataset[0][1])
print("label of the second training sample: ", train_dataset[1][1])

Display the two images to confirm what they look like

In [None]:
img_5 = train_dataset[0][0].numpy().reshape(28, 28)
plt.imshow(img_5, cmap='gray')
plt.show()
img_0 = train_dataset[1][0].numpy().reshape(28, 28)
plt.imshow(img_0, cmap='gray')
plt.show()

## Loading the Data
The data will be read in batches of 32 samples by using a DataLoader class

In [11]:
from torch.utils.data import DataLoader

batach_size = 32
train_loader = DataLoader(dataset=train_dataset, batch_size=batach_size, shuffle=True) 
test_loader = DataLoader(dataset=test_dataset, batch_size=batach_size, shuffle=False)

## Defining the model

Notice that we are setting the algorithm to Logistic Regression and the using the Sigmoid function

In [12]:
class LogisticRegression(torch.nn.Module):    
   
    def __init__(self, n_inputs, n_outputs):
        super(LogisticRegression, self).__init__()
        self.linear = torch.nn.Linear(n_inputs, n_outputs)
   
    def forward(self, x):
        y_pred = torch.sigmoid(self.linear(x))
        return y_pred

The previous step defined a module to do the work, so now you just need to create an instance of it

Notice the inputs are the 28x28 array of pixels are mapped into a single vector

In [13]:
n_inputs = 28*28
n_outputs = 10
log_regr = LogisticRegression(n_inputs, n_outputs)

## Training the Model

Now the actual training begins. 
- The loss function is cross-entropy loss
- The training cost function optimization is stochastic gradient descent
- Remember that stochastic GD uses epochs to train, you will train for 50


In [14]:
optimizer = torch.optim.SGD(log_regr.parameters(), lr=0.001)
criterion = torch.nn.CrossEntropyLoss()
epochs = 50

In [None]:
Loss = []
acc = []
for epoch in range(epochs):
    for i, (images, labels) in enumerate(train_loader):
        optimizer.zero_grad()
        outputs = log_regr(images.view(-1, 28*28))
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
    Loss.append(loss.item())
    correct = 0
    for images, labels in test_loader:
        outputs = log_regr(images.view(-1, 28*28))
        _, predicted = torch.max(outputs.data, 1)
        correct += (predicted == labels).sum()
    accuracy = 100 * (correct.item()) / len(test_dataset)
    acc.append(accuracy)
    print('Epoch: {}. Loss: {}. Accuracy: {}'.format(epoch, loss.item(), accuracy))

## Examine the rate of loss during training

In [None]:
plt.plot(Loss)
plt.xlabel("no. of epochs")
plt.ylabel("total loss")
plt.title("Loss")
plt.show()

In [None]:
plt.plot(acc)
plt.xlabel("no. of epochs")
plt.ylabel("total accuracy")
plt.title("Accuracy")
plt.show()