# Fashion MNIST Image Classification with PyTorch

## Overview
The [IMPROVE Project](https://jdacs4c-improve.github.io/docs/index.html) focus on data standardization/quality/integration for a variety of cancer drug response problems. The intention with this example is to show how to modify such an example to comply with the [IMPROVE library:](https://github.com/JDACS4C-IMPROVE/IMPROVE). See the [Unified Interface Documentation](https://jdacs4c-improve.github.io/docs/content/unified_interface.html) 

Note: This model does not use any of the cancer data, it simply classifies images of clothing items from the Fashion MNIST dataset into 10 categories:

T-shirt/top
Trouser
Pullover
Dress
Coat
Sandal
Shirt
Sneaker
Bag
Ankle boot

## Requirements
Break the model's workflow into three primary steps:

Preprocessing
Training
Inference

## Preprocessing

Load the Fashion MNIST dataset using PyTorch's torchvision.datasets.FashionMNIST class.
Split the data into training and validation sets.
Normalize pixel values to a range of [0, 1] and ensure them to have a mean of 0.5 and a standard deviation of 0.5.
This helps stabilize the training process and improve model convergence.
Use torchvision.datasets.FashionMNIST with ``Download=True`` to download the dataset locally.

## Training
This section assumes that the data has been preprocessed and is available for training.
We first define a neural network architecture suitable for image classification (e.g., a convolutional neural network).
Instantiate the model with PyTorch's torch.nn modules.
Select a loss function: cross-entropy loss. Choose the optimizer SGD to update model parameters during training.
Iterate through training data in batches for a fixed number of epochs:
Feed a batch of images and labels to the model.
Calculate the loss based on the model's predictions and true labels.
Backpropagate the loss to update model parameters using the optimizer.

## Inference

Load the trained model's state.
For new, unseen images:
Preprocess them as done during training.
Pass the preprocessed images through the model to compute predictions and compare with the ground truth labels.
## Implementation Details

Refer to the code for specific model architecture, hyperparameter choices, and implementation choices.

In [9]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# PREPARE DATA

# Define transformations for data preprocessing
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Load Fashion MNIST dataset
# NOTE: train=True for trainset and train=False for testset; dowload=True for both.
trainset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
testset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)

In [12]:
# Part 2: Model Training

# Create data loaders using the trainset and testset in Part 1, with batch size 64 and shuffle=True for trainset
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)

# Define the neural network model
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Create an instance of the model
model = Net()

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

# Train the model
for epoch in range(10):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data

        optimizer.zero_grad()

        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        # Print loss every 200 mini-batches
        # if i % 200 == 199:
        #     print(f'Epoch: {epoch + 1}, Batch: {i + 1}, Loss: {running_loss / 200}')
        #     running_loss = 0.0
            
    loss = running_loss / len(trainloader)
    print(f'-- Epoch: {epoch + 1}, Loss: {loss}')
    
print('Training finished.')

-- Epoch: 1, Loss: 1.027475444969338
-- Epoch: 2, Loss: 0.5553531648317126
-- Epoch: 3, Loss: 0.4895101859371291
-- Epoch: 4, Loss: 0.45323895183262797
-- Epoch: 5, Loss: 0.429904751845006
-- Epoch: 6, Loss: 0.41118921979721673
-- Epoch: 7, Loss: 0.3971544050458652
-- Epoch: 8, Loss: 0.3844668244454525
-- Epoch: 9, Loss: 0.37378820707040555
-- Epoch: 10, Loss: 0.3639335038978408
Training finished.


In [13]:
# Part 3: Model Inferencing

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy on test set: {(100 * correct / total):.2f}%')


Accuracy on test set: 85.62%
