# Overview

About Tensors see [What are tensors?](https://www.kaggle.com/code/aisuko/what-are-tensors)

We are going to use the FashionMNIST dataset to train a new simple model and optimize it using PyTorch.

Pytorch has two primitives to work with data:

* `torch.utils.data.DataLoader`
* `torch.utils.data.Daset.Dataset`

They stores the samples and their corresponding labels, and `DataLoader` wraps an iterable around the `Dataset`.

## TORCH.UTILS.DATA

`torch.utils.data.DataLoader` class is the the heart of PyTorch data loading utility. It supports for:

* map-style and iterable-style datasets
* customizing data loading order
* automatic batching
* single-and multi-process data loading
* automatic memory pining

More detail in the notebook [DataLoader in PyTorch](https://pytorch.org/docs/stable/data.html).

PyTorch offers domain-specific libraries such as:
* TorchText
* TorchVision
* TorchAudio

All of which include datasets. We wil be using a TorchVision dataset. The list of `torchvision.datasets` module contains in [here](https://pytorch.org/vision/stable/datasets.html).

# Download the dataset

Every TorchVision `Dataset` includes two arguments:
* `transform`
* `target_transform`

to modify the samples and lables respectively.

In [1]:
import torch
from torchvision import datasets
from torchvision.transforms import ToTensor

training_data=datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data=datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:05<00:00, 4752474.83it/s]


Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 197469.29it/s]


Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:01<00:00, 3666688.63it/s]


Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 3137956.26it/s]

Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw






# Loading the dataset

We pass the Dataset as an argument to DataLoader. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, each element in the dataloader iterable will return a batch of 64 features and labels.

In [2]:
from torch.utils.data import DataLoader

batch_size=64

train_dataloader=DataLoader(training_data, batch_size=batch_size)
test_dataloader=DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N,C,H,W]:{X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")

Shape of X [N,C,H,W]:torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64
Shape of X [N,C,H,W]:torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64
Shape of X [N,C,H,W]:torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64
Shape of X [N,C,H,W]:torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64
Shape of X [N,C,H,W]:torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64
Shape of X [N,C,H,W]:torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64
Shape of X [N,C,H,W]:torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64
Shape of X [N,C,H,W]:torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64
Shape of X [N,C,H,W]:torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64
Shape of X [N,C,H,W]:torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64
Shape of X [N,C,H,W]:torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64
Shape of X

# Define a Model

To define a neural network in PyTorch, we create a class that inherits from [nn.Module](). We define the layers of the network in the `__init__` function and specify how data will pass through the network in the `forward` function. To accelerate operations in the neural network, we move it to the GPU or MPS if avaliable.

In [3]:
device=("cuda" if torch.cuda.is_available() else "cpu")
device

'cuda'

In [4]:
from torch import nn

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten=nn.Flatten()
        self.linear_relu_stack=nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512,512),
            nn.ReLU(),
            nn.Linear(512,10)
        )
    
    def forward(self, x):
        x=self.flatten(x)
        logits=self.linear_relu_stack(x)
        return logits
model=NeuralNetwork().to(device)
model

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

# Optimizing the Model Parameters

To train a model, we need a [loss function]() and [optimizer]().

In [5]:
loss_fn=nn.CrossEntropyLoss()
optimizer=torch.optim.SGD(model.parameters(), lr=1e-3)

# Training the model

In a single training loop, the model maskes predictions on the training dataset(fed to it in batches), and backpropagates the prediction error to adjust the model's parameters

In [6]:
def train(dataloader, model, loss_fn, optimizer):
    size=len(dataloader.dataset)
    model.train()
    for batch, (X,y) in enumerate(dataloader):
        X,y =X.to(device), y.to(device)
        
        # compute prediction error
        pred=model(X)
        loss=loss_fn(pred,y)

        # backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        
        if batch%100==0:
            loss,current=loss.item(), (batch+1) *len(X)
            print(f"loss:{loss:>7f} [{current:>5d}/{size:>5d}]")
            

# Evaluate the model

In [7]:
def test(dataloader, model, loss_fn):
    size=len(dataloader.dataset)
    num_batches=len(dataloader)
    model.eval()
    test_loss, correct=0,0
    with torch.no_grad():
        for X,y in dataloader:
            X,y =X.to(device), y.to(device)
            pred=model(X)
            test_loss+=loss_fn(pred, y).item()
            correct+=(pred.argmax(1)==y).type(torch.float).sum().item()
    test_loss/=num_batches
    correct/=size
    print(f"Test Error:\n Accuracy: {(100*correct):>0.1f}%, Avg loss:{test_loss:>8f} \n")

The training process is conducted over several iterations(epochs). During each epoch, the model learns parameters to make better predictions. We print the model's accuracy and loss at each spoch; we'd like to see the accuracy increase and the loss decrease with every epoch.

In [8]:
epochs=5

for t in range(epochs):
    print(f"Epoch {t+1}\n------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
------------
loss:2.309162 [   64/60000]
loss:2.287271 [ 6464/60000]
loss:2.270671 [12864/60000]
loss:2.271568 [19264/60000]
loss:2.250602 [25664/60000]
loss:2.226507 [32064/60000]
loss:2.242480 [38464/60000]
loss:2.204216 [44864/60000]
loss:2.207146 [51264/60000]
loss:2.180778 [57664/60000]
Test Error:
 Accuracy: 46.2%, Avg loss:2.166024 

Epoch 2
------------
loss:2.184081 [   64/60000]
loss:2.160520 [ 6464/60000]
loss:2.106382 [12864/60000]
loss:2.125067 [19264/60000]
loss:2.069165 [25664/60000]
loss:2.023327 [32064/60000]
loss:2.053430 [38464/60000]
loss:1.969500 [44864/60000]
loss:1.982460 [51264/60000]
loss:1.914750 [57664/60000]
Test Error:
 Accuracy: 53.4%, Avg loss:1.900546 

Epoch 3
------------
loss:1.942873 [   64/60000]
loss:1.900106 [ 6464/60000]
loss:1.780484 [12864/60000]
loss:1.820999 [19264/60000]
loss:1.712080 [25664/60000]
loss:1.668019 [32064/60000]
loss:1.694818 [38464/60000]
loss:1.585313 [44864/60000]
loss:1.619307 [51264/60000]
loss:1.514308 [57664/6000

In [9]:
torch.save(model.state_dict(),"simple_model.pth")

In [10]:
!ls

__notebook__.ipynb  data  simple_model.pth


# Inference

The process of loading a model includes re-creating the model structure and loading the state dictionary into it.

In [11]:
import gc

del model
gc.collect()
torch.cuda.empty_cache()

In [12]:
model=NeuralNetwork().to(device)
model.load_state_dict(torch.load("simple_model.pth"))
model

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

In [13]:
classes=[
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

model.eval()
x,y=test_data[0][0], test_data[0][1]
with torch.no_grad():
    x=x.to(device)
    pred=model(x)
    predicted,actual=classes[pred[0].argmax(0)], classes[y]
    print(f"Predicted: {predicted}, Actual: {actual}")

Predicted: Ankle boot, Actual: Ankle boot
