<a href="https://www.kaggle.com/code/vedantpancholi/pytorch?scriptVersionId=216242030" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Fashion-MNIST 
is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255. The training and test data sets have 785 columns.

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

We load the FashionMNIST Dataset with the following **parameters**:
- root is the path where the train/test data is stored,
- train specifies training or test dataset,
- download=True downloads the data from the internet if it’s not available at root.
- transform and target_transform specify the feature and label transformations

In [2]:

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:01<00:00, 15001608.43it/s]


Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 263507.18it/s]


Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:00<00:00, 4997300.34it/s]


Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 8352911.80it/s]

Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw






We pass the Dataset as an argument to DataLoader. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.

**Batch size** refers to the number of samples (data points) that are passed through the model at one time during training or evaluation. It determines how many data samples are processed together in one forward/backward pass through the neural network.

In [3]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


X: The input batch with shape [N, C, H, W]:

* N: Batch size (e.g., 64).
* C: Number of channels (e.g., 3 for RGB images).
* H: Height of the image.
* W: Width of the image.

y: The corresponding labels for the batch.
break: Stops after the first batch, displaying the shape and type of X and y.

# Creating Models
To define a neural network in PyTorch, we create a class that inherits from nn.Module. We define the layers of the network in the __init__ function and specify how data will pass through the network in the forward function. To accelerate operations in the neural network, we move it to the GPU or MPS if available.

In [4]:
# model = NeuralNetwork().to(device)
# print(model)

# Explanation of Layers
**Flatten:**
Converts 2D images into a 1D vector to pass through fully connected layers.

**Linear:**
Fully connected layers that perform weighted sum operations followed by bias addition.

ReLU:
Applies the ReLU activation function: 
𝑓
(
𝑥
)
=
max
⁡
(
0
,
𝑥
)
f(x)=max(0,x).
Introduces non-linearity to help the model learn complex patterns.

**Sequential:**
Groups layers into a single block for simplicity.


In [5]:
# Get cpu, gpu or mps device for training.
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        # print(logists)
        return logits

model = NeuralNetwork().to(device)
print(model)

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Using cuda device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)
Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[-0.0097, -0.0138,  0.0351,  ..., -0.0088, -0.0273,  0.0213],
        [ 0.0322, -0.0218,  0.0122,  ...,  0.0155, -0.0179, -0.0007]],
       device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0207,  0.0300], device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0226,  0.0273, -0.0098,  ...,  0.0163, -0.0167, -0.0275],
        [-0.0071,  0.0094, -0.0301,  ...,  0.0324,  0.0409,  0.0329]],
       device='cuda:0', grad_fn=<Sli

# Optimizing the Model Parameters
**nn.CrossEntropyLoss**
* This criterion computes the cross entropy loss between input logits and target.


For more info of Optimizer is here [https://pytorch.org/docs/stable/optim.html](http://)

In [6]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In [7]:
print(optimizer)

SGD (
Parameter Group 0
    dampening: 0
    differentiable: False
    foreach: None
    fused: None
    lr: 0.001
    maximize: False
    momentum: 0
    nesterov: False
    weight_decay: 0
)


**optimizer.zero_grad()** is a crucial step in the training loop when using optimizers in PyTorch. It resets the gradients of all model parameters before computing the gradients during the backward pass.

**optimizer.step()** is called to update the model’s parameters based on the gradients stored in the .grad attributes. The optimizer adjusts the parameters to minimize the loss, typically by moving them in the opposite direction of the gradients.


Instead of SGD we can use 
**Adam Optimizer**:
Adam (Adaptive Moment Estimation) is an extension of SGD that computes adaptive learning rates for each parameter. It uses both the first moment (mean) and second moment (uncentered variance) of the gradients to scale the learning rate for each parameter.

> optimizer = torch.optim.Adam(model.parameters(), lr=0.001)


In [8]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

We also check the model’s performance against the test dataset to ensure it is learning.

**torch.argmax(..., axis=1):** 
- This computes the index of the class with the highest predicted score (logit) for each test sample.
- For binary classification, it will return 0 or 1 based on the higher score.
axis=1 means we are applying this operation along the class dimension.

In [9]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [10]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.307847  [   64/60000]
loss: 2.292337  [ 6464/60000]
loss: 2.273041  [12864/60000]
loss: 2.268509  [19264/60000]
loss: 2.250344  [25664/60000]
loss: 2.225611  [32064/60000]
loss: 2.238950  [38464/60000]
loss: 2.202011  [44864/60000]
loss: 2.203968  [51264/60000]
loss: 2.171115  [57664/60000]
Test Error: 
 Accuracy: 44.1%, Avg loss: 2.165017 

Epoch 2
-------------------------------
loss: 2.176085  [   64/60000]
loss: 2.165334  [ 6464/60000]
loss: 2.108520  [12864/60000]
loss: 2.128055  [19264/60000]
loss: 2.069658  [25664/60000]
loss: 2.016087  [32064/60000]
loss: 2.049450  [38464/60000]
loss: 1.967141  [44864/60000]
loss: 1.978970  [51264/60000]
loss: 1.899716  [57664/60000]
Test Error: 
 Accuracy: 52.3%, Avg loss: 1.899212 

Epoch 3
-------------------------------
loss: 1.927643  [   64/60000]
loss: 1.899173  [ 6464/60000]
loss: 1.785386  [12864/60000]
loss: 1.836100  [19264/60000]
loss: 1.711725  [25664/60000]
loss: 1.660042  [32064/600

# Saving Models
A common way to save a model is to serialize the internal state dictionary (containing the model parameters).

In [11]:
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")

Saved PyTorch Model State to model.pth


# Loading Models
The process for loading a model includes re-creating the model structure and loading the state dictionary into it.

In [12]:
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("model.pth", weights_only=True))

<All keys matched successfully>

**This model can now be used to make predictions.**

In [13]:
classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]


model.eval()
x, y = test_data[0][0], test_data[0][1]
with torch.no_grad():
    x = x.to(device)
    pred = model(x)
    predicted, actual = classes[pred[0].argmax(0)], classes[y]
    print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Ankle boot", Actual: "Ankle boot"


In [14]:
state_dict = torch.load("model.pth")
print(state_dict)


OrderedDict([('linear_relu_stack.0.weight', tensor([[-0.0097, -0.0138,  0.0351,  ..., -0.0088, -0.0273,  0.0213],
        [ 0.0322, -0.0218,  0.0122,  ...,  0.0155, -0.0179, -0.0007],
        [ 0.0265, -0.0068,  0.0016,  ...,  0.0153,  0.0203,  0.0025],
        ...,
        [ 0.0160,  0.0304,  0.0331,  ..., -0.0088, -0.0330, -0.0182],
        [ 0.0017, -0.0033,  0.0025,  ..., -0.0205, -0.0357, -0.0047],
        [ 0.0288,  0.0306,  0.0072,  ...,  0.0270, -0.0015,  0.0019]],
       device='cuda:0')), ('linear_relu_stack.0.bias', tensor([-0.0222,  0.0359,  0.0316, -0.0355,  0.0376,  0.0043, -0.0192,  0.0211,
        -0.0071,  0.0285, -0.0095, -0.0009,  0.0282,  0.0181,  0.0244,  0.0305,
         0.0143,  0.0087, -0.0075, -0.0117, -0.0039,  0.0259,  0.0195,  0.0197,
         0.0352,  0.0257, -0.0059, -0.0174, -0.0286,  0.0248, -0.0151, -0.0289,
        -0.0239, -0.0283,  0.0151,  0.0168, -0.0121,  0.0258, -0.0171,  0.0312,
        -0.0183,  0.0172,  0.0282, -0.0312,  0.0099, -0.0013,  0.01

  state_dict = torch.load("model.pth")
