<a href="https://colab.research.google.com/github/Ahsan-folium/ai-intern-week04-deep-learnin/blob/main/fashionmnist_task_drive.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [14]:
# imports
import torch
import torch.nn as nn

# For optimization algorithms like SGD and Adam
import torch.optim as optim

# For datasets like Fashion-MNIST
import torchvision

# To transform data
import torchvision.transforms as transforms

## DATASET AND DATA

In [15]:
# load the dataset


# Transform: Convert images to tensors and normalize to range [-1, 1]
transform = transforms.Compose([
    transforms.ToTensor(), # Convert PIL images to PyTorch tensors
    transforms.Normalize((0.5,), (0.5,)) # Normalize: mean=0.5, std=0.5
])


In [16]:
# download training dataset and testing dataset

trainset = torchvision.datasets.FashionMNIST(
    root='./data' ,
    train=True ,
    download=True ,
    transform= transform

)

testset = torchvision.datasets.FashionMNIST(
    root='./data' ,
    train= False ,
    download=True ,
    transform= transform

)


100%|██████████| 26.4M/26.4M [00:01<00:00, 17.1MB/s]
100%|██████████| 29.5k/29.5k [00:00<00:00, 275kB/s]
100%|██████████| 4.42M/4.42M [00:00<00:00, 4.51MB/s]
100%|██████████| 5.15k/5.15k [00:00<00:00, 13.1MB/s]


**Notes for undertanding**

ToTensor() converts images from [0,255] to [0,1].

Normalize((0.5,), (0.5,)) scales data to roughly [-1,1], which helps training stability.

DataLoader creates mini-batches for faster training and supports shuffling.

Some typical choices:

32, 64, 128, 256

Must balance between:

Small batch size (like 16 or 32):

Large batch size (like 256, 512):

👉 So:
batch_size=64 means "process 64 images at a time". It’s chosen because it’s a good balance of speed, memory usage, and model performance.

In [27]:
trainloader = torch.utils.data.DataLoader(
    trainset,
    batch_size= 64,
    shuffle=True
)

testloader = torch.utils.data.DataLoader(
    testset,
    batch_size= 64,
    shuffle=False
)

## NEURAL NETWORK

**input size** is always accoridng to dataset , this dataset has 28*28 images

**output size** is determined by the number of classes in your classification problem.

Fashion-MNIST has 10 categories → output_size = 10.


**Hidden layers (what you experiment with)**



Number of hidden layers (1, 2, 3, …).

Size of hidden layers (64 neurons, 128, 256, etc.).

Activation functions (ReLU, Sigmoid, Tanh, …).

These are your design choices. Changing them changes how powerful or efficient your model is.

In [31]:
class FashionNN(nn.Module):
  def __init__(self , hidden_size , activation, input_size = 28*28  , output_size=10):
    super(FashionNN,self).__init__()

    # 2 fully connected layers , as we are implementing a feed forward NN
    self.fc1 = nn.Linear(input_size, hidden_size)
    self.fc2 = nn.Linear(hidden_size,output_size)


    # activation function : relu / sigmoid
    if activation == 'relu':
      self.activation = nn.ReLU()
    elif activation == 'sigmoid':
      self.activation = nn.Sigmoid()


    # forwad pass

  def forward(self, x):
       # Flatten the image (28x28) to a vector (784,)
    x = x.view(x.size(0),-1)

    x = self.fc1(x)
    x = self.activation(x)
    x = self.fc2(x)
    return x


**notes**

fully connected layers (nn.Linear) expect 1D vectors, not 2D images. since we have images we have to flatten them

So if your input is:

[64, 1, 28, 28]   # 64 images in the batch


After x.view(x.size(0), -1) →

[64, 784]         # 64 flattened vectors, each of length 784

**First i made them without functions , but now i m converting to functions so i can play with different optimizers and activation functions and get the most accurate one**

## Create Model

In [39]:
# NOW WE CREATE THE MODEL , define loss and optimizer

def create_model(hidden_size, activation , optimizer , lr):
  # relu / sigmoid
  model = FashionNN(
    hidden_size = hidden_size ,
    activation= activation
  )

  # Loss function
  criterion = nn.CrossEntropyLoss() # Good for multi-class classification
  # CrossEntropyLoss combines LogSoftmax + NLLLoss, perfect for classification.


  # Choose optimizer: SGD or Adam
  optimizer_choice = optimizer

  if optimizer_choice == 'sgd':
    optimizer = optim.SGD(model.parameters(), lr=lr)
  elif optimizer_choice == 'adam':
    optimizer = optim.Adam(model.parameters(), lr=lr)

  return model, criterion, optimizer


## Train the network

In [40]:
def train_model(model, train_dataset ,criterion, optimizer, epochs):

  epochs = 5  # Number of times the model will see the entire dataset

  for epoch in range(epochs):
    running_loss = 0.0
    for images, labels in train_dataset:
        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass
        loss.backward()

        # Update weights
        optimizer.step()

        running_loss += loss.item()

    print(f"Epoch [{epoch+1}/{epochs}], Loss: {running_loss/len(trainloader):.4f}")


## Test the network

**notes**
outputs.data → raw tensor values (we don’t need gradient history).

torch.max(tensor, dim=1) → find the maximum along dimension 1 (the class scores).

Returns (values, indices) → values = max score, indices = where it occurred.

_ means “ignore values, we only care about indices”.

predicted shape = [batch_size], containing predicted class IDs.


labels.size(0) = batch size (number of images in this batch).

Add to total.

In [41]:
def test_model(model,test_dataset):

  correct = 0   # to keep track of right number of predictions
  total = 0

  # Normally, PyTorch tracks every operation for backpropagation.
  # But in testing, we don’t update weights → we don’t need gradients.

  with torch.no_grad(): # No need to compute gradients
    for images, labels in testloader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1) # Index of max logit = predicted class
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        accuracy = 100 * correct / total
  print(f"Test Accuracy: {accuracy:.2f}%")
  return accuracy


## Model 1

In [42]:
# create model
model, criterion, optimizer = create_model(
    hidden_size= 128 ,
    activation= 'relu' ,
    optimizer= 'adam' ,
    lr= 0.001
    )

# train

train_model(model, trainloader, criterion, optimizer, epochs=5)

# test
test_model(model,testloader)

Epoch [1/5], Loss: 0.5017
Epoch [2/5], Loss: 0.3808
Epoch [3/5], Loss: 0.3424
Epoch [4/5], Loss: 0.3149
Epoch [5/5], Loss: 0.2996
Test Accuracy: 86.44%


86.44

## Model 2

In [43]:
# create model
model2, criterion2, optimizer2 = create_model(
    hidden_size= 256 ,
    activation= 'sigmoid' ,
    optimizer= 'adam' ,
    lr= 0.001
    )

# train

train_model(model2, trainloader, criterion2, optimizer2, epochs=5)

# test
test_model(model2,testloader)

Epoch [1/5], Loss: 0.5401
Epoch [2/5], Loss: 0.3886
Epoch [3/5], Loss: 0.3490
Epoch [4/5], Loss: 0.3207
Epoch [5/5], Loss: 0.3001
Test Accuracy: 87.11%


87.11

## Model 3

In [45]:
# create model
model3, criterion3, optimizer3 = create_model(
    hidden_size= 128 ,
    activation= 'sigmoid' ,
    optimizer= 'sgd' ,
    lr= 0.01
    )

# train

train_model(model3, trainloader, criterion3, optimizer3, epochs=5)

# test
test_model(model3,testloader)

Epoch [1/5], Loss: 1.5136
Epoch [2/5], Loss: 0.9009
Epoch [3/5], Loss: 0.7321
Epoch [4/5], Loss: 0.6558
Epoch [5/5], Loss: 0.6087
Test Accuracy: 78.32%


78.32

In [46]:
# as we can see model 2 has best accuracy , so we save the best model
# Save weights + hyperparams in one checkpoint
torch.save({
    "hidden_size": 256,
    "activation": "sigmoid",
    "optimizer": "adam",
    "state_dict": model2.state_dict()
}, "model2.pt")

print("Model saved as model2.pt")


Model saved as model2.pt


In [47]:
!ls -lh /content


total 808K
drwxr-xr-x 3 root root 4.0K Aug 20 07:39 data
-rw-r--r-- 1 root root 798K Aug 20 09:43 model2.pt
drwxr-xr-x 1 root root 4.0K Aug 18 13:38 sample_data
