# Yoga: Maven (Deep Neural Networks)


> Objectives:
>
> - Explore the effects of data on model performance
> - Experiment with training regimes
> - Experiment with model architectures

<small>🚫🤖 AI code generation is NOT recommended for this notebook.</small>

<small>🙋 Have a suggestion for how to improve this file? Please open an issue for [this repo on GitHub](https://github.com/orgs/deepatlasai/repositories). Create a Help Desk ticket in Discord for time-sensitive technical issues.</small>

## Standard Deep Atlas Exercise Set Up

- [ ] Ensure you are using the coursework Pipenv environment and kernel ([instructions](../SETUP.md))
- [ ] Apply the standard Deep Atlas environment setup process by running this cell:

In [2]:
import sys, os
sys.path.insert(0, os.path.join('..', 'includes'))
import deep_atlas

### 🚦 Checkpoint: Start

- [ ] Run this cell to record your start time:

In [3]:
deep_atlas.log_start_time()

Started at: 2025-05-08T14:48:40.282964
🚀 Success! Get started...


## Training for Fashion-MNIST

Use your new deep learning knowhow to adapt your MNIST training workflow to work for [Fashion -MNIST](https://pytorch.org/vision/stable/generated/torchvision.datasets.FashionMNIST.html). 

Fashion-MNIST contains a set of images processed like MNIST: 70,000 28×28px grayscale images representing 10 classes. 

How does the architecture designed for MNIST perform when trained for Fashion-MNIST?

In [5]:
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torchinfo import summary

  from .autonotebook import tqdm as notebook_tqdm


In [6]:
train_batch_size = 64
test_batch_size = 64

transform = transforms.Compose(
    [
        transforms.ToTensor(),
        transforms.Normalize((0.5,), (0.5,)),
    ]
)
trainset = datasets.FashionMNIST(
    "./downloads/Fashionmnist",
    download=True,
    train=True,
    transform=transform,
)
testset = datasets.FashionMNIST(
    "./downloads/Fashionmnist",
    download=True,
    train=False,
    transform=transform,
)

trainloader = torch.utils.data.DataLoader(
    trainset, batch_size=train_batch_size, shuffle=True
)
testloader = torch.utils.data.DataLoader(
    testset, batch_size=test_batch_size, shuffle=True
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./downloads/Fashionmnist/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:03<00:00, 7559704.99it/s] 


Extracting ./downloads/Fashionmnist/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./downloads/Fashionmnist/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./downloads/Fashionmnist/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 179747.10it/s]


Extracting ./downloads/Fashionmnist/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./downloads/Fashionmnist/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./downloads/Fashionmnist/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:01<00:00, 3058000.77it/s]


Extracting ./downloads/Fashionmnist/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./downloads/Fashionmnist/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./downloads/Fashionmnist/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 18990569.03it/s]

Extracting ./downloads/Fashionmnist/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./downloads/Fashionmnist/FashionMNIST/raw






In [None]:
# Set the device to run on: GPU, MPS, or CPU
device = torch.device(
    "cpu"
    # if torch.cuda.is_available()
    # else "mps"
    # if torch.backends.mps.is_available()
    # else "cpu"
)
print(f"Using {device} device")

In [26]:


class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear_relu_stack = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Flatten(),  # Move flatten here after conv layers
            nn.Linear(128 * 3 * 3, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        logits = self.linear_relu_stack(x)
        return logits

# Create model instance
model = SimpleNN()
# Move model to device if available, otherwise keep on CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Move model to device and save it
model = model.to(device)
torch.save(model, "./downloads/Fashionmnist.pth")
print(f"Model moved to {device} and saved to ./downloads/Fashionmnist.pth")

# Print model summary
summary(model, input_size=(train_batch_size, 1, 28, 28))


Model moved to cpu and saved to ./downloads/Fashionmnist.pth


Layer (type:depth-idx)                   Output Shape              Param #
SimpleNN                                 [64, 10]                  --
├─Sequential: 1-1                        [64, 10]                  --
│    └─Conv2d: 2-1                       [64, 32, 28, 28]          320
│    └─ReLU: 2-2                         [64, 32, 28, 28]          --
│    └─MaxPool2d: 2-3                    [64, 32, 14, 14]          --
│    └─Conv2d: 2-4                       [64, 64, 14, 14]          18,496
│    └─ReLU: 2-5                         [64, 64, 14, 14]          --
│    └─MaxPool2d: 2-6                    [64, 64, 7, 7]            --
│    └─Conv2d: 2-7                       [64, 128, 7, 7]           73,856
│    └─ReLU: 2-8                         [64, 128, 7, 7]           --
│    └─MaxPool2d: 2-9                    [64, 128, 3, 3]           --
│    └─Flatten: 2-10                     [64, 1152]                --
│    └─Linear: 2-11                      [64, 512]                 590,336
│

In [27]:
model_v1 = torch.load("./downloads/Fashionmnist.pth")

In [31]:
model_v1.eval()

correct = 0


with torch.no_grad():
    for i, (images, labels) in enumerate(testloader):
        # Remove the flattening of images since model expects 4D input
        images = images.to(device)
        outputs = model_v1(images)
        _, predicted = torch.max(outputs, 1)
        # compute the accuracy of the model
        correct += (predicted == labels.to(device)).sum().item() / labels.shape[0]


print(f"Accuracy of model_v1 on test data: {100 * correct / len(testloader):.2f}%")

Accuracy of model_v1 on test data: 9.43%


## Training a single model on both sets

Is it possible for a single neural network to learn how to classify images for MNIST and Fashion-MNIST? 

Try training a single model with the following constraints: 

- Train the model on MNIST and Fashion-MNIST training sets.
- Retain the same number of output classes (both datasets have 10 possible classes).
- Try adjusting the size of the layers and track the effects of the changes on model performance. 

## Artifact and metrics logging

As you work, make sure to document your modeling experiments by logging your inputs (code, data, parameter values) and outputs (models, metrics).  
* You can use Weights and Biases or any other system of your choice, but try to log enough information that a classmate could understand and reproduce your modeling experiments by viewing only your logs, not your notebook.

From now on, you should document all your modeling experiments, even if we don't remind you.

In [38]:
import wandb
losses = []
epochs = 10
# Initialize wandb
wandb.init(
    project="mnist-fashion-mnist-classification",
    config={
        "architecture": "SimpleNN",
        "dataset": "MNIST + Fashion-MNIST",
        "epochs": 10,
        "batch_size": 64,
        "learning_rate": 0.001,
        "optimizer": "Adam"
    }
)

# Log model architecture
wandb.watch(model_v1)

# During training, log metrics
for epoch in range(epochs):
    # Training loop
    train_loss = train_one_epoch(model_v1, trainloader, criterion, optimizer)
    
    # Validation loop
    val_loss, val_acc = validate(model_v1, valloader, criterion)
    
    # Log metrics
    wandb.log({
        "train_loss": train_loss,
        "val_loss": val_loss,
        "val_accuracy": val_acc,
        "epoch": epoch
    })

# Log final test accuracy
wandb.log({"test_accuracy": 100 * correct / len(testloader)})

# Save model artifact
wandb.save("model_v1.pth")

# Close wandb run
wandb.finish()


NameError: name 'train_one_epoch' is not defined

### 🚦 Checkpoint: Stop

- [ ] Uncomment this code
- [ ] Complete the feedback form
- [ ] Run the cell to log your responses and record your stop time:

In [35]:
deep_atlas.log_feedback(
    {
        # How long were you actively focused on this section? (HH:MM)
        "active_time": "Still going over it",
        # Did you feel finished with this section (Yes/No):
        "finished": "not really, I need to go back over a lot.",
        # How much did you enjoy this section? (1–5)
        "enjoyment": "It was fun",
        # How useful was this section? (1–5)
        "usefulness": "4",
        # Did you skip any steps?
        "skipped_steps": "I made a crappy model for fashion, couldn't get it to work well and haven't made the second yet",
        # Any obvious opportunities for improvement?
        "suggestions": "I apparently have many opportunites for improvement here lol",
    }
)
deep_atlas.log_stop_time()

📝 Feedback logged!
Stopped at: 2025-05-08T15:42:44.244821
🎉 Thanks! You're all done.


## You did it!