# PyTorch Deep Dive: Image Classification

Regression is cool, but AI is famous for **Vision**. 

In this notebook, we will teach a computer to "see" by classifying images (FashionMNIST). 

## Learning Objectives
- **The Vocabulary**: What is a "Pixel", "Channel", "Class", "Logits", and "Softmax"?
- **The Intuition**: How a computer sees an image (Grid of numbers).
- **The Practice**: Building a classifier for 10 types of clothing.
- **The Visual**: Seeing the model's predictions.


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

torch.manual_seed(42)

## Part 1: The Vocabulary (Definitions First)

Computer Vision has its own language. Let's learn it.

### 1. Pixel
- The smallest dot in an image.
- Represented as a number from 0 (Black) to 255 (White).
- In PyTorch, we usually normalize this to be between 0 and 1.

### 2. Channel
- The color layers of an image.
- **Grayscale**: 1 Channel (Brightness).
- **Color (RGB)**: 3 Channels (Red, Green, Blue).

### 3. Class
- The category we want to predict.
- Example: "T-Shirt", "Trouser", "Sneaker".
- We map these to numbers: 0, 1, 2...

### 4. Logits
- The raw, unnormalized scores coming out of the last layer of the model.
- They can be any number (e.g., -5.2, 12.8).
- Higher score = Higher confidence.

### 5. Softmax
- A function that turns Logits into Probabilities (0 to 1).
- It makes sure all probabilities add up to 100%.

## Part 2: The Intuition (The Grid)

To you, an image is a picture of a shoe.
To a computer, an image is just a **Grid of Numbers**.

If you have a 28x28 image, you have 784 numbers.
The model's job is to look at these 784 numbers and say "That pattern looks like a Shoe".

## Part 3: The Data (FashionMNIST)

We will use FashionMNIST. It's like "Hello World" for vision, but harder than digits.
- Images: 28x28 Grayscale.
- Classes: 10 (T-shirt, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Boot).

In [None]:
# Download Data
transform = transforms.Compose([transforms.ToTensor()])

trainset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

testset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)

# Visualize one batch
dataiter = iter(trainloader)
images, labels = next(dataiter)

print(f"Image Batch Shape: {images.shape} (Batch, Channel, Height, Width)")
print(f"Label Batch Shape: {labels.shape}")

# Show first image
plt.imshow(images[0].squeeze(), cmap='gray')
plt.title(f"Label: {labels[0].item()}")
plt.show()

## Part 4: The Model (Flattening)

Since we are using Linear Layers (for now), we need to **Flatten** the 2D image (28x28) into a 1D vector (784).

Imagine taking the image and cutting it into strips, then laying them end-to-end.

In [None]:
class VisionNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.layer1 = nn.Linear(28 * 28, 128) # 784 -> 128
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(128, 10)      # 128 -> 10 Classes

    def forward(self, x):
        x = self.flatten(x)
        x = self.layer1(x)
        x = self.relu(x)
        x = self.layer2(x) # Output Logits
        return x

model = VisionNet()
criterion = nn.CrossEntropyLoss() # Combines Softmax + NLLLoss
optimizer = optim.Adam(model.parameters(), lr=0.001)

## Part 5: Training (The Loop)

Same loop, but now we iterate over the `trainloader`.

In [None]:
epochs = 5

for epoch in range(epochs):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
    
    print(f"Epoch {epoch}: Loss {running_loss / len(trainloader):.4f}")

print("Training Complete!")

## Part 6: Evaluation (Accuracy)

How many did we get right?

In [None]:
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Accuracy: {100 * correct / total}%")

## Summary Checklist

1. **Pixel** = Input number (0-1).
2. **Flatten** = Turning a 2D grid into a 1D vector.
3. **Logits** = Raw scores from the model.
4. **CrossEntropyLoss** = The standard loss for classification.

You have now built a computer vision model!