# Day 13: Image Classification with PyTorch – Dataset, Display, and Augmentation

This document summarizes the foundational concepts and implementation steps for building an image classification pipeline using PyTorch. The focus is on handling image data, loading structured datasets, visualizing samples, and applying data augmentation to improve model robustness.

---

## 1. Introduction to Image Data

Digital images are composed of **pixels** (picture elements), which are the smallest units of visual information.

- **Grayscale images**: Each pixel is a single integer between 0 (black) and 255 (white).
- **Color images**: Each pixel is represented by three integers for **Red**, **Green**, and **Blue** channels (RGB). For example:
  - Pixel `[52, 171, 235]` represents a specific shade of blue.

Understanding pixel structure is essential for preprocessing and model input formatting.

---

## 2. Cloud Type Classification Dataset

The project uses the **Cloud Type Classification** dataset from Kaggle:
[Cloud Type Classification Dataset](https://www.kaggle.com/competitions/cloud-type-classification2/data)

![image.png](attachment:b19efbfe-64f8-42ea-890a-ab4b06a1a350.png)

- **Directory structure**:
  - `cloud_train/` and `cloud_test/` folders
  - Each contains **seven subfolders**, one for each cloud type
  - Each subfolder contains `.jpg` images representing that class

This structure is compatible with PyTorch’s `ImageFolder` utility.

---

## 3. Loading Images with PyTorch

To load and preprocess images:

- Use `ImageFolder` from `torchvision.datasets` to create a labeled dataset.
- Apply transformations using `transforms.Compose`:
  - `ToTensor()`: Converts image to a PyTorch tensor
  - `Resize((128, 128))`: Standardizes image dimensions

This ensures consistent input size and format for the model.

---

## 4. Displaying Image Samples

Once loaded, images have the shape:  
`[batch_size, channels, height, width]` → `[1, 3, 128, 128]`

To visualize an image using `matplotlib`:

- Use `squeeze()` to remove the batch dimension
- Use `permute(1, 2, 0)` to rearrange dimensions to `[height, width, channels]`
- Call `plt.imshow()` followed by `plt.show()`

This step is crucial for verifying data integrity and understanding input structure.

---

## 5. Data Augmentation Techniques

Data augmentation increases dataset diversity and helps prevent overfitting.

Common transformations include:

- `RandomHorizontalFlip()`: Flips images horizontally
- `RandomRotation(degrees=(0, 45))`: Rotates images randomly within a specified range

Benefits of augmentation:

- Simulates real-world distortions
- Improves model generalization
- Reduces reliance on specific pixel patterns

Augmentation is applied during dataset loading and is only used for training data.

---

## 6. Summary of Key Concepts

| Concept               | Description                                           |
|-----------------------|-------------------------------------------------------|
| Pixels                | Fundamental units of image data                      |
| RGB Channels          | Represent color intensity per pixel                  |
| ImageFolder           | Loads structured image datasets with labels          |
| ToTensor + Resize     | Converts and standardizes image input                |
| Squeeze + Permute     | Prepares image for visualization                     |
| Data Augmentation     | Adds variability to training data                    |

---

## 7. Final Notes

Today’s session laid the groundwork for building image classifiers in PyTorch. By mastering image loading, preprocessing, and augmentation, you’re now equipped to train models that handle real-world visual data with robustness and precision.



In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt


In [None]:
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5]*3, std=[0.5]*3)
])

train_dataset = datasets.ImageFolder(root='path_to_cloud_images/train', transform=transform)
val_dataset = datasets.ImageFolder(root='path_to_cloud_images/val', transform=transform)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32)


In [None]:
class SimpleCNN(nn.Module):
    def __init__(self, num_classes):
        super(SimpleCNN, self).__init__()
        self.net = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Flatten(),
            nn.Linear(64 * 32 * 32, 128),
            nn.ReLU(),
            nn.Linear(128, num_classes)
        )

    def forward(self, x):
        return self.net(x)

model = SimpleCNN(num_classes=len(train_dataset.classes))




In [None]:
for epoch in range(10):
    model.train()
    total_loss = 0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    print(f"Epoch {epoch+1}, Loss: {total_loss:.4f}")


In [None]:
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in val_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Validation Accuracy: {100 * correct / total:.2f}%")
