# Niche Image Classifier: Data Exploration and Transformation

This notebook defines the data transformations for the ResNet50 model, which will be trained to classify images of rock, paper, and scissors.

**Key Steps:**
1.  **Define Transformations:** Create separate transformation pipelines for the training and validation datasets.
2.  **Load Datasets:** Use `torchvision.datasets.ImageFolder` to load the images.
3.  **Create DataLoaders:** Prepare the data for batching and training.

### 1. Import Libraries

In [None]:
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import os

### 2. Define Data Transformations

Here, we define two separate transformation pipelines:
- **Training (`train_transforms`):** Includes data augmentation (`RandomResizedCrop`, `RandomHorizontalFlip`) to improve model generalization.
- **Validation (`val_transforms`):** A simpler pipeline with only resizing and center cropping, as we want to evaluate the model on unmodified images.

Both pipelines resize images to `224x224` and normalize them using the ImageNet mean and standard deviation.

In [None]:
# ImageNet Mean and Standard Deviation (Required for Transfer Learning with ResNet50)
IMAGENET_MEAN = [0.485, 0.456, 0.406]
IMAGENET_STD = [0.229, 0.224, 0.225]

# Define the transformations for the training and validation sets
train_transforms = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(IMAGENET_MEAN, IMAGENET_STD)
])

val_transforms = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(IMAGENET_MEAN, IMAGENET_STD)
])

print("Transformations defined successfully!")

### 3. Load Datasets with `ImageFolder`

Now that the transformations are defined, we can load the datasets from their respective directories. We use `ImageFolder`, which automatically infers class labels from the folder structure.

In [18]:
# Define the data directories
# Using an absolute path to avoid issues with the current working directory
DATA_DIR = "/home/kushsoni/Desktop/ai-engineer-roadmap-y1/06-The_ML_Core/01_TransferLearning_Classifier/data/processed"
TRAIN_DIR = os.path.join(DATA_DIR, "train")
VAL_DIR = os.path.join(DATA_DIR, "val")

# Load the datasets using ImageFolder
train_dataset = datasets.ImageFolder(TRAIN_DIR, transform=train_transforms)
val_dataset = datasets.ImageFolder(VAL_DIR, transform=val_transforms)

# Print the class names and the number of images in each dataset
print(f"Classes: {train_dataset.classes}")
print(f"Number of training images: {len(train_dataset)}")
print(f"Number of validation images: {len(val_dataset)}")

Classes: ['paper', 'rock', 'scissors']
Number of training images: 2518
Number of validation images: 35


### 4. Create DataLoaders

`DataLoader` wraps an iterable around the dataset to enable easy access to the samples. It handles batching, shuffling, and parallel data loading.

In [19]:
# Define the batch size
BATCH_SIZE = 32

# Create the DataLoaders
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False)

# Verify the number of batches
print(f"Number of batches in train_loader: {len(train_loader)}")
print(f"Number of batches in val_loader: {len(val_loader)}")

Number of batches in train_loader: 79
Number of batches in val_loader: 2
