# Part 6: Datasets & DataLoaders ðŸ“¦

In the previous notebooks, we manually sliced our tensors (`X_train`, `y_train`). 
For huge datasets (like 1TB of images), you can't load everything into memory at once.

PyTorch solves this with **Datasets** (how to get one item) and **DataLoaders** (how to batch them).

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np

## 1. Custom Dataset Class

To use PyTorch's data tools, we just need to create a class that inherits from `Dataset` and implements:
1. `__len__`: How many items are there?
2. `__getitem__`: Get the i-th item.

In [None]:
class MyCustomDataset(Dataset):
    def __init__(self, size=1000):
        # Generate fake data on init
        self.x = torch.rand(size, 1) * 10
        self.y = self.x ** 2 + 1 + torch.randn(size, 1) * 2
        
    def __len__(self):
        return len(self.x)
    
    def __getitem__(self, idx):
        return self.x[idx], self.y[idx]

# Instantiate
dataset = MyCustomDataset(size=500)
print(f"Dataset size: {len(dataset)}")
print(f"Item 0: {dataset[0]}")

## 2. Using DataLoader

`DataLoader` takes a Dataset and gives you an iterator that handles:
- Batching (e.g., 32 items at a time)
- Shuffling (random order)
- Parallel loading (`num_workers`)

In [None]:
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

# Iterate through one batch
for X_batch, y_batch in dataloader:
    print(f"Batch Shape X: {X_batch.shape}")
    print(f"Batch Shape y: {y_batch.shape}")
    break

## 3. Training Loop with DataLoader

This is how real training loops look.

In [None]:
# Setup Model
model = nn.Sequential(
    nn.Linear(1, 20), 
    nn.ReLU(), 
    nn.Linear(20, 1)
)
optimizer = optim.Adam(model.parameters(), lr=0.01)
criterion = nn.MSELoss()

# Loop
epochs = 5

for epoch in range(epochs):
    total_loss = 0
    
    for X_batch, y_batch in dataloader:
        # The standard 5 steps
        predictions = model(X_batch)
        loss = criterion(predictions, y_batch)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
    
    avg_loss = total_loss / len(dataloader)
    print(f"Epoch {epoch+1}: Avg Loss = {avg_loss:.4f}")

## ðŸ§  Summary

1. **`Dataset`**: Defines HOW to get a single item.
2. **`DataLoader`**: Defines HOW to batch and shuffle items.
3. **Training Loop**: Iterates over the `DataLoader` instead of raw tensors.

Next up: **CNNs** - Finally, we work with images!