# 🖼 Level 4: Working with Data

In this level, you’ll learn how to handle data in PyTorch—whether it’s images, text, or custom datasets.  
PyTorch provides powerful tools like **Datasets** and **DataLoaders** to make data preparation easy and scalable.

---

## ✔️ Datasets & DataLoaders

- A **Dataset** provides an interface to your data.
- A **DataLoader** handles batching, shuffling, and parallel loading.

### Example using TensorDataset:




In [2]:
import torch
from torch.utils.data import TensorDataset, DataLoader

# Dummy data
X = torch.randn(100, 1)
y = 3 * X + 1

# Create Dataset and DataLoader
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=10, shuffle=True)

# Iterate through batches
for batch_X, batch_y in loader:
    print(batch_X.shape, batch_y.shape)

torch.Size([10, 1]) torch.Size([10, 1])
torch.Size([10, 1]) torch.Size([10, 1])
torch.Size([10, 1]) torch.Size([10, 1])
torch.Size([10, 1]) torch.Size([10, 1])
torch.Size([10, 1]) torch.Size([10, 1])
torch.Size([10, 1]) torch.Size([10, 1])
torch.Size([10, 1]) torch.Size([10, 1])
torch.Size([10, 1]) torch.Size([10, 1])
torch.Size([10, 1]) torch.Size([10, 1])
torch.Size([10, 1]) torch.Size([10, 1])


✅ This makes it easy to train models on mini-batches.

## ✔️ torchvision & torchtext (for Images and Text)
- PyTorch has special libraries to handle images and text:

    - torchvision → for image datasets, transforms, and pre-trained models.

    - torchtext → for text datasets, tokenization, and embeddings.

Example: Loading Images with torchvision

In [10]:
from torchvision import datasets, transforms

transform = transforms.ToTensor()
train_dataset = datasets.FakeData(transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=16, shuffle=True)


✅ torchvision provides built-in datasets like MNIST, CIFAR-10, ImageNet.

## ✔️ Custom Datasets
- When your data doesn’t fit standard formats, you can create a custom Dataset by subclassing torch.utils.data.Dataset.

Example Custom Dataset:

In [14]:
from torch.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self, X, y):
        self.X = X
        self.y = y

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]

# Usage:
dataset = MyDataset(X, y)
loader = DataLoader(dataset, batch_size=5)


✅ This is essential for handling non-standard datasets like time series, tabular data, or custom file formats.

## ✅ Summary Table

| Concept                | PyTorch Example                                         |
|------------------------|--------------------------------------------------------|
| Datasets & DataLoaders  | `TensorDataset`, `DataLoader`                           |
| torchvision / torchtext | `datasets.MNIST`, `datasets.CIFAR10`, text pipelines   |
| Custom Dataset          | Subclass `Dataset` and implement `__len__` + `__getitem__` |
