<H4>What is dataloader</H4>

- It takes a dataset.
- Creates batches
- Shuffles data
- Loads data efficiently


In [1]:
import torch
from torch.utils.data import Dataset

class SimpleDataset(Dataset):
    def __init__(self):
        self.x = torch.tensor([1, 2, 3, 4, 5], dtype=torch.float32)
        self.y = torch.tensor([2, 4, 6, 8, 10], dtype=torch.float32)

    def __len__(self):
        return len(self.x)

    def __getitem__(self, index):
        return self.x[index], self.y[index]



<h3> Creating Data Loader </h3>

In [2]:
from torch.utils.data import DataLoader
dataset = SimpleDataset()

loader = DataLoader(dataset, batch_size=2, shuffle=True)

Now the Data is shuffled and grouped into batches

In [3]:
for batchX, batchY in loader:
    print(batchX, batchY)

tensor([1., 4.]) tensor([2., 8.])
tensor([2., 5.]) tensor([ 4., 10.])
tensor([3.]) tensor([6.])


In [13]:
for batchX, batchY in loader:
    print(batchX.view(-1, 1))
    print(batchY.view(-1, 1))

tensor([[5.],
        [4.]])
tensor([[10.],
        [ 8.]])
tensor([[1.],
        [3.]])
tensor([[2.],
        [6.]])
tensor([[2.]])
tensor([[4.]])


Training Loop

In [4]:
import torch.nn as nn
import torch.optim as optim

model = nn.Linear(1, 1)
optimizer = optim.SGD(model.parameters(), lr=0.01)
criterion = nn.MSELoss()

In [14]:
for epoch in range(30):

    for batchX, batchY in loader:
        batchX = batchX.view(-1, 1)
        batchY = batchY.view(-1, 1)

        y_pred = model(batchX)
        loss = criterion(y_pred, batchY)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    print(f"Epoch {epoch}: Loss {loss.item()}")

Epoch 0: Loss 35.20134353637695
Epoch 1: Loss 1.348494052886963
Epoch 2: Loss 0.2828003764152527
Epoch 3: Loss 0.09064068645238876
Epoch 4: Loss 0.1604979932308197
Epoch 5: Loss 0.017627129331231117
Epoch 6: Loss 0.043362122029066086
Epoch 7: Loss 0.05831851810216904
Epoch 8: Loss 0.008720354177057743
Epoch 9: Loss 0.12110701948404312
Epoch 10: Loss 0.04412495344877243
Epoch 11: Loss 0.11093009263277054
Epoch 12: Loss 0.04143565148115158
Epoch 13: Loss 0.03907747566699982
Epoch 14: Loss 0.006892671808600426
Epoch 15: Loss 0.042529404163360596
Epoch 16: Loss 0.005525493994355202
Epoch 17: Loss 0.038001082837581635
Epoch 18: Loss 0.03674193471670151
Epoch 19: Loss 0.005699891597032547
Epoch 20: Loss 0.03603934496641159
Epoch 21: Loss 0.09115283936262131
Epoch 22: Loss 0.04793546348810196
Epoch 23: Loss 0.04273330420255661
Epoch 24: Loss 0.005918231792747974
Epoch 25: Loss 0.0844867154955864
Epoch 26: Loss 0.003599134972319007
Epoch 27: Loss 0.08022372424602509
Epoch 28: Loss 0.0783220976

<h4>Notes</h4>

Dataset defines how to access individual data samples, while DataLoader handles batching and shuffling for efficient training.

Dataset gives ONE sample.

DataLoader gives MANY samples (batch).

Training loop iterates over batches.

Epoch = one full pass over dataset.

Shuffle changes order every epoch.
