- Epoch -> One forward and Backward pass of ALL training sample
  >
- batch_size -> Number of training samples in one forward & backward pas
  >
- number of iterations -> number of passes, each pass using [batch_size] number of samples

- eg: 100 samples, batch_size=20 --> 100/20=5 iterations for 1 epoch

***The Main use of the Dataloader is to introducing the Minibatch training process***
- It automatically splits the data into batches, 
- It shuffles the data, 
- Loads data in parallel (multi threading)
- Handles large datasets that dont fit in RAM
>
So while its not mathematically required, Its a standard tool to save time of batch splitting code, Speed up training,

### Mini Batch Gradient Descent, Batch GD and Stochastic GD

- Mini Batch gradient descent = Updates new weights (batch size) per epoch (Stable)
- Batch Gradient descent = Updates new weight only once per epoch (Slow)
- Stochastic Gradient descent = Updates new weight every new data enters (Fast, Unstable)


#### Mini Batch 
**Eg:-**
- Dataset size = 100
- Batch size = 20
- So 100/20 = 5 mini batches
>
- Take 1st 20 samples - Forward pass - loss calc - backward pass - gradients - update weights
- Take 2nd 20 samples - Forward pass - loss calc - backward pass - gradients - update weights
- Repeat until all 5 mini batches are done
- 1 Epoch end, weights updates 5 times

***Like this Full batch GD and SGD works***


### OPTIMIZERS
- SGD: plain gradient step.
- RMSprop: scales gradients based on their recent magnitudes.
- Adam: combines momentum + RMSprop ideas.



**STEP BY STEP TRAINING PROCESS**
- Forward pass - (on your chosen batch - 1 sample, mini batch or full batch)
  - predict y_pred
- Loss Calculation - (difference btwn y_pred and y_true)
- Backward pass
  - Compute gradients of loss wrt weights
  - This gradient depends on how many samples you used ( batch size )
- Optimizer step (SGD, Adam, etc)
  - Takes the gradient and updates weights differently
 
  

In [None]:
# Withoud dataloader eg
 
data = numpy.loadtxt('wine.csv')
# training loop
for epoch in range(1000):
    x, y = data
    # foward + backward + weight updates

In [None]:
# With dataloader eg

# training loop
for epoch in range(1000):
    # loop over all batches
    for i in range(total_batches):
        x_batch, y_batch = ...

# --> use datset and dataloader to load wine.csv

In [1]:
# Pytorch with Dataloader

import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split


# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert to tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.long)
y_test = torch.tensor(y_test, dtype=torch.long)

# Create dataset & dataloader
train_dataset = TensorDataset(X_train,y_train)
test_dataset = TensorDataset(X_test, y_test)

train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True, num_workers=2)  
test_loader = DataLoader(test_dataset, batch_size=16)
# num_workers -> No of parallel processing (Multi threating)


# Define Model 
class IrisNet(nn.Module):
    def __init__(self):
        super(IrisNet, self).__init__()
        self.fc1 = nn.Linear(4, 100)
        self.fc2 = nn.Linear(100, 3)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        return torch.softmax(self.fc2(x), dim=1)

model = IrisNet()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Training loop
for epoch in range(10):
    for batch_X, batch_y in train_loader:
        # Forward pass - prediction and loss
        y_pred = model(batch_X)
        loss = criterion(y_pred, batch_y)

        # backward pass - gradient
        optimizer.zero_grad()
        loss.backward()

        # update weights 
        optimizer.step()
        
    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")


# X_test_data, y_test_data = test_loader
# with torch.no_grad():
#     y_predicted = model(X_test_data)
#     y_predicted_round = y_predicted.round()
    # # print(y_predicted, y_predicted_round)
    # acc = y_predicted_round.eq(y_test_data).sum() / float(y_test_data.shape[0])
    # print()
    # print(f'Accuracy score :{acc * 100:.3f}')

Epoch 1, Loss: 0.8270
Epoch 2, Loss: 0.5606
Epoch 3, Loss: 0.8279
Epoch 4, Loss: 0.8281
Epoch 5, Loss: 0.6240
Epoch 6, Loss: 0.7933
Epoch 7, Loss: 0.6436
Epoch 8, Loss: 0.7201
Epoch 9, Loss: 0.6077
Epoch 10, Loss: 0.6002


In [18]:
# Pytorch without dataloader


import numpy as np

# Same X_train, y_train, X_test, y_test from before
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)

batch_size = 16
n_batches = int(np.ceil(len(X_train_tensor) / batch_size))

model = IrisNet()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Manual batching loop
for epoch in range(10):
    permutation = torch.randperm(X_train_tensor.size(0))
    for i in range(n_batches):
        idx = permutation[i*batch_size:(i+1)*batch_size]
        batch_X, batch_y = X_train_tensor[idx], y_train_tensor[idx]

        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")


Epoch 1, Loss: 1.0834
Epoch 2, Loss: 0.9552
Epoch 3, Loss: 0.9585
Epoch 4, Loss: 0.7964
Epoch 5, Loss: 0.7861
Epoch 6, Loss: 0.7293
Epoch 7, Loss: 0.7380
Epoch 8, Loss: 0.7130
Epoch 9, Loss: 0.7749
Epoch 10, Loss: 0.7118


  X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
  y_train_tensor = torch.tensor(y_train, dtype=torch.long)


In [20]:
# # USING TENSORFLOW

# import numpy as np
# from sklearn.datasets import load_iris
# from sklearn.model_selection import train_test_split
# from tensorflow.keras.models import Sequential
# from tensorflow.keras.layers import Dense
# from tensorflow.keras.utils import to_categorical

# # Load dataset
# iris = load_iris()
# X, y = iris.data, iris.target
# y = to_categorical(y, num_classes=3)  # one-hot encode

# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# # Build model
# model = Sequential([
#     Dense(10, activation='relu', input_shape=(4,)),
#     Dense(3, activation='softmax')
# ])

# model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# # Training with mini-batch size 16
# model.fit(X_train, y_train, batch_size=16, epochs=20, validation_data=(X_test, y_test))
