# Training Parameters: Epochs, Batch Size, Iterations, Learning Rate

---

## 1. Epoch
- **Definition:** One epoch means **one complete pass** of the **entire training dataset** through the model.  
- Example: If you have 10,000 training samples and you pass all of them once → 1 epoch.  
- **Effect:**  
  - More epochs → model learns better (up to a limit).  
  - Too many epochs → **overfitting**.  

---

## 2. Batch Size
- **Definition:** Number of samples processed **before updating weights** once.  
- If dataset = 10,000 samples and batch size = 100 → each epoch has 100 updates.  
- **Effect:**  
  - Small batch size → noisy updates, better generalization, slower training.  
  - Large batch size → faster on GPU, but risk of poor generalization.  

---

## 3. Iteration
- **Definition:** One iteration = **one weight update step**.  
- Relationship:  
  $$
  \text{Iterations per Epoch} = \frac{\text{Number of Training Samples}}{\text{Batch Size}}
  $$
- Example: Dataset = 10,000 samples, Batch Size = 100 → 100 iterations per epoch.  
- If trained for 50 epochs → Total iterations = $50 \times 100 = 5000$.  

---

## 4. Learning Rate (η)
- **Definition:** Controls the **step size** at each iteration while moving toward minimum of the loss function.  
- Update Rule (basic gradient descent):  
  $$
  w_{t+1} = w_t - \eta \nabla L(w_t)
  $$
  where:  
  - $w_t$: weight at step *t*  
  - $\eta$: learning rate  
  - $\nabla L(w_t)$: gradient of the loss with respect to weight  

- **Effect:**  
  - Small $\eta$ → slow learning.  
  - Large $\eta$ → may diverge or oscillate.  
  - Optimal $\eta$ → fast and stable convergence.  

---

## 5. Summary Relationships
- **Epoch:** 1 full pass of dataset.  
- **Batch Size:** number of samples per update.  
- **Iteration:** one weight update step.  
- **Learning Rate:** size of the update step.  

**Formula Recap:**  
$$
\text{Iterations per Epoch} = \frac{\text{Dataset Size}}{\text{Batch Size}}
$$  

---
