# Optimizers and Practical Skills
## Deep Learning Toolbox
### Data processing
Data augmentation  
- Flip, rotate, random crop, colour shift, noise addition, information loss, contrast change  
- Batch normalization  

Training neural network parameters  
- Epoch  
- Mini-batch gradient descent  
- Loss function  
    - Cross-entropy loss

Finding optimal weights  
- Backpropagation weight update  

Parameter tuning - Weights initialization  
- Xavier initialization  
    - Instead of random initialization, initialize to take into account characteristics unique to the architecture  
- Transfer learning
    - Can freeze all layers and train only on classifier/last layers and classifier or retrain all depending on how much training we have  

Optimizing convergence  
- Learning rate  
- Adaptive learning rates  

Regularization  
- Dropout  
- Weight regularization  
    - Lasso: L1 regularization, shrinks coefficients to 0  
    - Ridge: L2 regularization, makes coefficients smaller  
    - Elastic Net: L1+L2, trade off being variable selection and small coefficients  
- Early stopping  


# 🧠 Neural Network Debugging Checklist (Quick Notes)

## 0. First Response
- ✅ Use simple baseline model (e.g., VGG for images).  
- ✅ Standard loss, no custom functions.  
- ✅ Disable regularization & augmentation.  
- ✅ Check preprocessing (esp. for finetuning).  
- ✅ Verify input data visually.  
- ✅ Overfit on a tiny dataset (2–20 samples).  
- ✅ Add complexity back gradually.  

---

## I. Dataset Issues
- 📸 Check input/labels (e.g., swapped dims, wrong batch, all zeroes).  
- 🎲 Feed random input → if same error, data not used properly.  
- 🛠️ Validate data loader → inspect first layer’s input.  
- 🔗 Ensure correct label mapping & shuffling.  
- ❓ Check if input–output relationship is meaningful.  
- 🔊 Inspect dataset noise & mislabels.  
- 🔀 Shuffle dataset properly.  
- ⚖️ Handle class imbalance (loss balancing, resampling).  
- 📈 Enough training examples? (~1k images/class for scratch training).  
- 🗂️ Ensure batches aren’t single-label.  
- 📦 Reduce batch size if too large.  
- 🏷️ Use standard datasets first (MNIST, CIFAR-10) to validate pipeline.  

---

## II. Data Normalization / Augmentation
- 📏 Standardize features (zero mean, unit variance).  
- 🔄 Avoid excessive augmentation → underfitting risk.  
- 🖼️ Match pretrained model preprocessing ([0,1], [-1,1], [0,255]).  
- 📊 Train/val/test preprocessing split correctly (train-only stats).  

---

## III. Implementation Issues
- 🧩 Solve simpler subproblem first.  
- 🎯 Check loss “at chance” (e.g., 10 classes → CE loss ≈ 2.302).  
- ⚠️ Verify loss function (bugs in custom loss?).  
- 🛑 Ensure correct inputs to loss (NLLLoss vs CrossEntropyLoss).  
- ⚖️ Balance multi-loss weights.  
- 📊 Track multiple metrics (not just loss).  
- 🧪 Unit test custom layers.  
- 🔒 Check for unintentionally frozen layers.  
- 🏗️ Increase network size if too weak.  
- 🔢 Use unusual dims (primes) to detect shape errors.  
- 🧮 Gradient checking (if manual backprop).  

---

## IV. Training Issues
- 🔍 Overfit tiny subset (1–2 samples).  
- 🎲 Try different weight inits (Xavier, He).  
- 🔧 Tune hyperparams (grid/random search).  
- 🚫 Reduce reg. if underfitting (dropout, weight decay, BN).  
- ⏳ Allow more training time if loss steadily ↓.  
- 🔀 Switch Train ↔ Test mode correctly (BN, dropout).  
- 👁️ Visualize training (weights, activations, updates, TensorBoard).  
- 📉 Check activations (std ~ 0.5–2.0) for vanishing/exploding.  
- ⚡ Try different optimizer (Adam, SGD+momentum).  
- 🎚️ Adjust learning rate (×0.1 or ×10).  
- 🚫 Debug NaNs (reduce LR, check div/0, log(≤0), trace layer by layer).  

---
