To systematically explore potential optimizations for the ResNet architecture while staying within the 5 million parameter constraint we will experiment with a variety of hyperparameters across architecture, training, optimization, and regularization settings**.

---

## 1. Architecture Hyperparameters

| **Hyperparameter** | **Description** | **Possible Values to Test** |
|-------------------|----------------|---------------------------|
| **Number of Residual Blocks** | Controls model depth and complexity | 8, 10, 14 |
| **Number of Channels per Layer** | Determines feature map size per block | 16, 32, 64 |
| **Kernel Size (Standard Conv)** | Controls receptive field per layer | 3x3, 5x5 |
| **Kernel Size (Skip Connection)** | Controls information flow in residual block | 1x1, 3x3 |
| **Use Depthwise Separable Convs** | Reduces parameter count while maintaining expressiveness | Yes, No |
| **Use Squeeze-and-Excitation Blocks** | Enhances channel-wise feature recalibration | Yes, No |
| **Global Average Pooling (GAP) vs. Fully Connected Layers** | Reduces overfitting and parameter count | GAP, FC Layers |
| **DropPath Regularization** | Prevents overfitting in deeper models | 0.1, 0.2, 0.3 |

---

## 2. Training Hyperparameters

| **Hyperparameter** | **Description** | **Possible Values to Test** |
|-------------------|----------------|---------------------------|
| **Batch Size** | Controls the number of samples per update | 64, 128, 256 |
| **Learning Rate (Initial)** | Defines step size for weight updates | 0.01, 0.001, 0.0005 |
| **Optimizer Type** | Determines how gradients are updated | SGD (Momentum), AdamW, RMSProp, LAMB |
| **Momentum (for SGD)** | Helps smooth gradient updates | 0.8, 0.9, 0.95 |
| **Weight Decay (L2 Regularization)** | Prevents overfitting by penalizing large weights | 1e-5, 1e-4, 1e-3 |
| **Gradient Clipping** | Prevents exploding gradients | No Clipping, Max Norm (1.0, 5.0) |

---

## 3. Learning Rate Scheduling Strategies

| **Scheduler Type** | **Description** | **Hyperparameters to Tune** |
|-------------------|----------------|---------------------------|
| **Cosine Annealing** | Smooth decay of learning rate | Min LR: 1e-6, Restart Intervals |
| **Step Decay** | Reduces LR at fixed intervals | Drop Rate: 0.1, Step Size: 10 epochs |
| **OneCycleLR** | Uses a cyclic LR for fast convergence | Max LR: 0.1, Anneal Strategy |
| **Cyclic LR** | Periodically increases and decreases LR | Base LR: 1e-4, Max LR: 1e-2 |

---

## 4. Data Augmentation Hyperparameters

| **Augmentation Type** | **Description** | **Possible Values to Test** |
|----------------------|----------------|---------------------------|
| **CutMix** | Replaces a patch of an image with another image | Alpha = 0.2, 0.4 |
| **MixUp** | Blends two images with a weighted sum | Alpha = 0.1, 0.2, 0.4 |
| **AutoAugment** | Uses a learned augmentation policy | CIFAR-10 Policy, SVHN Policy |
| **Random Erasing** | Randomly removes parts of an image | Probability = 0.25, 0.5 |
| **Horizontal Flip** | Flips images left-right | Always On, 50% Probability |

---

## 5. Batch Normalization & Dropout Hyperparameters

| **Regularization Type** | **Description** | **Possible Values to Test** |
|----------------------|----------------|---------------------------|
| **BatchNorm Momentum** | Affects stability of normalization layers | 0.8, 0.9, 0.99 |
| **Dropout Rate** | Prevents overfitting by randomly disabling neurons | 0.2, 0.3, 0.5 |
| **Label Smoothing** | Reduces confidence of softmax predictions | 0.05, 0.1, 0.2 |

---

## 6. Residual Block Optimizations

| **Block Design Choice** | **Description** | **Possible Values to Test** |
|------------------------|----------------|---------------------------|
| **Number of Residual Blocks** | Controls model depth | 3, 5, 7 |
| **Use Grouped Convolutions** | Splits filters into smaller groups | Yes, No |
| **Position of BatchNorm in Residual Block** | Affects gradient flow and stability | Before Activation, After Activation |
| **Shortcut Connection Type** | Defines how information flows through blocks | Identity Shortcut, 1x1 Conv |

---

## Final Model Selection Criteria
Compare and integrate the best-performing:
- **Optimizer + Learning Rate Schedule**
- **Data Augmentation Strategy**
- **Residual Block Architecture**
- **Regularization Techniques**
- **Batch Normalization & Dropout Settings**
