---

# **Experiment Report: Model Performance and Hyperparameter Tuning**

## **Overview**

This report summarizes the results of six experiments performed on the MViT_V1_B model. Each experiment focuses on the impact of different configurations, including optimizer selection, dropout rates, number of unfreezed layers, and the use of data augmentation. The goal was to evaluate the effect of these parameters on the model's performance in terms of test accuracy.

---

## **Experiment Details**

### **Common Parameters Across Experiments**

* **Dataset**: Balanced with 31 classes
* **Training/Validation Split**: 80 training, 20 validation
* **Training Samples**: 2,480 training videos
* **Validation Samples**: 619 validation videos
* **Frames per Clip**: 16
* **Batch Size**: 4
* **Learning Rate**: 0.0001
* **Model**: MViT_V1_B
* **K (video temporal duration)**: Varies across experiments
* **Num_classes**: 31

---

### **Experiment 1: Adam Optimizer, No Augmentation**

* **Optimizer**: Adam

* **Frozen Layers**: 3 layers unfreezed

* **Epochs**: 15

* **Train/Val**: 80/20 per class

* **Test Accuracy**: 71.43%

* **Training Run ID**: 49470cf15ae3430391516c4684044f9e

---

### **Experiment 2: AdamW Optimizer, No Augmentation**

* **Optimizer**: AdamW

* **Frozen Layers**: 3 layers unfreezed

* **Epochs**: 10

* **Train/Val**: 80/20 per class

* **Weight Decay**: 0.05

* **Test Accuracy**: 76.19%

* **Training Run ID**: d403d5ccc2af4feeb7d728ebaee90eb9

---

### **Experiment 3: AdamW Optimizer, 2 Unfreezed Layers, No Augmentation**

* **Optimizer**: AdamW

* **Frozen Layers**: 2 layers unfreezed

* **Epochs**: 10

* **Train/Val**: 80/20 per class

* **Weight Decay**: 0.05

* **Test Accuracy**: 71.43%

* **Training Run ID**: 8a485caa2a244fc992f4874daa203dd3

---

### **Experiment 4: AdamW Optimizer, 3 Unfreezed Layers, Dropout = 0.5, No Augmentation**

* **Optimizer**: AdamW

* **Frozen Layers**: 3 layers unfreezed

* **Epochs**: 10

* **Dropout**: 0.5

* **Train/Val**: 80/20 per class

* **Weight Decay**: 0.05

* **Test Accuracy**: 76.19%

* **Training Run ID**: 11549cdad64c4d29a49f14c06aae05cc

---

### **Experiment 5: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.5, No Augmentation**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 10

* **Dropout**: 0.5

* **Train/Val**: 80/20 per class

* **Weight Decay**: 0.05

* **Test Accuracy**: 80.95%

* **Training Run ID**: 3f8940e46655425c9a7a38386c3e5698

---

### **Experiment 6: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.5, Temporal Augmentation**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 10

* **Dropout**: 0.5

* **Train/Val**: 80/20 per class

* **Weight Decay**: 0.05

* **Augmentation**: Temporal (Random Stride, Random Start Point)

* **Test Accuracy**: 66.67%

* **Training Run ID**: dedd45053c52412f8d2eb72ed53ab13e

---

## **Summary of Results**

| Experiment | Optimizer | Unfreezed Layers | Dropout | Augmentation          | Test Accuracy |
| ---------- | --------- | ---------------- | ------- | --------------------- | ------------- |
| **1**      | Adam      | 3                | None    | None                  | 71.43%        |
| **2**      | AdamW     | 3                | None    | None                  | 76.19%        |
| **3**      | AdamW     | 2                | None    | None                  | 71.43%        |
| **4**      | AdamW     | 3                | 0.5     | None                  | 76.19%        |
| **5**      | AdamW     | 1                | 0.5     | None                  | 80.95%        |
| **6**      | AdamW     | 1                | 0.5     | Temporal Augmentation | 66.67%        |

---

## **Analysis and Insights**

1. **Optimizer Choice**:

   * **AdamW** consistently outperformed **Adam**, achieving higher accuracy across all experiments except Experiment 3. This suggests that **AdamW** might be more suitable for the current task, potentially due to its better handling of weight decay.

2. **Frozen Layers**:

   * Experiments with more unfreezed layers (3) did not consistently outperform experiments with fewer unfreezed layers. Experiment 5, which had only 1 layer unfreezed, achieved the highest accuracy (80.95%).

3. **Dropout**:

   * The addition of dropout (0.5) in **Experiments 4 and 5** helped improve the model's performance compared to **Experiments 1, 2, and 3**. It appears that regularization through dropout enhances generalization without overfitting.

4. **Temporal Augmentation**:

   * Experiment 6, which used temporal augmentation (Random Stride and Random Start Point), showed the lowest test accuracy (66.67%). The lack of improvement could suggest that temporal augmentation, at least with the current setup, may not be beneficial or may require more fine-tuning.

---

## **Conclusion**

* **Experiment 5** shows the best test accuracy (80.95%) and suggests that a configuration with the **AdamW optimizer**, **1 unfreezed layer**, and **dropout = 0.5** is optimal for this dataset.
* While **temporal augmentation** had no beneficial effect in **Experiment 6**, further experimentation with different augmentation strategies may still be worthwhile.
* Future experiments could involve increasing the number of unfreezed layers and testing other augmentation methods to better generalize the model.

---

---

# **Experiment Report: Model Performance and Hyperparameter Tuning (Continued)**

## **Overview**

This section continues the evaluation of the MViT_V1_B model across 7 more experiments. These experiments further explore the effects of temporal augmentation, random flips, dropout rates, and the number of epochs on the model's performance.

---

## **Experiment Details**

### **Common Parameters Across Experiments**

* **Dataset**: Balanced with 31 classes
* **Training/Validation Split**: 80 training, 20 validation
* **Training Samples**: 2,480 training videos
* **Validation Samples**: 619 validation videos
* **Frames per Clip**: 16
* **Batch Size**: 4
* **Learning Rate**: 0.0001
* **Model**: MViT_V1_B
* **Num_classes**: 31
* **K (video temporal duration)**: Varies across experiments

---

### **Experiment 7: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.5, Temporal Augmentation**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 10

* **Dropout**: 0.5

* **Train/Val**: 80/20 per class

* **Weight Decay**: 0.05

* **Augmentation**: Temporal (Random Stride, Random Start Point, Temporal Jitter)

* **Test Accuracy**: 66.67%

* **Training Run ID**: 57cc8ff7c49c4d59b3507976dcb2b637

---

### **Experiment 8: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.5, Random Flip (0.5)**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 10

* **Dropout**: 0.5

* **Train/Val**: 80/20 per class

* **Weight Decay**: 0.05

* **Augmentation**: Random Flip (Probability = 0.5)

* **Test Accuracy**: 71.43%

* **Training Run ID**: 2f44dd2a2882421fa6efa8ce19c0daba

---

### **Experiment 9: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.5, Random Flip (0.7)**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 10

* **Dropout**: 0.5

* **Train/Val**: 80/20 per class

* **Weight Decay**: 0.05

* **Augmentation**: Random Flip (Probability = 0.7)

* **Test Accuracy**: 71.43%

* **Training Run ID**: a5e6452fe83c4cdbb031127393f2e28d

---

### **Experiment 10: AdamW Optimizer, 2 Unfreezed Layers, Dropout = 0.5, Epochs = 15**

* **Optimizer**: AdamW

* **Frozen Layers**: 2 layers unfreezed

* **Epochs**: 15

* **Dropout**: 0.5

* **Train/Val**: 80/20 per class

* **Weight Decay**: 0.05

* **Test Accuracy**: 76.19%

* **Training Run ID**: 6eddc45f099b4b19b585fce06874c970

---

### **Experiment 11: AdamW Optimizer, 2 Unfreezed Layers, Dropout = 0.6, Epochs = 15**

* **Optimizer**: AdamW

* **Frozen Layers**: 2 layers unfreezed

* **Epochs**: 15

* **Dropout**: 0.6

* **Train/Val**: 80/20 per class

* **Weight Decay**: 0.05

* **Test Accuracy**: 71.43%

* **Training Run ID**: c3063b7968064de182f0db590a7d797f

---

### **Experiment 12: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.5, Epochs = 15**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 15

* **Dropout**: 0.5

* **Train/Val**: 80/20 per class

* **Weight Decay**: 0.05

* **Test Accuracy**: 71.43%

* **Training Run ID**: d519b75001524f6f988d1ced79ea5bf3

---

### **Experiment 13: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.5, Epochs = 5**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 5

* **Dropout**: 0.5

* **Train/Val**: 80/20 per class

* **Test Accuracy**: 71.43%

* **Training Run ID**: 75a7f0c2f39e479f85ae3aadde395fd9

---

## **Summary of Results**

| Experiment | Optimizer | Unfreezed Layers | Dropout | Augmentation                           | Epochs | Test Accuracy |
| ---------- | --------- | ---------------- | ------- | -------------------------------------- | ------ | ------------- |
| **7**      | AdamW     | 1                | 0.5     | Temporal (Stride, Start Point, Jitter) | 10     | 66.67%        |
| **8**      | AdamW     | 1                | 0.5     | Random Flip (0.5)                      | 10     | 71.43%        |
| **9**      | AdamW     | 1                | 0.5     | Random Flip (0.7)                      | 10     | 71.43%        |
| **10**     | AdamW     | 2                | 0.5     | None                                   | 15     | 76.19%        |
| **11**     | AdamW     | 2                | 0.6     | None                                   | 15     | 71.43%        |
| **12**     | AdamW     | 1                | 0.5     | None                                   | 15     | 71.43%        |
| **13**     | AdamW     | 1                | 0.5     | None                                   | 5      | 71.43%        |

---

## **Analysis and Insights**

1. **Effect of Temporal Augmentation (Experiment 7)**:

   * **Experiment 7**, which included temporal augmentation (random stride, start point, and jitter), showed a significant drop in performance with a test accuracy of **66.67%**. The complex nature of the temporal augmentations may have made the model more prone to overfitting or noise, reducing its generalization ability on the validation set.

2. **Impact of Random Flip (Experiments 8 & 9)**:

   * Both **Experiment 8** (with a random flip probability of 0.5) and **Experiment 9** (with a higher probability of 0.7) resulted in a test accuracy of **71.43%**, showing no improvement with the increase in flip probability. This suggests that random flipping, at least at these probabilities, might not have had a significant impact on the model's performance.

3. **Epochs and Test Accuracy**:

   * **Experiment 10**, which used 15 epochs and 2 unfreezed layers, achieved the best test accuracy of **76.19%**. Increasing the number of epochs and unfreezing more layers appears to benefit the model when compared to experiments with fewer epochs (like Experiments 11, 12, and 13).

4. **Dropout Variations**:

   * The dropout rate played a crucial role in regularizing the model. Experiment 11, which had a higher dropout rate of **0.6**, did not improve accuracy and actually resulted in a test accuracy of **71.43%**, the same as several other experiments. It seems that a **dropout rate of 0.5** (as seen in Experiments 8-10) might be more optimal for this task.

5. **Effect of Epochs**:

   * **Experiment 13**, which used only 5 epochs, showed no increase in accuracy compared to those with 10 or 15 epochs, supporting the idea that a higher number of epochs could be beneficial for training.

---

## **Conclusion**

* **Experiment 10**, which used **2 unfreezed layers**, **dropout = 0.5**, and **15 epochs**, yielded the best performance with a **test accuracy of 76.19%**.
* **Temporal augmentation** (Experiment 7) did not improve model performance and might have introduced more complexity without sufficient benefits.
* **Random flip augmentation** had little effect on performance, suggesting that its use should be further tested with different configurations.
* The dropout rate of **0.5** seems optimal, as it generally produced the best results compared to higher dropout values (e.g., **0.6**).
* The **number of epochs** also plays a critical role, with models trained for **15 epochs** yielding better results.

---

---

# **Experiment Report: Model Performance and Hyperparameter Tuning (Continued)**

## **Overview**

This section continues the evaluation of the MViT_V1_B model across 7 more experiments. These experiments focus on varying training and validation video counts, epochs, and the AdamW optimizer, as well as adjusting the learning rate and dropout rate.

---

## **Experiment Details**

### **Common Parameters Across Experiments**

* **Dataset**: Balanced with 31 classes
* **Training/Validation Split**: Varies by experiment (70 videos on training, 15 videos validation and 80 videos on training, 20 videos validation)
* **Frames per Clip**: 16
* **Batch Size**: 4
* **Learning Rate**: 0.0001 (except in Experiment 19 where it's adjusted to 0.1)
* **Model**: MViT_V1_B
* **Num_classes**: 31
* **K (video temporal duration)**: Varies across experiments

---

### **Experiment 14: AdamW Optimizer, 2 Unfreezed Layers, Dropout = 0.5, Epochs = 8**

* **Optimizer**: AdamW

* **Frozen Layers**: 2 layers unfreezed

* **Epochs**: 8

* **Dropout**: 0.5

* **Train/Val**: 80/20 per class

* **Test Accuracy**: 71.43%

* **Training Run ID**: 232ba76f2eb54546a9f8ab312c5b1a5d

---

### **Experiment 15: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.5, Epochs = 10**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 10

* **Dropout**: 0.5

* **Train/Val**: 80/20 per class

* **Test Accuracy**: 66.67%

* **Training Run ID**: 808297df76234823b7abf794ccf299fe

---

### **Experiment 16: AdamW Optimizer, 2 Unfreezed Layers, Dropout = 0.5, Epochs = 10**

* **Optimizer**: AdamW

* **Frozen Layers**: 2 layers unfreezed

* **Epochs**: 10

* **Dropout**: 0.5

* **Train/Val**: 80/20 per class

* **Test Accuracy**: 71.43%

* **Training Run ID**: dc5f69f8169b414a82ace13a056b5b0e

---

### **Experiment 17: AdamW Optimizer, 2 Unfreezed Layers, Dropout = 0.5, Epochs = 10 (with 15% validation)**

* **Optimizer**: AdamW

* **Frozen Layers**: 2 layers unfreezed

* **Epochs**: 10

* **Dropout**: 0.5

* **Train/Val**: 70/15 per class

* **Weight Decay**: 0.05

* **Test Accuracy**: 76.19%

* **Training Run ID**: 3796b4bdc2fc48f18f2a6fd0b04c6210

---

### **Experiment 18: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.5, Epochs = 10 (with 15% validation)**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 10

* **Dropout**: 0.5

* **Train/Val**: 70/15 per class

* **Test Accuracy**: 80.95%

* **Training Run ID**: ec9a3aee97754a18b6b417ed9e9e5bd5

---

### **Experiment 19: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.5, Epochs = 15 (with 15% validation)**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 15

* **Dropout**: 0.5

* **Train/Val**: 70/15 per class

* **Test Accuracy**: 76.19%

* **Training Run ID**: 4fcfc53a8e594b7cac5c0baee4b7d7be

---

### **Experiment 20: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.5, Epochs = 15, Learning Rate = 0.1**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 15

* **Dropout**: 0.5

* **Train/Val**: 70/15 per class

* **Weights Decay**: 0.1

* **Test Accuracy**: 71.43%

* **Training Run ID**: 71422a00e422416eb1c2c286992edd89

---

### **Experiment 21: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.5, Epochs = 10, Learning Rate = 0.1**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 10

* **Dropout**: 0.5

* **Train/Val**: 70/15 per class

* **Weights Decay**: 0.1

* **Test Accuracy**: 71.43%

* **Training Run ID**: 7dcecb8d9b9b43d2a4d784f1b3c743b8

---

## **Summary of Results**

| Experiment | Optimizer | Unfreezed Layers | Dropout | Learning Rate | Epochs | Validation Split | Test Accuracy |
| ---------- | --------- | ---------------- | ------- | ------------- | ------ | ---------------- | ------------- |
| **14**     | AdamW     | 2                | 0.5     | 0.0001        | 8      | 80/20            | 71.43%        |
| **15**     | AdamW     | 1                | 0.5     | 0.0001        | 10     | 80/20            | 66.67%        |
| **16**     | AdamW     | 2                | 0.5     | 0.0001        | 10     | 80/20            | 71.43%        |
| **17**     | AdamW     | 2                | 0.5     | 0.0001        | 10     | 70/15            | 76.19%        |
| **18**     | AdamW     | 1                | 0.5     | 0.0001        | 10     | 70/15            | 80.95%        |
| **19**     | AdamW     | 1                | 0.5     | 0.0001        | 15     | 70/15            | 76.19%        |
| **20**     | AdamW     | 1                | 0.5     | 0.1           | 15     | 70/15            | 71.43%        |
| **21**     | AdamW     | 1                | 0.5     | 0.1           | 10     | 70/15            | 71.43%        |

---

## **Analysis and Insights**

1. **Effect of Training and Validation Split**:

   * **Experiment 17** and **Experiment 18**, which used a smaller **15% validation split**, resulted in improved performance compared to others with an 80/20 split, especially for **Experiment 18** (80.95% accuracy). This suggests that a smaller validation set could be helping the model by making it focus more on the training set.

2. **Epochs**:

   * **Experiment 18** achieved the best accuracy (80.95%) with 10 epochs. **Experiment 19** with **15 epochs** showed a comparable accuracy of **76.19%**, indicating that training for more epochs in this case did not improve performance significantly and might have led to overfitting.

3. **Learning Rate**:

   * **Experiment 20** and **Experiment 21**, where the learning rate was increased to **0.1** (compared to the previous 0.0001), did not show improvements in accuracy. This suggests that **a higher learning rate of 0.1** may not be effective for the task at hand, and sticking with a lower learning rate is preferable.

4. **Dropout**:

   * **Dropout at 0.5** remained a consistent and effective choice, with the majority of experiments showing reasonable performance at this rate. The experiments with **dropout = 0.6** (if tested in future experiments) could be explored further to see if it negatively impacts the model's performance.

---

## **Conclusion**

* **Experiment 18**, with **1 unfreezed layer**, **dropout = 0.5**, **10 epochs**, and a **15% validation split**, achieved the best test accuracy of **80.


95%**.

* **Experiment 17**, which used **2 unfreezed layers** and a **15% validation split**, achieved a strong accuracy of **76.19%**.
* **Learning rate** adjustments (Experiments 20 and 21) did not result in better accuracy, suggesting that the model performs better with the default learning rate of **0.0001**.
* The **dropout rate of 0.5** consistently showed good performance across multiple configurations.

---


---

# **Experiment Report: Model Performance and Hyperparameter Tuning (Continued)**

## **Overview**

This section details the results of **5 additional experiments**, focusing on variations in dropout rate, label smoothing, and weight decay. Additionally, one experiment explores the effect of unfreezing 5 layers instead of 1. The experiments utilize the **MViT_V1_B** model, **AdamW** optimizer, and **balanced datasets**.

---

## **Experiment Details**

### **Common Parameters Across Experiments**

* **Dataset**: Balanced with 31 classes
* **Training/Validation Split**: Varies by experiment (70 videos on training, 15 videos validation and 80 videos on training, 20 videos validation)
* **Frames per Clip**: 16
* **Batch Size**: 4
* **Learning Rate**: 0.0001
* **Model**: MViT_V1_B
* **Num_classes**: 31
* **K (video temporal duration)**: Varies across experiments

---

### **Experiment 21: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.5, Label Smoothing = 0.05, Epochs = 10**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 10

* **Dropout**: 0.5

* **Train/Val**: 70/15 per class

* **Label Smoothing**: 0.05

* **Test Accuracy**: 71.43%

* **Training Run ID**: 6d42a2a2a7f84eb7a18af1ec5b9d59f0

---

### **Experiment 22: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.5, Label Smoothing = 0.1, Epochs = 10**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 10

* **Dropout**: 0.5

* **Train/Val**: 70/15 per class

* **Label Smoothing**: 0.1

* **Test Accuracy**: 66.67%

* **Training Run ID**: 86c0b71841a94168adacbdcafe659d53

---

### **Experiment 23: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.7, Label Smoothing = 0.1, Horizontal Flipping = 0.7, Weight Decay = 0.1, Epochs = 10**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 10

* **Dropout**: 0.7

* **Train/Val**: 70/15 per class

* **Label Smoothing**: 0.1

* **Horizontal Flipping**: 0.7

* **Weight Decay**: 0.1

* **Test Accuracy**: 66.67%

* **Training Run ID**: fe8fd35d791c4d90aa2a98473abc21b2

---

### **Experiment 24: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.7, Horizontal Flipping = 0.5, Weight Decay = 0.1, Epochs = 10**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 10

* **Dropout**: 0.7

* **Train/Val**: 70/15 per class

* **Horizontal Flipping**: 0.5

* **Weight Decay**: 0.1

* **Test Accuracy**: 66.67%

* **Training Run ID**: d64412ad1d1045e7a62e7beedc871989

---

### **Experiment 25: AdamW Optimizer, 1 Unfreezed Layer, Dropout = 0.8, Horizontal Flipping = 0.5, Weight Decay = 0.05, Epochs = 10**

* **Optimizer**: AdamW

* **Frozen Layers**: 1 layer unfreezed

* **Epochs**: 10

* **Dropout**: 0.8

* **Train/Val**: 70/15 per class

* **Horizontal Flipping**: 0.5

* **Weight Decay**: 0.05

* **Test Accuracy**: 66.67%

* **Training Run ID**: 79e48ec72f8648dd87123b8d067d8dbb

---

### **Experiment 26: AdamW Optimizer, 5 Unfreezed Layers, Dropout = 0.8, Horizontal Flipping = 0.5, Weight Decay = 0.05, Epochs = 10**

* **Optimizer**: AdamW

* **Frozen Layers**: 5 layers unfreezed

* **Epochs**: 10

* **Dropout**: 0.8

* **Train/Val**: 70/15 per class

* **Horizontal Flipping**: 0.5

* **Weight Decay**: 0.05

* **Test Accuracy**: 80.95%

* **Training Run ID**: 106e4008a54a4baa92886afbb6457432

---

## **Summary of Results**

| Experiment | Optimizer | Unfreezed Layers | Dropout | Label Smoothing | Horizontal Flipping | Weight Decay | Epochs | Test Accuracy |
| ---------- | --------- | ---------------- | ------- | --------------- | ------------------- | ------------ | ------ | ------------- |
| **21**     | AdamW     | 1                | 0.5     | 0.05            | -                   | -            | 10     | 71.43%        |
| **22**     | AdamW     | 1                | 0.5     | 0.1             | -                   | -            | 10     | 66.67%        |
| **23**     | AdamW     | 1                | 0.7     | 0.1             | 0.7                 | 0.1          | 10     | 66.67%        |
| **24**     | AdamW     | 1                | 0.7     | 0.1             | 0.5                 | 0.1          | 10     | 66.67%        |
| **25**     | AdamW     | 1                | 0.8     | 0.05            | 0.5                 | 0.05         | 10     | 66.67%        |
| **26**     | AdamW     | 5                | 0.8     | -               | 0.5                 | 0.05         | 10     | 80.95%        |

---

## **Analysis and Insights**

1. **Effect of Dropout**:

   * **Dropout** in experiments 23, 24, and 25 had a noticeable effect on performance, with all experiments (dropout rates of 0.7, 0.8) resulting in lower accuracy (66.67%). This suggests that **higher dropout rates** may be negatively impacting the model's ability to generalize.

2. **Impact of Label Smoothing**:

   * **Label smoothing** did not seem to result in significant improvements in accuracy. Experiment 22, with label smoothing set to 0.05, achieved 71.43%, while Experiment 21 (label smoothing = 0.1) resulted in 66.67%, indicating that **label smoothing at higher values** could lead to performance degradation.

3. **Unfreezing More Layers**:

   * **Experiment 26**, which unfreezed **5 layers**, showed a strong improvement in performance, reaching **80.95%** accuracy. This suggests that **more layers** being unfreezed might allow the model to learn more complex features and patterns, potentially improving its performance.

4. **Weight Decay**:

   * **Weight decay** (with values of 0.05 and 0.1) did not appear to have a drastic impact on the modelâ€™s performance, with accuracy being generally consistent at **66.67%** in experiments 23, 24, and 25. However, **experiment 26** with a smaller **weight decay (0.05)** and **5 unfreezed layers** saw a noticeable improvement.

---

## **Conclusion**

* **Experiment 26** achieved the highest test accuracy of **80.95%**, likely due to the combination of **5 unfreezed layers**, **dropout = 0.8**, and a **moderate weight decay (0.05)**.
* **Higher dropout values** (e.g., 0.7, 0.8) seem to harm the model's performance, with several experiments showing consistent accuracy of **66.67%**.
* **Label smoothing** had a minimal impact on performance and may require fine-tuning to improve its effectiveness.

---

## **Summary of Results**

| Experiment | Optimizer | Weight Decay | Unfreezed Layers | Dropout | Augmentation                     | Epochs | Train/Val (per class) | Test Accuracy |
| ---------: | --------- | ------------ | ---------------- | ------- | -------------------------------- | ------ | --------------------- | ------------- |
|      **1** | Adam      | None         | 3                | 0.0     | None                             | 15     | 80 / 20               | 71.43%        |
|      **2** | AdamW     | 0.05         | 3                | 0.0     | None                             | 10     | 80 / 20               | 76.19%        |
|      **3** | AdamW     | 0.05         | 2                | 0.0     | None                             | 10     | 80 / 20               | 71.43%        |
|      **4** | AdamW     | 0.05         | 3                | 0.5     | None                             | 10     | 80 / 20               | 76.19%        |
|      **5** | AdamW     | 0.05         | 1                | 0.5     | None                             | 10     | 80 / 20               | **80.95%**    |
|      **6** | AdamW     | 0.05         | 1                | 0.5     | Temporal (Stride, Start)         | 10     | 80 / 20               | 66.67%        |
|      **7** | AdamW     | 0.05         | 1                | 0.5     | Temporal (Stride, Start, Jitter) | 10     | 80 / 20               | 66.67%        |
|      **8** | AdamW     | 0.05         | 1                | 0.5     | Random Horizontal Flip (p=0.5)   | 10     | 80 / 20               | 71.43%        |
|      **9** | AdamW     | 0.05         | 1                | 0.5     | Random Horizontal Flip (p=0.7)   | 10     | 80 / 20               | 71.43%        |
|     **10** | AdamW     | 0.05         | 2                | 0.5     | None                             | 15     | 80 / 20               | 76.19%        |
|     **11** | AdamW     | 0.05         | 2                | 0.6     | None                             | 15     | 80 / 20               | 71.43%        |
|     **12** | AdamW     | 0.05         | 1                | 0.5     | None                             | 15     | 80 / 20               | 71.43%        |
|     **13** | AdamW     | 0.05         | 1                | 0.5     | None                             | 5      | 80 / 20               | 71.43%        |
|     **14** | AdamW     | 0.05         | 2                | 0.5     | None                             | 8      | 80 / 20               | 71.43%        |
|     **15** | AdamW     | 0.05         | 1                | 0.5     | None                             | 10     | 80 / 20               | 66.67%        |
|     **16** | AdamW     | 0.05         | 2                | 0.5     | None                             | 10     | 80 / 20               | 71.43%        |
|     **17** | AdamW     | 0.05         | 2                | 0.5     | None                             | 10     | 70 / 15               | 76.19%        |
|     **18** | AdamW     | 0.05         | 1                | 0.5     | None                             | 10     | 70 / 15               | **80.95%**    |
|     **19** | AdamW     | 0.05         | 1                | 0.5     | None                             | 15     | 70 / 15               | 76.19%        |
|     **20** | AdamW     | 0.10         | 1                | 0.5     | None                             | 15     | 70 / 15               | 71.43%        |
|     **21** | AdamW     | 0.10         | 1                | 0.5     | None                             | 10     | 70 / 15               | 71.43%        |


