## How Well Do the Models Do Their Jobs? - Edison

## Visualizations - Angie

### Training vs. Testing Loss (Before Augmentation)
![My Image](project_images/screenshot1.png)
***Figure 1. Training and testing loss over 5 epochs before applying data augmentation.***

Before augmentation, the training loss drops quickly while the test loss remains high and unstable, including a sharp spike at epoch 3. This behavior indicates overfitting, where the model memorizes the training data rather than learning generalizable features.

### Training vs. Testing Loss (After Augmentation)
![My Image](project_images/screenshot2.png)
***Figure 2. Training and testing loss over 15 epochs after applying data augmentation.***

After augmentation, the training loss still steadily decreases, but the test loss becomes more stable and shows fewer severe spikes. Although the test loss remains relatively high, the model is forced to generalize better because the augmented images add variability. This shows reduced overfitting compared to the first model, even though perfect generalization is not reached.

### Predicted Labels for Validation Images
![My Image](project_images/screenshot3.png)
***Figure 3. Model predictions for images in the validation set. Each column corresponds to a different genre, and each image is labeled with the predicted class.***

These predictions demonstrate how the trained model interprets visual features from new gameplay screenshots. Genres with distinctive visuals (such as puzzle games’ bright colors or strategy games’ overhead views) are classified more accurately. Errors typically occur in visually similar categories (e.g., RPG vs. shooter), highlighting the challenges of learning genre-specific visual patterns.

### Confusion Matrix for Genre Classification
![My Image](project_images/screenshot4.png)
***Figure 4. Confusion matrix showing the number of validation images predicted for each genre. Darker colors indicate higher counts along the diagonal, representing correct predictions.***

The confusion matrix shows that the model performs very well on puzzle, racing, and strategy images, with strong diagonal values indicating accurate classification. However, it struggles more with RPG and shooter images, which are sometimes confused with each other. This is likely due to similar camera perspectives or action-heavy scenes. Overall, the matrix highlights which genres are easiest to recognize and where the model’s weaknesses lie.

### RNN Prediction Grid (Pure Action, RPG, Strategy)
![My Image](project_images/image.png)
***Figure 5. Grid of RNN predictions on test images using three genres. Each tile shows the predicted label, true label, and the model’s output probabilities.***

This figure shows the RNN’s predicted labels for test images after training on a subset of three genres. While the model correctly identifies several pure action it struggles significantly when distinguishing RPG from strategy.

### RNN Loss Curve (pure action, rpg, strategy)
![My Image](project_images/image2.png)
***Figure 6. Training and testing loss curves for the RNN over 50 epochs. The training loss continually decreases while test loss begins rising, indicating overfitting.***

This plot shows the RNN’s training and testing loss over 50 epochs for a subset of genres. The training curve steadily decreases, indicating the model is learning patterns from the training data. In contrast, the test loss begins increasing after ~15 epochs, revealing clear overfitting. 

### RNN Training and Testing Loss (Genres: shooter, puzzle, rpg, strategy, racing)
![My Image](project_images/image3.png)
***Figure 7. RNN training and testing loss curves for five genres. Test loss becomes unstable and rises early, indicating severe overfitting.***

When trained on the full genre set, the RNN shows even stronger overfitting. Training loss decreases smoothly, but test loss rises sharply as training progresses. This indicates the RNN struggles to model the visual structure of screenshot data.

### RNN Predicted Labels for Test Images (Genres: shooter, rpg, puzzle, racing, strategy)
![My Image](project_images/image4.png)
***Figure 8. Grid of RNN predictions on the five genres. Predictions include confidence probabilities and true labels for each test image.***

The model shows several correct classifications for shooterimages, but the model frequently misclassifies genres such as RPG and strategy. Many predictions show incorrect labels. With five genres, the RNN struggles even more. 

### CNN Predicted Labels for Test Images (Genres: pure action, strategy, rpg)
![My Image](project_images/image5.png)
***Figure 9. Grid of CNN predictions for three genres. Each image shows the predicted and true labels along with output probabilities.***

Compared to the RNN, the CNN makes fewer miclassifications. The model correctly identifies most pure action and strategy images.

### CNN Training and Testing Loss (Genres: pure action, strategy, rpg)
![My Image](project_images/image6.png)
***Figure 10. CNN loss curves showing stable test loss and steadily decreasing training loss. The smaller gap between curves suggests good generalization.***

Compared to the RNN, the CNN models training loss decreases steadily while test loss increases only mildly. Some overfitting is present, but its magnitude is smaller

### CNN Training and Testing Loss (Genres: pure action, strategy, rpg)
![My Image](project_images/image7.png)
***Figure 11. CNN predictions on the five-genre dataset***

The CNN handles the five-genre task far better than the RNN. Puzzle and shooter images are identified more often, while rpg, racing, and strategy still show confusion. 

### CNN Training and Testing Loss (Genres: shooter, puzzle, rpg, strategy, racing)
![My Image](project_images/image8.png)
***Figure 12. CNN training and testing loss curves for the five genres.***

This plot displays the CNN’s training and testing loss across all five genres. Training loss decreases smoothly, while test loss fluctuates and rises after several epochs, indicating overfitting. However, the gap between curves is significantly smaller than the RNN's.