# 2. Model Development and Training

This notebook details the architecture, compilation, and training process for the two models developed in this project:
1.  **Model #1:** A custom Convolutional Neural Network (CNN) built from scratch.
2.  **Model #2:** A more advanced model utilizing transfer learning with a pre-trained MobileNetV2 base.

All training is performed on the V2 (cropped) dataset. The actual training is executed via the `src/train.py` script for reproducibility, but the model definitions and training setup are documented here.

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.applications import MobileNetV2
from pathlib import Path

# --- Configuration ---
PROJECT_ROOT = Path.cwd().parent
DATA_DIR = PROJECT_ROOT / "data"
IMAGE_SIZE = (150, 150)
NUM_CLASSES = 4 # rock, paper, scissors, none

## 2.1. Model #1: Custom CNN from Scratch

This model serves as a baseline to understand the difficulty of the task without pre-trained knowledge.

### Architecture Justification
-   **Convolutional Blocks (3x):** The model uses a stack of three `Conv2D` and `MaxPooling2D` layers. The number of filters increases (32 -> 64 -> 128) to allow the network to learn increasingly complex features, from simple edges in the first layer to more intricate shapes in deeper layers.
-   **MaxPooling:** After each convolution, max pooling is used to downsample the feature maps, making the learned features more robust to variations in position and reducing the number of parameters.
-   **Flatten & Dense Layers:** The 2D feature maps are flattened into a 1D vector to be processed by a fully-connected `Dense` layer.
-   **Dropout:** A `Dropout` layer with a rate of 0.4 is included before the final output layer. This is a crucial regularization technique to combat overfitting by randomly deactivating neurons during training, forcing the network to learn more redundant and robust representations.
-   **Output Layer:** The final `Dense` layer has `NUM_CLASSES` neurons with a `softmax` activation function to output a probability distribution over the four classes.

In [None]:
def create_scratch_model(input_shape, num_classes):
    """Defines the custom CNN architecture."""
    model = models.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(128, (3, 3), activation='relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Flatten(),
        layers.Dense(512, activation='relu'),
        layers.Dropout(0.4),
        layers.Dense(num_classes, activation='softmax')
    ])
    return model

# Instantiate and print summary
scratch_model = create_scratch_model(input_shape=IMAGE_SIZE + (3,), num_classes=NUM_CLASSES)
scratch_model.summary()

## 2.2. Model #2: Transfer Learning with MobileNetV2

This model leverages a pre-trained network to achieve higher performance with less data.

### Architecture Justification
-   **Base Model (MobileNetV2):** MobileNetV2 was chosen as the base model. It is a powerful, state-of-the-art architecture that is also lightweight and efficient. It has already been trained on the massive ImageNet dataset, learning a rich hierarchy of visual features.
-   **Freezing the Base:** The convolutional base of MobileNetV2 is frozen (`base_model.trainable = False`). This prevents the pre-trained weights from being updated during initial training, preserving the valuable learned features. We only want to train our new classifier head.
-   **Custom Classifier Head:**
    -   `GlobalAveragePooling2D`: This layer is added after the base model to efficiently downsample the feature maps into a single vector, drastically reducing the number of parameters compared to a `Flatten` layer.
    -   `Dropout`: A dropout layer is still used to regularize our new classifier head and prevent it from overfitting.
    -   `Dense` Output Layer: The final layer is identical to the scratch model, tailored for our 4-class problem.

In [None]:
def create_transfer_model(input_shape, num_classes):
    """Defines the transfer learning architecture."""
    base_model = MobileNetV2(input_shape=input_shape,
                             include_top=False,
                             weights='imagenet')
    base_model.trainable = False

    model = models.Sequential([
        base_model,
        layers.GlobalAveragePooling2D(),
        layers.Dropout(0.3),
        layers.Dense(num_classes, activation='softmax')
    ])
    return model

# Instantiate and print summary
transfer_model = create_transfer_model(input_shape=IMAGE_SIZE + (3,), num_classes=NUM_CLASSES)
transfer_model.summary()