![Practicum AI Logo image](https://github.com/PracticumAI/practicumai.github.io/blob/main/images/logo/PracticumAI_logo_250x50.png?raw=true) <img src="https://github.com/PracticumAI/practicumai.github.io/blob/84b04be083ca02e5c7e92850f9afd391fc48ae2a/images/icons/practicumai_computer_vision.png?raw=true" alt="Practicum AI: Computer Vision icon" align="right" width=50>
***

# Computer Vision Concepts

You may recall *Practicum AI*'s heroine Amelia, the AI-savvy nutritionist. At the end of our [*Deep Learning Foundations* course](https://practicumai.org/courses/deep_learning/), Amelia was helping with a computer vision project. Her colleague, an entomologist named Kevin, had a dataset of images of bees and wasps and wanted to classify them.

**For this exercise, we have made a subsample of those data to make training easier.**

![Image of bees and wasps from the dataset cover image](https://github.com/PracticumAI/deep_learning/blob/main/images/bees_wasps_dataset-cover.png?raw=true)


## AI Pathway review for Bees vs Wasps

If you have taken our [*Getting Started with AI* course](https://practicumai.org/courses/getting_started/), you may remember this figure of the **AI Application Development Pathway**. Let's take a quick review of how we applied this pathway in the case of the Bees vs Wasps example.

![AI Application Development Pathway image showing the 7 steps in developing an AI application](https://practicumai.org/getting_started/images/application_dev_pathway.png)

1. **Choose a problem to solve:** In this example, we need to classify images as bees, wasps, other insects, or a non-insect. 
2. **Gather data:** The data for the example comes from [Kaggle](https://www.kaggle.com/datasets/jerzydziewierz/bee-vs-wasp), a great repository of datasets, code, and models. Again, we have subsampled the full dataset to make training times more reasonable for the exercises and for most notebooks that deal with the dataset imbalance issues. We will explore the data imbalance problem in [notebook 01.4_data_imbalance.ipynb](01.4_data_imbalance.ipynb).
3. **Clean and prepare the data:** In the *Deep Learning Foundations* course, we assumed that this was done for us. One issue that we ran into was that of class imbalance. There are many more images in some classes than others, leading to a poorly performing model. For most exercises, we have created a smaller, balanced dataset.
4. **Choose a model:** In the *Deep Learning Foundations* course, we presented the model with little detail. Now that we know more about Convolutional Neural Networks (CNNs) and some other tools at our disposal, we will explore the model in more detail.
   * As part of the iterative process of training models, one thing we noticed is that most of our models were **overfitting** — performing better on the training data than they did on the testing data. Essentially, the models memorized the training data but did not generalize well to new data that had not been seen. 
      * In this notebook, we will explore **dropout** as one mechanism to mitigate overfitting.
5. **Train the model:** In training the model, we may have had a few issues. With so many hyperparameters to tune, it's easy to lose track of what combinations have been tried and how changes impacted model performance. 
   * In this notebook, we introduce you to [TensorBoard](https://www.tensorflow.org/tensorboard), a popular tool in a class of tools known as **experiment tracking** or **MLOps (Machine learning operations) tools**. These tools help track changes to hyperparameters, the training process, and the data. They allow comparison among runs and can even automate multiple runs for you. Learning to use MLOps tools will help you as you continue to learn more about AI workflows.
6. **Evaluate the model:** We will continue to assess how the model performs on the validation set and adjust the model and hyperparameters to attempt to produce a better model.
7. **Deploy the model:** We won't get to this stage in this exercise, but hopefully, we will end up with a model that could be deployed and achieve relatively good accuracy at solving the problem.


## PyTorch and PyTorch Lightning

Since the introduction of the *Practicum AI* program, the AI landscape has shifted significantly (no real surprise there!!). While [TensorFlow](https://www.tensorflow.org/) seemed like a good choice when we started making courses in 2021, and it can certainly be easier to get started with, the reality is that [Pytorch](https://pytorch.org/) has gained much more popularity. At this point, the *Practicum AI* team has made the decision to transition our code to use PyTorch. 

### PyTorch Lightning 

<img src='images/Lightning_logo.svg' alt="PyTorch Lightning logo" width="20%" align="right">

[PyTorch Lightning](https://lightning.ai/) is an open-source framework built on top of PyTorch that makes training deep learning models more straightforward. It abstracts many common tasks like managing training loops, logging, checkpointing, and handling hardware setups, allowing you to focus on the core aspects of your model and experimentation. 

Rather than writing repetitive code, you define key methods—such as `training_step` and `validation_step`—to describe the model's behavior while the Lightning trainer automates optimization details, synchronization, and even distributed training. This separation between scientific code and engineering routines leads to cleaner, more maintainable projects that are easier to scale.

Additionally, PyTorch Lightning integrates smoothly with popular tools such as [TensorBoard](https://www.tensorflow.org/tensorboard), which simplifies tracking experiment metrics and visualizing performance. Overall, Lightning streamlines the training workflow, boosts reproducibility, and helps both beginners and seasoned researchers concentrate on innovation, not boilerplate coding.

This course will make use of Lightning to simplify training.

The Beginner Series of courses has also be updated to have a PytTorch version.

## Outline of this notebook

This notebook covers a fair bit of ground. To orient you, here's an outline of the topics covered. Note that you can also open the notebook outline to see section headers.

1. Run through loading the data and exploring it a bit ([sections 1](#1.-Import-the-libraries-we-will-use) through 5).
1. Set initial hyperparameters, train a CNN model, and evaluate the performance ([sections 6](#6.-Train-the-Model) through 7).
1. Explore Tensorboard as a tool to gain more insight into model performance ([section 8](#8.-View-training-metrics-in-TensorBoard)).
1. Summarize the results obtained so far ([section 9](#9.-Summary-so-far))
1. Explore what a convolutional kernel is in more detail, visualizing kernels and convolved images ([section 10](#10.-A-look-inside-CNNs))
1. Add dropout to our model ([section 11](#11.-Dropout)) and discuss the padding and stride hyperparameters ([section 12](#12.-Padding-and-stride-for-convolutional-layers))
1. Experiment with hyperparameters ([section 13](#13.-Experimenting-with-Hyperparameters))

## 1. Import the libraries we will use

In [None]:
import torch
import torch.nn.functional as F
import os
import matplotlib.pyplot as plt
import numpy as np
import random


from torch.utils.data import random_split
from torchvision.transforms.functional import to_pil_image

from sklearn.model_selection import train_test_split
import pytorch_lightning as pl

# Many functions are moved to helpers_01.py to keep this file clean.
import helpers_01

# Set seed for reproducibility
pl.seed_everything(42)

## 2. Check PyTorch installation

In [None]:
# Print Pytorch versions and check for GPU
print(f"Pytorch version: {torch.__version__}")
print(f'  Should be "True" if Pytorch was built for GPU: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    print(f"  Available GPU: {torch.cuda.get_device_name()}")
else:
    print("  No GPU available, will use CPU")

## 3. Getting the data

For details about the dataset and the code used to get the data, you can look at the [helpers_01.py file](helpers_01.py). 

If you need to download the data, it is [hosted for public download from HiPerGator as a `tar.gz` file](https://data.rc.ufl.edu/pub/practicum-ai/Computer_Vision/bee_vs_wasp_reduced.tar.gz). If you need to manually extract the data, you can add a cell and run: `helpers_01.extract_file("bee_vs_wasp_reduced.tar.gz", "data")`


In [None]:
# Check for the data.
# This will look for the data files required for this notebook in some common locations.
#   * If it can't find the data, it will ask if you know where it is.
#   * If you do, answer yes and provide the path to the data
#       (up to and including the `bee_vs_wasp` folder name).
#   * If not, it will ask if you want to download it.
#      * If you answer yes, it will download the data and
#        extract it into your data folder.

data_path = helpers_01.manage_data(
    folder_name="bee_vs_wasp_reduced",
)

## 4. Setting the number of workers

In machine learning, especially when working with large datasets, it is often necessary to load data in parallel to speed up the training process. This is where the concept of "workers" comes into play. Each worker is a process that can load data independently, allowing for faster data preparation and feeding into the model.

One of the really great things about PyTorch Lightning is that it can efficiently use multiple CPUs (or cores) using these worker processes. This can be important in keeping the GPU fed with data rather than sitting idle waiting for data.

Up to a certain point, more workers will help keep the GPU fed with data. Requesting more workers than the number of available CPU cores will hurt performance, however.

The cell below will set the number of workers. The options here are:
- Using a manually set number (change `None` in the first line of code to an integer)
- Using a number from the Slurm scheduler. Slurm is the software that manages the compute load for many high-performance computer clusters, such as UF's HiPerGator.
- Using the number of CPU cores on the computer. 

In [None]:
# Set the number of workers to use for data loading
num_workers = None  # To manually set the number of workers, change this to an integer

if num_workers is None:
    # If Slurm is being used, set the number of workers to a Slurm-provided value.
    # If Slurm is not being used, set the number of workers to the number of available CPU cores.
    if os.getenv("SLURM_CPUS_PER_TASK") is not None:
        num_workers = int(os.getenv("SLURM_CPUS_PER_TASK"))
    elif os.getenv("SLURM_NTASKS_PER_NODE") is not None:
        num_workers = int(os.getenv("SLURM_NTASKS_PER_NODE"))
    elif os.getenv("SLURM_NTASKS") is not None:
        num_workers = int(os.getenv("SLURM_NTASKS"))
    else:
        num_workers = os.cpu_count()

print(f"Using {num_workers} workers for data loading.")

## 5. Examine some images

Many of the steps in this notebook are written as functions, making it easier to run these steps repeatedly as you work on optimizing the various hyperparameters.

The `helpers_01.load_display_data()` function takes: 
* A path to the data: set from above.
* The batch size: set as 32 below, but a good hyperparameter to tune.
* Target shape for images: set as 80x80 color images below, another possible hyperparameter.
* Whether or not to show sample images.
* The train/validation split
* The number of workers

The function returns training and validation datasets. To help highlight the class imbalance issue, the function has been updated to report the number of images and the percentage of the total in each class.

In [None]:
# Use the Lightning DataModule from helpers_01
data_module = helpers_01.load_display_data(
    data_path,
    batch_size=32,
    shape=(80, 80, 3),
    show_pictures=True,
    train_split=0.8,  # 80% train, 20% validation
    num_workers=num_workers,  # Number of workers for data loading
)

## 6. Train the Model

Using PyTorch Lightning's standard patterns, we create a LightningModule that defines:

- **Model architecture** in `__init__` and `forward`
- **Training logic** in `training_step` 
- **Validation logic** in `validation_step`
- **Optimizer configuration** in `configure_optimizers`
- **Built-in metrics** using Lightning's torchmetrics integration

The training process is handled by Lightning's `Trainer` which provides:
- Automatic GPU/CPU handling
- Built-in logging to TensorBoard
- Model checkpointing and early stopping
- Progress bars and model summaries

In [None]:
# Train the model using standard PyTorch Lightning approach
model, trainer = helpers_01.train_model(
    data_module=data_module,
    num_classes=4,
    learning_rate=0.001,
    max_epochs=10,
    accelerator="auto",  # Lightning will choose GPU if available
    devices="auto",  # Lightning will choose optimal device count
    input_shape=(3, 80, 80),  # Match the shape used in data loading
)

## 7. Evaluate the model

The `test_model` function will now provide a comprehensive evaluation, including:
- Test accuracy and loss metrics
- Training and validation loss curves over epochs
- Training and validation accuracy curves over epochs  
- Confusion matrix showing prediction accuracy per class
- Per-class precision, recall, and F1 scores

These visualizations help us understand:
- **Loss curves**: Whether the model is overfitting (validation loss increases while training loss decreases)
- **Accuracy curves**: How well the model generalizes to unseen data
- **Confusion matrix**: Which classes the model confuses with each other
- **Per-class metrics**: How well the model performs on each individual class

In [None]:
# Test the model and display comprehensive evaluation plots
# The debug output will show us what TensorBoard metrics are actually available
test_results = helpers_01.test_model(data_module, model, trainer)

## 8. View training metrics in TensorBoard

Now that we've run one training cycle, we can open TensorBoard and have a look at the visualizations it provides to evaluate training performance.

The detailed instructions for different platforms are in the course content. In general, we use the `tensorboard --logdir ./logs` command to start TensorBoard and then connect in a Web browser. Here's a screenshot of what that might look like:

![Screenshot of the TensorBoard web page](images/tensorboard_screenshot.png)

In [None]:
# View TensorBoard logs
# %load_ext tensorboard
# %tensorboard --logdir lightning_logs/

## 9. Summary so far

A key point that we can take away so far is:

* Most of the time, our model struggles to do better than about 70% accuracy on the validation data. Accuracy on the training data is closer to 90%. This suggests that our model is overfit to the training data.

Before we move on to working more to improve the model, let's take a quick look at the inner workings of the convolutional kernels.

## 10. A look inside CNNs
To get an idea for what is happening *inside* this model, let's look at a **feature map**. Below we see a vertical edge detection filter applied to a sunflower picture, resulting in a feature map of that image.

![Two pictures of a sunflower. On the left is the original image, on the right is the convolved image resulting from applying an edge detecting convolutional filter.](images/filtered_sunflower_nb01.jpg)

Imagine you're a detective investigating a scene.  A feature map is like a sketch you create, focusing on specific details that might be clues to solving the case.  In a CNN, the "case" is recognizing patterns in an image, and the feature maps capture these patterns at different levels of complexity. Early layers might create feature maps that detect basic edges, corners, or blobs of color. As the network progresses through more layers, the feature maps become more intricate, combining these simpler features to represent more complex objects or shapes.

Getting a bit more technical, a feature map is a 2D array of activations produced by applying a convolutional filter to an input image or a previous layer's feature map. It essentially captures the presence and strength of specific visual features that the filter is optimized to detect within the input.

The **convolutional filters** (also just called "filters" or "kernels") are small matrices containing learnable weights. The filter "slides" across the input image, performing element-wise multiplication with the underlying image data at each position. The results of the multiplications are summed and then passed through an activation function (like ReLU) to introduce non-linearity and help the network learn complex features. A convolutional layer typically has multiple filters, each generating a separate feature map. These feature maps capture different aspects of the input, providing a richer representation of the image.

### <img src='images/note_icon.svg' width=40, align='center' alt='Note icon'> Note
> The above sunflower example could potentially be a bit misleading. While a model *might* develop a vertical edge detection filter, the model develops its filters' weights through the same backpropagation process as other deep neural networks. Most of the filters and their resulting feature maps will not be as easily interpretable as the vertical edge detection filter.

In [None]:
# Get the filters from the first convolutional layer of the PyTorch model
model.eval()  # Set model to evaluation mode
filters = (
    model.conv1.weight.data.cpu().numpy()
)  # Shape: (out_channels, in_channels, height, width)

# Get the first batch of images from the validation set
val_loader = data_module.val_dataloader()
images, labels = next(iter(val_loader))

# Take only the first image for visualization
sample_image = images[0:1]  # Keep batch dimension
sample_label = labels[0]

# Ensure the input tensor is on the same device as the model
device = next(model.parameters()).device
sample_image = sample_image.to(device)

# Get feature maps from the first conv layer
with torch.no_grad():
    # Forward pass through first conv + relu + pool
    x = model.relu(model.conv1(sample_image))
    feature_maps_conv1 = x.cpu().numpy()
    feature_maps_pool1 = model.pool(x).cpu().numpy()

# Convert sample image to numpy for display
sample_image_np = sample_image[0].permute(1, 2, 0).cpu().numpy()

# Normalize the filters and feature maps for better visualization
# PyTorch filters have shape (out_channels, in_channels, height, width)
# We'll take the first 3 filters and visualize them
num_filters = min(3, filters.shape[0])
normal_filters = (filters - filters.min()) / (filters.max() - filters.min())

plt.figure(figsize=(12, 10))

# ----- Display First 3 Filters -----
for i in range(num_filters):
    # Get the i-th filter (shape: in_channels, height, width)
    filter_weights = normal_filters[i]  # Shape: (3, 3, 3) for RGB input

    # Create RGB visualization by treating channels as RGB
    if filter_weights.shape[0] == 3:  # RGB input
        # Rearrange from (C, H, W) to (H, W, C) for display
        filter_rgb = np.transpose(filter_weights, (1, 2, 0))
        # Normalize to [0, 1] range
        filter_rgb = (filter_rgb - filter_rgb.min()) / (
            filter_rgb.max() - filter_rgb.min()
        )
    else:
        # For non-RGB, just show the first channel
        filter_rgb = filter_weights[0]

    plt.subplot(3, 3, i + 1)
    if len(filter_rgb.shape) == 3:
        plt.imshow(filter_rgb)
    else:
        plt.imshow(filter_rgb, cmap="gray")
    plt.title(f"Filter {i}")
    plt.axis("off")

# ----- Original Image -----
plt.subplot(3, 3, 5)  # Center position
# Normalize image for display (assuming it's normalized for training)
img_display = sample_image_np.copy()
# If image was normalized during training, denormalize it
img_display = (img_display - img_display.min()) / (
    img_display.max() - img_display.min()
)
plt.imshow(img_display)
plt.title("Original Image")
plt.axis("off")

# ----- Feature Maps from First Conv Layer -----
num_feature_maps = min(3, feature_maps_conv1.shape[1])
for i in range(num_feature_maps):
    plt.subplot(3, 3, i + 7)  # Bottom row
    feature_map = feature_maps_conv1[0, i, :, :]  # Get i-th feature map
    plt.imshow(feature_map, cmap="gray")
    plt.title(f"Feature Map {i}")
    plt.axis("off")

plt.suptitle("Visualizing PyTorch CNN - Filters and Feature Maps")
plt.tight_layout()
plt.show()

# Print some information about the shapes
print(f"Filter weights shape: {filters.shape}")
print(f"Sample image shape: {sample_image.shape}")
print(f"Feature maps after conv1 shape: {feature_maps_conv1.shape}")
print(f"Sample image label: {sample_label}")

The learned weights for each of the color channel pixel intensity values of the filters above are printed inside their rendered cells (the weights for the red channel are in red, green in green, and blue in blue). The colors of the filters are shown for illustrative purposes, it's the weights that matter!

As mentioned above, the feature maps you see may not be as easily interpretable as the edge detection filter to us, but they *are* useful to the model. They help the model learn to recognize patterns in the images. You can rerun the cell above to see the feature maps for different images.

Below, we'll look at a few more feature maps, this time from the first, second, and third convolutional layers of our model. This will give us an idea of what the model is learning at different levels of abstraction. In the first Module we explained that each layer looks at a larger area of the image, so the feature maps from the first layer will be more detailed than those from the second layer, and so on.

### <img src='images/note_icon.svg' width=40, align='center' alt='Note icon'> Note
> If you have changed the model architecture, you may need to adjust the layer numbers below!


In [None]:
# Get feature maps from different layers of the PyTorch model
model.eval()

# Get the first batch of images from the validation set
val_loader = data_module.val_dataloader()
images, labels = next(iter(val_loader))

# Take only the first image for visualization
sample_image = images[0:1]  # Keep batch dimension

# Ensure the input tensor is on the same device as the model
device = next(model.parameters()).device
sample_image = sample_image.to(device)

with torch.no_grad():
    # Get feature maps from first conv layer (after ReLU and pooling)
    x1 = model.relu(model.conv1(sample_image))
    feature_maps_conv1 = model.pool(x1).cpu().numpy()

    # Get feature maps from second conv layer (after ReLU and pooling)
    x2 = model.relu(model.conv2(x1))
    feature_maps_conv2 = model.pool(x2).cpu().numpy()

    # The "last" layer would be the final conv layer (conv2 in our simple model)
    # So we'll use conv2 as both middle and last for demonstration
    feature_maps_last = feature_maps_conv2

# Convert sample image to numpy for display
sample_image_np = sample_image[0].permute(1, 2, 0).cpu().numpy()
sample_image_np = (sample_image_np - sample_image_np.min()) / (
    sample_image_np.max() - sample_image_np.min()
)

# Select random feature maps to display
num_filters_conv1 = feature_maps_conv1.shape[1]
num_filters_conv2 = feature_maps_conv2.shape[1]

# Ensure we don't try to select more indices than available
random_indices_conv1 = random.sample(
    range(num_filters_conv1), min(3, num_filters_conv1)
)
random_indices_conv2 = random.sample(
    range(num_filters_conv2), min(3, num_filters_conv2)
)
random_indices_last = random_indices_conv2  # Same as conv2 for our simple model

plt.figure(figsize=(10, 12))

# ----- Original Image -----
plt.subplot(4, 3, 2)  # Position 2 in top row
plt.imshow(sample_image_np)
plt.title("Original Image")
plt.axis("off")

# ----- First Layer Feature Maps (Conv1 + Pool) -----
for i, idx in enumerate(random_indices_conv1):
    plt.subplot(4, 3, i + 4)  # Second row: positions 4, 5, 6
    feature_map = feature_maps_conv1[0, idx, :, :]
    plt.imshow(feature_map, cmap="gray")
    plt.title(f"Conv1 Feature Map {idx}")
    plt.axis("off")

# ----- Second Layer Feature Maps (Conv2 + Pool) -----
for i, idx in enumerate(random_indices_conv2):
    plt.subplot(4, 3, i + 7)  # Third row: positions 7, 8, 9
    feature_map = feature_maps_conv2[0, idx, :, :]
    plt.imshow(feature_map, cmap="gray")
    plt.title(f"Conv2 Feature Map {idx}")
    plt.axis("off")

# ----- "Last" Layer Feature Maps (same as Conv2 for our simple model) -----
for i, idx in enumerate(random_indices_last):
    plt.subplot(4, 3, i + 10)  # Fourth row: positions 10, 11, 12
    feature_map = feature_maps_last[0, idx, :, :]
    plt.imshow(feature_map, cmap="gray")
    plt.title(f"Final Feature Map {idx}")
    plt.axis("off")

plt.suptitle("PyTorch CNN - Feature Maps Across Layers")
plt.tight_layout()
plt.show()

# Print information about the feature map shapes
print(f"Original image shape: {sample_image.shape}")
print(f"Conv1 feature maps shape: {feature_maps_conv1.shape}")
print(f"Conv2 feature maps shape: {feature_maps_conv2.shape}")
print(f"Selected Conv1 feature map indices: {random_indices_conv1}")
print(f"Selected Conv2 feature map indices: {random_indices_conv2}")

Just as with the other hyperparameters in Section 5 above, the number of filters, the size of the filters, and the stride of the filters are all hyperparameters that can be adjusted. You can also add or remove convolutional and pooling layers, or add dropout layers.


## 11. Dropout

Dropout layers are a regularization technique that helps prevent overfitting by randomly setting a fraction of input units to 0 at each update during training. In PyTorch, you can add dropout using:
    
```python
   nn.Dropout(0.5)  # 50% dropout rate (which is actually higher than generally used)
```

## 12. Padding and stride for convolutional layers

In PyTorch, you can adjust the stride and padding of convolutional layers using the `stride` and `padding` arguments in `nn.Conv2d`. The `stride` parameter controls how much the filter moves at each step, while `padding` adds zeros (or other value) around the input. Here are some examples:

```python
    # Standard convolutional layer with padding=1 (keeps spatial dimensions roughly the same)
    nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1)
    
    # Convolutional layer with stride=2 (reduces spatial dimensions by half)
    nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1)
    
    # MaxPooling layer (typically no padding needed)
    nn.MaxPool2d(kernel_size=2, stride=2)
```

The `train_model` function accepts `dropout_rate` and `conv_padding` parameters, allowing you to experiment with different configurations to reduce overfitting and control the spatial dimensions of your feature maps.

In [None]:
# Example: Train a model with different dropout and padding settings

# Example 1: High dropout to reduce overfitting
print("Training model with high dropout (0.5) to reduce overfitting...")
model_dropout, trainer_dropout = helpers_01.train_model(
    data_module=data_module,
    num_classes=4,
    learning_rate=0.001,
    max_epochs=10,  # Fewer epochs for demonstration
    dropout_rate=0.5,  # Add a dropout rate
    conv_padding=1,  # Standard padding
)

print("Model with dropout - check for reduced overfitting")
helpers_01.test_model(data_module, model_dropout, trainer_dropout)

## 13. Experimenting with Hyperparameters

The example above demonstrates how to train a model with different hyperparameter configurations. You can experiment with these parameters:
- **`dropout_rate`**: Try values from 0.0 (no dropout) to 0.8 (aggressive dropout)
- **`conv_padding`**: Usually 0, 1, or 2 depending on desired output size

Compare the training/validation curves and confusion matrices to see which configuration works best for reducing overfitting while maintaining good performance.

In [None]:
# Let's also try some extreme examples to see the effects more clearly

# Example 2: Very conservative model (low dropout, minimal padding)
print("Training conservative model (low dropout, minimal padding)...")
model_conservative, trainer_conservative = helpers_01.train_model(
    data_module=data_module,
    num_classes=4,
    learning_rate=0.001,
    max_epochs=5,
    dropout_rate=0.1,  # Very low dropout
    conv_padding=0,  # No padding (valid convolution)
)

# Example 3: Very aggressive regularization
print("\nTraining highly regularized model...")
model_aggressive, trainer_aggressive = helpers_01.train_model(
    data_module=data_module,
    num_classes=4,
    learning_rate=0.001,
    max_epochs=5,
    dropout_rate=0.8,  # Very high dropout
    conv_padding=2,  # High padding
)

print("\nConservative model results:")
helpers_01.test_model(data_module, model_conservative, trainer_conservative)

print("\nAggressive regularization model results:")
helpers_01.test_model(data_module, model_aggressive, trainer_aggressive)

### Continue experimenting with hyperparameters

Spend some time experimenting and see how good a model you can get. Remember to check TensorBoard to make comparisons easier.

----
## Push changes to GitHub <img src="images/push_to_github.png" alt="Push to GitHub icon" align="right" width=150>

 Remember to **add**, **commit**, and **push** the changes you have made to this notebook to GitHub to keep your repository in sync.

In Jupyter, those are done in the git tab on the left. In Google Colab, use File > Save a copy in GitHub.