## Import Required Libraries

This section imports all the necessary libraries for building models for image classification using PyTorch and related tools.

In [1]:
import torch.nn as nn
from torchinfo import summary

## SimpleCNN Model Architecture

The `SimpleCNN` class defines a straightforward convolutional neural network for grayscale image classification. Below are the key architectural details:

- **Version:** 1.0
- **Input:** Grayscale images of size 256x256 (1 channel)
- **Layers:**
    - **Convolutional Layer:**  
        - `nn.Conv2d` with 1 input channel, 1 output channel, kernel size 6x6, stride 4, no padding  
        - Followed by `nn.ReLU` activation
        - Followed by `nn.MaxPool2d` with kernel size 3x3, stride 3
    - **Flatten Layer:**  
        - Flattens the output from the convolutional block to a 1D tensor
    - **Fully Connected Layer:**  
        - `nn.Linear` with 441 input features (1 × 21 × 21) and 14 output classes

### Forward Pass

1. The input image passes through the convolutional block (`layer1`).
2. The output is flattened into a vector.
3. The flattened vector is passed through a fully connected layer (`fc1`) to produce class logits.

### Usage

This model is suitable for simple grayscale image classification tasks with 14 output classes. The architecture is intentionally minimal for educational or prototyping purposes.

In [2]:
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.version = '1.0'
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=1, kernel_size=6, stride=4, padding=0),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=3),
        )
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(1 * 21 * 21, 14)

    def forward(self, x):
        out = self.layer1(x)
        out = self.flatten(out)
        out = self.fc1(out)
        return out

summary(
    SimpleCNN(), 
    input_size=(1, 1, 256, 256), 
    col_names=[
        "kernel_size", "input_size", "output_size", 
        "num_params", "mult_adds", "trainable"
    ], 
    row_settings=["var_names", "depth"],
)

Layer (type (var_name):depth-idx)        Kernel Shape              Input Shape               Output Shape              Param #                   Mult-Adds                 Trainable
SimpleCNN (SimpleCNN)                    --                        [1, 1, 256, 256]          [1, 14]                   --                        --                        True
├─Sequential (layer1): 1-1               --                        [1, 1, 256, 256]          [1, 1, 21, 21]            --                        --                        True
│    └─Conv2d (0): 2-1                   [6, 6]                    [1, 1, 256, 256]          [1, 1, 63, 63]            37                        146,853                   True
│    └─ReLU (1): 2-2                     --                        [1, 1, 63, 63]            [1, 1, 63, 63]            --                        --                        --
│    └─MaxPool2d (2): 2-3                3                         [1, 1, 63, 63]            [1, 1, 21, 21]          

## Model Version Update and Input Adaptation

The `SimpleCNN` model has been upgraded to **version 1.1**. The primary change is in the input layer, which now expects **3-channel (RGB) images** instead of single-channel (grayscale) images. This update enables the model to process color images, expanding its applicability to a wider variety of image classification tasks.

### Key Details:

- **Input Layer:**  
    - Now configured for 3 input channels (`in_channels=3`), suitable for RGB images.
    - Previous version (1.0) accepted only grayscale images (`in_channels=1`).

- **Architecture:**  
    - The convolutional, pooling, flatten, and fully connected layers remain unchanged from the previous version.
    - The model outputs predictions for **14 classes**.

- **Image Preprocessing:**  
    - The transformation pipeline (`trans`) resizes images to 256x256 pixels and converts them to tensors.
    - The current sample image (`img`) is loaded in grayscale mode (`mode='L'`), which is incompatible with the new model expecting RGB input.
    - **Action Required:** To use the updated model, ensure images are loaded in RGB mode:
        ```python
        img = Image.open("path/to/image.jpg").convert('RGB')
        ```

- **Summary:**  
    - This version is better suited for real-world datasets where color information is important.
    - Always verify that your input data matches the model's expected input shape and channel configuration.

> **Note:** If you continue to use grayscale images, you must either convert them to RGB (by duplicating the single channel) or revert the model to accept single-channel input.

In [3]:
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.version = '1.1'
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=1, kernel_size=6, stride=4, padding=0),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=3),
        )
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(1 * 21 * 21, 14)

    def forward(self, x):
        out = self.layer1(x)
        out = self.flatten(out)
        out = self.fc1(out)
        return out

summary(
    SimpleCNN(), 
    input_size=(1, 3, 256, 256), 
    col_names=[
        "kernel_size", "input_size", "output_size", 
        "num_params", "mult_adds", "trainable"
    ], 
    row_settings=["var_names", "depth"],
)

Layer (type (var_name):depth-idx)        Kernel Shape              Input Shape               Output Shape              Param #                   Mult-Adds                 Trainable
SimpleCNN (SimpleCNN)                    --                        [1, 3, 256, 256]          [1, 14]                   --                        --                        True
├─Sequential (layer1): 1-1               --                        [1, 3, 256, 256]          [1, 1, 21, 21]            --                        --                        True
│    └─Conv2d (0): 2-1                   [6, 6]                    [1, 3, 256, 256]          [1, 1, 63, 63]            109                       432,621                   True
│    └─ReLU (1): 2-2                     --                        [1, 1, 63, 63]            [1, 1, 63, 63]            --                        --                        --
│    └─MaxPool2d (2): 2-3                3                         [1, 1, 63, 63]            [1, 1, 21, 21]          

## SimpleCNN Version 1.3: RGB Image Classification

This section documents **SimpleCNN version 1.3**, designed specifically for RGB image classification tasks.

- **Input:**  
    - Expects RGB images (3 channels) of size 256x256 pixels.

- **Architecture:**  
    - **Convolutional Layer:**  
        - `nn.Conv2d` with 3 input channels and 10 output channels, kernel size 6x6, stride 4, no padding.
        - Followed by `nn.ReLU` activation.
        - Followed by `nn.MaxPool2d` with kernel size 3x3, stride 3.
    - **Flatten Layer:**  
        - Flattens the output to a 1D tensor.
    - **Fully Connected Layer:**  
        - `nn.Linear` with 4410 input features (10 × 21 × 21) and 14 output classes.

- **Output:**  
    - Produces logits for 14 classes.

### Notes

- Ensure input images are loaded in RGB mode and preprocessed to 256x256 pixels.
- This version increases model capacity by using 10 convolutional output channels, making it more suitable for complex color image datasets.
- The architecture remains simple for ease of experimentation and educational use.


In [5]:
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.version = '1.2'
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=10, kernel_size=6, stride=4, padding=0),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=3),
        )
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(10 * 21 * 21, 14)

    def forward(self, x):
        out = self.layer1(x)
        out = self.flatten(out)
        out = self.fc1(out)
        return out

summary(
    SimpleCNN(), 
    input_size=(1, 1, 256, 256), 
    col_names=[
        "kernel_size", "input_size", "output_size", 
        "num_params", "mult_adds", "trainable"
    ], 
    row_settings=["var_names", "depth"],
)

Layer (type (var_name):depth-idx)        Kernel Shape              Input Shape               Output Shape              Param #                   Mult-Adds                 Trainable
SimpleCNN (SimpleCNN)                    --                        [1, 1, 256, 256]          [1, 14]                   --                        --                        True
├─Sequential (layer1): 1-1               --                        [1, 1, 256, 256]          [1, 10, 21, 21]           --                        --                        True
│    └─Conv2d (0): 2-1                   [6, 6]                    [1, 1, 256, 256]          [1, 10, 63, 63]           370                       1,468,530                 True
│    └─ReLU (1): 2-2                     --                        [1, 10, 63, 63]           [1, 10, 63, 63]           --                        --                        --
│    └─MaxPool2d (2): 2-3                3                         [1, 10, 63, 63]           [1, 10, 21, 21]         

## SimpleCNN Version 1.3: Architecture and Summary

The following cell implements **SimpleCNN version 1.3**, which is designed for RGB image classification tasks:

- **Input:**  
    - Expects RGB images (3 channels) of size 256x256 pixels.
- **Convolutional Layer:**  
    - 3 input channels, 10 output channels, kernel size 6x6, stride 4, no padding.
    - Activation: ReLU
    - Max pooling: 3x3 kernel, stride 3
- **Flatten Layer:**  
    - Converts the output to a 1D tensor.
- **Fully Connected Layer:**  
    - 4410 input features (10 × 21 × 21), 14 output classes.

This version increases model capacity compared to earlier versions and is suitable for more complex color image datasets. The summary below provides details on the model's structure and parameter count.

In [6]:
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.version = '1.3'
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=10, kernel_size=6, stride=4, padding=0),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=3),
        )
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(10 * 21 * 21, 14)

    def forward(self, x):
        out = self.layer1(x)
        out = self.flatten(out)
        out = self.fc1(out)
        return out

summary(
    SimpleCNN(), 
    input_size=(1, 3, 256, 256), 
    col_names=[
        "kernel_size", "input_size", "output_size", 
        "num_params", "mult_adds", "trainable"
    ], 
    row_settings=["var_names", "depth"],
)

Layer (type (var_name):depth-idx)        Kernel Shape              Input Shape               Output Shape              Param #                   Mult-Adds                 Trainable
SimpleCNN (SimpleCNN)                    --                        [1, 3, 256, 256]          [1, 14]                   --                        --                        True
├─Sequential (layer1): 1-1               --                        [1, 3, 256, 256]          [1, 10, 21, 21]           --                        --                        True
│    └─Conv2d (0): 2-1                   [6, 6]                    [1, 3, 256, 256]          [1, 10, 63, 63]           1,090                     4,326,210                 True
│    └─ReLU (1): 2-2                     --                        [1, 10, 63, 63]           [1, 10, 63, 63]           --                        --                        --
│    └─MaxPool2d (2): 2-3                3                         [1, 10, 63, 63]           [1, 10, 21, 21]         

## SimpleCNN Version 1.4: High-Capacity Grayscale Model

This section documents **SimpleCNN version 1.4**, which is designed for high-capacity grayscale image classification tasks.

- **Input:**  
    - Expects grayscale images (1 channel) of size 256x256 pixels.

- **Architecture:**  
    - **Convolutional Layer:**  
        - `nn.Conv2d` with 1 input channel and 100 output channels, kernel size 6x6, stride 4, no padding.
        - Activation: ReLU
        - Max pooling: 3x3 kernel, stride 3
    - **Flatten Layer:**  
        - Flattens the output to a 1D tensor.
    - **Fully Connected Layer:**  
        - `nn.Linear` with 21,000 input features (100 × 21 × 21) and 14 output classes.

- **Output:**  
    - Produces logits for 14 classes.

### Notes

- This version significantly increases model capacity for grayscale images by using 100 convolutional output channels.
- Suitable for large and complex grayscale image datasets where more expressive power is needed.
- Ensure input images are loaded in grayscale mode and preprocessed to 256x256 pixels.

In [7]:
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.version = '1.4'
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=100, kernel_size=6, stride=4, padding=0),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=3),
        )
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(100 * 21 * 21, 14)

    def forward(self, x):
        out = self.layer1(x)
        out = self.flatten(out)
        out = self.fc1(out)
        return out

summary(
    SimpleCNN(), 
    input_size=(1, 1, 256, 256), 
    col_names=[
        "kernel_size", "input_size", "output_size", 
        "num_params", "mult_adds", "trainable"
    ], 
    row_settings=["var_names", "depth"],
)

Layer (type (var_name):depth-idx)        Kernel Shape              Input Shape               Output Shape              Param #                   Mult-Adds                 Trainable
SimpleCNN (SimpleCNN)                    --                        [1, 1, 256, 256]          [1, 14]                   --                        --                        True
├─Sequential (layer1): 1-1               --                        [1, 1, 256, 256]          [1, 100, 21, 21]          --                        --                        True
│    └─Conv2d (0): 2-1                   [6, 6]                    [1, 1, 256, 256]          [1, 100, 63, 63]          3,700                     14,685,300                True
│    └─ReLU (1): 2-2                     --                        [1, 100, 63, 63]          [1, 100, 63, 63]          --                        --                        --
│    └─MaxPool2d (2): 2-3                3                         [1, 100, 63, 63]          [1, 100, 21, 21]        

## SimpleCNN Version 1.5: High-Capacity RGB Model

The following cell implements **SimpleCNN version 1.5**, which is designed for challenging RGB image classification tasks:

- **Input:**  
    - Expects RGB images (3 channels) of size 256x256 pixels.
- **Convolutional Layer:**  
    - 3 input channels, 100 output channels, kernel size 6x6, stride 4, no padding.
    - Activation: ReLU
    - Max pooling: 3x3 kernel, stride 3
- **Flatten Layer:**  
    - Converts the output to a 1D tensor.
- **Fully Connected Layer:**  
    - 44,100 input features (100 × 21 × 21), 14 output classes.

This version significantly increases model capacity, making it suitable for large and complex color image datasets. The summary in the next cell provides details on the model's structure and parameter count.

In [None]:
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.version = '1.5'
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=100, kernel_size=6, stride=4, padding=0),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=3),
        )
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(100 * 21 * 21, 14)

    def forward(self, x):
        out = self.layer1(x)
        out = self.flatten(out)
        out = self.fc1(out)
        return out

summary(
    SimpleCNN(), 
    input_size=(1, 3, 256, 256), 
    col_names=[
        "kernel_size", "input_size", "output_size", 
        "num_params", "mult_adds", "trainable"
    ], 
    row_settings=["var_names", "depth"],
)

Layer (type (var_name):depth-idx)        Kernel Shape              Input Shape               Output Shape              Param #                   Mult-Adds                 Trainable
SimpleCNN (SimpleCNN)                    --                        [1, 3, 256, 256]          [1, 14]                   --                        --                        True
├─Sequential (layer1): 1-1               --                        [1, 3, 256, 256]          [1, 100, 21, 21]          --                        --                        True
│    └─Conv2d (0): 2-1                   [6, 6]                    [1, 3, 256, 256]          [1, 100, 63, 63]          10,900                    43,262,100                True
│    └─ReLU (1): 2-2                     --                        [1, 100, 63, 63]          [1, 100, 63, 63]          --                        --                        --
│    └─MaxPool2d (2): 2-3                3                         [1, 100, 63, 63]          [1, 100, 21, 21]        