## Imports

The following libraries are imported for this notebook:

- **PyTorch** (`torch`, `torch.nn`, `torch.utils.data`): For building and training neural networks.
- **Torchvision** (`transforms`, `datasets.ImageFolder`): For image transformations and dataset handling.
- **TensorBoard** (`torch.utils.tensorboard.SummaryWriter`): For logging and visualizing training metrics.
- **Scikit-learn** (`accuracy_score`, `f1_score`, `precision_score`, `recall_score`, `confusion_matrix`): For evaluation metrics.
- **Seaborn** and **Matplotlib**: For plotting and visualization.
- **tqdm**: For progress bars during training.

These imports enable data loading, preprocessing, model building, training, evaluation, and visualization throughout the notebook.

In [1]:
import torch
import torch.nn as nn
from torchvision import transforms
from torch.utils.data import DataLoader
from torchvision.datasets import ImageFolder
from torch.utils.tensorboard import SummaryWriter
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
from tqdm import tqdm

In [2]:
# writer = SummaryWriter(log_dir="/Users/arponbiswas/Computer-Vision-Projects/Image_classification_projects/PC_Parts_Image_Classification/Tensorboard_Graph/Models_Graph/SimpleModel/v_1.0")
writer = True

## Training Data Transformations

The following transformations are applied to the training images:

1. **Geometric Transformations:**
    - `Resize((256, 256))`: Resizes all images to 256x256 pixels.
    - `RandomHorizontalFlip(p=0.5)`: Randomly flips images horizontally with a 50% probability.
    - `RandomRotation(10)`: Rotates images randomly within ±10 degrees.
    - `RandomAffine(degrees=0, translate=(0.05, 0.05), scale=(0.9, 1.1))`: Applies small translations and scaling for positional and size variance.
    - `RandomPerspective(distortion_scale=0.1, p=0.3, interpolation=3)`: Applies subtle 3D perspective distortions with a 30% probability.

2. **Color Augmentations:**
    - `ColorJitter(brightness=0.15, contrast=0.15, saturation=0.15, hue=0.05)`: Introduces mild variations in brightness, contrast, saturation, and hue.

3. **Grayscale Conversion and Normalization:**
    - `Grayscale(num_output_channels=1)`: Converts images to single-channel grayscale.
    - `ToTensor()`: Converts images to PyTorch tensors.

These augmentations help improve model robustness by simulating real-world variations in the training data.

In [3]:
train_trans = transforms.Compose([
    # 1. Geometric Transformations (with resizing)
    transforms.Resize((256, 256)),
    transforms.RandomHorizontalFlip(p=0.5),  # Horizontal flips for left-right consistency
    transforms.RandomRotation(10),  # Minor rotation for realistic variation
    transforms.RandomAffine(
        degrees=0,  # No additional rotation
        translate=(0.05, 0.05),  # Small positional variance
        scale=(0.9, 1.1)  # Conservative scaling for proportion preservation
    ),
    transforms.RandomPerspective(distortion_scale=0.1, p=0.3, interpolation=3),  # Subtle 3D perspective

    # 2. Color Augmentations
    transforms.ColorJitter(
        brightness=0.15, contrast=0.15, saturation=0.15, hue=0.05
    ),  # Mild color variation for natural lighting

    # 3. Grayscale conversion, ToTensor, and Normalization for 1 channel
    transforms.Grayscale(num_output_channels=1),  # Convert to 1 channel
    transforms.ToTensor()
])

## Dataset and Data Loading

The dataset consists of images of PC parts, organized in subfolders by class. Images are loaded using PyTorch's `ImageFolder`, which automatically assigns labels based on folder names.

For training, the dataset is loaded with the defined transformations to augment and preprocess the images. The `DataLoader` is used to efficiently batch and shuffle the data during training, enabling parallel data loading and improved training performance.

- **Dataset root:** `/Users/arponbiswas/Computer-Vision-Projects/Image_classification_projects/PC_Parts_Image_Classification/Data/pc_parts_ready`
- **Transformations:** See previous section for details on augmentation and preprocessing.
- **Batch size:** 16
- **Shuffling:** Enabled for training

This setup ensures that the model receives diverse and well-preprocessed data for robust learning.

In [4]:
train_dataset = ImageFolder(root='/Users/arponbiswas/Computer-Vision-Projects/Image_classification_projects/PC_Parts_Image_Classification/Data/pc_parts_ready', transform=train_trans)
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)

## Model Loading

The neural network model is instantiated and prepared for training and evaluation in this section. The model object is created and can be used for forward passes, loss computation, and optimization. This step ensures the model is ready to receive input data and participate in the training loop.

In [41]:
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.version = '1.0'
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=1, kernel_size=6, stride=4, padding=0),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=3),
        )
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(1 * 21 * 21, 14)

    def forward(self, x):
        out = self.layer1(x)
        out = self.flatten(out)
        out = self.fc1(out)
        return out

## Model Instantiation and TensorBoard Visualization

In this section, the `SimpleCNN` model is instantiated and a sample image from the training dataset is prepared for visualization. The following steps are performed:

- **Model Creation:**  
    The `SimpleCNN` neural network is instantiated and assigned to the variable `model`. This model is designed for grayscale image classification with a simple convolutional architecture.

- **Sample Image Preparation:**  
    A single image is extracted from the training dataset (`train_dataset[8][0]`) and reshaped with `unsqueeze(0)` to add a batch dimension, making it compatible with the model's expected input shape.

- **TensorBoard Graph Logging:**  
    The model's computational graph is logged to TensorBoard using `writer.add_graph(model, img)`. This allows for visual inspection of the model architecture in TensorBoard.

- **Feature Map Visualization:**  
    The function `add_image_to_tensorboard` is defined to:
    - Log the original input image to TensorBoard.
    - Pass the image through each convolutional and pooling layer of the model.
    - Log the resulting feature maps after each major operation, providing insight into how the model transforms the input at each stage.

- **Function Execution:**  
    The visualization function is called with the sample image and model, enabling detailed monitoring of feature extraction and transformation within the network.

This setup aids in debugging, understanding model behavior, and ensuring that data flows correctly through the network.

In [None]:
# Instantiate the SimpleCNN model
model = SimpleCNN()

# Get a sample image from the training dataset and add a batch dimension
img = train_dataset[8][0].unsqueeze(0)

# Add the model graph to TensorBoard for visualization
writer.add_graph(model, img)

# Define a function to log images and feature maps to TensorBoard
def add_image_to_tensorboard(trans_image, model):
    # Log the original input image
    writer.add_image("Actual_Image", trans_image.squeeze(0))
    # Iterate through model layers
    for name, layers in model.named_children():
        if isinstance(layers, nn.Sequential):
            for layer in layers:
                # If the layer is Conv2d, pass the image through and log the feature map
                if isinstance(layer, nn.Conv2d):
                    trans_image = layer(trans_image)
                    writer.add_image("Conv2d", trans_image.squeeze(0)[:3])
                # If the layer is MaxPool2d, pass the image through and log the feature map
                if isinstance(layer, nn.MaxPool2d):
                    trans_image = layer(trans_image)
                    writer.add_image("MaxPool2d", trans_image.squeeze(0)[:3])

# Call the function to log the sample image and feature maps
add_image_to_tensorboard(img, model)

# Display the model architecture
model

## Loss Function and Optimizer

In this section, the loss function and optimizer for training the neural network are defined:

- **Loss Function:**  
    `nn.CrossEntropyLoss()` is used as the loss criterion. This loss is suitable for multi-class classification problems, as it measures the difference between the predicted class probabilities and the true class labels.

- **Optimizer:**  
    The Adam optimizer (`torch.optim.Adam`) is initialized with the model's parameters and a learning rate of 0.001. Adam is an adaptive learning rate optimization algorithm that combines the advantages of two other extensions of stochastic gradient descent: AdaGrad and RMSProp. It is widely used for training deep learning models due to its efficiency and effectiveness.

These components are essential for guiding the model's learning process during training by updating the model weights to minimize the loss.

In [None]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

## Training Loop and Performance Analysis

This section defines the `training_analysis` function, which manages the end-to-end training process for the neural network and provides comprehensive performance monitoring:

- **Device Selection:**  
    Automatically detects and utilizes a GPU if available, otherwise defaults to CPU, ensuring efficient computation.

- **Epoch-wise Training:**  
    For each epoch:
    - Sets the model to training mode.
    - Iterates over the training data in batches, performing forward and backward passes.
    - Computes the loss using the defined loss function and updates model weights via the optimizer.
    - Tracks predictions and true labels for metric calculation.

- **Progress Monitoring:**  
    Utilizes `tqdm` to display a real-time progress bar with current loss, enhancing transparency during training.

- **Metric Calculation:**  
    After each epoch, calculates key performance metrics:
    - **Accuracy**
    - **F1 Score**
    - **Precision**
    - **Recall**
    - **Confusion Matrix**

- **TensorBoard Logging:**  
    Logs loss, accuracy, F1 score, precision, recall, and the confusion matrix to TensorBoard, enabling detailed visualization and analysis of the training process.

- **Confusion Matrix Visualization:**  
    Plots the confusion matrix using Seaborn and Matplotlib, providing insights into class-wise prediction performance.

- **Summary Output:**  
    Prints a concise summary of metrics for each epoch, facilitating quick assessment of model progress.

This function streamlines the training workflow and ensures that both quantitative and qualitative aspects of model performance are thoroughly tracked and visualized.

In [None]:
def training_analysis(model, epochs=10):
    # Select device: use GPU if available, else fallback to CPU
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"Using device: {device}")
    model.to(device)
    
    for epoch in range(epochs):
        model.train()  # Set model to training mode
        epoch_loss = 0.0  # Accumulate loss for the epoch
        all_preds = []    # Store all predictions for metric calculation
        all_labels = []   # Store all true labels for metric calculation
        
        # Progress bar for batches in the current epoch
        progress_bar = tqdm(train_loader, desc=f"Epoch [{epoch+1}/{epochs}]", leave=False)
        
        for images, labels in progress_bar:
            images, labels = images.to(device), labels.to(device)  # Move data to device
            optimizer.zero_grad()  # Reset gradients
            outputs = model(images)  # Forward pass
            loss = loss_fn(outputs, labels)  # Compute loss
            loss.backward()  # Backpropagation
            optimizer.step()  # Update model parameters
            epoch_loss += loss.item()  # Accumulate batch loss
            all_preds.extend(outputs.argmax(dim=1).tolist())  # Store predicted classes
            all_labels.extend(labels.tolist())  # Store true classes

            # Update progress bar with current batch loss
            progress_bar.set_postfix(loss=loss.item())

        # Calculate metrics for the epoch
        accuracy = accuracy_score(all_labels, all_preds)
        f1 = f1_score(all_labels, all_preds, average='weighted', zero_division=0)
        precision = precision_score(all_labels, all_preds, average='weighted', zero_division=0)
        recall = recall_score(all_labels, all_preds, average='weighted', zero_division=0)
        cm = confusion_matrix(all_labels, all_preds)

        # Log scalar metrics to TensorBoard
        writer.add_scalar('Loss/train', epoch_loss / len(train_loader), epoch)
        writer.add_scalar('Accuracy/train', accuracy, epoch)
        writer.add_scalar('F1/train', f1, epoch)
        writer.add_scalar('Precision/train', precision, epoch)
        writer.add_scalar('Recall/train', recall, epoch)

        # Plot and log confusion matrix to TensorBoard
        fig, ax = plt.subplots()
        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax)
        ax.set_xlabel('Predicted')
        ax.set_ylabel('True')
        ax.set_title('Confusion Matrix')
        writer.add_figure('Confusion Matrix', fig, global_step=epoch)
        plt.close(fig)

        # Print summary of metrics for the epoch
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {epoch_loss/len(train_loader):.4f} Accuracy: {accuracy:.4f} F1: {f1:.4f} Precision: {precision:.4f} Recall: {recall:.4f}")

In [None]:
# training_analysis(model, epochs=1)