# 🧠 Real-time Entity Classifier - CNN Architecture

This notebook defines the convolutional neural network (CNN) architecture that will classify webcam frames into one of the following categories:

- 🐱 **Pet** (my girlfriend's sphynx cat, Lucy)
- 👤 **Pet Owner** (me, Sebastian)
- 🧍 **Another Person**  
- 🚫 **Nobody Present**

The CNN architecture is designed to extract meaningful features from real-time video input and make an accurate classification based on the detected subject.  
It is lightweight and suitable for live webcam processing while respecting user privacy. 🔒

<hr>

## 📦 PyTorch Imports Overview

To build and train our convolutional neural network (CNN), we import key components from the **PyTorch** framework:

- **`torch`**: The main PyTorch library, which provides core functionalities such as tensor operations, model creation, and automatic differentiation. It is the backbone of PyTorch.

- **`torch.nn`**: This module offers essential tools for building neural networks, such as `Conv2d` for convolutional layers, `Linear` for fully connected layers, and many other layers and utilities that help define and train models.

- **`torch.nn.functional`**: A functional interface to neural network operations. It includes commonly used functions like `ReLU`, `Sigmoid`, and others, providing flexibility and stateless operations. These functions are typically used directly in the `forward()` method of a model.

- **`torch.optim`**: Contains optimization algorithms like Adam, SGD, and more. These are used to update the model’s weights during training, minimizing the loss function.

- **`torchvision.transforms`**: A collection of image transformation utilities that allow us to preprocess and augment image data, like resizing, normalization, and random flips. This is essential when working with image datasets.

- **`torchvision.datasets`**: Provides easy access to popular datasets. In this project, we use **`datasets.ImageFolder`**, which is designed to load images organized into class-specific folders. It helps us load and manage custom datasets where images are stored in a directory structure.

- **`DataLoader`**: A utility that simplifies data loading, batching, and shuffling, ensuring that our data is efficiently processed during training and evaluation.

- **`tqdm`**: A Python library that adds a progress bar to loops, making it easy to monitor the progress of long-running operations, such as training over multiple epochs.

- **`matplotlib.pyplot`**: A plotting library used to visualize key metrics like training loss and accuracy over time, helping us understand the model’s performance during training.

These imports form the foundation for constructing, training, and evaluating our deep learning model. 🧠

In [96]:
# Importing the necessary libraries for deep learning with PyTorch

# Importing torch for tensor operations and model creation
import torch

# Importing neural network modules to build and train neural networks
import torch.nn as nn

# Importing functional APIs from PyTorch for operations like activation functions
import torch.nn.functional as F

# Importing torch.optim for various optimization algorithms (like Adam, SGD)
import torch.optim as optim

# Importing transforms and datasets for image processing and loading standard datasets
from torchvision import transforms
from torchvision import datasets

# Importing DataLoader to handle batching and shuffling of data
from torch.utils.data import DataLoader

# Importing tqdm to visualize progress in loops (like during training)
from tqdm import tqdm

# Importing matplotlib for plotting graphs and visualizing metrics
import matplotlib.pyplot as plt

<hr>

## 🧠 CNN Model Definition & Architecture Explanation

We define a Convolutional Neural Network (CNN) model called `EntityClassifierCNN`, designed to classify webcam frames into one of four categories:

### 🎯 Classification Targets:
1. **Pet** – my girlfriend’s Sphynx cat, Lucy 🐱  
2. **Owner** – me, Sebastian 👨  
3. **Other Person** – anyone who is not the owner 🧍  
4. **None** – when no person or pet is present 🚫

### 🧱 Model Architecture Overview:

The model follows a standard convolutional architecture:

- **Three Convolutional Layers**:
  - Each convolution uses a **3×3 kernel** with **padding of 1** to preserve spatial dimensions, and **stride being 1** by default.
  - The number of output feature maps increases across layers: **32 → 64 → 128**.
  - After each convolution, a **ReLU activation** is applied, followed by **2×2 MaxPooling**, which halves the spatial resolution.

- **Fully Connected Layers**:
  - The output of the final convolutional layer is **flattened** and passed to a fully connected layer with **512 neurons**.
  - A **Dropout layer** with a probability of **0.3** is applied to reduce overfitting.
  - The final fully connected layer outputs **logits for 4 classes**, one for each possible classification.

### 🔁 Forward Pass Flow:

1. Input image (assumed to be RGB, 3 channels)  
2. Conv → ReLU → MaxPool  
3. Conv → ReLU → MaxPool  
4. Conv → ReLU → MaxPool  
5. Flatten  
6. Fully connected → ReLU → Dropout  
7. Final fully connected layer → Output scores for each class

In [51]:
class EntityClassifierCNN(nn.Module):
    def __init__(self, num_classes=4):
        super(EntityClassifierCNN, self).__init__()  # Call the constructor of nn.Module
        
        # MaxPooling layer with kernel_size=2 and stride=2
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # Convolutional Layer 1: 
        # in_channels=3 (RGB), out_channels=32, kernel_size=3x3, padding=1
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1)
        
        # Convolutional Layer 2: 
        # in_channels=32 (from previous layer), out_channels=64, kernel_size=3x3, padding=1
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
        
        # Convolutional Layer 3:
        # in_channels=64, out_channels=128, kernel_size=3x3, padding=1
        self.conv3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1)
        
        # Fully Connected Layer 1:
        # input features = 128 channels * 28 * 28 (flattened), output features = 512
        self.fc1 = nn.Linear(in_features=128 * 28 * 28, out_features=512)
        
        # Fully Connected Layer 2 (Output Layer):
        # input features = 512, output features = num_classes (default 4)
        self.fc2 = nn.Linear(in_features=512, out_features=num_classes)
        
        # Dropout with probability = 0.3
        self.dropout = nn.Dropout(p=0.3)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))  # Output: 112x112
        x = self.pool(F.relu(self.conv2(x)))  # Output: 56x56
        x = self.pool(F.relu(self.conv3(x)))  # Output: 28x28
        x = x.view(-1, 128 * 28 * 28)         # Flatten
        x = self.dropout(F.relu(self.fc1(x))) # Apply dropout after activation
        return self.fc2(x)                    # Output logits

<hr>

## Model, Optimizer, and Loss Function Setup

### 1. **Model Instantiation** 🧑‍💻
We instantiate the `EntityClassifierCNN` model

### 2. **Optimizer Setup** ⚙️
We set up our optimizer for this task as **Adam** 🚀, a popular and highly efficient choice for training deep learning models. The Adam optimizer adapts the learning rate for each parameter during training, making it more efficient and faster in practice compared to other optimizers like **SGD** (Stochastic Gradient Descent). Unlike SGD, which uses a fixed learning rate for all parameters, Adam dynamically adjusts the learning rate based on individual parameter updates, which helps achieve faster convergence and better performance on complex tasks. We specify a learning rate of **0.001** as a starting point, but this can be fine-tuned based on experimentation.

### 3. **Loss Function Setup** ⚖️
For our multi-class classification task, we use **CrossEntropyLoss** as our loss function. Cross-entropy loss combines `log_softmax` and `NLLLoss` to efficiently calculate the difference between the predicted class probabilities and the true labels. It is well-suited for multi-class classification problems because it directly optimizes the model to correctly classify each class.

In [67]:
# We instantiate our model
model = EntityClassifierCNN()

# We use the Adam optimizer since it adapts the learning rate for each parameter and is usually more efficient in practice
optimizer = optim.Adam(model.parameters(), lr=0.001)

# We will also use the CrossEntropyLoss criterion/loss function due to us having a multi-class classification task
criterion = nn.CrossEntropyLoss()

<hr>

## 📦 DataLoader and Dataset Setup

Before training our model, we need to properly load and preprocess the image data. This is handled using PyTorch’s `ImageFolder` and `DataLoader` utilities.

### 🖼️ Image Preprocessing
We define a series of transformations to ensure consistency and improve model performance:
- **Resize**: All images are resized to a fixed dimension of **224x224** pixels. This is a common size for CNNs and ensures a uniform input shape.
- **ToTensor**: Images are converted into PyTorch tensors, enabling them to be processed by the model.
- **Normalize**: Pixel values are normalized using the mean and standard deviation from the ImageNet dataset:
  - Mean: `[0.485, 0.456, 0.406]`
  - Standard Deviation: `[0.229, 0.224, 0.225]`
  This helps with model convergence and consistency.

### 🗂️ Dataset Structure
The dataset is organized using folders representing each class. Each subdirectory (e.g., `nobody`, `pet`, `owner`, `other_person`) contains images specific to that class. PyTorch’s `ImageFolder` automatically maps these folders to class labels.

### 🛠️ DataLoader Creation
We create two `DataLoader` objects:
- **Training Loader**: Loads the dataset in shuffled batches of 32 images to ensure randomness and improve generalization.
- **Validation Loader**: Loads the validation data without shuffling, maintaining the order for consistent evaluation.

The `num_workers=4` setting allows data loading to happen in parallel for better performance.


In [71]:
# Define transformations for preprocessing
transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize images to 224x224
    transforms.ToTensor(),  # Convert images to PyTorch tensors
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # Normalize
])

# Load the dataset
train_dataset = datasets.ImageFolder(root='../data', transform=transform)
val_dataset = datasets.ImageFolder(root='../data', transform=transform)

# Create DataLoader for training
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=4)

# Create DataLoader for validation
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False, num_workers=4)

FileNotFoundError: Found no valid file for the classes nobody, other_person, owner, pet. Supported extensions are: .jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp

<hr>

## 🧠 Model Training Loop Explained

This section explains the logic behind training our deep learning model using PyTorch.

### 🔁 Epochs

We train the model over **100 epochs** — one epoch is a complete pass over the entire training dataset. Repeating this allows the model to learn gradually and improve its performance over time.

### 📊 Metrics Tracked

During training, we monitor two key metrics every epoch:
- **Loss** (📉): Measures how far off the model’s predictions are from the actual labels.
- **Accuracy** (✅): Tells us how many predictions were correct out of the total.

These are stored in `train_losses` and `train_accuracies` lists to help us visualize progress later.

### 🔄 Training Steps Per Epoch

Each epoch consists of several steps that happen for every batch of images:

1. **Set model to training mode** 🏋️  
   Enables dropout and batch normalization, which behave differently during training.

2. **Loop through batches of data** 📦  
   We use `tqdm` to wrap our DataLoader, which gives us a nice real-time progress bar.

3. **Zero out gradients** 🧽  
   We reset gradients using `optimizer.zero_grad()` so that past gradient values don’t accumulate.

4. **Forward pass** 📤  
   The input images are passed through the model to generate predictions.

5. **Calculate loss** 📏  
   We use a loss function (CrossEntropyLoss) to compute how wrong the model was.

6. **Backward pass** 🧮  
   We call `.backward()` on the loss to compute gradients of the model parameters.

7. **Update weights** 🔧  
   The optimizer (Adam) updates the model parameters based on the gradients with `optimizer.step()`.

8. **Track loss and accuracy** 🧾  
   We add the loss to a running total and count how many predictions were correct.

### 🧮 After Each Epoch

Once all batches are processed in an epoch:
- We calculate the **average loss** and **accuracy** for that epoch.
- We print this info to the console.
- We append the values to our tracking lists for later use.

### 💾 Saving the Model

At the end of training, we save the learned weights of our model to a file `entity_classifier.pth`. This allows us to reuse the model later without retraining it from scratch! 💡

### 📈 Plotting Metrics

We use `matplotlib` to generate two side-by-side plots:
- **Training Loss per Epoch** (in red)
- **Training Accuracy per Epoch** (in green)

These plots visually show how well the model learned over time — ideally, loss should decrease while accuracy increases! 📉📈

In [None]:
num_epochs = 100  # We train the model for 100 epochs

# We keep track of the training loss and accuracy for each epoch
train_losses = []
train_accuracies = []

for epoch in range(num_epochs):
    model.train()  # Set model to training mode
    running_loss = 0.0  # Accumulates total loss in the epoch
    correct_predictions = 0  # Tracks correct predictions
    total_samples = 0  # Tracks number of samples seen

    # tqdm progress bar for the current epoch
    loop = tqdm(train_loader, desc=f"Epoch {epoch+1}/{num_epochs}", leave=False)
    for images, labels in loop:
        optimizer.zero_grad()  # Clear gradients from previous step

        # Forward pass
        outputs = model(images)

        # Compute loss
        loss = criterion(outputs, labels)

        # Backward pass to compute gradients
        loss.backward()

        # Update weights
        optimizer.step()

        # Accumulate the running loss
        running_loss += loss.item()

        # Calculate number of correct predictions
        _, predicted = torch.max(outputs, 1)
        total_samples += labels.size(0)
        correct_predictions += (predicted == labels).sum().item()

        # Update tqdm progress bar with current loss
        loop.set_postfix(loss=loss.item())

    # Calculate average loss and accuracy for the epoch
    avg_loss = running_loss / len(train_loader)
    accuracy = 100 * correct_predictions / total_samples

    # Store metrics for later visualization
    train_losses.append(avg_loss)
    train_accuracies.append(accuracy)

    # Print epoch statistics
    print(f"Epoch [{epoch+1}/{num_epochs}] - Loss: {avg_loss:.4f}, Accuracy: {accuracy:.2f}%")

# ✅ Save the trained model for future use or inference
torch.save(model.state_dict(), 'entity_classifier.pth')
print("Model saved as 'entity_classifier.pth' ✅")

# 📈 Plot training loss and accuracy over epochs using matplotlib
plt.figure(figsize=(12, 5))

# Plot training loss
plt.subplot(1, 2, 1)
plt.plot(train_losses, label="Loss", color="red")
plt.title("Training Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.grid(True)
plt.legend()

# Plot training accuracy
plt.subplot(1, 2, 2)
plt.plot(train_accuracies, label="Accuracy", color="green")
plt.title("Training Accuracy")
plt.xlabel("Epoch")
plt.ylabel("Accuracy (%)")
plt.grid(True)
plt.legend()

# Show plots
plt.tight_layout()
plt.show()

🎉 **Success!** Our model is now fully trained and we have the metrics to prove it! 🎉

We've completed the training process, and our model is ready for action. By evaluating the **training loss** and **accuracy** over the epochs, we can confirm the model’s progress and how well it has learned from the data.

Now that we've achieved solid performance on the training set, it's time to move on to **real-time inference**! 🚀

<hr>

### What’s next? 👀
Instead of continuing in a notebook, the next step is to integrate the trained model into a **PyCharm application** for real-time usage. We'll use the model to make predictions on live data, such as from a **webcam feed**. This allows us to test the model’s performance in real-world scenarios and see how it handles new, unseen data in an interactive application.

Let’s take the model for a spin and see how it performs when it’s really put to work in a **PyCharm app**. 💻✨