<a href="https://colab.research.google.com/github/dajopr/lectures/blob/main/image_processing/lecture_08_image_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Exercise Sheet: Image Classification - From Feature Engineering to Neural Networks

**Prerequisites**: Python, NumPy, OpenCV, Matplotlib. Familiarity with image processing fundamentals and basic machine learning concepts.

**Goal**: To build a practical understanding of image classification by progressing from classical machine learning to neural networks. This journey involves implementing feature engineering (LBP, SIFT), applying various classifiers (Naive Bayes, SVM, k-Means), and ultimately building, training, and tuning neural networks in PyTorch to solve diverse classification problems.


In [None]:
import os
from collections import Counter

import cv2
import matplotlib.pyplot as plt
import numpy as np
from skimage.feature import local_binary_pattern
from sklearn.cluster import KMeans
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

SHAPES_DIR = "images/shape_dataset"
VEHICLES_DIR = "images/vehicles"


def load_dataset(dataset="shapes", split="train"):
    """
    Loads images from the dataset directory, flattens them, and creates labels.

    Args:
        dataset_dir (str): The path to the dataset directory.

    Returns:
        tuple: A tuple containing the data (flattened images) and labels.
    """
    data = []
    labels = []

    if dataset == "shapes":
        # Define class names and their corresponding integer labels
        class_map = {"circles": 0, "squares": 1}

        for class_name, label in class_map.items():
            class_dir = os.path.join(SHAPES_DIR, class_name)
            if not os.path.isdir(class_dir):
                print(f"Warning: Directory not found at {class_dir}")
                continue

            for filename in os.listdir(class_dir):
                if filename.endswith(".png"):
                    img_path = os.path.join(class_dir, filename)
                    # Read the image in grayscale mode
                    img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
                    if img is not None:
                        # Flatten the 28x28 image to a 784-element vector
                        flattened_img = img.flatten()
                        data.append(flattened_img)
                        labels.append(label)

        # Convert lists to NumPy arrays for efficient computation
        return np.array(data), np.array(labels)
    elif dataset == "vehicles":
        dir = os.path.join(VEHICLES_DIR, split)
        for label, category in enumerate(os.listdir(dir)):
            category_path = os.path.join(dir, category)
            if not os.path.isdir(category_path):
                continue
            for filename in os.listdir(category_path):
                img_path = os.path.join(category_path, filename)
                img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
                img = cv2.resize(img, (224, 224))
                if img is not None:
                    data.append(img)
                    labels.append(category)  # Store category name as label
        return data, labels


## Exercise 1: k-Nearest Neighbor Classification

**Tasks**

1. Load the dataset using load_dataset
2. Split the dataset into train and test subsets using train_test_split
3. Use KNeighborsClassifier to fit a kNN Classifier
4. Test the classifier using the test set
5. Repeat steps 3 and 4 for different values of k

**Discussion questions**
1. What is the specific role of the k in k-NN? What happens when k=1?
2. Impact of 'k': Based on the experiment, which value of k gave you the highest accuracy? Why do you think that specific value performed best for this dataset?


## Exercise 2: Scene Classification with Local Binary Patterns (LBP)

In our previous exercise, we used global geometric features to classify shapes. Now, we'll shift our focus to another fundamental aspect of an image: **texture**. Different scenes, like a sandy beach or a dense forest, have distinct textural properties. We can exploit these differences for classification.

Our goal is to **classify images into "beach" or "forest" categories based solely on their texture**.

To do this, we will use a powerful texture descriptor called **Local Binary Patterns (LBP)**.

**Tasks**:

1.  **Feature Extraction Pipeline**: For each image in the dataset, you need to perform the following steps:
    * Load the image and convert it to **grayscale**. Texture is a property of luminance, not color.
    * Compute the **LBP representation** of the grayscale image. The `skimage.feature.local_binary_pattern` function is perfect for this. This creates a new 2D array where pixel values represent local texture patterns.
    * Calculate a **histogram** of the LBP image's pixel values. This histogram serves as our feature vector. A histogram with 256 bins is standard, as an 8-point LBP creates 2^8 = 256 possible patterns.
    * Store these histograms (features) and their corresponding labels ("beach" or "forest").

2.  **Train and Evaluate a Classifier**:
    * Split your extracted features and labels into a training set and a testing set.
    * Train a **Naive Bayes classifier** (`sklearn.naive_bayes.GaussianNB`) on the training data.
    * Evaluate the classifier's performance on the test data. Calculate and print metrics like accuracy to see how well it performs.

**Discussion Questions**

1.  **Interpreting the Results**: Look at the classification report. How well did the model perform in terms of accuracy, precision, and recall for each class? Is one class easier to identify than the other? Why might that be?

2.  **The Power of LBP**: Why is a histogram of Local Binary Patterns a good feature for describing textures like sand, water, leaves, and tree bark? What kind of image information does LBP capture?

3.  **LBP Parameters**: In the code, `radius` and `n_points` are key parameters for LBP. What do these parameters control? How do you think changing them (e.g., a smaller radius or fewer points) would affect the feature vector and the final classification accuracy?

4.  **Choice of Classifier**: We used a Naive Bayes classifier. What is the core assumption that a "naive" Bayes classifier makes? Why might this be a reasonable (or perhaps flawed) assumption for LBP histogram features?

5.  **From Texture to Scene**: Our model only "sees" texture, not objects or shapes. How does this compare to the shape classification exercise? What are the limitations of a purely texture-based approach for general scene understanding?

In [None]:
# --- 1. Feature Extraction ---


def extract_lbp_features(data_dir):
    """
    Iterates through subdirectories of data_dir, extracts LBP features
    for each image, and returns features and labels.
    """
    features = []
    labels = []

    # LBP parameters
    radius = 3
    n_points = 8 * radius

    # Loop through each category (beach, forest)
    for category in os.listdir(data_dir):
        category_dir = os.path.join(data_dir, category)
        if not os.path.isdir(category_dir):
            continue

        label = 1 if category == "forest" else 0

        # Loop through images in the category
        for filename in os.listdir(category_dir):
            img_path = os.path.join(category_dir, filename)

            # Read and convert to grayscale
            image = cv2.imread(img_path)
            if image is None:
                continue
            gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

            # Compute LBP
            lbp = None  # Use local_binary_pattern

            # Compute histogram of LBP
            # n_bins should be n_points + 2 for the 'uniform' method
            n_bins = int(lbp.max() + 1)
            (hist, _) = None  # use np.histogram()

            # Normalize the histogram to make it comparable across images
            hist = hist.astype("float")
            hist /= hist.sum() + 1e-6  # Add a small epsilon to avoid division by zero

            # Append features and label
            features.append(hist)
            labels.append(label)

    return np.array(features), np.array(labels)


# Path to your dataset
DATA_DIR = "dataset"

print("Starting feature extraction...")
features, labels = extract_lbp_features(DATA_DIR)
print(f"Feature extraction complete. Extracted {len(features)} feature vectors.")
print(f"Feature vector shape: {features.shape}")
print(f"Labels shape: {labels.shape}")


# --- 2. Train and Evaluate the Classifier ---

# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(
    features, labels, test_size=0.20, random_state=42, stratify=labels
)

# Initialize and train the Gaussian Naive Bayes model
model = None  # Implement this


# --- 3. Evaluation ---
predictions = model.predict(X_test)

# Print a detailed classification report
# target_names=['beach', 'forest'] maps the labels (0, 1) to their names

report = None  # Create classification_report

print(report)


# Visualize a sample LBP image and its histogram
def visualize_sample_lbp():
    sample_image_path = os.path.join(
        DATA_DIR, "beach", os.listdir(os.path.join(DATA_DIR, "beach"))[0]
    )
    image = cv2.imread(sample_image_path)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    radius = 3
    n_points = 8 * radius
    lbp = local_binary_pattern(gray, n_points, radius, method="uniform")
    n_bins = int(lbp.max() + 1)
    hist, _ = np.histogram(
        lbp.ravel(), bins=np.arange(0, n_bins + 1), range=(0, n_bins)
    )

    plt.style.use("ggplot")
    plt.figure(figsize=(12, 5))

    plt.subplot(1, 3, 1)
    plt.title("Original Grayscale")
    plt.imshow(gray, cmap="gray")
    plt.axis("off")

    plt.subplot(1, 3, 2)
    plt.title("LBP Representation")
    plt.imshow(lbp, cmap="gray")
    plt.axis("off")

    plt.subplot(1, 3, 3)
    plt.title("LBP Histogram (Feature Vector)")
    plt.bar(range(len(hist)), hist, width=1.0)
    plt.xlabel("LBP Bins")
    plt.ylabel("Normalized Frequency")

    plt.tight_layout()
    plt.show()


print("\nVisualizing a sample LBP transformation...")
visualize_sample_lbp()

## Exercise 3: Object Classification with SIFT and SVM

Welcome to the third exercise! We are moving from classifying general scenes by texture to identifying specific objects within images. Our goal is to build a robust classifier that can distinguish between **"buses"** and **"motorbikes"**.

To achieve this, we will use the **Scale-Invariant Feature Transform (SIFT)** to find unique keypoints on the objects. Since each image will have a variable number of these keypoints, we can't directly feed them into a classifier. We will solve this using the **Bag of Visual Words (BoVW)** model, a powerful technique for converting local features into a fixed-size vector representation for any image.

**Tasks**

You will implement the full Bag of Visual Words pipeline.

**Step 1: SIFT Feature Extraction**
- For every image in your **training set**, detect SIFT keypoints and compute their 128-dimensional descriptor vectors.
- Since each image will have a different number of keypoints, you will end up with a large collection of descriptors from all training images.

**Step 2: Create a "Visual Vocabulary"**
- Pool all the SIFT descriptors from all training images into one large list.
- Use **k-Means clustering** on this list of descriptors to group them into `k` clusters. A good starting point is `k=50`.
- The `k` cluster centers are your **"visual words."** Together, they form the vocabulary. Each visual word represents a common type of local feature (e.g., a wheel edge, a wingtip, a handlebar) found in your training images.

**Step 3: Represent Images as Feature Histograms**
- Now, you need a way to represent each image using a single, fixed-length vector. For every image (both **training and testing**):
    - Extract its SIFT descriptors.
    - For each descriptor, find the closest "visual word" (cluster center) from your vocabulary.
    - Create a **histogram of size k**. Count how many times each of the `k` visual words appears in the image.
- This histogram is your new feature vector for the image. Every image, regardless of its size or number of keypoints, is now represented by a single vector of length `k`.

**Step 4: Train and Evaluate the SVM Classifier**
- Train a **Support Vector Machine (SVM)** classifier using the feature histograms of your training images and their corresponding labels ("bus" or "motorbike").
- Use the trained model to predict the labels for the feature histograms of your **test images**.
- Evaluate the classifier's performance and analyze the results.

**Discussion Questions**

1.  **The Visual Vocabulary (`k`)**: We chose `k=50` for our vocabulary size. What does this `k` represent in the context of our images? What do you think would happen if you chose a very small `k` (e.g., 5) or a very large `k` (e.g., 500)? How might it affect performance and computation time?

2.  **SIFT vs. LBP**: Compare the SIFT features used in this exercise with the LBP features from the previous one. Why is SIFT better suited for *object recognition* (airplanes vs. motorbikes), while LBP was effective for *scene classification* (beach vs. forest)?

3.  **The Role of SVM**: Why is a Support Vector Machine (SVM) a good choice for this classification task? What is the role of the `kernel` parameter (we used `'linear'`)? How might other kernels like `'rbf'` change the decision boundary and potentially the performance?

4.  **BoVW Limitations**: The Bag of Visual Words model is powerful, but it has a major limitation: it discards all spatial information about the features. It only counts *how many* of each visual word appear, not *where* they appear. How could this be a problem? For example, could it confuse an image of a bicycle with an image of two separate wheels and handlebars?

5.  **Improving the Model**: Based on your results and understanding of the BoVW model, what are two different ways you could try to improve the classifier's accuracy?

In [None]:
# The number of clusters for k-Means, which determines the size of the vocabulary.
K_CLUSTERS = 50

# --- Step 1: SIFT Feature Extraction (for training set) ---
print("1. Loading training images and extracting SIFT features...")
train_images, train_labels = load_dataset("vehicles", split="train")
sift = cv2.SIFT_create()

all_descriptors = []
for img in train_images:
    pass  # Implement this

all_descriptors = np.asarray(all_descriptors)
print(
    f"   - Extracted {len(all_descriptors)} total descriptors from {len(train_images)} training images."
)

# --- Step 2: Create a "Visual Vocabulary" with k-Means ---

# Implement this (use Scikit-Learn's KMeans implementation)
kmeans = None


# --- Step 3: Represent Images as Feature Histograms ---
def create_feature_histograms(images, vocabulary: KMeans):
    """Creates a histogram of visual words for a list of images."""
    sift_detector = cv2.SIFT_create()
    histograms = []
    for image in images:
        keypoints, descriptors = sift_detector.detectAndCompute(image, None)

        # Create a histogram of size K_CLUSTERS, initialized to zeros
        hist = np.zeros(K_CLUSTERS)

        if descriptors is not None:
            # For each descriptor, find the closest visual word
            words = None  # Implement this
            # Count the occurrences of each word
            word_counts = None  # Implement this (you can use Counter)

            # Populate the histogram
            pass  # Write the word counts into hist

        histograms.append(hist)

    return np.asarray(histograms)


print("\n3. Creating feature histograms for training and testing sets...")
# Create histograms for the training data
X_train = create_feature_histograms(train_images, kmeans)
y_train = np.asarray(train_labels)

# Create histograms for the testing data
test_images, test_labels = load_dataset("vehicles", split="test")
X_test = create_feature_histograms(test_images, kmeans)
y_test = np.asarray(test_labels)
print("   - Histograms created.")

# --- Step 4: Train and Evaluate the SVM Classifier ---

svm_classifier = None  # Implement this
print("   - Training complete.")

print("\n--- Evaluating the model ---")
# Make predictions on the test set
y_pred = svm_classifier.predict(X_test)

# Calculate and print the results
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")
print("\nClassification Report:")
print(report)

# Exercise 4: Color Quantization with k-Means Clustering

In this exercise, we will explore an application of clustering outside of feature-based classification. We will use the **k-Means algorithm** for **color quantization**.

The goal is to reduce the number of distinct colors in an image to a smaller, representative palette of `k` colors. This is a common technique used in image compression and for creating artistic effects. Instead of classifying images, we will be classifying the *pixels themselves* based on their color.

### Your Task:

1.  **Load Image**:
    * Choose a single, colorful image (e.g., a landscape, a bowl of fruit, a vibrant painting). Make sure the image file is in the same directory as this notebook.
    * Load the image using OpenCV. Remember that OpenCV loads images in BGR format. For display purposes, you will want to convert it to RGB.

2.  **Prepare Pixel Data**:
    * The features for this task are the colors of the individual pixels. To prepare the data for k-Means, you need to reshape the image from a 2D grid of pixels (height x width) into a 1D list of pixels.
    * Each pixel is a 3-dimensional data point (its R, G, and B value). Your final data structure should be a NumPy array of shape `(width * height, 3)`.

3.  **Apply k-Means Clustering**:
    * Use the `sklearn.cluster.KMeans` algorithm on your list of pixel colors.
    * Set `k` to the desired number of final colors. Good starting values are `k=8` or `k=16`.
    * The algorithm will group all the pixel colors into `k` clusters and find the centroid (the mean color) for each cluster. These `k` centroids will form your new, optimized color palette.

4.  **Recreate the Image**:
    * Create a new image by replacing each original pixel's color with the color of the centroid it was assigned to by the k-Means algorithm.
    * You will need to use the labels assigned by the fitted k-Means model to map each pixel to its new color from the palette (the cluster centers).
    * Finally, reshape the list of new pixel colors back to the original image's dimensions (height x width x 3).

5.  **Display Results**:
    * Display the original image and the new color-quantized image side-by-side to visually compare the results.


**Discussion Questions**

1.  **The Effect of `k`**: In the code, `K_COLORS` determines the final number of colors. What happens to the final image as you make `k` very small (e.g., 2 or 3)? What happens when you make it larger (e.g., 64)? Discuss the trade-off between the value of `k`, visual fidelity, and the level of compression.

2.  **Understanding the Centroids**: What do the `kmeans.cluster_centers_` physically represent in this exercise? Why are these centroids the perfect choice for the new color palette?

3.  **Real-World Applications**: Beyond creating artistic effects, where is color quantization useful? Think about old video games or image file formats like GIF. How does this technique help with image compression?

4.  **Limitations of k-Means**: Did you notice any strange artifacts or color choices in the quantized image? K-Means only considers color, not the spatial location of pixels. How might this lead to less-than-ideal results on certain images (e.g., an image with a smooth, subtle gradient)?

5.  **k-Means Initialization**: The k-Means algorithm's final result can sometimes depend on the initial placement of its clusters. The `n_init` parameter in `sklearn.cluster.KMeans` runs the algorithm multiple times with different starting points and chooses the best result. Why is this important for achieving a good and consistent color palette?

In [None]:
import cv2
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import KMeans

# --- 1. Load Image ---

# Change this filename to the path of your chosen colorful image.
IMAGE_PATH = "images/bowl_of_fruit.jpg"


bgr_image = cv2.imread(IMAGE_PATH)

# Convert the image from BGR to RGB for correct display with Matplotlib
rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)


# --- 2. Prepare Pixel Data ---

# Get the dimensions of the image
height, width, _ = rgb_image.shape

# Reshape the image to be a list of pixels (N_pixels, 3)
# We also convert to float32, as k-means expects this type.
pixel_data = rgb_image.reshape((-1, 3))
pixel_data = np.float32(pixel_data)


# --- 3. Apply k-Means Clustering ---

# Set the number of clusters (desired number of colors)
K_COLORS = 10

print(f"\nApplying k-Means clustering to find {K_COLORS} dominant colors...")

# Create a k-Means model and fit it to the pixel data
# n_init='auto' is the default and recommended setting to handle initialization intelligently


kmeans = None  # Implement this


# The cluster centers are our new color palette.
# These are floats, so we convert them back to 8-bit unsigned integers.


new_palette = None  # Implement this


# --- 4. Recreate the Image ---

# `kmeans.labels_` contains the cluster index for each pixel.
# We can use this to create the new image.
labels = kmeans.labels_
quantized_pixels = new_palette[labels]

# Reshape the 1D array of pixels back to the original image dimensions
quantized_image = quantized_pixels.reshape((height, width, 3))


# --- 5. Display Results ---

plt.style.use("default")
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.imshow(rgb_image)
plt.title("Original Image")
plt.axis("off")

plt.subplot(1, 2, 2)
plt.imshow(quantized_image)
plt.title(f"Color-Quantized Image (k={K_COLORS})")
plt.axis("off")

plt.tight_layout()
plt.show()

## Exercise 5 Simple Object Classification with a Neural Network

In this exercise, we will tackle the same problem as before—classifying basic geometric shapes—but this time using **PyTorch**, another major deep learning framework.

The goal remains to build, train, and evaluate a neural network to distinguish between "Circles" and "Squares." This will give you a chance to see how a different framework handles model definition, training loops, and data handling, which are core skills for any deep learning practitioner.

**Tasks**:

1.  **Load and Prepare Data**:
    * You can use the create shape dataset function that is supplied or implement this yourself.
    * The code cell below includes a function to generate the 28x28 grayscale images of circles and squares.
    * You will need to flatten each 28x28 image into a 784-element vector and normalize the pixel values to a [0, 1] range.
    * Convert the NumPy data arrays into PyTorch Tensors.
    * Use PyTorch's `TensorDataset` and `DataLoader` to create efficient data loaders for batching and shuffling.

2.  **Define the Network Architecture**:
    * Create a simple neural network by defining a class that inherits from PyTorch's `torch.nn.Module`.
    * In the `__init__` method, define your layers using `torch.nn.Linear`.
        * One hidden layer with a small number of neurons (e.g., 64).
        * An output layer with a single neuron.
    * In the `forward` method, define the flow of data through the layers, applying a **ReLU** activation to the hidden layer and a **Sigmoid** activation to the output layer.

3.  **Define Loss Function and Optimizer**:
    * Choose a loss function suitable for binary classification, which is `nn.BCELoss` (Binary Cross-Entropy Loss) in PyTorch.
    * Select an optimizer, such as `torch.optim.SGD`, and pass it the model's parameters.

4.  **Write the Training Loop**:
    * In PyTorch, you write the training loop explicitly.
    * For a set number of `epochs`, you will loop through the training `DataLoader`, and for each batch you must:
        1.  Perform a forward pass to get predictions.
        2.  Compute the loss.
        3.  Clear previous gradients (`optimizer.zero_grad()`).
        4.  Perform backpropagation (`loss.backward()`).
        5.  Update the model's weights (`optimizer.step()`).

5. **Tune hyperparameters**
    * Try different values for the following hyperparameters:
        - Learning rate
        - Momentum
        - Hidden layer size
        - Dropout
    * Achieve a validation accuracy of more than 80 %

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Training hyper parameters
HIDDEN_SIZE = 64
LEARNING_RATE = 0.001
DROPOUT = 0.0
MOMENTUM = 0.0
NUM_EPOCHS = 100

# Data Parameters
NUM_SAMPLES_PER_CLASS = 256
IMG_SIZE = 28
BATCH_SIZE = 32


def create_shape_dataset():
    def generate_shape_data(num_images, shape_type, img_size=28):
        """Generates a dataset of simple shapes (circles or squares)."""
        images = []
        for _ in range(num_images):
            img = np.zeros((img_size, img_size), dtype=np.uint8)
            center = (np.random.randint(10, 18), np.random.randint(10, 18))
            size = np.random.randint(5, 15)
            color = 255

            if shape_type == "circle":
                cv2.circle(img, center, size, color, -1)
            elif shape_type == "square":
                pt1 = (center[0] - size, center[1] - size)
                pt2 = (center[0] + size, center[1] + size)
                cv2.rectangle(img, pt1, pt2, color, -1)
            angle = np.random.randint(-10, 10)
            M = cv2.getRotationMatrix2D((center[0], center[1]), angle, 1)
            img = cv2.warpAffine(img, M, (img_size, img_size))
            noise = np.random.normal(0, 5, img.shape).astype(np.uint8)
            img = cv2.add(img, noise)
            images.append(img)
        return np.array(images)

    # --- 1. Load and Prepare Data ---
    print("1. Generating and preparing data for PyTorch...")

    # Generate images
    circles = generate_shape_data(NUM_SAMPLES_PER_CLASS, "circle", IMG_SIZE)
    squares = generate_shape_data(NUM_SAMPLES_PER_CLASS, "square", IMG_SIZE)

    # Create labels (0 for circle, 1 for square)
    circle_labels = np.zeros(circles.shape[0])
    square_labels = np.ones(squares.shape[0])

    # Combine, flatten, and normalize
    X = (
        np.concatenate((circles, squares), axis=0)
        .reshape(-1, IMG_SIZE * IMG_SIZE)
        .astype("float32")
        / 255.0
    )
    y = np.concatenate((circle_labels, square_labels), axis=0).astype("float32")

    # Split data
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )

    # Convert to PyTorch Tensors
    X_train_tensor = torch.from_numpy(X_train)
    y_train_tensor = torch.from_numpy(y_train).view(-1, 1)  # Reshape for BCELoss
    X_test_tensor = torch.from_numpy(X_test)
    y_test_tensor = torch.from_numpy(y_test).view(-1, 1)  # Reshape for BCELoss

    # Create TensorDatasets and DataLoaders
    train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
    test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

    train_loader = DataLoader(
        dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True
    )
    test_loader = DataLoader(dataset=test_dataset, batch_size=BATCH_SIZE, shuffle=False)
    return train_loader, test_loader


# Change this
train_loader, test_loader = None, None

print(f"   - Data loaded into PyTorch DataLoaders with batch size {BATCH_SIZE}.")

# --- 2. Define the Network Architecture ---
print("\n2. Defining the Neural Network model using nn.Module...")


class SimpleNet(nn.Module):
    def __init__(self, hidden_size, dropout=0.0):
        pass  # Your code goes here

    def forward(self, x):
        pass  # Your code goes here


model = SimpleNet(hidden_size=HIDDEN_SIZE, dropout=DROPOUT)

# --- 3. Define Loss Function and Optimizer ---
print("\n3. Defining loss function and optimizer...")

criterion = None  # Binary Cross-Entropy Loss for binary classification
optimizer = None

# --- 4. Write the Training Loop ---
print("\n4. Training the network...")
train_losses, val_losses = [], []
train_accuracies, val_accuracies = [], []


for epoch in range(NUM_EPOCHS):
    model.train()  # Set the model to training mode
    running_loss = 0.0
    correct_train = 0
    total_train = 0

    for inputs, labels in train_loader:
        # Implement output and loss calculation
        outputs = None
        loss = None

        running_loss += loss.item()
        predicted = (outputs > 0.5).float()
        total_train += labels.size(0)
        correct_train += (predicted == labels).sum().item()

    train_loss = running_loss / len(train_loader)
    train_acc = correct_train / total_train
    train_losses.append(train_loss)
    train_accuracies.append(train_acc)

    # --- 5. Evaluation within the loop ---
    model.eval()  # Set the model to evaluation mode
    running_val_loss = 0.0
    correct_val = 0
    total_val = 0
    with torch.no_grad():  # Disable gradient calculation
        for inputs, labels in test_loader:
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            running_val_loss += loss.item()
            predicted = (outputs > 0.5).float()
            total_val += labels.size(0)
            correct_val += (predicted == labels).sum().item()

    val_loss = running_val_loss / len(test_loader)
    val_acc = correct_val / total_val
    val_losses.append(val_loss)
    val_accuracies.append(val_acc)

    print(
        f"Epoch {epoch + 1}/{NUM_EPOCHS} | Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} | Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f}"
    )

print("\nTraining complete.")

# --- Final Evaluation ---
print(f"\nFinal Test Accuracy: {val_accuracies[-1] * 100:.2f}%")
print(f"Final Test Loss: {val_losses[-1]:.4f}")


# --- Plotting ---
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(train_accuracies, label="Training Accuracy")
plt.plot(val_accuracies, label="Validation Accuracy")
plt.title("Model Accuracy")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(train_losses, label="Training Loss")
plt.plot(val_losses, label="Validation Loss")
plt.title("Model Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()

plt.tight_layout()
plt.show()

** Discussion Questions**:

1. Learning Rate and Convergence: Describe the effect of the learning rate. What did you observe in the training/validation loss curves when you set the learning rate too high (e.g., 0.1) versus too low (e.g., 1e-5)? How does the learning rate control how quickly the model finds an optimal solution?

2. Hidden Layer Capacity: Explain the trade-off involved in setting the hidden layer size. What happened to your training and validation accuracy when you used a very small number of neurons (e.g., 8)? What happened with a very large number (e.g., 512)? Relate your findings to the concepts of underfitting (the model is too simple) and overfitting (the model is too complex and memorizes the training data).

3. The Role of Dropout: Dropout is a form of regularization. What problem is it designed to solve? Describe where you added the nn.Dropout layer in your model's forward pass. What effect did adding dropout have on the gap between your training accuracy and your validation accuracy?

4. The Optimizer and Momentum: The Adam optimizer has built-in adaptive momentum. If you switched to a standard SGD (Stochastic Gradient Descent) optimizer, what was the role of the momentum parameter? How did adding momentum (e.g., a value of 0.9) to SGD affect the training speed and the final accuracy?

5. Your Winning Combination: There is no single "correct" combination of hyperparameters to achieve the target accuracy. Describe the final set of parameters that worked for you. Which hyperparameter do you feel had the most significant impact on improving the model's performance for this specific task, and why?

