# Data Preprocessing for AlexNet Model

## Objective
In this notebook, we will preprocess the human detection dataset to prepare it for training and evaluation on the AlexNet model. Preprocessing involves the following key steps:

1. **Loading Images**: Load images from the dataset directory, which contains subdirectories for images with humans (`1`) and without humans (`0`).
2. **Resizing**: Resize all images to 227x227 pixels to match the input size required by AlexNet.
3. **Normalization**: Normalize pixel values to the range [0, 1] to improve training stability.
4. **Dataset Splitting**: Split the dataset into training, validation, and test sets to ensure proper evaluation of the model's performance.

This preprocessing is essential to standardize the input data and ensure compatibility with the AlexNet architecture.

In [1]:
# Import necessary libraries
import os
import cv2
import numpy as np
from sklearn.model_selection import train_test_split

## Define Functions for Data Preprocessing

In [3]:
def load_and_preprocess_images(data_dir, target_size=(227, 227)):
    """
    Load and preprocess images from the dataset directory.
    
    Parameters:
        data_dir (str): Path to the dataset directory.
        target_size (tuple): Desired image size (width, height).
        
    Returns:
        X (numpy array): Preprocessed images.
        y (numpy array): Corresponding labels (1 for humans, 0 for no humans).
    """
    images = []
    labels = []
    
    # Define subdirectories for classes
    classes = ['0', '1']  # 0: No humans, 1: With humans

    for label in classes:
        class_dir = os.path.join(data_dir, label)
        for file_name in os.listdir(class_dir):
            file_path = os.path.join(class_dir, file_name)
            image = cv2.imread(file_path)
            if image is not None:  # Ensure valid image
                image = cv2.resize(image, target_size)
                images.append(image)
                labels.append(int(label))

    # Convert to NumPy arrays
    X = np.array(images, dtype='float32') / 255.0  # Normalize
    y = np.array(labels, dtype='int')
    
    return X, y

In [4]:
def split_dataset(X, y, test_size=0.2, val_size=0.1):
    """
    Split dataset into training, validation, and test sets.
    
    Parameters:
        X (numpy array): Images.
        y (numpy array): Labels.
        test_size (float): Proportion of test data.
        val_size (float): Proportion of validation data (from training set).
        
    Returns:
        X_train, X_val, X_test, y_train, y_val, y_test
    """
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)
    X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=val_size, random_state=42)
    return X_train, X_val, X_test, y_train, y_val, y_test

## Load and Preprocess the Dataset

In [8]:
# Path to the dataset
DATA_DIR = "human_detection_dataset"  # Replace with your dataset path

# Load and preprocess images
X, y = load_and_preprocess_images(DATA_DIR)

# Split the dataset
X_train, X_val, X_test, y_train, y_val, y_test = split_dataset(X, y)

# Print dataset sizes
print(f"Training set: {X_train.shape}, Validation set: {X_val.shape}, Test set: {X_test.shape}")
print(f"Training labels: {y_train.shape}, Validation labels: {y_val.shape}, Test labels: {y_test.shape}")



Training set: (662, 227, 227, 3), Validation set: (74, 227, 227, 3), Test set: (185, 227, 227, 3)
Training labels: (662,), Validation labels: (74,), Test labels: (185,)
