<center><img src="https://javier.rodriguez.org.mx/itesm/2014/tecnologico-de-monterrey-blue.png" width="450" align="center"></center>
<br><p><center><h1><b>Object Detection with Neural Network Appraoch: Analyzing the Stanford Cars Dataset</b></h1></center></p> 
<p><center><h3>Course: <i>Neural Network Design and Deep Learning</i></h3></center></p>
<p><center><h4>Instructed by: <i>Dr. Leonardo Mauricio Cañete Sifuentes</i></h4></center></p> 

<p style="text-align: right;">Alejandro Santiago Baca Eyssautier - A01656580</p> 
<p style="text-align: right;">André Colín Avila - A01657474</p> 
<p style="text-align: right;">Santiago Caballero - A01657699</p> 
<p style="text-align: right;">November 28th, 2024</p><br>

<br><p><h3> <b>1. Introduction</b></h3></p>

The objective of this project is to explore object detection using neural networks by analyzing the **Stanford Cars Dataset**. This dataset is widely used for fine-grained visual categorization, making it an excellent choice for detecting and classifying cars into various categories based on their make, model, and year. 

Throughout this project, the team aims to preprocess the data, construct multiple neural network architectures, and evaluate their performance to identify the most efficient and accurate model. The project is divided into individual and team contributions, ensuring a collaborative yet personalized approach to model development.

By leveraging deep learning techniques, the team seeks to tackle the challenges of detecting subtle differences between visually similar objects while maintaining computational efficiency.

<br>

<br><p><h3> <b>2. Dataset Selection and Justification</b></h3></p>

The **Stanford Cars Dataset** consists of 16,185 images of cars categorized into 196 classes based on their make, model, and year. The dataset is split into 8,144 training images and 8,041 test images. It is commonly used for fine-grained image classification and object detection tasks.

**Key Features:**

- **Name**: Stanford Cars Dataset
- **Download URL**: [Stanford Cars Dataset on Papers with Code](https://paperswithcode.com/dataset/stanford-cars)
- **Description**: The images are provided in high resolution, enabling detailed feature extraction. With 196 classes, the dataset provides a challenging environment for distinguishing between visually similar classes. Bounding boxes and class labels are included, making it suitable for object detection and classification tasks.

**Justification**  

The Stanford Cars Dataset is an ideal choice for this project for several reasons:

1. **Problem Relevance**: The dataset aligns with the team's objective of solving an object detection problem. Its detailed annotations support both object localization and classification tasks.
2. **Complexity**: The fine-grained nature of the dataset presents a significant challenge, requiring advanced neural network architectures to achieve high performance.
3. **Broad Applicability**: Insights gained from working on this dataset can be extended to other fine-grained object detection problems, such as species identification or defect detection in industrial processes.

<br>

<br><p><h3> <b>3. Data Preprocessing and Splitting</b></h3></p>

The dataset is loaded using the **Hugging Face `datasets` library**, which provides a convenient way to access and manipulate datasets. The **Stanford Cars Dataset** has predefined splits for training, testing, and several additional subsets with distortions such as noise and blur. 

The project focuses on the **training** and **test** splits for building and evaluating the models, with a portion of the training set used for validation.

**Preprocessing Steps**

1. **Image Resizing**: Images are resized to a fixed resolution ($224 \times 224$) to ensure compatibility with deep learning models.
2. **Normalization**: Pixel values are normalized to the range [0, 1] to stabilize the training process.
3. **Data Augmentation**: Techniques such as flipping and rotation are applied to the training set to improve generalization.
4. **Dataset Splitting**:
   - The training set is further split into **training** (80%) and **validation** (20%) subsets for hyperparameter tuning and model evaluation.

In [4]:
from datasets import load_dataset
from torchvision import transforms
from torch.utils.data import DataLoader, random_split

# Load the dataset
dataset = load_dataset("tanganke/stanford_cars")

# Display dataset structure
print(dataset)

# Access training and test sets
train_set = dataset["train"]
test_set = dataset["test"]

# Define preprocessing pipeline
transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize images
    transforms.ToTensor(),          # Convert images to tensors
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])  # Normalize pixel values
])

# Apply transformations to dataset
train_set = train_set.with_transform(lambda example: {
    "image": transform(example["image"]),
    "label": example["label"]
})

test_set = test_set.with_transform(lambda example: {
    "image": transform(example["image"]),
    "label": example["label"]
})

# Split training set into training and validation
train_size = int(0.8 * len(train_set))
val_size = len(train_set) - train_size
train_set, val_set = random_split(train_set, [train_size, val_size])

# Create DataLoaders for batching
train_loader = DataLoader(train_set, batch_size=32, shuffle=True)
val_loader = DataLoader(val_set, batch_size=32, shuffle=False)
test_loader = DataLoader(test_set, batch_size=32, shuffle=False)

DatasetDict({
    train: Dataset({
        features: ['image', 'label'],
        num_rows: 8144
    })
    test: Dataset({
        features: ['image', 'label'],
        num_rows: 8041
    })
    contrast: Dataset({
        features: ['image', 'label'],
        num_rows: 8041
    })
    gaussian_noise: Dataset({
        features: ['image', 'label'],
        num_rows: 8041
    })
    impulse_noise: Dataset({
        features: ['image', 'label'],
        num_rows: 8041
    })
    jpeg_compression: Dataset({
        features: ['image', 'label'],
        num_rows: 8041
    })
    motion_blur: Dataset({
        features: ['image', 'label'],
        num_rows: 8041
    })
    pixelate: Dataset({
        features: ['image', 'label'],
        num_rows: 8041
    })
    spatter: Dataset({
        features: ['image', 'label'],
        num_rows: 8041
    })
})
