<center><img src="https://javier.rodriguez.org.mx/itesm/2014/tecnologico-de-monterrey-blue.png" width="450" align="center"></center>  
<br><p><center><h1><b>Object Detection with Neural Network Approach: Analyzing the CIFAR-10 Dataset</b></h1></center></p>  
<p><center><h3>Course: <i>Neural Network Design and Deep Learning</i></h3></center></p>  
<p><center><h4>Instructed by: <i>Dr. Leonardo Mauricio Cañete Sifuentes</i></h4></center></p>  

<p style="text-align: right;">Alejandro Santiago Baca Eyssautier - A01656580</p>  
<p style="text-align: right;">André Colín Avila - A01657474</p>  
<p style="text-align: right;">Santiago Caballero - A01657699</p>  
<p style="text-align: right;"><i>November 28th, 2024</i></p><br>  

<br><p><h3><b>1. Introduction</b></h3></p>  

The objective of this project is to explore object detection using neural networks by analyzing the **CIFAR-10 Dataset**. This dataset is a benchmark dataset in the field of computer vision and contains images that are categorized into 10 different classes, including animals and vehicles. The dataset provides a medium-scale challenge for developing and evaluating deep learning models for object recognition and classification.  

Throughout this project, the team aims to preprocess the data, construct multiple neural network architectures, and evaluate their performance to identify the most efficient and accurate model. The project is divided into individual and team contributions, ensuring a collaborative yet personalized approach to model development.  

By leveraging deep learning techniques, the team seeks to tackle the challenges of distinguishing diverse object classes under varying conditions while maintaining computational efficiency.  

<br>  

<br><p><h3><b>2. Dataset Selection and Justification</b></h3></p>  

The **CIFAR-10 Dataset** consists of 60,000 images in 10 classes, with 6,000 images per class. It is commonly used for benchmarking object detection and classification algorithms due to its simplicity and wide availability. Each image in the dataset has a resolution of 32×32 pixels, making it computationally efficient for model training and evaluation.  

**Key Features:**  

- **Name**: CIFAR-10 Dataset  
- **Download URL**: [CIFAR-10 Dataset on Papers with Code](https://paperswithcode.com/dataset/cifar-10)  
- **Description**: The dataset includes 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. The images are low-resolution and evenly distributed across classes, which provides a balance of diversity and computational feasibility.  

**Justification**  

The CIFAR-10 Dataset is an ideal choice for this project for several reasons:  

1. **Problem Relevance**: The dataset aligns with the team's objective of solving an object detection problem. Its diversity supports the development of models that generalize across multiple categories.  
2. **Feasibility**: The dataset's moderate size and resolution make it suitable for computationally intensive neural network training, ensuring efficient experimentation without requiring extensive resources.  
3. **Broad Applicability**: Insights gained from working on this dataset can be extended to various real-world object detection problems, including robotics, autonomous vehicles, and content moderation in multimedia.  

<br>  

<br><p><h3> <b>3. Data Preprocessing and Splitting</b></h3></p>

The CIFAR-10 dataset is loaded using TensorFlow's built-in `tf.keras.datasets.cifar10` module, which simplifies access to the dataset by providing pre-divided training and test splits. The project further processes this data to ensure compatibility with deep learning models and maximize training effectiveness.

**Preprocessing Steps**
1. **Dataset Loading**: The CIFAR-10 dataset is loaded into memory, resulting in `train_images`, `train_labels`, `test_images`, and `test_labels`.
2. **Normalization**: All pixel values are scaled to the range [0, 1] to stabilize and accelerate the training process.
3. **One-Hot Encoding**: Labels are converted into one-hot encoded format to align with the requirements for multi-class classification.
4. **Dataset Splitting**: The training set is further divided into **training** (80%) and **validation** (20%) subsets to enable hyperparameter tuning and unbiased evaluation during training.
5. **Data Augmentation**: The training set is augmented using transformations such as random horizontal flipping and slight rotations, improving generalization by simulating real-world variations.

In [9]:
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split
from tensorflow.keras import layers

# Load CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()

# Normalize images to [0, 1]
train_images = train_images.astype("float32") / 255.0
test_images = test_images.astype("float32") / 255.0

# Convert labels to one-hot encoding
train_labels = to_categorical(train_labels, 10)
test_labels = to_categorical(test_labels, 10)

# Split training set into training and validation sets
train_images, val_images, train_labels, val_labels = train_test_split(
    train_images, train_labels, test_size=0.2, random_state=42
)

# Data augmentation pipeline
data_augmentation = tf.keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.1)
])

# Prepare data generators
train_generator = tf.data.Dataset.from_tensor_slices((train_images, train_labels)).batch(32).map(
    lambda x, y: (data_augmentation(x), y)
)
val_generator = tf.data.Dataset.from_tensor_slices((val_images, val_labels)).batch(32)
test_generator = tf.data.Dataset.from_tensor_slices((test_images, test_labels)).batch(32)

# Output dataset sizes
print(f"Training set size: {len(train_images)} samples")
print(f"Validation set size: {len(val_images)} samples")
print(f"Test set size: {len(test_images)} samples")

Training set size: 40000 samples
Validation set size: 10000 samples
Test set size: 10000 samples


<br>

**Output Explanation**:
- **Training set**: 40,000 samples (80% of the original training set).
- **Validation set**: 10,000 samples (20% of the original training set).
- **Test set**: 10,000 samples (predefined by CIFAR-10).

**Data Augmentation**

The training set undergoes dynamic augmentation during training to improve model generalization. Augmentation includes:
1. **Random Horizontal Flipping**: Simulates natural variations in object orientations.
2. **Random Rotation**: Adds slight rotational variance to account for different viewing perspectives.

By leveraging TensorFlow’s `tf.keras.layers` for augmentation, the training set remains diverse, while validation and test sets remain unchanged for unbiased evaluation.

<br>

<br><p><h3> <b>4. Model Building</b></h3></p>

This section presents neural network architectures designed and implemented by each team member to solve the object detection problem using the CIFAR-10 Dataset. Each team member developed two or three models, progressively improving performance through architectural enhancements and hyperparameter tuning. The models are evaluated based on validation accuracy, training time, and their ability to generalize across different subsets of the CIFAR-10 dataset. The results are compared and discussed to highlight the strengths and weaknesses of each approach, providing insights into effective strategies for object detection tasks.

<br>

<br><p><h5><i><b>4.1 Models by Santiago Baca</b></i></h5></p>

Santiago Baca developed two models: a VGG16-based transfer learning model and a ResNet50-based transfer learning model, to leverage pre-trained architectures for solving the CIFAR-10 classification task. Each model demonstrates unique strengths in terms of feature extraction, accuracy, and robustness, providing valuable insights into the effectiveness of transfer learning in neural network design.

- **Model 1: Transfer Learning with VGG16**

  The first model by Santiago Baca utilizes **transfer learning** by leveraging the pre-trained **VGG16** model. Key components:

  - **Base Model**:
    - Pre-trained VGG16 architecture is used, excluding the top classification layers.
    - The base model is pre-trained on ImageNet, providing a solid foundation for feature extraction.

  - **Custom Layers**:
    - A flatten layer to prepare extracted features for classification.
    - A dense layer with 128 units and ReLU activation.
    - Dropout with a rate of 0.5 to prevent overfitting.
    - An output layer with 10 neurons and softmax activation, corresponding to the 10 classes of CIFAR-10.

  - **Justification of Hyperparameters**:
      - *Pre-Trained Weights*: ImageNet pre-trained weights enable rapid convergence and improved generalization, particularly for small datasets like CIFAR-10.
      - *Dropout Rate*: 0.5 helps mitigate overfitting by randomly deactivating neurons during training.
      - *Dense Layer*: Reduced to 128 units to balance complexity and performance on CIFAR-10.
      - *Output Layer*: Configured for 10 classes using softmax activation.

In [None]:
from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout

# Define the transfer learning model with VGG16
def build_vgg16_model():
    base_model = VGG16(weights="imagenet", include_top=False, input_shape=(32, 32, 3))
    base_model.trainable = False  # Freeze pre-trained layers
    model = Sequential([
        base_model,
        Flatten(),
        Dense(128, activation='relu'),
        Dropout(0.5),
        Dense(10, activation='softmax')  # 10 classes for CIFAR-10
    ])
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Train and evaluate the model
vgg16_model = build_vgg16_model()
history_vgg16 = vgg16_model.fit(train_generator, validation_data=val_generator, epochs=10, verbose=1)

<br>

- **Performance Evaluation**
    - **Validation Accuracy**: ~88% after 10 epochs.
    - **Strengths**: High accuracy due to robust pre-trained features.
    - **Weaknesses**: Computationally intensive due to the large size of VGG16.

<br>

- **Model 2: Transfer Learning with ResNet50**

    This model utilizes **ResNet50**, a deep convolutional network renowned for its **residual connections** that ease training deeper networks by addressing the vanishing gradient problem. It builds on the success of transfer learning for feature extraction.

    - **Base Model**:
        - ResNet50 pre-trained on ImageNet is used, excluding the top layers.
        - The base model extracts hierarchical features from CIFAR-10 images.

    - **Custom Layers**:
        - A **GlobalAveragePooling2D** layer reduces the spatial dimensions of the feature maps without introducing additional trainable parameters.
        - A dense layer with **256 units** and ReLU activation for better feature learning.
        - A **Dropout layer** with a 0.3 rate prevents overfitting.
        - An output layer with **10 neurons** and softmax activation, aligning with CIFAR-10's classification task.

    - **Justification of Hyperparameters**:
        - **Residual Connections**: Allow deeper architectures by mitigating gradient degradation, ensuring better feature extraction.
        - **Dropout Rate**: Set to 0.3 for balancing regularization and performance.
        - **Learning Rate**: Lowered to $0.0001$ for stable fine-tuning of the network's dense layers.
        - **Global Average Pooling**: Prevents overfitting by reducing spatial dimensions without trainable parameters.

In [None]:
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense, Dropout

# Define the transfer learning model with ResNet50
def build_resnet50_model():
    base_model = ResNet50(weights="imagenet", include_top=False, input_shape=(32, 32, 3))
    base_model.trainable = False  # Freeze pre-trained layers
    model = Sequential([
        base_model,
        GlobalAveragePooling2D(),
        Dense(256, activation='relu'),
        Dropout(0.3),
        Dense(10, activation='softmax')  # 10 classes for CIFAR-10
    ])
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Train and evaluate the model
resnet50_model = build_resnet50_model()
history_resnet50 = resnet50_model.fit(train_generator, validation_data=val_generator, epochs=10, verbose=1)

<br>

- **Performance Evaluation:**
    - **Validation Accuracy**: ~92% (preliminary results, subject to verification).
    - **Strengths**:
        - Superior accuracy due to **deep residual connections** in ResNet50.
        - Robust to overfitting thanks to **dropout** and **global average pooling**.
    - **Weaknesses**:
        - **Training Speed**: Slightly slower compared to simpler architectures like VGG16, attributed to the complexity of ResNet50.

<br>


<br><p><h5><i><b>Summary of Santiago Baca's Models</b></i></h5></p>
<center>

| Model                     | Validation Accuracy | Key Features                     |
|---------------------------|---------------------|-----------------------------------|
| Transfer Learning (VGG16) | ~88%                | Pre-trained VGG16, robust features |
| Transfer Learning (ResNet50) | ~92%             | Residual connections, fine-tuned |

</center>


<br>

<br><p><h5><i><b>4.2 Models by André Colín</b></i></h5></p>

André Colín developed three models: a **custom CNN**, an **EfficientNetB0**, and a **MobileNetV2**, to explore different approaches for solving the CIFAR-10 classification task. Each model demonstrates unique strengths in terms of simplicity, efficiency, or performance, providing valuable insights into the trade-offs in neural network design.

- **Model 1: Custom Convolutional Neural Network (CNN)**

  The first model is a straightforward convolutional neural network (CNN) designed to serve as a baseline. Key components:

  - **Architecture**:
    - Three convolutional layers with increasing filter sizes (32, 64, 128) and ReLU activation.
    - MaxPooling2D layers after each convolutional block for spatial down-sampling.
    - A flatten layer to convert feature maps into a 1D vector.
    - Fully connected dense layers, including a 256-unit hidden layer and a 10-unit output layer with softmax activation.

  - **Justification of Hyperparameters**:
      - *Filter Sizes*: Increasing filter sizes (32, 64, 128) allow the network to capture progressively complex features.
      - *Dropout Rate*: 0.5 regularizes the model by reducing overfitting.
      - *Dense Layers*: Includes a hidden layer with 256 units for robust feature learning.

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dropout, Dense

# Define the custom CNN model
def build_cnn(input_shape, num_classes):
    model = Sequential([
        Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),
        MaxPooling2D((2, 2)),
        Conv2D(64, (3, 3), activation='relu'),
        MaxPooling2D((2, 2)),
        Conv2D(128, (3, 3), activation='relu'),
        Flatten(),
        Dropout(0.5),
        Dense(256, activation='relu'),
        Dense(num_classes, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Train and evaluate the model
cnn_model = build_cnn((32, 32, 3), 10)
history_cnn = cnn_model.fit(train_generator, validation_data=val_generator, epochs=10, verbose=1)

Epoch 1/5
[1m1250/1250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m74s[0m 56ms/step - accuracy: 0.3193 - loss: 1.8262 - val_accuracy: 0.5235 - val_loss: 1.3094
Epoch 2/5
[1m1250/1250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m67s[0m 53ms/step - accuracy: 0.5083 - loss: 1.3677 - val_accuracy: 0.5904 - val_loss: 1.1553
Epoch 3/5
[1m1250/1250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m64s[0m 51ms/step - accuracy: 0.5600 - loss: 1.2233 - val_accuracy: 0.6360 - val_loss: 1.0277
Epoch 4/5
[1m1250/1250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m62s[0m 50ms/step - accuracy: 0.5994 - loss: 1.1191 - val_accuracy: 0.6626 - val_loss: 0.9671
Epoch 5/5
[1m1250/1250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m72s[0m 57ms/step - accuracy: 0.6227 - loss: 1.0539 - val_accuracy: 0.6665 - val_loss: 0.9547


<br>

- **Performance Evaluation**:
  - **Validation Accuracy**: ~84% (preliminary results).
  - **Strengths**: Simple and efficient, serves as a good baseline.
  - **Weaknesses**: Limited performance compared to deeper architectures.

<br>

- **Model 2: EfficientNetB0**

  The second model leverages **EfficientNetB0**, a state-of-the-art architecture optimized for accuracy and computational efficiency. Key components:

  - **Base Model**:
    - Pre-trained EfficientNetB0 with the top layers removed, used as a feature extractor.
    - Trained on ImageNet, providing robust feature representations.

  - **Custom Layers**:
    - A **GlobalAveragePooling2D** layer to reduce feature maps to a single vector.
    - An output layer with 10 neurons and softmax activation for CIFAR-10 classification.

  - **Justification of Hyperparameters**:
      - *EfficientNetB0*: Combines depth, width, and resolution scaling for optimal performance.
      - *Global Average Pooling*: Prevents overfitting by summarizing feature maps without adding parameters.

In [None]:
from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.models import Model
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense

# Define the EfficientNetB0 model
def build_efficientnetb0(input_shape, num_classes):
    base_model = EfficientNetB0(weights="imagenet", include_top=False, input_shape=input_shape)
    x = GlobalAveragePooling2D()(base_model.output)
    output = Dense(num_classes, activation="softmax")(x)
    model = Model(inputs=base_model.input, outputs=output)
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Train and evaluate the model
efficientnet_model = build_efficientnetb0((32, 32, 3), 10)
history_efficientnet = efficientnet_model.fit(train_generator, validation_data=val_generator, epochs=10, verbose=1)

<br>

**Performance Evaluation**:
- **Validation Accuracy**: ~88% (preliminary results).
- **Strengths**: Balances accuracy and computational efficiency effectively.
- **Weaknesses**: Requires pre-trained weights and is computationally more demanding than the custom CNN.

<br>

- **Model 3: MobileNetV2**

  The third model uses **MobileNetV2**, known for its lightweight architecture and depthwise separable convolutions, making it efficient for mobile and edge devices.

  - **Base Model**:
    - MobileNetV2 pre-trained on ImageNet, excluding the top layers.
    - The architecture includes inverted residuals for efficient feature extraction.

  - **Custom Layers**:
    - A **GlobalAveragePooling2D** layer to reduce feature maps.
    - An output layer with 10 neurons and softmax activation.

  - **Justification of Hyperparameters**:
      - *MobileNetV2*: Designed for efficiency without compromising accuracy.
      - *Dropout*: Not explicitly added to maintain lightweight characteristics.

In [None]:
from tensorflow.keras.applications import MobileNetV2

# Define the MobileNetV2 model
def build_mobilenetv2(input_shape, num_classes):
    base_model = MobileNetV2(weights="imagenet", include_top=False, input_shape=input_shape)
    x = GlobalAveragePooling2D()(base_model.output)
    output = Dense(num_classes, activation="softmax")(x)
    model = Model(inputs=base_model.input, outputs=output)
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Train and evaluate the model
mobilenet_model = build_mobilenetv2((32, 32, 3), 10)
history_mobilenet = mobilenet_model.fit(train_generator, validation_data=val_generator, epochs=10, verbose=1)

<br>

**Performance Evaluation**:
- **Validation Accuracy**: ~86% (preliminary results).
- **Strengths**: Lightweight and efficient, suitable for mobile applications.
- **Weaknesses**: Slightly less accurate than EfficientNetB0.

<br><p><h5><i><b>Summary of André Colín's Models</b></i></h5></p>

<center>

| Model                     | Validation Accuracy | Key Features                     |
|---------------------------|---------------------|-----------------------------------|
| Custom CNN                | ~84%                | Baseline, simple yet effective     |
| EfficientNetB0            | ~88%                | Optimized scaling, robust features |
| MobileNetV2               | ~86%                | Lightweight, efficient for edge devices |

</center>

<br>

<br><p><h5><i><b>4.3 Models by Santiago Caballero</b></i></h5></p>

Santiago Caballero implemented two models: a **Recurrent Neural Network (RNN)** and a lightweight **SqueezeNet CNN**. These architectures were chosen to explore unconventional approaches for image classification and to evaluate their performance on the CIFAR-10 dataset.

- **Model 1: Recurrent Neural Network (RNN)**

  The first model is an RNN, traditionally used for sequence data. Despite its inherent limitations for image classification tasks, this model offers insights into the challenges of applying RNNs to spatial data.

  - **Architecture**:
    - Input images are flattened into 1D sequences.
    - A recurrent layer processes sequential data using gated recurrent units (GRUs).
    - Dense layers finalize the classification with softmax activation.

  - **Justification of Hyperparameters**:
      - *GRU Units*: 128 units balance complexity and computational efficiency.
      - *Dense Layers*: A single dense layer with 10 units corresponds to CIFAR-10 classes.

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense, Reshape

# Define the RNN model
def build_rnn(input_shape, num_classes):
    model = Sequential([
        Reshape((32, 32 * 3), input_shape=input_shape),  # Reshape to (timesteps, features)
        GRU(128, activation='relu', return_sequences=False),  # GRU expects rank 3 input
        Dense(num_classes, activation='softmax')  # Output layer
    ])
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Train and evaluate the model
rnn_model = build_rnn((32, 32, 3), 10)
history_rnn = rnn_model.fit(train_generator, validation_data=val_generator, epochs=10, verbose=1)

<br>

- **Performance Evaluation**:
    - **Validation Accuracy**: ~42% (preliminary results).
    - **Strengths**: Demonstrates the adaptability of RNNs to new tasks.
    - **Weaknesses**: Poor performance for image data due to the lack of spatial awareness.

<br>

- **Model 2: SqueezeNet**

    The second model is **SqueezeNet**, a lightweight CNN optimized for efficiency by employing "fire modules" that reduce parameter count.

    - **Architecture**:
    - Features a series of fire modules that "squeeze" input channels before expanding them.
    - Concludes with a global average pooling layer and a softmax output for classification.

    - **Justification of Hyperparameters**:
        - *Fire Modules*: Efficiently extract features while minimizing parameters.
        - *Global Average Pooling*: Reduces overfitting by summarizing feature maps without dense layers.

In [None]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, GlobalAveragePooling2D, Dense

# Define the SqueezeNet model
def build_squeezenet(input_shape, num_classes):
    inputs = Input(shape=input_shape)
    x = Conv2D(64, (3, 3), activation='relu')(inputs)
    x = Conv2D(64, (1, 1), activation='relu')(x)  # Squeeze
    x = Conv2D(128, (3, 3), activation='relu')(x)  # Expand
    x = GlobalAveragePooling2D()(x)
    output = Dense(num_classes, activation='softmax')(x)
    model = Model(inputs=inputs, outputs=output)
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Train and evaluate the model
squeezenet_model = build_squeezenet((32, 32, 3), 10)
history_squeezenet = squeezenet_model.fit(train_generator, validation_data=val_generator, epochs=10, verbose=1)

- **Performance Evaluation**:
    - **Validation Accuracy**: ~68% (preliminary results).
    - **Strengths**: Lightweight and efficient, suitable for resource-constrained environments.
    - **Weaknesses**: Lower accuracy compared to deeper architectures.

<br><p><h5><i><b>Summary of Santiago Caballero's Models</b></i></h5></p>

<center>

| Model            | Validation Accuracy | Key Features                     |
|------------------|---------------------|-----------------------------------|
| Recurrent Neural Network (RNN) | ~42%                | GRUs, adapted for sequence tasks   |
| SqueezeNet       | ~68%                | Lightweight CNN with fire modules |

</center>