# Object Detection with TensorFlow API

## Overview
In this lesson, we will train an **Object Detection model** using the **TensorFlow Object Detection API**. You'll be able to choose between two datasets, **"PlasticinWater.coco.zip"** or **"PlasticinWaterNOAUGMENTS.coco.zip"**, based on your system's capabilities and runtime for Colab sessions. Both datasets are annotated in **COCO format**, with the primary difference being that **PlasticinWater.coco.zip** includes augmented images, whereas **PlasticinWaterNOAUGMENTS.coco.zip** contains the raw images without augmentation.

### Learning Objectives

By the end of this section, you will:
- Understand the structure and format of a dataset in **COCO format**.
- Learn how to load and preprocess a COCO-formatted dataset for object detection.
- Train a custom object detection model using the **TensorFlow Object Detection API** in Google Colab.
- Evaluate model performance using metrics such as **mAP** and visualize results.

---

## Downloading the Dataset

You can choose one of the following datasets based on your machine's ability and the available runtime in Colab:

1. **PlasticinWater.coco.zip**:
   - **Size**: 10,789 images (with augmentations).
   - **Description**: Includes a variety of augmented images for a more robust model. Useful if you have more computational power or access to Colab's extended runtimes.

2. **PlasticinWaterNOAUGMENTS.coco.zip**:
   - **Size**: 4,511 images (without augmentations).
   - **Description**: Ideal for machines with limited resources or Colab sessions with shorter runtimes.

**Download Links**:
- [PlasticinWater.coco.zip](https://drive.google.com/file/d/1RVPp7EmZ1H3R6IgX2azi0ScKQmNzkp2J/view?usp=sharing)
- [PlasticinWaterNOAUGMENTS.coco.zip](https://drive.google.com/file/d/1XO9uzUZsBfNhOS0DXi-W8V2Le80f0srs/view?usp=sharing)

---

## Understanding the COCO Format

The **COCO (Common Objects in Context)** format is a widely used annotation format in object detection tasks. It stores annotations for object detection, segmentation, and keypoint detection in a single **JSON file**. Here's how the dataset is structured:

### File Structure:
Both datasets are structured into **train**, **validation**, and **test** subfolders, each containing:
- `_annotations.coco.json`: The COCO-formatted annotations file for the respective split.
- Images for that split, with annotations stored in the COCO JSON file.

### COCO JSON Structure:
1. **Images**: Contains metadata about each image, such as its ID, file name, and dimensions.
2. **Annotations**: Contains the **bounding boxes**, **category IDs**, and **segmentation masks** (if any) for each image. 
3. **Categories**: Defines the classes of objects in the dataset (e.g., "plastic").

**Example**:
```json
{
    "images": [
        {
            "id": 1,
            "file_name": "000001.jpg",
            "height": 360,
            "width": 640
        }
    ],
    "annotations": [
        {
            "image_id": 1,
            "category_id": 1,
            "bbox": [100, 50, 150, 200],
            "area": 30000,
            "iscrowd": 0
        }
    ],
    "categories": [
        {"id": 1, "name": "Orange_Plastic_Cap"}
    ]
}
```

bbox: The bounding box coordinates in [x, y, width, height] format.
iscrowd: Determines whether the object is part of a crowd.
area: The area covered by the bounding box.

## Preparing the Environment

Before training, we'll set up our environment in Google Colab. Follow these steps to get started:

### Install TensorFlow Object Detection API

In [1]:
# Clone the TensorFlow Models repository
!git clone https://github.com/tensorflow/models.git

# Navigate to the research directory
%cd models/research/

# Install the dependencies for TensorFlow Object Detection API
!pip install -r object_detection/packages/tf2/setup.py


this is a test


### Download and Extract the Dataset

You’ll need to upload and extract your chosen dataset.

In [None]:
import zipfile
import os

# Path to the uploaded dataset (ensure you upload the dataset file manually via Colab interface)
zip_file = '/content/PlasticinWater.coco.zip'  # OR 'PlasticinWaterNOAUGMENTS.coco.zip'

# Extract the zip file
with zipfile.ZipFile(zip_file, 'r') as zip_ref:
    zip_ref.extractall('/content/PlasticinWater/')


### Loading the Dataset

Next, we'll prepare the COCO dataset for use with TensorFlow's Object Detection API.
:::{note}
You will need to change PlasticinWater to PlasticinWaterNOAUGMENTS if you have chosen that dataset
:::

In [None]:
import tensorflow as tf
from object_detection.utils import config_util
from object_detection.builders import model_builder
from object_detection.utils import visualization_utils as viz_utils

# Define the paths to the train and validation sets
train_images_dir = '/content/PlasticinWater/train/'
valid_images_dir = '/content/PlasticinWater/valid/'
train_annotations_file = '/content/PlasticinWater/train/_annotations.coco.json'
valid_annotations_file = '/content/PlasticinWater/valid/_annotations.coco.json'


### Converting COCO Annotations to TensorFlow TFRecord Format

To use the COCO dataset with TensorFlow, we need to convert the COCO annotations to TFRecord format.

In [None]:
# Script to convert COCO JSON to TFRecord (available from TensorFlow Object Detection API)
!python models/research/object_detection/dataset_tools/create_coco_tf_record.py \
    --logtostderr \
    --train_image_dir="$train_images_dir" \
    --val_image_dir="$valid_images_dir" \
    --train_annotations_file="$train_annotations_file" \
    --val_annotations_file="$valid_annotations_file" \
    --output_dir='/content/tfrecords/'


## Transfer Learning in Object Detection

**Transfer learning** is a popular technique used in deep learning, especially in object detection tasks, where it is often difficult to train a model from scratch due to the large amount of data and time required. Instead of training a model from the ground up, **transfer learning** allows you to start with a pre-trained model that has already learned useful features from a large dataset (like COCO), and then fine-tune it for your specific task.

### Why Use Transfer Learning?

1. **Faster Convergence**: Pre-trained models already have learned basic features (such as edges, textures, and shapes) from large datasets. By using a pre-trained model, your model can converge faster because it doesn't need to learn these features from scratch.
   
2. **Smaller Dataset Requirement**: Transfer learning allows you to use smaller, task-specific datasets, as the model has already learned a lot of general features from the larger dataset. For instance, in detecting plastics in water, you can start with a model pre-trained on the COCO dataset and fine-tune it on your dataset.

3. **Better Performance**: Models trained from scratch may struggle to achieve high performance if the dataset is too small or not diverse enough. Using a pre-trained model improves performance, especially when the target task (plastic detection in water) shares similarities with the pre-training task (object detection in general).

---

### Walking Through the Code

The following code demonstrates how to load a pre-trained detection model using the **TensorFlow Object Detection API** and fine-tune it for your task. 
### Code Breakdown

#### 1. **Loading the Pipeline Configuration File**
```python
pipeline_config = '/content/models/research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config'
```

- This line specifies the path to the **pipeline configuration file**. In this case, we are using a pre-built configuration file for the **SSD ResNet50 FPN model**.
  - **SSD ResNet50 FPN**: This is a model architecture that combines a **Single Shot Multibox Detector (SSD)** with a **ResNet50** backbone, enhanced with a **Feature Pyramid Network (FPN)**. This architecture balances speed and accuracy, making it well-suited for detecting small objects like plastics in water.
  - **640x640**: This is the image input size used by the model, which has been pre-configured for training on the COCO dataset.
  - **TPU-8**: This indicates that the model configuration is optimized for training on a TPU (Tensor Processing Unit) with 8 cores, although you can still use it with GPUs.

#### 2. **Loading the Configuration File**
```python
configs = config_util.get_configs_from_pipeline_file(pipeline_config)
model_config = configs['model']
```

- This section uses **`config_util.get_configs_from_pipeline_file()`** to read the pipeline configuration file and load it into a Python dictionary named **`configs`**.
  - The **`configs`** dictionary contains different sections of the model configuration, such as the model architecture, training hyperparameters, input pipeline, and more.
  
- **`model_config = configs['model']`** extracts the specific configuration for the model itself (e.g., SSD with ResNet50 backbone and FPN) from the overall configuration file.

#### 3. **Building the Detection Model**
```python
detection_model = model_builder.build(model_config=model_config, is_training=True)
```

- **`model_builder.build()`** is a function from the TensorFlow Object Detection API that takes the model configuration and constructs the detection model based on the specified architecture.
  - **`model_config=model_config`**: Passes the loaded model configuration to the builder function to create the model as specified (SSD with ResNet50 FPN backbone).
  - **`is_training=True`**: This flag sets the model in **training mode**, meaning the model is ready to be fine-tuned on the new dataset (i.e., plastics in water). In training mode, the model's layers will be updated during backpropagation.


In [None]:
# Load the pipeline config and build the detection model
pipeline_config = '/content/models/research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config'

# Load the configuration file
configs = config_util.get_configs_from_pipeline_file(pipeline_config)
model_config = configs['model']
detection_model = model_builder.build(model_config=model_config, is_training=True)


### Train the Model

This command is used to **fine-tune a pre-trained model** using the TensorFlow Object Detection API. Fine-tuning allows us to adapt the model to the specific task of detecting plastics in water, without needing to train from scratch.
```python
!python models/research/object_detection/model_main_tf2.py \
```
with the parameters chosen as follows:
- **Fine-Tuning**: Starts from pre-trained weights, so the model only needs to adapt to new data.
- **10000 Training Steps**: Specifies the number of steps for which the model will train.
- **Evaluation Samples**: Evaluates the model on one validation sample after every evaluation run.

By using this command, we fine-tune a pre-trained model to fit our dataset while saving significant time compared to training a new model from scratch.


:::{note}
**Steps vs. Epochs**

In object detection, training is often measured in **steps**, while in classification tasks, we usually refer to **epochs**. Here’s a comparison:

- **Epochs (Classification Models)**: An epoch refers to one complete pass over the entire training dataset. For example, training for 10 epochs means the model has seen every image in the dataset 10 times.
  
- **Steps (Object Detection Models)**: A **step** refers to one iteration where a batch of images is passed through the model. Training for 10,000 steps means that the model processes 10,000 batches of images, but it may not see every image in the dataset depending on the batch size and total dataset size.

- In object detection tasks, **steps** are typically used due to large datasets and batch processing, whereas **epochs** are more common in classification tasks, where the dataset size is usually smaller.

:::

In [None]:
# Train the model (example uses fine-tuning from a pre-trained model)
!python models/research/object_detection/model_main_tf2.py \
    --pipeline_config_path=$pipeline_config \
    --model_dir='/content/model/' \
    --num_train_steps=10000 \
    --sample_1_of_n_eval_examples=1 \
    --alsologtostderr


### Evaluate the Model

In [None]:
# Run evaluation on the validation set
!python models/research/object_detection/model_main_tf2.py \
    --pipeline_config_path=$pipeline_config \
    --model_dir='/content/model/' \
    --checkpoint_dir='/content/model/' \
    --eval_dir='/content/eval/'


## Reflecting on the Evaluation Results

Once the evaluation is complete, you will be presented with several key metrics and visualizations. **Open the evaluation output** and take note of the following:

### **mAP (Mean Average Precision)**
- Reflect on the **mAP** values, particularly **mAP@[IoU=0.50]** and **mAP@[IoU=0.75]**.
  - What do these values tell you about the model’s ability to correctly predict bounding boxes?
  - Is there a large difference between **mAP@0.50** and **mAP@0.75**? If so, what does that say about the model's precision at higher IoU thresholds (tighter bounding box predictions)?

### **Per-Class mAP**
- Look at the **mAP for each class**. In this case, reflect on the performance for detecting plastics in water.
  - Is the model performing better for one class over others? If so, why might this be the case (e.g., more training data for certain classes, better distinguishability)?

### **Precision and Recall**
- Take note of the **precision** and **recall** scores provided in the evaluation output.
  - Is there a balance between precision and recall? For instance, if precision is high but recall is low, the model might be too conservative and missing some detections. On the other hand, if recall is high but precision is low, the model may be making too many false positives.
  - Reflect on what this balance means for your model in the context of plastic detection in water.

### **Confusion Matrix**
- Review the **confusion matrix** to see where your model is making mistakes.
  - Are there specific classes or categories that are often misclassified? Reflect on what might be causing these misclassifications (e.g., similarities between objects, poor quality images, overlapping bounding boxes).
  - Consider what steps you could take to improve these results (e.g., more data, different augmentations, improving model architecture).

### **Loss Values**
- Check the **classification loss** and **localization loss** values.
  - Are these values low? If the classification loss is high, it may suggest the model is struggling to assign the correct class labels. If the localization loss is high, the model may be having trouble placing accurate bounding boxes around objects.
  - Compare these losses to the training losses. Are they significantly different? If so, reflect on whether your model may be overfitting or underfitting.

---

## Activity: Interpreting Your Results

Now that you’ve reviewed the key metrics and reflected on their meaning, summarize the evaluation results by answering the following questions:
1. **What does the mAP tell you about your model's performance?**
2. **Is there a clear balance between precision and recall? How might this affect the model's usability in real-world applications?**
3. **Does the confusion matrix reveal any common misclassifications? If so, what could be the cause?**
4. **How do the validation loss values compare to the training loss values? Is the model overfitting or underfitting?**

---

By reflecting on these metrics and considering what they reveal about your model, you’ll gain valuable insights into its strengths and weaknesses. Use this information to guide future improvements and adjustments to your object detection model.


### Visualizing Results

Although it is always good to get a look at how the metrics and training graphs look, another good verification of how your model is doing is looking at it perform **inferences** on unseen test images. Visualizing the predictions can help you identify patterns that may not be clear from the metrics alone. For example, you can spot cases where the model consistently misplaces bounding boxes, misses small objects, or misclassifies certain items.This can be particularly useful in object detection tasks where the model's success depends not only on correctly identifying the presence of objects but also on accurately placing bounding boxes around them. By visualizing results on test images, you can quickly assess whether the model is over-detecting objects (too many false positives) or missing important detections (false negatives). You may also observe patterns in failure cases, such as objects partially out of frame or overlapping plastics in the water. Moreover, visualizing results helps you evaluate the model’s **confidence scores**, which tell you how certain the model is about its predictions. For example, a prediction with a high confidence score but visibly incorrect bounding boxes or classifications would indicate that the model is overconfident in that instance, suggesting a need for additional tuning or more diverse training data. This process adds a qualitative aspect to the quantitative evaluation, allowing you to better judge how your model performs in real-world scenarios. It's an essential step for fine-tuning and deciding how to improve the model, whether it involves tweaking hyperparameters, introducing more data, or applying different augmentations.


import matplotlib.pyplot as plt
import cv2
import os
from object_detection.utils import visualization_utils as viz_utils

# Get the first image from the test directory
test_images_dir = '/content/PlasticinWater/test/'
test_image_files = sorted(os.listdir(test_images_dir))
first_image_path = os.path.join(test_images_dir, test_image_files[0])

# Load the first test image
image_np = cv2.imread(first_image_path)

# Run the detection model on the image
detections = detection_model(image_np)

# Visualize the detection results on the image
viz_utils.visualize_boxes_and_labels_on_image_array(
    image_np,
    detections['detection_boxes'],
    detections['detection_classes'],
    detections['detection_scores'],
    category_index,
    use_normalized_coordinates=True,
    min_score_thresh=0.5
)

# Display the result
plt.figure(figsize=(10, 10))
plt.imshow(cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB))
plt.axis('off')
plt.show()