# Detectron2 & TFOD2 Assignment

#Question 1: What is Detectron2 and how does it differ from previous object detection frameworks?

- Detectron2 is an open-source computer vision framework developed by Facebook AI Research (FAIR). It’s built on PyTorch and is used mainly for object detection, instance segmentation, semantic segmentation, and keypoint detection.

- How is Detectron2 different from previous object detection frameworks?

1. Framework base

   - Detectron2: Built on PyTorch → dynamic computation graph, easier debugging, more flexible.

   - Older frameworks (Detectron, Faster R-CNN original, etc.): Mostly Caffe2 or TensorFlow → rigid, harder to customize.

2. Modularity & design

   - Detectron2 has a clean, modular architecture (backbone, neck, head are clearly separated).

    - Older frameworks were monolithic — changing one thing often broke others.

3. Performance & scalability

    -  Detectron2 is highly optimized for:

     - Multi-GPU training

     - Large datasets

      - Previous frameworks struggled with scaling efficiently.

4. State-of-the-art models

    - Detectron2 natively supports modern models like:

    - Faster R-CNN

    - Mask R-CNN

    - RetinaNet

    - Cascade R-CNN

    - Panoptic FPN


# Question 2: Explain the process and importance of data annotation when working with Detectron2.

- Data annotation is the process of labeling raw images with information such as:

   - Bounding boxes

   - Class labels

    - Segmentation masks

    - Keypoints

1. Define the task

   - First, decide what type of problem you’re solving:

   - Object detection → bounding boxes

    - Instance segmentation → pixel-level masks

    - Keypoint detection → landmarks (eyes, joints, etc.)

    - Detectron2 needs different annotation formats for each task.

2. Choose an annotation format

    - Detectron2 mainly supports:

    - COCO format (most common)

    - Pascal VOC (with conversion)

    - Custom datasets (registered manually)

3. Annotate the data

    - Use annotation tools like:

     - LabelImg (bounding boxes)

     - CVAT

     - Roboflow

    - LabelMe

#Question 3: Describe the steps involved in training a custom object detection model using Detectron2.

1. Install and set up Detectron2

    - Install PyTorch (CUDA version if GPU is available)

    - Install Detectron2 compatible with your PyTorch version

    - This ensures the environment is ready for training.

2. Prepare and annotate the dataset

    - Collect relevant images

    - Annotate objects using bounding boxes (COCO format preferred)

    - Assign correct class labels

    - Split dataset into training and validation sets

    - Detectron2 relies on high-quality labeled data.

3. Register the dataset

   - Register custom datasets using DatasetCatalog

    - Define class names using MetadataCatalog

    - This step allows Detectron2 to recognize your custom data.

4. Choose a base model (pre-trained weights)

    - Select a model from Detectron2 Model Zoo (e.g., Faster R-CNN, RetinaNet)

    - Load pre-trained weights (usually trained on COCO)

    - This speeds up training and improves accuracy via transfer learning.


#Question 4: What are evaluation curves in Detectron2, and how are metrics like mAP and IoU interpreted?

- Evaluation curves are graphical representations that show how well an object detection model performs as detection confidence thresholds change. In Detectron2, evaluation is usually done using COCO evaluation protocol, which generates:

   - Precision–Recall (PR) curves

    - Metrics like mAP at different IoU thresholds

- Key Evaluation Curves
1. Precision–Recall (PR) Curve

   - Precision = Correct detections / Total detections

   - Recall = Correct detections / Total ground-truth objects

- The PR curve plots:

    - Precision on Y-axis

    - Recall on X-axis
- How to interpret mAP and IoU together

    - High IoU + high mAP → Excellent detection and localization

    - High mAP, low IoU → Correct objects detected, poor box alignment

    - Low mAP, high IoU → Accurate boxes but missing many objects

    - Low both → Model is bad (no sugarcoating)

#Question 5: Compare Detectron2 and TFOD2 in terms of features, performance, and ease of use.

1. Framework & Backend

   - Detectron2:
        - Built on PyTorch. Dynamic computation graph, easier debugging, very research-friendly.

   - TFOD2:
        - Built on TensorFlow 2. Supports eager execution but still feels heavier.

2. Performance

   - Detectron2

      - Optimized for multi-GPU training

      - Faster experimentation

      - Strong COCO benchmark performance

   - TFOD2

       - Stable but often slower to configure and train

       - Performance depends heavily on pipeline tuning

3. Ease of Use

   - Detectron2

      - Clean API

       - Minimal boilerplate

        - Steep learning curve at first, but smooth once understood

   - TFOD2

       - Heavy configuration files

        - Complex directory structure

        - Beginner-friendly tutorials but messy customization       






# Question 6: Write Python code to install Detectron2 and verify the installation.


In [None]:

!pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 \
--index-url https://download.pytorch.org/whl/cu118


!pip install detectron2 \
-f https://dl.fbaipublicfiles.com/detectron2/wheels/cu118/torch2.0/index.html


import torch
import detectron2
from detectron2.utils.logger import setup_logger

setup_logger()

print("PyTorch version:", torch.__version__)
print("Detectron2 version:", detectron2.__version__)
print("CUDA available:", torch.cuda.is_available())


#Question 7: Annotate a dataset using any tool of your choice and convert the annotations to COCO format for Detectron2.


In [None]:
import os
import json
import xml.etree.ElementTree as ET
from detectron2.data.datasets import register_coco_instances

XML_DIR = "annotations"
IMG_DIR = "images"
OUTPUT_JSON = "coco_annotations.json"

coco = {
    "images": [],
    "annotations": [],
    "categories": []
}

category_map = {}
ann_id = 1
img_id = 1

for xml_file in os.listdir(XML_DIR):
    if not xml_file.endswith(".xml"):
        continue

    tree = ET.parse(os.path.join(XML_DIR, xml_file))
    root = tree.getroot()

    filename = root.find("filename").text
    width = int(root.find("size/width").text)
    height = int(root.find("size/height").text)

    coco["images"].append({
        "id": img_id,
        "file_name": filename,
        "width": width,
        "height": height
    })

    for obj in root.findall("object"):
        label = obj.find("name").text

        if label not in category_map:
            category_map[label] = len(category_map) + 1
            coco["categories"].append({
                "id": category_map[label],
                "name": label
            })

        bbox = obj.find("bndbox")
        xmin = int(bbox.find("xmin").text)
        ymin = int(bbox.find("ymin").text)
        xmax = int(bbox.find("xmax").text)
        ymax = int(bbox.find("ymax").text)

        coco["annotations"].append({
            "id": ann_id,
            "image_id": img_id,
            "category_id": category_map[label],
            "bbox": [xmin, ymin, xmax - xmin, ymax - ymin],
            "area": (xmax - xmin) * (ymax - ymin),
            "iscrowd": 0
        })
        ann_id += 1

    img_id += 1

with open(OUTPUT_JSON, "w") as f:
    json.dump(coco, f, indent=4)

register_coco_instances(
    "custom_dataset",
    {},
    OUTPUT_JSON,
    IMG_DIR
)


#Question 8: Write a script to download pretrained weights and configure paths for training in Detectron2.

In [None]:
from detectron2.config import get_cfg
from detectron2 import model_zoo
import os

cfg = get_cfg()

cfg.merge_from_file(
    model_zoo.get_config_file(
        "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"
    )
)

cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(
    "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"
)

cfg.DATASETS.TRAIN = ("custom_dataset",)
cfg.DATASETS.TEST = ()

cfg.DATALOADER.NUM_WORKERS = 2

cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025
cfg.SOLVER.MAX_ITER = 3000
cfg.SOLVER.STEPS = []
cfg.SOLVER.CHECKPOINT_PERIOD = 500

cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 3

cfg.OUTPUT_DIR = "./output"
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)

with open(os.path.join(cfg.OUTPUT_DIR, "config.yaml"), "w") as f:
    f.write(cfg.dump())


#Question 9: Show the steps and code to run inference using a trained Detectron2 model on a new image.


In [None]:
import cv2
from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog
from detectron2 import model_zoo

cfg = get_cfg()
cfg.merge_from_file(
    model_zoo.get_config_file(
        "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"
    )
)

cfg.MODEL.WEIGHTS = "output/model_final.pth"

cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 3
cfg.MODEL.DEVICE = "cpu"

predictor = DefaultPredictor(cfg)

image = cv2.imread("test.jpg")

outputs = predictor(image)

metadata = MetadataCatalog.get("custom_dataset")
v = Visualizer(image[:, :, ::-1], metadata=metadata, scale=1.0)
result = v.draw_instance_predictions(outputs["instances"].to("cpu"))

cv2.imshow("Inference Output", result.get_image()[:, :, ::-1])
cv2.waitKey(0)
cv2.destroyAllWindows()


#Question 10: You are assigned to build a wildlife monitoring system to detect and track different animal species in a forest using Detectron2. Describe the end-to-end pipeline from data collection to deploying the model, and how you would handle challenges like occlusion or nighttime detection.


- **Wildlife Monitoring System using Detectron2 – End-to-End Pipeline**

1. Data is collected using camera traps and drones placed in forest areas to capture animals under different conditions such as daytime, nighttime, and dense vegetation. The collected images and videos are annotated using tools like CVAT or LabelImg by labeling animal species with bounding boxes or instance masks and converting annotations to COCO format.

2. The data is preprocessed and augmented using techniques like resizing, flipping, brightness adjustment, motion blur, and synthetic occlusion to improve model robustness. A Detectron2 model such as Faster R-CNN or Mask R-CNN is fine-tuned using pretrained COCO weights. Model performance is evaluated using metrics like mAP and IoU, with special focus on occluded and low-light samples.

3. Occlusion is handled by training on partially visible animals, using instance segmentation (Mask R-CNN), multi-scale features, and temporal information from video frames. Nighttime detection is improved by training on infrared or thermal images, applying low-light augmentation, and using image enhancement techniques.

4. For video streams, detected animals are tracked using algorithms like DeepSORT or ByteTrack. The trained model is deployed on edge devices or central servers for real-time monitoring, and continuous retraining is performed using newly collected data.