# Faster R-CNN Object Detection

In this notebook, we’ll explore **Faster R-CNN (Region-based Convolutional Neural Network)** — a two-stage object detection architecture that combines accuracy and efficiency.

We'll cover:
- Concept of region-based CNNs
- Faster R-CNN architecture overview
- Using pre-trained Faster R-CNN from PyTorch
- Performing inference on sample images
- Understanding bounding boxes and predictions

---

## 1️⃣ What is Faster R-CNN?

**Faster R-CNN** is a deep learning model for **object detection**. It builds upon earlier models like **R-CNN** and **Fast R-CNN** by introducing a **Region Proposal Network (RPN)** that makes the pipeline end-to-end trainable.

### ⚙️ Pipeline Overview:
1. **CNN Backbone (Feature Extractor)** – Extracts features from the image (e.g., ResNet50).
2. **Region Proposal Network (RPN)** – Suggests regions (bounding boxes) that may contain objects.
3. **ROI Pooling Layer** – Extracts fixed-size feature maps for each proposed region.
4. **Fully Connected Layers + Classifier** – Predicts class label and refines box coordinates.

**Key Benefits:**
- High accuracy (used in benchmarks like COCO, Pascal VOC)
- Suitable for small datasets
- Easy to fine-tune on custom data

---

## 2️⃣ Import Dependencies

In [None]:
import torch
import torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.transforms import functional as F
from PIL import Image
import requests
import matplotlib.pyplot as plt
import cv2
import numpy as np

print('Torch version:', torch.__version__)

## 3️⃣ Load Pre-trained Faster R-CNN Model

We’ll use a **ResNet-50 backbone** version of Faster R-CNN pre-trained on the **COCO dataset**.

In [None]:
model = fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()  # Set model to evaluation mode

print('Model loaded successfully!')

## 4️⃣ Load and Preprocess an Image

We’ll use a sample image from the internet for inference. The model expects images as **PyTorch tensors** normalized to `[0, 1]` range.

In [None]:
# Download and load an example image
url = 'https://ultralytics.com/images/zidane.jpg'
image = Image.open(requests.get(url, stream=True).raw).convert('RGB')

# Convert image to tensor
img_tensor = F.to_tensor(image)
print('Image shape:', img_tensor.shape)

## 5️⃣ Run Object Detection

We’ll pass the image tensor to the model and visualize the detected bounding boxes and class labels.

In [None]:
# Run inference
with torch.no_grad():
    predictions = model([img_tensor])

# Extract prediction data
boxes = predictions[0]['boxes']
labels = predictions[0]['labels']
scores = predictions[0]['scores']

print('Number of objects detected:', len(boxes))

## 6️⃣ Visualize Detections

We’ll draw bounding boxes and labels on the original image using OpenCV for visualization.

In [None]:
# Convert PIL to OpenCV format
img_cv = np.array(image)
img_cv = cv2.cvtColor(img_cv, cv2.COLOR_RGB2BGR)

# COCO class labels
COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter',
    'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
    'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle',
    'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
    'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table',
    'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',
    'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]

# Draw boxes for predictions above a confidence threshold
for i in range(len(boxes)):
    if scores[i] > 0.6:
        box = boxes[i].cpu().numpy().astype(int)
        label = COCO_INSTANCE_CATEGORY_NAMES[labels[i]]
        cv2.rectangle(img_cv, (box[0], box[1]), (box[2], box[3]), (0, 255, 0), 2)
        cv2.putText(img_cv, label, (box[0], box[1]-10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)

# Convert BGR to RGB for display
plt.figure(figsize=(10, 8))
plt.imshow(cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB))
plt.axis('off')
plt.show()

## 7️⃣ Custom Model Training (Optional Overview)

Faster R-CNN can be fine-tuned on a **custom dataset** by following these steps:

1. Prepare your dataset with **images + bounding box annotations** (Pascal VOC format or COCO JSON).
2. Use **`torchvision.datasets`** or a custom dataset class to load them.
3. Modify the classifier head to match your number of classes:
```python
num_classes = 3  # background + 2 classes
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes)
```
4. Train using a standard optimizer and scheduler.
5. Save the trained model using `torch.save()`.

## ✅ Summary

- **Faster R-CNN** is a two-stage detector (RPN + classifier).
- Provides high accuracy on complex object detection tasks.
- Pre-trained models from **PyTorch** make it easy to use.
- Can be fine-tuned for custom applications.

---
**Next:** `13-Instance_Segmentation_Basics.ipynb` → Learn how to detect and segment objects at the pixel level using **Mask R-CNN**.