<a href="https://colab.research.google.com/github/Aastha-collab/Python-Basics-ques-Assignment-1/blob/main/Image_Segmentation_%26_Mask_R_CNN_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Image Segmentation & Mask R-CNN - Assignment**

**Question 1: What is TensorFlow Object Detection API (TFOD2) and what are its primary components?**

The TensorFlow Object Detection API (TFOD2) is an open-source framework built on TensorFlow, simplifying the creation, training, and deployment of object detection models with pre-trained models (Model Zoo) and tools, featuring key components like Model Zoo, Pipelines, Configuration Files, Training Scripts, and Export Tools, enabling fast localization and classification in images/videos.

**Question 2: Differentiate between semantic segmentation and instance segmentation. Provide examples of where each might be used.**

Semantic segmentation classifies every pixel into a category (e.g., all cars are "car"), providing scene understanding, while instance segmentation separates and labels each individual object of a class (e.g., car_1, car_2), enabling object counting and differentiation, crucial for tasks needing object-level precision like autonomous driving or robotics where distinguishing between multiple vehicles is key.

**Question 3: Explain the Mask R-CNN architecture. How does it extend Faster R-CNN?**

Mask R-CNN is an extension of Faster R-CNN for instance segmentation, adding a parallel mask prediction branch to output pixel-level masks for each detected object, alongside bounding boxes and class labels, using a shared backbone (like ResNet) for feature extraction, a Region Proposal Network (RPN), and crucial components like Feature Pyramid Networks (FPN) for scale invariance and RoIAlign for precise feature alignment, overcoming pooling inaccuracies for better segmentation quality.

**Question 4: Describe the purpose of masks in image segmentation. How are they used during training and inference?**

In image segmentation, masks are pixel-level maps identifying regions of interest (ROIs) or objects, crucial for training models to understand what and where things are, acting as ground truth during training (comparing predictions to actual pixel labels) and generating precise object outlines during inference for applications like autonomous driving or medical analysis.

During training, models learn to map input images to these target masks, minimizing errors; during inference, the trained model outputs its own predicted masks, often refined to match the input image's size, to segment new images.

**Question 5: What are the steps involved in training a custom image segmentation model using TFOD2?**

Training a custom image segmentation model using the TensorFlow Object Detection (TFOD) API 2 involves several key steps:

The process begins with data collection and annotation, where you gather images and use a tool like LabelMe to create pixel-level segmentation masks for your custom objects.

Next, we prepare the dataset by dividing the annotated images into training and testing sets and converting them into the TFRecord format, which is required by the TFOD API. We also need to create a label map file (label_map.pbtxt) that maps class names to numeric IDs.

Then, we configure the training pipeline by selecting a pre-trained model (like Mask R-CNN) from the TensorFlow 2 Detection Model Zoo and modifying its configuration file to point to your data, specify the number of classes, and adjust hyperparameters.

The next step is to train the model using the configured pipeline and data, monitoring its progress and metrics (like loss) with TensorBoard.

After training is complete, we export the trained model's weights to create an inference graph.

Finally, we can test the model by running inference on new images to evaluate its performance and visualize the segmentation results.

**Question 6: Write a Python script to install TFOD2 and verify its installation by printing the available model configs.**

In [None]:
# Git Clone Github Project
!git clone https://github.com/tensorflow/models.git

# Lets go Inside the Models/research folder
%cd models/research

!pwd

# Protos conversion to python
!protoc object_detection/protos/*.proto --python_out=.

# Getting the Setup File
!cp object_detection/packages/tf2/setup.py .

In [None]:
# Installing the Setup
!pip install .

In [None]:
# Installing other dependencies
!pip install tf-models-official

In [None]:
import os
# Verification: Print available model configs
print("\n--- Verifying Installation: Available Model Configs ---")
config_dir = 'object_detection/configs/tf2'
if os.path.exists(config_dir):
    configs = [f for f in os.listdir(config_dir) if f.endswith('.config')]
    print(f"Found {len(configs)} model configurations:")
    for cfg in sorted(configs):
        print(f" - {cfg}")
else:
    print("Error: Model configuration directory not found.")

**Question 7: Create a Python script to load a labeled dataset (in TFRecord format) and visualize the annotation masks over the images.**

In [None]:
!pip install -q tensorflow opencv-python matplotlib

In [None]:
import tensorflow as tf
import numpy as np
import cv2
import matplotlib.pyplot as plt

In [None]:
import requests

url = "https://images.pexels.com/photos/13872248/pexels-photo-13872248.jpeg"
image_path = "/content/sample_image.jpg"

response = requests.get(url)
with open(image_path, "wb") as f:
    f.write(response.content)

print("Image downloaded successfully")

In [None]:
import matplotlib.pyplot as plt
import cv2

img = cv2.imread(image_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(6,6))
plt.imshow(img)
plt.axis("off")
plt.show()

In [None]:
from PIL import Image
import requests
from io import BytesIO
import matplotlib.pyplot as plt

url = "https://images.pexels.com/photos/13872248/pexels-photo-13872248.jpeg"
img = Image.open(BytesIO(requests.get(url).content))

plt.imshow(img)
plt.axis("off")
plt.show()

OR

In [None]:
# Install TFOD2 and Dependencies
!git clone https://github.com/tensorflow/models.git
%cd models/research
!protoc object_detection/protos/*.proto --python_out=.
!cp object_detection/packages/tf2/setup.py .
!pip install .
!pip install tf-models-official

In [None]:
!pip install protobuf==3.20.*

In [None]:
from object_detection.utils import label_map_util, visualization_utils as viz_utils

import numpy as np
import cv2
import tensorflow as tf
from matplotlib import pyplot as plt

%matplotlib inline

In [None]:
# Download a pretrained Model
model_name = "faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8"
!wget http://download.tensorflow.org/models/object_detection/tf2/20200711/{model_name}.tar.gz
!tar -xvf {model_name}.tar.gz

In [None]:
# 1. Use pretrained model path
PATH_TO_SAVED_MODEL = "faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8/saved_model"

# 2. Load model
detect_fn = tf.saved_model.load(PATH_TO_SAVED_MODEL)

In [None]:
# 3. Use COCO label map (download and place it if needed)
category_index = label_map_util.create_category_index_from_labelmap(
    "/content/models/research/object_detection/data/mscoco_label_map.pbtxt", use_display_name=True
)

In [None]:
import requests
import os

def download_image(url, filename):
  """Downloads an image from a given URL and saves it to a file using requests.

  Args:
    url: The URL of the image.
    filename: The name of the file to save the image to.
  """
  try:
    # Add a User-Agent header to mimic a browser request
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
    }
    response = requests.get(url, headers=headers, stream=True)
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

    with open(filename, 'wb') as out_file:
        for chunk in response.iter_content(chunk_size=8192):
            out_file.write(chunk)

    print(f"Image downloaded successfully to {filename}")
  except requests.exceptions.RequestException as e:
    print(f"Error downloading image: {e}")

# Now call the modified download_image function
download_image("https://images.pexels.com/photos/13872248/pexels-photo-13872248.jpeg", "pexels-photo-13872248.jpg")

In [None]:
# Read and display the image
import cv2
from google.colab.patches import cv2_imshow

im = cv2.imread("/content/pexels-photo-13872248.jpg")
if im is not None:
    cv2_imshow(im)
else:
    print("Error: Could not read the image file.")

In [None]:
# 4. Test image path
image_path = "/content/pexels-photo-13872248.jpg"
image_np = cv2.imread(image_path)

In [None]:
# 5. Inference
input_tensor = tf.convert_to_tensor([image_np], dtype=tf.uint8)

detections = detect_fn(input_tensor)

In [None]:
detections

In [None]:
# 6. Visualization
viz_utils.visualize_boxes_and_labels_on_image_array(
    image_np,
    detections['detection_boxes'][0].numpy(),
    detections['detection_classes'][0].numpy().astype(np.int32),
    detections['detection_scores'][0].numpy(),
    category_index,
    use_normalized_coordinates=False,
    line_thickness=15
)
# 7. Show image
plt.figure(figsize=(12,12))
plt.imshow(cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB))
plt.axis('off')
plt.show()

In [None]:
# 7. Show image
plt.figure(figsize=(12,12))
plt.imshow(cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB))
plt.axis('off')
plt.show()

**Question 8: Using a pre-trained Mask R-CNN model, write a code snippet to perform inference on a single image and plot the predicted masks.**


In [None]:
!pip install -q torch torchvision matplotlib pillow requests

In [None]:
import torch
import torchvision
import torchvision.transforms as T
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import requests
from io import BytesIO

In [None]:
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()

In [None]:
image_url = "https://images.pexels.com/photos/13872248/pexels-photo-13872248.jpeg"

image = Image.open(BytesIO(requests.get(image_url).content)).convert("RGB")

In [None]:
transform = T.Compose([T.ToTensor()])
img_tensor = transform(image)

In [None]:
with torch.no_grad():
    predictions = model([img_tensor])

In [None]:
def plot_masks(image, predictions, score_threshold=0.5):
    img = np.array(image)
    masks = predictions[0]['masks']
    scores = predictions[0]['scores']

    plt.figure(figsize=(8,8))
    plt.imshow(img)

    for i in range(len(masks)):
        if scores[i] > score_threshold:
            mask = masks[i, 0].cpu().numpy()
            plt.imshow(mask, alpha=0.4)

    plt.axis("off")
    plt.show()

plot_masks(image, predictions)

In [None]:
from PIL import Image
import requests
from io import BytesIO
import matplotlib.pyplot as plt

url = "https://images.pexels.com/photos/13872248/pexels-photo-13872248.jpeg"
img = Image.open(BytesIO(requests.get(url).content))

plt.imshow(img)
plt.axis("off")
plt.show()

**Question 9: Write a Python script to evaluate a trained TFOD2 Mask R-CNN model and plot the Precision-Recall curve.**

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval

# 1. Path to your COCO ground truth and detection results
# You can generate 'detections.json' by running the model on the test set
ann_file = 'path/to/kangaroo_coco_annotations.json'
det_file = 'path/to/detections.json'

def plot_precision_recall_curve(ann_file, det_file):
    # Load ground truth and detections
    coco_gt = COCO(ann_file)
    coco_dt = coco_gt.loadRes(det_file)

    # Initialize COCOeval object (iouType='segm' for Mask R-CNN, 'bbox' for boxes)
    coco_eval = COCOeval(coco_gt, coco_dt, iouType='segm')
    coco_eval.evaluate()
    coco_eval.accumulate()
    coco_eval.summarize()

    # Extract Precision-Recall data
    # precision has shape [T, R, K, A, M]
    # T: iou thresholds [0.5:0.05:0.95] (index 0 is 0.5)
    # R: recall thresholds [0:0.01:1]
    # K: categories
    # A: area ranges
    # M: max detections
    precision = coco_eval.eval['precision'][0, :, 0, 0, 2] # IoU=0.5, All recall, Category 0
    recall = np.linspace(0, 1, 101)

    # Plot the curve
    plt.figure(figsize=(8, 6))
    plt.plot(recall, precision, color='blue', lw=2, label='Mask R-CNN (IoU=0.5)')
    plt.fill_between(recall, precision, alpha=0.2, color='blue')
    plt.xlabel('Recall')
    plt.ylabel('Precision')
    plt.title('Precision-Recall Curve (Mask R-CNN)')
    plt.legend(loc="lower left")
    plt.grid(True)
    plt.show()

# Run the plotting function
# plot_precision_recall_curve(ann_file, det_file)

**Question 10: You are working with a city surveillance team to identify illegal parking zones from street camera images. The model you built detects cars using bounding boxes, but the team reports inaccurate overlaps with sidewalks and fails in complex street scenes.
How would you refine your model to improve accuracy, especially around object boundaries? What segmentation strategy and tools would you use?**

To improve illegal parking detection around sidewalk boundaries, I would transition from simple bounding boxes to Instance Segmentation using models like Mask R-CNN or YOLOv8/v11, which provide pixel-level masks to accurately delineate cars from sidewalks.

To handle complex street scenes, I would employ a multi-task learning strategy that simultaneously segments cars, sidewalks, and curblines, using DeepLabV3+ for high-resolution segmentation.

For training, I would utilize semantic segmentation data with polygonal annotations created in Labelbox or V7 Labs, and apply augmentation techniques like random cropping, rotation, and lighting changes to improve robustness against occlusions and varied weather.

Finally, I would incorporate a post-processing step to calculate the exact percentage of the car mask overlapping with the labeled sidewalk mask, refining the detection of illegal parking.