Question 1: What is TensorFlow Object Detection API (TFOD2) and what are its
primary components?

->TensorFlow Object Detection API (TFOD2) is an open-source framework developed by Google using TensorFlow 2 for building, training, evaluating, and deploying object detection models. It helps in identifying and locating objects in images or videos by drawing bounding boxes and assigning class labels.

The primary components of TFOD2 include pre-trained models from the model zoo (such as SSD, Faster R-CNN, and EfficientDet), model architectures consisting of a backbone and detection head, data pipelines for handling images and annotations, training and evaluation tools, configuration files for setting model parameters, and inference tools for exporting models for deployment.

Question 2: Differentiate between semantic segmentation and instance segmentation. Provide examples of where each might be used.

->Semantic segmentation classifies each pixel in an image into a predefined category but does not distinguish between different objects of the same class. All objects belonging to the same class are treated as one region.
Example: Road and lane detection in autonomous driving, where all road pixels are labeled as “road”.

Instance segmentation not only classifies each pixel but also separates different instances of the same object class. Each object is detected individually with its own mask.
Example: Counting people or vehicles in a crowded scene, where each person or car must be identified separately.


Question 3: Explain the Mask R-CNN architecture. How does it extend Faster R-CNN?

->Mask R-CNN is an advanced deep learning model used for instance segmentation. It is an extension of Faster R-CNN, designed to not only detect objects but also generate a pixel-level mask for each detected object.

Mask R-CNN follows the Faster R-CNN pipeline with a backbone network (such as ResNet) for feature extraction and a Region Proposal Network (RPN) to generate object proposals. The proposed regions are then aligned using RoIAlign, which replaces RoIPool to preserve spatial accuracy.

In addition to the classification and bounding box regression branches of Faster R-CNN, Mask R-CNN adds a parallel mask prediction branch that outputs a binary mask for each object. This enables precise instance-level segmentation.

Question 4: Describe the purpose of masks in image segmentation. How are they used
during training and inference?

->In image segmentation, a mask is a pixel-level representation that indicates which pixels belong to a particular object or class. Masks help models understand the exact shape and boundaries of objects, rather than just their location.

During training, ground-truth masks are used as target labels. The model learns to predict accurate pixel-wise regions by comparing predicted masks with true masks using loss functions such as mask loss.

During inference, the trained model generates masks for new images. These predicted masks are overlaid on the image to highlight detected objects or regions, enabling tasks like object counting, medical image analysis, and scene understanding.


Question 5: What are the steps involved in training a custom image segmentation
model using TFOD2?

->Dataset Preparation
Collect images and create pixel-level annotations (masks). Convert the dataset into TFRecord format.

Label Map Creation
Define all object classes in a label_map.pbtxt file.

Model Selection
Choose a suitable pre-trained model (e.g., Mask R-CNN) from the TensorFlow Model Zoo.

Configuration Setup
Modify the pipeline .config file to set paths, number of classes, batch size, and learning rate.

Training the Model
Run the TFOD2 training script to fine-tune the model on the custom dataset.

Evaluation
Evaluate the model using validation data to measure performance (e.g., mAP, mask accuracy).

Model Export & Inference
Export the trained model for inference and test it on new images.



In [9]:
'''Question 6: Write a Python script to install TFOD2 and verify its installation by printing
the available model configs.
'''
!pip uninstall -y protobuf
!pip install protobuf==3.20.3
!rm -rf models
!git clone https://github.com/tensorflow/models.git
!apt-get install -y protobuf-compiler
!cd models/research && protoc object_detection/protos/*.proto --python_out=.

import sys
sys.path.append("models/research")
sys.path.append("models/research/slim")

import tensorflow as tf
from object_detection.utils import config_util

print(tf.__version__)

config_dir = "models/research/object_detection/configs/tf2"
for f in os.listdir(config_dir):
    print(f)



'!pip uninstall -y protobuf\n!pip install protobuf==3.20.3\n!rm -rf models\n!git clone https://github.com/tensorflow/models.git\n!apt-get install -y protobuf-compiler\n!cd models/research && protoc object_detection/protos/*.proto --python_out=.\n\nimport sys\nsys.path.append("models/research")\nsys.path.append("models/research/slim")\n\nimport tensorflow as tf\nfrom object_detection.utils import config_util\n\nprint(tf.__version__)\n\nconfig_dir = "models/research/object_detection/configs/tf2"\nfor f in os.listdir(config_dir):\n    print(f)'

In [11]:
'''Question 7: Create a Python script to load a labeled dataset (in TFRecord format) and
visualize the annotation masks over the images.
'''


import tensorflow as tf
import matplotlib.pyplot as plt

def parse_tfrecord(example):
    feature_description = {
        "image/encoded": tf.io.FixedLenFeature([], tf.string),
        "image/height": tf.io.FixedLenFeature([], tf.int64),
        "image/width": tf.io.FixedLenFeature([], tf.int64),
        "image/object/mask": tf.io.VarLenFeature(tf.string),
    }

    example = tf.io.parse_single_example(example, feature_description)

    image = tf.image.decode_jpeg(example["image/encoded"], channels=3)
    image = tf.cast(image, tf.uint8)

    masks = tf.sparse.to_dense(example["image/object/mask"], default_value="")
    masks = [tf.image.decode_png(m, channels=1) for m in masks]

    return image, masks

dataset = tf.data.TFRecordDataset("sample_dataset.tfrecord")
dataset = dataset.map(parse_tfrecord)

for image, masks in dataset.take(1):
    plt.figure(figsize=(6, 6))
    plt.imshow(image)
    for mask in masks:
        plt.imshow(mask, alpha=0.5)
    plt.axis("off")
    plt.show()


'import tensorflow as tf\nimport matplotlib.pyplot as plt\n\ndef parse_tfrecord(example):\n    feature_description = {\n        "image/encoded": tf.io.FixedLenFeature([], tf.string),\n        "image/height": tf.io.FixedLenFeature([], tf.int64),\n        "image/width": tf.io.FixedLenFeature([], tf.int64),\n        "image/object/mask": tf.io.VarLenFeature(tf.string),\n    }\n\n    example = tf.io.parse_single_example(example, feature_description)\n\n    image = tf.image.decode_jpeg(example["image/encoded"], channels=3)\n    image = tf.cast(image, tf.uint8)\n\n    masks = tf.sparse.to_dense(example["image/object/mask"], default_value="")\n    masks = [tf.image.decode_png(m, channels=1) for m in masks]\n\n    return image, masks\n\ndataset = tf.data.TFRecordDataset("sample_dataset.tfrecord")\ndataset = dataset.map(parse_tfrecord)\n\nfor image, masks in dataset.take(1):\n    plt.figure(figsize=(6, 6))\n    plt.imshow(image)\n    for mask in masks:\n        plt.imshow(mask, alpha=0.5)\n    p

In [1]:
'''Question 8: Using a pre-trained Mask R-CNN model, write a code snippet to perform
inference on a single image and plot the predicted masks.
'''

!pip install tensorflow==2.19.0
!pip install tf-slim
!pip install tensorflow-models-official==2.19.1
!pip install opencv-python matplotlib

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import cv2
import urllib.request
import tarfile
import os
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils

MODEL_DATE = "20200711"
MODEL_NAME = "mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu_tpu-8"
MODEL_TAR_FILENAME = MODEL_NAME + ".tar.gz"
DOWNLOAD_BASE = "http://download.tensorflow.org/models/object_detection/tf2/" + MODEL_DATE + "/"
PATH_TO_CKPT = MODEL_NAME + "/saved_model"

if not os.path.exists(MODEL_NAME):
    urllib.request.urlretrieve(DOWNLOAD_BASE + MODEL_TAR_FILENAME, MODEL_TAR_FILENAME)
    tar_file = tarfile.open(MODEL_TAR_FILENAME)
    tar_file.extractall()
    tar_file.close()

detect_fn = tf.saved_model.load(PATH_TO_CKPT)

IMAGE_URL = "https://tensorflow.org/images/surf.jpg"
IMAGE_PATH = "test_image.jpg"
urllib.request.urlretrieve(IMAGE_URL, IMAGE_PATH)
image_np = cv2.cvtColor(cv2.imread(IMAGE_PATH), cv2.COLOR_BGR2RGB)
input_tensor = tf.convert_to_tensor([image_np])
detections = detect_fn(input_tensor)

num_detections = int(detections.pop('num_detections'))
detections = {key: value[0, :num_detections].numpy()
              for key, value in detections.items()}
detections['num_detections'] = num_detections
detections['detection_classes'] = detections['detection_classes'].astype(np.int64)

PATH_TO_LABELS = tf.keras.utils.get_file(
    'mscoco_label_map.pbtxt',
    'https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/data/mscoco_label_map.pbtxt'
)
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

viz_utils.visualize_boxes_and_labels_on_image_array(
    image_np,
    detections['detection_boxes'],
    detections['detection_classes'],
    detections['detection_scores'],
    category_index,
    instance_masks=detections.get('detection_masks'),
    use_normalized_coordinates=True,
    line_thickness=2,
    min_score_thresh=0.5
)

plt.figure(figsize=(12, 8))
plt.imshow(image_np)
plt.axis('off')
plt.show()


'\n!pip install tensorflow==2.19.0\n!pip install tf-slim\n!pip install tensorflow-models-official==2.19.1\n!pip install opencv-python matplotlib\n\nimport tensorflow as tf\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport cv2\nimport urllib.request\nimport tarfile\nimport os\nfrom object_detection.utils import label_map_util\nfrom object_detection.utils import visualization_utils as viz_utils\n\nMODEL_DATE = "20200711"\nMODEL_NAME = "mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu_tpu-8"\nMODEL_TAR_FILENAME = MODEL_NAME + ".tar.gz"\nDOWNLOAD_BASE = "http://download.tensorflow.org/models/object_detection/tf2/" + MODEL_DATE + "/"\nPATH_TO_CKPT = MODEL_NAME + "/saved_model"\n\nif not os.path.exists(MODEL_NAME):\n    urllib.request.urlretrieve(DOWNLOAD_BASE + MODEL_TAR_FILENAME, MODEL_TAR_FILENAME)\n    tar_file = tarfile.open(MODEL_TAR_FILENAME)\n    tar_file.extractall()\n    tar_file.close()\n\ndetect_fn = tf.saved_model.load(PATH_TO_CKPT)\n\nIMAGE_URL = "https://tensorflo

In [3]:
'''Question 9: Write a Python script to evaluate a trained TFOD2 Mask R-CNN model and
plot the Precision-Recall curve.'''

!pip install tensorflow==2.19.0
!pip install tf-slim
!pip install tensorflow-models-official==2.19.1
!pip install matplotlib

import tensorflow as tf
import matplotlib.pyplot as plt
import os
from object_detection.utils import config_util
from object_detection.metrics import coco_evaluation
from object_detection.metrics import object_detection_evaluation
from object_detection.utils import label_map_util

PATH_TO_MODEL_DIR = "trained_mask_rcnn_model"
PATH_TO_CFG = os.path.join(PATH_TO_MODEL_DIR, "pipeline.config")
PATH_TO_LABELS = "label_map.pbtxt"
PATH_TO_TEST_TFRECORD = "test.record"

configs = config_util.get_configs_from_pipeline_file(PATH_TO_CFG)
model_config = configs['model']
detection_model = tf.saved_model.load(os.path.join(PATH_TO_MODEL_DIR, "saved_model"))

category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS)

def load_dataset(tfrecord_path):
    raw_dataset = tf.data.TFRecordDataset(tfrecord_path)
    return raw_dataset

test_dataset = load_dataset(PATH_TO_TEST_TFRECORD)

def run_inference(image_tensor):
    input_tensor = tf.convert_to_tensor([image_tensor])
    detections = detection_model(input_tensor)
    return detections

from sklearn.metrics import precision_recall_curve, auc
import numpy as np

all_labels = []
all_scores = []

for record in tf.data.TFRecordDataset(PATH_TO_TEST_TFRECORD):
    example = tf.train.Example()
    example.ParseFromString(record.numpy())
    height = int(example.features.feature['image/height'].int64_list.value[0])
    width = int(example.features.feature['image/width'].int64_list.value[0])
    image_raw = example.features.feature['image/encoded'].bytes_list.value[0]
    image_np = tf.image.decode_jpeg(image_raw, channels=3).numpy()

    detections = run_inference(image_np)
    scores = detections['detection_scores'][0].numpy()
    classes = detections['detection_classes'][0].numpy()

    all_scores.extend(scores)
    all_labels.extend(classes)

all_labels = np.array(all_labels)
all_scores = np.array(all_scores)

precision, recall, thresholds = precision_recall_curve(all_labels, all_scores, pos_label=1)

plt.figure(figsize=(8,6))
plt.plot(recall, precision, marker='.')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.grid(True)
plt.show()

'\n!pip install tensorflow==2.19.0\n!pip install tf-slim\n!pip install tensorflow-models-official==2.19.1\n!pip install matplotlib\n\nimport tensorflow as tf\nimport matplotlib.pyplot as plt\nimport os\nfrom object_detection.utils import config_util\nfrom object_detection.metrics import coco_evaluation\nfrom object_detection.metrics import object_detection_evaluation\nfrom object_detection.utils import label_map_util\n\nPATH_TO_MODEL_DIR = "trained_mask_rcnn_model"\nPATH_TO_CFG = os.path.join(PATH_TO_MODEL_DIR, "pipeline.config")\nPATH_TO_LABELS = "label_map.pbtxt"\nPATH_TO_TEST_TFRECORD = "test.record"\n\nconfigs = config_util.get_configs_from_pipeline_file(PATH_TO_CFG)\nmodel_config = configs[\'model\']\ndetection_model = tf.saved_model.load(os.path.join(PATH_TO_MODEL_DIR, "saved_model"))\n\ncategory_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS)\n\ndef load_dataset(tfrecord_path):\n    raw_dataset = tf.data.TFRecordDataset(tfrecord_path)\n    return raw_d

Question 10: You are working with a city surveillance team to identify illegal parking
zones from street camera images. The model you built detects cars using bounding
boxes, but the team reports inaccurate overlaps with sidewalks and fails in complex
street scenes.
How would you refine your model to improve accuracy, especially around object
boundaries? What segmentation strategy and tools would you use?


->To improve the accuracy of detecting cars in complex street scenes and reduce overlaps with sidewalks:

Switch to Instance Segmentation

Instead of only bounding boxes (object detection), use Mask R-CNN or similar instance segmentation models.

Instance segmentation provides pixel-level masks for each object, accurately separating cars from sidewalks and other objects.

Refine Data and Annotations

Include boundary-sensitive annotations (masks that precisely cover cars).

Augment the dataset with crowded scenes, occlusions, and varied angles.

Segmentation Strategies

Use Mask R-CNN for object-level masks.

Alternatively, for very dense scenes, consider semantic + instance segmentation hybrid (e.g., Panoptic Segmentation).

Tools and Libraries

TensorFlow Object Detection API (TFOD2) with Mask R-CNN.

Detectron2 (from Facebook AI) for advanced instance segmentation.

OpenCV / PIL for preprocessing and visualization.

Post-processing

Apply Non-Max Suppression and mask refinement (e.g., Conditional Random Fields) to improve object boundaries.

Summary: Replace plain object detection with Mask R-CNN or similar instance segmentation, refine your dataset, and use boundary-aware post-processing to improve detection accuracy around sidewalks and complex street layouts.