<a href="https://colab.research.google.com/github/aakarsh7599/Text-Detection-using-Detectron2/blob/master/Text_Identification_using_Detectron_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Object Detection with Detectron 2 using a Custom Dataset**

**Detectron2** is a research platform and a production library for object detection, built by **Facebook AI Research (FAIR)**.

It is completely written on PyTorch and is flexible and extensible, and able to provide fast training on single or multiple GPU servers. 

Also, it includes high-quality implementations of state-of-the-art object detection algorithms, including 
**DensePose**
, 
**panoptic feature pyramid network**s
, and numerous variants of the pioneering 
**Mask R-CNN**
model family also developed by FAIR. 

Its extensible design makes it easy to implement cutting-edge research projects without having to fork the entire codebase.


**In this notebook, we are going to deal with identifying masked faces using a fine-tuned and saved instance of the Faster RCNN model from the Detectron 2's model zoo.**


We identify faces **with_mask**, **without_mask** and with **mask_weared_incorrect**(ly).

#### Import a few necessary packages.

In [1]:
# Some basic setup:
# Setup detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger

setup_logger()

# import some common libraries
import os, sys, json, cv2, random, numpy as np
import matplotlib.pyplot as plt

# from google.colab.patches import cv2_imshow

# import some common detectron2 utilities

from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.structures import BoxMode

import pandas as pd
import time
import datetime
import re

import torch, torchvision

In [2]:
# check pytorch installation:
print(
    f"torch version: {torch.__version__}\n\ntorch cuda is available: {torch.cuda.is_available()}"
)
assert torch.__version__.startswith(
    "1.8"
)  # please manually install torch 1.8 if Colab changes its default version

torch version: 1.8.0+cu101

torch cuda is available: True


In [3]:
torch.cuda.empty_cache()
print(torch.cuda.memory_summary())

|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Active memory         |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------

#### Define global variables

In [4]:
print(os.getcwd())

base_path = os.getcwd()

masked_face_detector = os.path.join(base_path, "ai", "model", "model_final.pth")
conf_level = 0.5

output_folder = os.path.join(base_path, "data", "output_files")

metadata_folder = os.path.join(base_path, "data", "metadata")
video_folder = os.path.join(base_path, "data", "videos")

broadcast_dict = {}

C:\Users\raghav\atoti_play\mask\mask_detection\mask_detection_better_model


In [5]:
# PROJECT_PATH = '../'
# DATA_PATH = os.path.join(PROJECT_PATH, 'data')
# MODELS_PATH = os.path.join(PROJECT_PATH, 'results')

In [6]:
# USE_MODEL = 1
# MODEL = None

# if USE_MODEL == 1:
#     MODEL = "MAX-ITER3000_LR0.05_GAMMA0.1-FASTRCNN-ONLY-2-CLASSES"
# elif USE_MODEL == 2:
#      MODEL = "MAX-ITER5000_LR0.05_GAMMA0.2-NMS-0.3-FASTRCNN-ONLY-2-CLASSES"
# elif USE_MODEL == 3:
#      MODEL = "MAX-ITER7000_LR0.05_GAMMA0.2-NMS-0.3-FASTRCNN-ONLY-2-CLASSES"
# elif USE_MODEL == 4:
#      MODEL = None

## **Preparing and registering the Dataset**



There are different types for the format of bbox. It must be a member of structures.BoxMode.There are 5 such formats. But, currently it supports: **BoxMode.XYXY_ABS, BoxMode.XYWH_ABS**. We use the second format and mention the same in our dataset. After that, we need to register our datset.

In [7]:
# if your dataset is in COCO format, this cell can be replaced by the following three lines:
# from detectron2.data.datasets import register_coco_instances
# register_coco_instances("my_dataset_train", {}, "json_annotation_train.json", "path/to/image/dir")
# register_coco_instances("my_dataset_val", {}, "json_annotation_val.json", "path/to/image/dir")


def get_masked_face_dicts(coco_format_json_file, img_dir):
    with open(coco_format_json_file) as f:
        dataset_dicts = json.load(f)

    for d in dataset_dicts:
        d["file_name"] = os.path.join(img_dir, d["file_name"])

        for anno in d["annotations"]:
            anno["bbox_mode"] = BoxMode.XYWH_ABS
            anno["category_id"] = int(anno["category_id"])

    return dataset_dicts


for d in ["train", "val", "test"]:
    DatasetCatalog.register(
        "masked_face_" + d,
        lambda d=d: get_masked_face_dicts(
            os.path.join(DATA_PATH, d, d + "-with-only-2-classes.json"),
            os.path.join(DATA_PATH, d),
        ),
    )
    MetadataCatalog.get("masked_face_" + d).set(
        thing_classes=["with_mask", "without_mask"]
    )
masked_face_metadata = MetadataCatalog.get("masked_face_train")

## **Inference from new images using the trained model**

An output folder gets saved in the local storage in which the final weights are stored. You can save this folder for inferencing from this model in future.

Set the score threshold value for reducing the redundant boxes on the prediction results.

In [8]:
%%time
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg

cfg = get_cfg()
cfg.merge_from_file(
    model_zoo.get_config_file("COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x.yaml")
)
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
# cfg.MODEL.ROI_HEADS.NMS_THRESH_TEST = 0.5
cfg.MODEL.WEIGHTS = masked_face_detector
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 2  # No. of classes = [with_mask, without_mask]

#################################################################################################################################
# add config params
cfg.MODEL.PIXEL_STD = [
    57.375,
    57.120,
    58.395,
]  # Otherwise, you can use [57.375, 57.120, 58.395] (ImageNet std)

cfg.MODEL.ROI_HEADS.IOU_THRESHOLDS = [
    0.5
]  # Overlap threshold for an RoI to be considered foreground (if >= IOU_THRESHOLD), Overlap threshold for an RoI to be considered background (if < IOU_THRESHOLD)
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 512  # 128
# cfg.MODEL.ROI_HEADS.POSITIVE_FRACTION = 0.1 # 0.25 Target fraction of RoI minibatch that is labeled foreground (i.e. class > 0)
cfg.MODEL.ROI_BOX_HEAD.NORM = "GN"  # Normalization method for the convolution layers
cfg.MODEL.ROI_HEADS.NMS_THRESH_TEST = 0.3
# cfg.MODEL.ROI_BOX_HEAD.SMOOTH_L1_BETA = 0.5
cfg.MODEL.ROI_BOX_HEAD.NORM = "GN"

cfg.MODEL.RPN.NMS_THRESH = 0.7  # 0.3
cfg.MODEL.FPN.NORM = "GN"
cfg.MODEL.RPN.IOU_THRESHOLDS = [0.3, 0.7]
cfg.MODEL.RPN.BATCH_SIZE_PER_IMAGE = 512  # 256

##################################################################################################################################

predictor = DefaultPredictor(cfg)

Wall time: 3.6 s


## **Inference from videos using the trained model**

In [9]:
from detectron2.utils.visualizer import ColorMode
import glob
import cv2

# from google.colab.patches import cv2_imshow

# cap = cv2.VideoCapture(os.path.join(DATA_PATH, 'images-from-youtube-videos', 'video_5', 'Paris_camera5.mp4'))
cap = cv2.VideoCapture(os.path.join(video_folder, "Paris_camera5.mp4"))
while cap.isOpened():
    ret, frame = cap.read()
    # if frame is read correctly ret is True
    if not ret:
        break

    # frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    frame = cv2.resize(frame, (300, 200))  # cv2.resize(frame, (1800, 1540))
    outputs = predictor(frame)
    v = Visualizer(
        frame[:, :, ::-1],
        metadata=masked_face_metadata,
        scale=1.0,  # 0.8
        instance_mode=ColorMode.IMAGE,
    )
    out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    # cv2.namedWindow( "frame", CV_WINDOW_AUTOSIZE);
    # imshow("Display frame", image);
    cv2.imshow("frame", out.get_image()[:, :, ::-1])
    # plt.figure(figsize = (150, 15))
    # plt.imshow(out.get_image()[:, :, ::-1])
    # plt.show()

    if cv2.waitKey(1) == ord("q"):
        break

# cap.release()
# cv2.destroyAllWindows()

RuntimeError: cuDNN error: CUDNN_STATUS_ALLOC_FAILED