# Object Detection API Demo

<table align="left"><td>
  <a target="_blank"  href="https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb">
    <img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab
  </a>
</td><td>
  <a target="_blank"  href="https://colab.sandbox.google.com/github/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb">
    <img width=32px src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
</td></table>

Welcome to the [Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection). This notebook will walk you step by step through the process of using a pre-trained model to detect objects in an image.

> **Important**: This tutorial is to help you through the first step towards using [Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection) to build models. If you just just need an off the shelf model that does the job, see the [TFHub object detection example](https://colab.sandbox.google.com/github/tensorflow/hub/blob/master/examples/colab/object_detection.ipynb).

# Setup

Important: If you're running on a local machine, be sure to follow the [installation instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md). This notebook includes only what's necessary to run in Colab.

### Install

Get `tensorflow/models` or `cd` to parent directory of the repository.

In [17]:
#thresh_hold value is taken as 0.3 for person and car detection
thresh_hold_val = 0.3

In [18]:
import os
import pathlib


if "models" in pathlib.Path.cwd().parts:
  while "models" in pathlib.Path.cwd().parts:
    os.chdir('..')
elif not pathlib.Path('models').exists():
  !git clone --depth 1 https://github.com/tensorflow/models

'git' is not recognized as an internal or external command,
operable program or batch file.


Compile protobufs and install the object_detection package

### Imports

In [19]:
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile

from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from IPython.display import display
from collections import Counter

In [20]:
import cv2
from lxml import etree
import xml.etree.cElementTree as ET
from tkinter import Tk, Label,Entry
from tkinter import filedialog,ttk,StringVar
from tkinter import Checkbutton
from tkinter import Button

In [21]:
import tkinter.messagebox

Import the object detection module.

In [22]:
from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

Patches:

In [23]:
# patch tf1 into `utils.ops`
utils_ops.tf = tf.compat.v1

# Patch the location of gfile
tf.gfile = tf.io.gfile

# Model preparation 

## Variables

Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing the path.

By default we use an "SSD with Mobilenet" model here. See the [detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) for a list of other models that can be run out-of-the-box with varying speeds and accuracies.

## Loader

In [24]:
def load_model(model_name):
  base_url = 'http://download.tensorflow.org/models/object_detection/'
  model_file = model_name + '.tar.gz'
  model_dir = tf.keras.utils.get_file(
    fname=model_name, 
    origin=base_url + model_file,
    untar=True)

  model_dir = pathlib.Path(model_dir)/"saved_model"

  model = tf.saved_model.load(str(model_dir))
  model = model.signatures['serving_default']

  return model

## Loading label map
Label maps map indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`.  Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine

In [25]:
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = r'F:\models-master\research\object_detection\data\mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

For the sake of simplicity we will test on 2 images:

# Detection

Load an object detection model:

In [26]:
model_name = 'ssd_mobilenet_v1_coco_2018_01_28'
detection_model = load_model(model_name)

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Check the model's input signature, it expects a batch of 3-color images of type uint8: 

Run it on each test image and show the results:

## realtime object detection

# Before running this code pls go to utils folder in object detection directory and replace existing visualization_utils file with new visualization_utils.py file given along with this notebook

In [27]:
global videofile  # I used global variables to take videofile as video directory input and location as image save directory input
global location  #global variables are used since same variables are used inside the play video function

    
def opendir():
        root.directory = filedialog.askdirectory() #filedialog.askdirectory() takes the location name of input directory
        global videofile
        videofile=root.directory

def filepath():        
    filename = filedialog.askdirectory()
    global location
    location=filename
    


# # the following function playvideo takes the inputs from GUI's and saves image in incremental order with names  after every nth frame. And also creates 2 seperate folder for person and car to store only the cropped images of car and person class. And then checks for the check boxes of which format label to be saved like xml,kitti,yolo or all and creates respective label folders and saves label file with same name as frame image.
#note : if there are no person or car found in the image then label file for that image will not be created

In [28]:
def playvideo():       # This function will be input to the play video button
    
    out_format = [var1.get(),var2.get(),var3.get(),var4.get()]  #returns the name of checked and unchecked check boxes
    output_format = []          #This array holds only checked boxes
    for i in out_format:
        if i !='0' :
            output_format.append(i)
    
    if output_format == []:     #if no checkbox is checked then it gives warning messege
        tkinter.messagebox.showinfo('warning','pls check atleast one value in checkbox')

    frame_number = int(t1.get())    #it takes integer frame number from GUI input
    
    if output_format != []:    #if any one or more checkboxes are checked then following code will be executed
    
        videoslist = os.listdir(videofile)   #lists the given input video directory
        videos = []                 #creates empty array to hold only .mp4 files
        for i in videoslist:
            if i.endswith('.mp4'):
                videos.append(videofile+'/'+i)
        countid2= 1              #the name of image begins with number 1
        zero_array = [0]*2       #creates empty zero array to hold the count of car and persons
        for i in videos:

            cap = cv2.VideoCapture(i)   #capture video from video file
            countid= countid2           #initialize countid = prvious count id (countid2)
            count = 0                  #count = 0 for the start of video frame
            new_array = zero_array     #initialize new array to zero_array
            items = ['person','car']  #person and car are put into the array since we have known object classes to detect
            sourceDirectory = location   #this location saves image to the location directory in GUI input


            while cap.isOpened():
                ret, image_np = cap.read()
                ret1,frame = cap.read()
                  # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
                if ret:
                    image = np.asarray(image_np)
                  # The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
                    input_tensor = tf.convert_to_tensor(image)
                  # The model expects a batch of images, so add an axis with `tf.newaxis`.
                    input_tensor = input_tensor[tf.newaxis,...]

                  # Run inference
                    output_dict = detection_model(input_tensor)

                  # All outputs are batches tensors.
                  # Convert to numpy arrays, and take index [0] to remove the batch dimension.
                  # We're only interested in the first num_detections.
                    num_detections = int(output_dict.pop('num_detections'))
                    output_dict = {key:value[0, :num_detections].numpy() 
                                 for key,value in output_dict.items()}
                    output_dict['num_detections'] = num_detections

                  # detection_classes should be ints.
                    output_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)
                      # Visualization of the results of a detection.
                    vis_util.visualize_boxes_and_labels_on_image_array(
                      image_np,
                      output_dict['detection_boxes'],
                      output_dict['detection_classes'],
                      output_dict['detection_scores'],
                      category_index,
                      instance_masks=output_dict.get('detection_masks_reframed', None),
                      use_normalized_coordinates=True,
                      min_score_thresh=thresh_hold_val,
                      line_thickness=8)


                   #get classes,classname and bounding boxes from visualization utils.py i have made changes to visualization utils.py
                    image,boxes,classname,classes = vis_util.visualize_boxes_and_labels_on_image_array(image_np,
                                                                                   output_dict['detection_boxes'],
                                                                                   output_dict['detection_classes'],
                                                                                   output_dict['detection_scores'],
                                                                                   category_index,
                                                                                   instance_masks=output_dict.get('detection_masks_reframed', None),
                                                                                   use_normalized_coordinates=True,
                                                                                   min_score_thresh=thresh_hold_val,
                                                                                   line_thickness=8)

                    img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)  #convert opencv bgr to rgb format



                    im = Image.fromarray(img)      #read the frame into pil format
                    width,height = im.size         #get width and height

                    #xml label format
                    annotation = ET.Element('annotation')           
                    ET.SubElement(annotation, 'folder').text = 'xml labels'
                    ET.SubElement(annotation, 'filename').text = str(countid)+'.jpg'
                    ET.SubElement(annotation, 'segmented').text = '0'
                    size = ET.SubElement(annotation, 'size')
                    ET.SubElement(size, 'width').text = str(width)
                    ET.SubElement(size, 'height').text = str(height)
                    ET.SubElement(size, 'depth').text = '3'



                    lines = []          #lines to store items in image in kitti format
                    yolotext = []       #yolotext stores image in yolo format
                    for i in boxes:
                      left =  int(i[1]*width)       #get left,right,top,bottom values from boxes
                      bottom = int(i[2]*height)
                      right =int(i[3]*width)
                      top = int(i[0]*height)
                      val = (left, top, right, bottom)
                      im1 = im.crop((left, top, right, bottom))   #crop car or person from frame 

                      index = boxes.index(i)          
                      object_name = classname[index]          #get the object name like car or person from array
                      classid = classes[index]                #get the class id
                      if classid == 3:
                          classid = 0
                      path = location+'/'+object_name    #creates folder with object name

                      objindex= items.index(object_name)
                      new_array[objindex] +=1
                      a = new_array[objindex]  

                      dw = 1./width      #there is standard formula for yolo format
                      dh = 1./height

                      x = (left+right)/2.0
                      y = (top+bottom)/2.0
                      w = right - left
                      h = bottom - top

                      x = x*dw      #x,y are centers w and h are width and height for yolo format
                      y = y*dh
                      w = w*dw
                      h = h*dh

                      yolotext.append(str(classid)+' '+str(round(x,3))+' '+str(round(y,3))+' '+str(round(w,3))+' '+str(round(h,3)))
                      lines.append(object_name + ' ' + '0.00 0 0.00 ' +str(left) + ' ' + str(top) + ' ' + str(right) + ' ' + str(bottom) + ' 0.00 0.00 0.00 0.00 0.00 0.00 0.00')

                      if not os.path.exists(path):
                         os.makedirs(path)
                      im1.save(path+'/'+object_name+str(a)+'.jpg')  

                      ob = ET.SubElement(annotation, 'object')
                      ET.SubElement(ob, 'name').text = object_name
                      ET.SubElement(ob, 'pose').text = 'Unspecified'
                      ET.SubElement(ob, 'truncated').text = '0'
                      ET.SubElement(ob, 'difficult').text = '0'
                      bbox = ET.SubElement(ob, 'bndbox')
                      ET.SubElement(bbox, 'xmin').text = str(left)
                      ET.SubElement(bbox, 'ymin').text = str(top)
                      ET.SubElement(bbox, 'xmax').text = str(right)
                      ET.SubElement(bbox, 'ymax').text = str(bottom)


                    xml_str = ET.tostring(annotation)
                    root = etree.fromstring(xml_str)
                    xml_str = etree.tostring(root, pretty_print=True)


#check if 'xml' checkbox is checked or not and based on the given input it creates respective folders and label files in folder
                    if 'xml' in output_format or 'all' in output_format:

                        Directory1 = location+'/'+'xml labels'
                        if not os.path.exists(Directory1):
                            os.makedirs(Directory1)

                        if boxes != []:
                            xml_labels = str(countid) + '.xml'
                            save_path = os.path.join(Directory1, xml_labels)
                            with open(save_path, 'wb') as temp_xml:
                                temp_xml.write(xml_str)


#check if 'yolo' checkbox is checked or not and based on the given input it creates respective folders and label files in folder

                    if 'yolo' in output_format or 'all' in output_format:

                        Directory2 = location+'/'+'yolo labels'
                        if not os.path.exists(Directory2):
                            os.makedirs(Directory2)

                        if yolotext != []:
                            yolo_label = str(countid)+ '.txt'
                            with open(os.path.join(Directory2, yolo_label), "w") as text12_file:
                                for item in yolotext:
                                    text12_file.write("%s\n" % item)              

#check if 'kitti' checkbox is checked or not and based on the given input it creates respective folders and label files in folder
                                    
                    if 'kitti' in output_format or 'all' in output_format:

                        Directory3 = location+'/'+'kitty labels'
                        if not os.path.exists(Directory3):
                            os.makedirs(Directory3)

                        if lines != []:
                            kitti_labels = str(countid) + '.txt'
                            with open(os.path.join(Directory3, kitti_labels), "w") as text_file:
                                #if lines is not []:
                                for item in lines:
                                    text_file.write("%s\n" % item)


                    im.save(sourceDirectory+'/'+str(countid)+'.jpg')
                    count += frame_number         #30 # i.e. at 30 fps, this advances one second
                    countid+=1                    #increases the countid for next image to be saved
                    cap.set(1, count)






                    cv2.imshow('object detection',image_np)
                    if cv2.waitKey(25) & 0xFF == ord('q'):
                        break
                else:
                    cap.release()
                    break
            #cap.release()
            cv2.destroyAllWindows()

            zero_array = new_array     #for the next video initialize zero_array = new_array
            countid2=countid+1       #initialize countid2 for next video to last image count of previous video+1

# The following code creates GUI app to take the input and then executes the above function based on the given input

In [29]:
root = Tk()       #create Gui window using tkinter
root.wm_title("Objecct detection")   #name of GUI window
root.geometry("400x300")            #Geometry size for GUI

l2=Label(root,text="enter the frame number")    #label to enter frame number in gui
l2.grid(row=1,column=0)                         #location of the label
t1=Entry(root)                                  #Entry command is used to take the input for frame number
t1.grid(row=1,column=1)                         #location of entry window


Label(root, text ='Select format u need pls uncheck all the boxes 1st if checked then check the boxes u need').place(x = 20, y = 180) 
  
# check buttons 
var1 = StringVar()  #var1 is used return the checked and unchecked value
xml = Checkbutton(root, text ='xml', 
                   variable = var1,onvalue = 'xml',offvalue = None).place(x = 40, y = 200) 

var2 = StringVar()
yolo = Checkbutton(root, text ='yolo', 
                  variable = var2,onvalue = 'yolo',offvalue = None).place(x = 40, y = 220) 

var3 = StringVar()
kitti = Checkbutton(root, text ='kitti', 
                     variable =var3,onvalue = 'kitti',offvalue = None).place(x = 40, y = 240) 

var4 = StringVar()
all = Checkbutton(root, text ='all', 
                variable = var4,onvalue = 'all',offvalue = None).place(x = 40, y = 260) 

#The following function creates Buttons when clicked performs the given in command variable
B = Button(text ="Open video path",command=opendir,height = 1, width = 15).place(x=40,y=50)
B1 = Button(text ="Open image save path",command=filepath,height = 1, width = 18).place(x=180,y=50)
B2  =  Button(text ="Play video",command=playvideo,height = 2, width = 20).place(x=100,y=100)

root.mainloop()