In [1]:
#Importing necessary libraries
import cv2 
import numpy as np 
import matplotlib.pyplot as plt
import os
os.chdir('C:\\Users\\akhil\\Desktop\\mobilenet')

The DNN I am using for this assignment is a caffee version of MobileNet-SSD model, which uses a hybrid of a framework from Google called MobileNet(transfer learning pretrained model) and another framework called Single Shot Detector Multibox.

- __MobileNetSSD_deploy.caffemodel__: This is the model.
- __MobileNetSSD_deploy.prototxt__: This is the text file that describes the
model's parameters.

### Why MobileNet SSD ?

- __SSD Object Detection__ extracts feature map using a base deep learning network, which are CNN based classifiers, and applies convolution filters to finally detect objects. 
- My implementation uses MobileNet as the base network (others might include- VGGNet, ResNet, DenseNet).

In [2]:
##STEP 1: Importing models and prototxt file for labels
model = cv2.dnn.readNetFromCaffe('MobileNetSSD_deploy.prototxt','MobileNetSSD_deploy.caffemodel')

For MobileNet we need to decide some preprocessing parameters that are specified to this model.
- It expects the input image to be 300 pixels high
- Also, it expects the pixel valuesin the image to be on a scale from -1.0 to 1.0.
- This means that, relative to the usual scale from 0 to 255, it is necessary to subtract 127.5 and then divide by 127.5. 

In [3]:
##STEP 2: 
blob_height = 300
color_scale = 1.0/127.5
average_color = (127.5, 127.5, 127.5)
confidence_threshold = 0.5 #Also need to define a confidence threshold

In [4]:
##STEP 3: The model supports 20 classes of objects, with IDs from 1 to 20 (I am using all the classses as it could be 
## used for real time too to identify different objects). The labels for these classes can be defined as follows:

labels = ['airplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus',
          'car', 'cat', 'chair', 'cow', 'dining table', 'dog',
          'horse', 'motorbike', 'person', 'potted plant', 'sheep',
          'sofa', 'train', 'TV or monitor']

In [5]:
##STEP 4: For each frame, we need to calculate the aspect ratio.

cap = cv2.VideoCapture('1615363610851.mp4') #Put 0 for real time video processing or You can put some other video to check its
                                            #performance
font = cv2.FONT_HERSHEY_SIMPLEX
success, frame = cap.read()
color = (255,170,0)
while success:
    h, w = frame.shape[:2]
    aspect_ratio = w/h

    # Detect objects in the frame.

    blob_width = int(blob_height * aspect_ratio)
    blob_size = (blob_width, blob_height)
    
    #STEP 4: I will be using cv2.dnn.blobFromImage function, with several of its optional arguments, to perform the necessary 
    #preprocessing, including resizing the frame and converting its pixel data into a scale from -1.0 to 1.0:
    
    blob = cv2.dnn.blobFromImage(frame, scalefactor=color_scale, size=blob_size, mean=average_color)
    
    # feed the resulting blob to the DNN and get the model's output:
    model.setInput(blob)
    results = model.forward()  # results are an array, in a format that is specific to the model we are using
    
    # STEP 5:for object detection DNN trained with the SSD framework â€“ the results include a subarray of detected objects, 
    #each with its own confidence score, rectangle coordinates, and class ID. The following code shows
    #how to access these, as well as how to use an ID to look up a label in the list I defined earlier:
    
    count = [] #making a empty list to get the count of vehicle 
    
    # Iterate over the detected objects.
    for object in results[0, 0]:
        confidence = object[2]
        if confidence > confidence_threshold:
            # Get the object's coordinates.
            
            x0, y0, x1, y1 = (object[3:7] * [w, h, w, h]).astype(int)
            if (x0 <= 1364) & (y0 >=189): #This expression is to satisfy the detetion area condition, which is a line in our case
                                          #(vehicle crosses this coordinate our model will detect the object.) 
                                          #These x0 and y0 coordinates are nothing but to represent the line.
                                          #They can be changed according to videos.

            # Get the classification result.
                id = int(object[1])
                label = labels[id - 1]
                
                #STEP 6:  As we iterate over the detected objects, we draw the detection rectangles, along
                # with the classification labels and confidence scores:
                
                # Draw a blue rectangle around the object.
                cv2.rectangle(frame, (x0, y0), (x1, y1),
                              (255, 0, 0), 2)
                
    
                # Draw the classification result and confidence.
                text = '%s (%.1f%%)' % (label, confidence * 100.0)
                cv2.putText(frame, text, (x0, y0 - 20),
                    cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2)
                count.append(object)

    #The last thing to do with the frame is to show it:
    cv2.line(frame, (1,266), (1364,177), color, 2)
    cv2.putText(frame, "vehicles detected: " + str(len(count)), (889, 19), font, 0.6, (0, 180, 80), 2)
    cv2.imshow('Objects', frame)
    
    k = cv2.waitKey(1)
    if k == 27:  # Escape
        break

    success, frame = cap.read()