**Problem Statement**
* The problem statement that we have selected is to develop an ML model which gives us the duration of objects in a given video.
* The steps included are :
  * Conversion of video into frames
  * Finding out the duration of the video
  * Calculating the duration of each frame
  * Applying object detection on each frame using YOLO algorithm
  * Finding out unique objects present in the video
  * Calculating the no.of.frames for each object in which the particular object is present
  * Calculating the total duration of each detected object in the video 

In [117]:
#importing necessary libraries
import numpy as np

Conversion of video into frames

In [76]:
#video to frames
import cv2
cap=cv2.VideoCapture("train-6074.mp4")
res,vf=cap.read()
frames=[]
count=0
while res==True:
  cv2.imwrite("frame%d.jpg" %count,vf)
  frames.append(cv2.resize(vf, (200, 500)))
  res,vf=cap.read()
  
  count+=1

In [77]:
no_of_frames=len(frames)
print("No.of Frames in the given video : ", no_of_frames)

No.of Frames in the given video :  133


Finding out the duration of the video

In [79]:
import subprocess
import json
input_filename = "SrisailamDam_Trim.mp4"
out = subprocess.check_output(["ffprobe", "-v", "quiet", "-show_format", "-print_format", "json", input_filename])
ffprobe_data = json.loads(out)
duration_seconds = float(ffprobe_data["format"]["duration"])
print("Duration of input Video : ",duration_seconds)

Duration of input Video :  4.074646


Calculating the duration of each frame

In [113]:
print("Duration of each frame : ",duration_seconds/no_of_frames)
duration_of_frame=duration_seconds/no_of_frames

Duration of each frame :  0.030636436090225566


In [81]:
print((frames[0].shape))

(500, 200, 3)


Applying object detection on each frame using YOLO algorithm

In [23]:
from google.colab.patches import cv2_imshow

In [83]:
net=cv2.dnn.readNetFromDarknet('/content/YOLOV3.cfg','/content/yolov3.weights')
net

< cv2.dnn.Net 0x7f48a8885b70>

In [86]:
classes=[]
with open('/content/coco.names.txt','r') as f:
  classes = [i.strip() for i in f.readlines()]

In [105]:
def detect_objects(frame, net):
    #read the image as BGR
    my_img = cv2.resize(frame, (200,500)) # Change the input image size
    # cv2_imshow(my_img)

    blob = cv2.dnn.blobFromImage(my_img, 1/255, (416, 416), (0, 0, 0), swapRB=True, crop=False)

    net.setInput(blob)
    last_layer = net.getUnconnectedOutLayersNames()
    last_out = net.forward(last_layer)

    ht, wt, _ = my_img.shape

    boxes = []
    v,u=[],[]
    confidences = []
    classes_id = []
    for output in last_out:
        for detection in output:
            score = detection[5:]
            class_id = np.argmax(score)
            confidence = score[class_id]
            if confidence > 0.3: # Lower the confidence threshold
                center_x = int(detection[0] * wt)
                center_y = int(detection[1] * ht)
                w = int(detection[2] * wt)
                h = int(detection[3] * ht)
                x = int(center_x - w / 2)
                y = int(center_y - h / 2)
                boxes.append([x, y, w, h])
                confidences.append(float(confidence))
                classes_id.append(class_id)
    indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.4, 0.3) # Adjust the NMS parameters
    font = cv2.FONT_HERSHEY_PLAIN
    # print(indexes)
    colors = np.random.uniform(0, 255, size=(len(boxes), 3))
    labels = []
    for i in indexes.flatten():
        x, y, w, h = boxes[i]
        label = str(classes[classes_id[i]])
        confidence = str(round(confidences[i], 2))
        labels.append(label)
        color = colors[i]
        cv2.rectangle(my_img, (x, y), (x + w, y + h), color, 2)
        cv2.putText(my_img, label + " " + confidence, (x, y + 20), font, 2, (0, 0, 0), 2)
        v.append(label)
    u.append(list(set(v)))
    # print(u)
    return my_img,u



In [106]:
img=frames[120]
res,p=detect_objects(img,net)
# cv2_imshow(res)
print(p)

[['car']]


Finding out unique objects present in the video

In [108]:
images_in_frames=[]
for i in range(no_of_frames):
  img=frames[i]
  res,p=detect_objects(img,net)
  images_in_frames.append(p[0])
print(images_in_frames)

[['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['car', 'train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['train'], ['car', 'train'], ['car', 'train'], ['car', 'train'], ['car', 'train'], ['car', 'train'], ['car', 'train'], ['car', 'train'], ['car', 'train'], [

In [110]:
z=[]
for i in images_in_frames:
  for j in i:
    z.append(j)
print("No.of Unique Objects Detected in the video : ",set(z))

No.of Unique Objects Detected in the video :  {'bus', 'car', 'train', 'truck'}


Calculating the no.of.frames for each object in which the particular object is present

In [111]:
tn=c=b=tk=0
for i in images_in_frames:
  for j in i:
    if j=="bus":
      b+=1
    if j=="car":
      c+=1
    if j=="train":
      tn+=1
    if j=="truck":
      tk+=1
print(f'No.of frames in which bus appears {b}')
print(f'No.of frames in which car appears {c}')
print(f'No.of frames in which train appears {tn}')
print(f'No.of frames in which truck appears {tk}')

No.of frames in which bus appears 1
No.of frames in which car appears 52
No.of frames in which train appears 94
No.of frames in which truck appears 1


Calculating the total duration of each detected object in the video

In [116]:
print(f'Bus appears for {b*duration_of_frame} secs')
print(f'Car appears for {c*duration_of_frame} secs')
print(f'Train appears for {tn*duration_of_frame} secs')
print(f'Truck appears for {tk*duration_of_frame} secs')

Bus appears for 0.030636436090225566 secs
Car appears for 1.5930946766917295 secs
Train appears for 2.879824992481203 secs
Truck appears for 0.030636436090225566 secs
