### Part1: Using available pre-trained models for object detection, conduct inference on a short video (5-10 seconds) of a street scene drawing bounding boxes around detected vehicles.

### Step 1. Collect a source video. It may be necessary to divide the video into discrete image frames.
Downloaded the mp4 from this location:https://github.com/OlafenwaMoses/ImageAI/blob/master/data-videos/traffic-mini.mp4

### Step 2. Conduct inference on each frame of the video, drawing bounding boxes around detected vehicles.


In [1]:
# install the necessary libraries
#!pip install ultralytics

In [2]:
import cv2
# latest object detection version
from ultralytics import YOLO
import numpy as np

In [3]:
# define file
cap = cv2.VideoCapture("input_traffic-mini.mp4")

# load pre-trained model with template yolov8m.pt 
# consists of 80 different classes
# which downloads below version

model = YOLO("yolov8m.pt")

1. Person
2. Bicycle
3. Car
4. Motorcycle
5. Airplane
6. Bus
7. Train
8. Truck
9. Boat
10. Traffic light
11. Fire hydrant
12. Stop sign
13. Parking meter
14. Bench
15. Bird
16. Cat
17. Dog
18. Horse
19. Sheep
20. Cow
21. Elephant
22. Bear
23. Zebra
24. Giraffe
25. Backpack
26. Umbrella
27. Handbag
28. Tie
29. Suitcase
30. Frisbee
31. Skis
32. Snowboard
33. Sports ball
34. Kite
35. Baseball bat
36. Baseball glove
37. Skateboard
38. Surfboard
39. Tennis racket
40. Bottle
41. Wine glass
42. Cup
43. Fork
44. Knife
45. Spoon
46. Bowl
47. Banana
48. Apple
49. Sandwich
50. Orange
51. Broccoli
52. Carrot
53. Hot dog
54. Pizza
55. Donut
56. Cake
57. Chair
58. Couch
59. Potted plant
60. Bed
61. Dining table
62. Toilet
63. TV
64. Laptop
65. Mouse
66. Remote
67. Keyboard
68. Cell phone
69. Microwave
70. Oven
71. Toaster
72. Sink
73. Refrigerator
74. Book
75. Clock
76. Vase
77. Scissors
78. Teddy bear
79. Hair drier
80. Toothbrush

In [4]:
# check if vid file is opened correctly
if not cap.isOpened():
    print("Error: could not open video")
    exit()

# Get video properties
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

# Define the output video codec and create a VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'XVID')  # Change this to the codec you want to use
output_video = cv2.VideoWriter('output_video_vk.mp4', fourcc, fps, (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))))


while True:
    ret, frame = cap.read()
    # if there are no more frames
    if not ret:
        break
    #
    results = model(frame)
    # take a single result
    result = results[0]
    bboxes = result.boxes.xyxy
    #print(bboxes)
    # extract coorindates 
    bboxes = np.array(result.boxes.xyxy.cpu(), dtype="int")
    classes = np.array(result.boxes.cls.cpu(), dtype="int")
    
    # transfer this info to image
    for cls,bbox in zip(classes, bboxes):
        (x, y, x2, y2) = bbox
        cv2.rectangle(frame, (x, y), (x2, y2), (0, 0, 255), 2)
        cv2.putText(frame, str(cls), (x, y-5), cv2.FONT_HERSHEY_PLAIN, 2, (0, 0, 225), 2)
    
    # Write the frame into the file
    output_video.write(frame)
    
    cv2.imshow("img", frame)
    key = cv2.waitKey(1)
    if key == 27:
        break

# Release the video capture and writer objects
cap.release()
output_video.release()
# close all windows
cv2.destroyAllWindows()




OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'


0: 384x640 10 persons, 1 bicycle, 9 cars, 2 motorcycles, 4 buss, 1 truck, 122.7ms
Speed: 1.6ms preprocess, 122.7ms inference, 1.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 9 persons, 1 bicycle, 8 cars, 2 motorcycles, 4 buss, 1 truck, 109.9ms
Speed: 1.1ms preprocess, 109.9ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 8 persons, 2 bicycles, 9 cars, 2 motorcycles, 3 buss, 1 truck, 103.7ms
Speed: 1.2ms preprocess, 103.7ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 10 persons, 2 bicycles, 9 cars, 2 motorcycles, 3 buss, 1 truck, 105.9ms
Speed: 1.2ms preprocess, 105.9ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 10 persons, 1 bicycle, 7 cars, 2 motorcycles, 5 buss, 110.2ms
Speed: 1.2ms preprocess, 110.2ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 10 persons, 2 bicycles, 7 cars, 2 motorcycles, 5 buss, 110.4ms
Speed: 1.1ms preprocess, 110.4

Speed: 1.2ms preprocess, 112.1ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 7 persons, 3 bicycles, 4 cars, 5 buss, 101.0ms
Speed: 1.3ms preprocess, 101.0ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 7 persons, 2 bicycles, 4 cars, 5 buss, 100.5ms
Speed: 1.2ms preprocess, 100.5ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 9 persons, 3 bicycles, 4 cars, 6 buss, 97.7ms
Speed: 1.2ms preprocess, 97.7ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 8 persons, 2 bicycles, 4 cars, 6 buss, 102.3ms
Speed: 1.0ms preprocess, 102.3ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 8 persons, 2 bicycles, 4 cars, 6 buss, 99.2ms
Speed: 1.1ms preprocess, 99.2ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 8 persons, 2 bicycles, 4 cars, 6 buss, 109.2ms
Speed: 1.1ms preprocess, 109.2ms inference, 0.5ms postprocess 

Speed: 1.1ms preprocess, 112.1ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 8 persons, 4 bicycles, 6 cars, 4 buss, 112.0ms
Speed: 1.3ms preprocess, 112.0ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 8 persons, 4 bicycles, 7 cars, 4 buss, 107.0ms
Speed: 1.2ms preprocess, 107.0ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 7 persons, 3 bicycles, 7 cars, 4 buss, 109.3ms
Speed: 1.1ms preprocess, 109.3ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 7 persons, 4 bicycles, 7 cars, 3 buss, 111.0ms
Speed: 1.1ms preprocess, 111.0ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 7 persons, 4 bicycles, 9 cars, 5 buss, 111.1ms
Speed: 1.3ms preprocess, 111.1ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 6 persons, 3 bicycles, 9 cars, 5 buss, 108.3ms
Speed: 1.0ms preprocess, 108.3ms inference, 0.5ms postproc


0: 384x640 8 persons, 2 bicycles, 12 cars, 1 motorcycle, 5 buss, 112.3ms
Speed: 1.2ms preprocess, 112.3ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 9 persons, 2 bicycles, 9 cars, 1 motorcycle, 4 buss, 114.4ms
Speed: 1.2ms preprocess, 114.4ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 8 persons, 3 bicycles, 10 cars, 1 motorcycle, 5 buss, 106.7ms
Speed: 1.2ms preprocess, 106.7ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 7 persons, 3 bicycles, 10 cars, 1 motorcycle, 5 buss, 113.4ms
Speed: 1.3ms preprocess, 113.4ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 7 persons, 3 bicycles, 10 cars, 1 motorcycle, 5 buss, 111.1ms
Speed: 1.4ms preprocess, 111.1ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 7 persons, 3 bicycles, 10 cars, 1 motorcycle, 4 buss, 111.3ms
Speed: 1.2ms preprocess, 111.3ms inference, 0.5ms postprocess per i

In [5]:
import torch
torch.__version__

'2.2.0'

In [6]:
import torchvision
torchvision.__version__ 

'0.17.0'

In [7]:
# Debugging, was having version issues

# Execute this cell to respond with 'y' using the %run magic command
#%run echo y | pip uninstall torchvision

#!echo y | pip uninstall torchvision
#!pip install torchvision==0.17

### Step 3. Format the results back into a video.

As you can see the bound boxes for Person, Bicycle, Car, Motorcycle. The program was able to detect the bus driver inside the bus as well!!