  # - Object Detection and Counting -

                                                 Realized by : Maleke KOUKI       

    This file displays the testing of object detection and counting on images, videos and our camera lens, which is our primary object. 
    
    During this project, we decided to employ the SSD method, which is based on the combination of YOLO for its accuracy and R-CNN for its velocity. For performance considerations, we'll use MobileNetV2. 
    SSD (Single Shot MultiBox Detector) is an object detection architecture based on a CNN architecture, while MobileNetV2 is a CNN architecture optimized for feature extraction. They are often used together in object detection models to take advantage of MobileNetV2's efficiency and SSD's object detection capabilities, creating a balanced and efficient object detection system.


# Import libraries

- Capturer des flux vidéo depuis la caméra et l'affichage du résultat

In [1]:
pip install opencv-python 

Note: you may need to restart the kernel to use updated packages.


In [1]:
import cv2

# Get OpenCV version
print("OpenCV version:", cv2.__version__)

# Check if OpenCV is available
if cv2:
    print("OpenCV is properly installed.")
else:
    print("OpenCV is not installed or not properly configured.")


OpenCV version: 4.7.0
OpenCV is properly installed.


In [2]:
import numpy as np
import cv2

In [3]:
image_path = 'img.jpg'

- The prototxt file : defines the architecture of the model ( the different layers, the output, input, pooling...).
- The caffemodel file :  contains the actual numerical values (weights: The parameters that are learned during the training process) associated with each connection in the neural network

In [4]:
prototxt_path = 'C:/Users/malek/Desktop/stage2023/object detection/models/MobileNetSSD_deploy.prototxt'
model_path = 'C:/Users/malek/Desktop/stage2023/object detection/models/MobileNetSSD_deploy.caffemodel'

- Define the minimum confidence of the prediction

In [5]:
min_confidence = 0.05

- Define the list of different objects 

In [21]:
CLASSES = ('background',
           'aeroplane', 'bicycle', 'bird', 'boat',
           'bottle', 'bus', 'car', 'cat', 'chair',
           'cow', 'diningtable', 'dog', 'truck',
           'motorbike', 'person', 'pottedplant',
           'sheep', 'sofa', 'train', 'tvmonitor', 'banana')

- Define the color of the rectangles of the detected objects

In [22]:
np.random.seed(543210) #for color problem solving
colors = np.random.uniform(0, 255, size=(len(CLASSES), 3)) #problem: we can have similar colors

- Load the pretrained model

In [23]:
net = cv2.dnn.readNetFromCaffe(prototxt_path, model_path)

In [24]:
if net.empty():
    print("Error loading the model.")
    

# 1- Image

- import the image into the neural network and resize it 

In [25]:
image = cv2.imread(image_path)
height, width = image.shape[0], image.shape[1]
blob  = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 0.007, (300,300), 130)
#put the image into the neural network
net.setInput(blob)
detected_objects = net.forward()
print(detected_objects[0][0][0]) #result : values of the first object detected 

[ 0.         15.          0.94687384  0.50484574  0.3137808   0.6396011
  0.5986711 ]


 - 9: the type of the object which is mentionned in the list (object number 9 ) 
 - 0.975 : the confidence 
 - others : are the coordinates x and y

In [26]:
# Initialize a dictionary to count each class of object
object_counts = {class_name: 0 for class_name in CLASSES[1:]}  # Exclude 'background'

for i in range(detected_objects.shape[2]):
    confidence = detected_objects[0][0][i][2] #to get the confidence
    if confidence > min_confidence: #to test if the confidence > min then we draw a rectangle on the object
        class_id = int(detected_objects[0, 0, i, 1])
        class_name = CLASSES[class_id]
        if class_name in object_counts:
            object_counts[class_name] += 1
    #to get the exact coordinates of the whole object
        upper_left_x = int(detected_objects[0, 0, i, 3]* width)
        upper_left_y = int(detected_objects[0, 0, i, 4]* height)
        lower_right_x = int(detected_objects[0, 0, i, 5]* width)
        lower_right_y = int(detected_objects[0, 0, i, 6]* height) 
    #get the name of the object and the value of the confidence
        prediction_txt = f"{CLASSES[class_id]}: {confidence:.2f}%"  
    #draw the rectangle
        cv2.rectangle(image, (upper_left_x, upper_left_y), (lower_right_x, lower_right_y), colors[class_id], 3)
    #the position of the text ( if there is no space we change the position)
        cv2.putText(image, prediction_txt, (upper_left_x, upper_left_y - 15 if upper_left_y > 30 else upper_left_y + 15), cv2.FONT_HERSHEY_SIMPLEX, 0.6, colors[class_id], 2)
                                 
        

In [27]:
# Display object counts on the image
y_position = 30
for class_name, count in object_counts.items():
    cv2.putText(image, f"{class_name}: {count}", (10, y_position), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    y_position += 30
cv2.imshow("Detected objects", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

- we use an 'object_counts' dictionary to track the number of occurrences of each object class detected in the image.

In [13]:
# count the number of objects detected
print("Number of detected objects:", len(object_counts))
for class_name, count in object_counts.items():
    print(f"{class_name}: {count} objects")

Number of detected objects: 20
aeroplane: 0 objects
bicycle: 0 objects
bird: 0 objects
boat: 0 objects
bottle: 0 objects
bus: 0 objects
car: 0 objects
cat: 0 objects
chair: 0 objects
cow: 0 objects
diningtable: 0 objects
dog: 1 objects
truck: 0 objects
motorbike: 0 objects
person: 7 objects
pottedplant: 0 objects
sheep: 0 objects
sofa: 1 objects
train: 0 objects
tvmonitor: 0 objects


# 2- video

For a video, we have to capture the frames and process each frame individually.

In [14]:
#the video path
video_path = 'traffic.mp4'  

# Open the video file
cap = cv2.VideoCapture(video_path)

# Initialize a dictionary to count each class of object
object_counts = {class_name: 0 for class_name in CLASSES[1:]}  # Exclude 'background'

while True:
    ret, frame = cap.read()  # Read a frame from the video

    if not ret:
        break  # Break the loop if the video has ended

    height, width = frame.shape[0], frame.shape[1]
    blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 0.007, (300, 300), 130)
    net.setInput(blob)
    detected_objects = net.forward()

    for i in range(detected_objects.shape[2]):
        confidence = detected_objects[0][0][i][2]
        if confidence > min_confidence:
            class_id = int(detected_objects[0, 0, i, 1])
            #
            class_name = CLASSES[class_id]
            if class_name in object_counts:
                object_counts[class_name] += 1
            upper_left_x = int(detected_objects[0, 0, i, 3] * width)
            upper_left_y = int(detected_objects[0, 0, i, 4] * height)
            lower_right_x = int(detected_objects[0, 0, i, 5] * width)
            lower_right_y = int(detected_objects[0, 0, i, 6] * height)
            prediction_txt = f"{CLASSES[class_id]}: {confidence:.2f}%"
            cv2.rectangle(frame, (upper_left_x, upper_left_y), (lower_right_x, lower_right_y), colors[class_id], 3)
            cv2.putText(frame, prediction_txt, (upper_left_x, upper_left_y - 15 if upper_left_y > 30 else upper_left_y + 15),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.6, colors[class_id], 2)

    # Display object counts on the frame
    y_position = 30
    for class_name, count in object_counts.items():
        cv2.putText(frame, f"{class_name}: {count}", (10, y_position), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
        y_position += 30
    
    cv2.imshow("Detected objects", frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):  #q : if we press q the window will be closed
        break

cap.release()
cv2.destroyAllWindows()

# 3- Webcam

In [None]:
capp = cv2.VideoCapture(0)

while True:
    _, image = capp.read()  # to read the cam data
    height, width = image.shape[0], image.shape[1]
    blob  = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 0.007, (300,300), 130)
    #put the image into the neural network
    net.setInput(blob)
    detected_objects = net.forward()
    print(detected_objects[0][0][0]) #result : values of the first object detected 
    # Initialize a dictionary to count each class of object
    object_counts = {class_name: 0 for class_name in CLASSES[1:]}  # Exclude 'background'

    for i in range(detected_objects.shape[2]):
        confidence = detected_objects[0][0][i][2] #to get the confidence
        if confidence > min_confidence: #to test if the confidence > min then we draw a rectangle on the object
            class_id = int(detected_objects[0, 0, i, 1])
            class_name = CLASSES[class_id]
            if class_name in object_counts:
                object_counts[class_name] += 1
        #to get the exact coordinates of the whole object
            upper_left_x = int(detected_objects[0, 0, i, 3]* width)
            upper_left_y = int(detected_objects[0, 0, i, 4]* height)
            lower_right_x = int(detected_objects[0, 0, i, 5]* width)
            lower_right_y = int(detected_objects[0, 0, i, 6]* height) 
        #get the name of the object and the value of the confidence
            prediction_txt = f"{CLASSES[class_id]}: {confidence:.2f}%"  
        #draw the rectangle
            cv2.rectangle(image, (upper_left_x, upper_left_y), (lower_right_x, lower_right_y), colors[class_id], 3)
        #the position of the text ( if there is no space we change the position)
            cv2.putText(image, prediction_txt, (upper_left_x, upper_left_y - 15 if upper_left_y > 30 else upper_left_y + 15), cv2.FONT_HERSHEY_SIMPLEX, 0.6, colors[class_id], 2)
        # Display object counts on the image
    y_position = 30
    for class_name, count in object_counts.items():
        cv2.putText(image, f"{class_name}: {count}", (10, y_position), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
        y_position += 30
    cv2.imshow("Detected objects", image)
    cv2.waitKey(5) #0 will freeze the image
cv2.destroyAllWindows()
cap.release()        

[0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0.]
[ 0.         15.          0.5869797   0.0166032   0.3362716   0.96063876
  0.99549127]
[ 0.         15.          0.98055387  0.03561494  0.40419468  0.9451518
  1.0025605 ]
[ 0.         15.          0.9798772   0.04478461  0.39916     0.9389892
  1.0034566 ]
[ 0.         15.          0.98233265  0.02544087  0.37237734  0.97572416
  0.99761754]
[0.0000000e+00 1.5000000e+01 9.8930675e-01 1.3895363e-02 3.4237641e-01
 9.8863828e-01 9.9555796e-01]
[ 0.         15.          0.98283494  0.01806551  0.34690216  0.98723704
  0.99568653]
[ 0.         15.          0.9612344   0.01833183  0.36919916  0.98673004
  0.9949087 ]
[0.0000000e+00 1.5000000e+01 9.7305793e-01 1.3500184e-02 3.7148130e-01
 9.8748672e-01 9.9543846e-01]
[ 0.         15.          0.9595606   0.01510254  0.3780044   0.9869721
  0.99541783]
[0.0000000e+00 1.5000000e+01 9.5758122e-01 2.8273463e-03 3.8291740e-01
 9.9213278e-01 9.9930549e-01]