# Real-time Object Detection using YOLO and OpenCV
This script implements real-time object detection using the YOLO (You Only Look Once) model through a webcam feed. It processes each frame from the webcam, detects objects, and displays the results with bounding boxes and labels in a window. So it displays a real-time video feed with object detection overlay.

In order the script will:
1. Initialize YOLO model and webcam capture
2. Process each frame in real-time
3. Draw bounding boxes and labels around detected objects
4. Display confidence scores
5. Continue until 'q' is pressed

Realized by Picciano Alisia, Patrikov Martin and Micaletto Giorgio

Requirements: `OpenCV (cv2)`, `Ultralytics YOLO` and a working webcam

In [3]:
import cv2
from ultralytics import YOLO

In [7]:
model = YOLO('yolov5su.pt')  
cap = cv2.VideoCapture(0)

if not cap.isOpened():
    print("Error: Could not open webcam.")

### Real-time Detection Loop
The following cell runs the main loop. It will open a separate window named 'YOLO Webcam'. 
Press 'q' while that window is active to stop the loop.

In [5]:
while True:
    ret, frame = cap.read()

    if not ret:
        print("Failed to grab frame. Check webcam connection or permissions.")
        break

    results = model(frame)
    for r in results:
        for box in r.boxes:
            x1, y1, x2, y2 = map(int, box.xyxy[0])
            label = model.names[int(box.cls[0])]
            confidence = box.conf[0].item()
            cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
            cv2.putText(frame, f'{label} {confidence:.2f}', (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)

    cv2.imshow('YOLO Webcam', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break


0: 384x640 1 person, 1 keyboard, 74.4ms
Speed: 5.3ms preprocess, 74.4ms inference, 4.7ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 laptop, 1 keyboard, 81.0ms
Speed: 1.2ms preprocess, 81.0ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 1 laptop, 1 keyboard, 69.8ms
Speed: 1.3ms preprocess, 69.8ms inference, 0.9ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 1 laptop, 1 keyboard, 78.5ms
Speed: 1.5ms preprocess, 78.5ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 1 keyboard, 65.1ms
Speed: 1.2ms preprocess, 65.1ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 1 keyboard, 63.9ms
Speed: 1.4ms preprocess, 63.9ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)



2025-05-01 22:20:54.437 python[21726:9933577] +[IMKClient subclass]: chose IMKClient_Modern
2025-05-01 22:20:54.437 python[21726:9933577] +[IMKInputSession subclass]: chose IMKInputSession_Modern


0: 384x640 2 persons, 1 keyboard, 66.8ms
Speed: 1.3ms preprocess, 66.8ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 1 keyboard, 69.7ms
Speed: 1.9ms preprocess, 69.7ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 1 keyboard, 72.6ms
Speed: 1.3ms preprocess, 72.6ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 1 keyboard, 75.3ms
Speed: 1.3ms preprocess, 75.3ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 1 keyboard, 1 cell phone, 76.4ms
Speed: 2.1ms preprocess, 76.4ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 1 keyboard, 1 book, 74.1ms
Speed: 1.5ms preprocess, 74.1ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 1 laptop, 1 keyboard, 133.0ms
Speed: 1.5ms preprocess, 133.0ms inference, 1.8ms postprocess per image at shape (1, 3, 384, 640)


In [8]:
cap.release()
cv2.destroyAllWindows()