# Object Detection using YOLOv11
Using the pre-trained YOLOv11 model to detect people in images from the training dataset.
<br/><br/>How it Works :
1. Load the YOLOv11 model.
2. Read images from the training dataset.
3. Run the YOLO model on each image.
4. Draw bounding boxes around detected people.
5. Display the processed image.
6. Exit on pressing 'q'.

In [1]:
import cv2
from ultralytics import YOLO
import os

In [4]:
# Define path to training images
train_dir = "../data/images/train"

# Load YOLOv11 model (pre-trained) - n/s/m/l/x are available
model = YOLO("yolo11s.pt")

# Fixed display width for the resized image
display_width = 800

# Toggle grayscale conversion (grayscale seems to have worse performance)
use_grayscale = False

# Process images in the train folder
for img_name in sorted(os.listdir(train_dir)):
    img_path = os.path.join(train_dir, img_name)

    # Read image
    frame = cv2.imread(img_path)
    if frame is None:
        continue  # skip if image cannot be read

    # Convert to grayscale if enabled
    if use_grayscale:
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        frame = cv2.cvtColor(gray, cv2.COLOR_GRAY2BGR)  # Convert back to 3 channels for YOLO

    # Run YOLO detection
    results = model(frame)

    # Draw bounding boxes
    for result in results:
        for box in result.boxes:
            if int(box.cls[0]) == 0:  # Class 0 = "person" in YOLO
                x1, y1, x2, y2 = map(int, box.xyxy[0])
                cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)

    # Resize image for display while keeping aspect ratio
    height, width = frame.shape[:2]
    new_height = int((display_width / width) * height)  # Maintain aspect ratio
    frame_resized = cv2.resize(frame, (display_width, new_height))

    # Show the processed image
    cv2.imshow("Drowning Detection", frame_resized)

    # Wait for 1ms and exit if 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Cleanup
cv2.destroyAllWindows()


0: 384x640 2 persons, 1 surfboard, 81.7ms
Speed: 2.5ms preprocess, 81.7ms inference, 0.7ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 1 dog, 1 chair, 92.4ms
Speed: 2.1ms preprocess, 92.4ms inference, 0.7ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 86.2ms
Speed: 2.3ms preprocess, 86.2ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 1 surfboard, 99.7ms
Speed: 1.7ms preprocess, 99.7ms inference, 0.9ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 85.2ms
Speed: 2.0ms preprocess, 85.2ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 1 chair, 82.4ms
Speed: 1.9ms preprocess, 82.4ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 person, 1 surfboard, 87.3ms
Speed: 2.1ms preprocess, 87.3ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 85.7ms
Speed: 1.8ms prepro

### Possible ways to Improve the Model
1. Use a larger YOLO model
2. Apply image preprocessing (enhance contrast, histogram equalisation)
3. Adjust confidence threshold
4. Apply frame resizing before detection