About the dataset :
1. Dataset diversity - various postures (treading water, breaststroke, backstroke, freestyle, group configurations with 2/3/4 individuals, were included)
2. 8572 labeled images representing drowning, treading water, and swimming
3. 7000 randomly chosen for training set, 1572 allocated to validation set
4. Every image under images folder has a corresponding text file under labels folder with the same name
5. Each text file shows YOLO object detection format: class, x_center, y_center, width, height
6. However, the paper does not explicitly define what each class ID (e.g., 0, 1, 2) represents in the drowning detection task.
7. Based on a labeled image of someone drowning in the paper and observations in the dataset, we can infer that class 2 represents drowning (vertical position with both hands up).
8. Further inference from Figure 5 in the paper suggests that class 0 is swimming and class 1 is tread water

# 1.0 Train YOLO Model
Train yolo model with the dataset

In [1]:
import os
import yaml
import cv2
import matplotlib.pyplot as plt
import torch
from ultralytics import YOLO

In [2]:
# Check if GPU is available
device = "cuda:0" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    print(f"GPU detected: {gpu_name}")

Using device: cpu


In [None]:
# Create dataset.yaml file
yaml_content = {
    'path': '../',  # Relative path to the root directory
    'train': 'data/images/train',  # Training images
    'val': 'data/images/val',      # Validation images
    'names': {
        0: 'person'               # class 0 is 'person'
    }
}

# Create the YAML file
with open('dataset.yaml', 'w') as f:
    yaml.dump(yaml_content, f)
    print("Created dataset.yaml file")

In [None]:
# Load the pre-trained model
model = YOLO("yolo11s.pt")
print("Loaded base YOLO model")

In [None]:
# Train the model
print("Starting training...")
results = model.train(
    data='dataset.yaml',
    epochs=50,
    imgsz=640,
    batch=16,
    name='human_detection_model',
    device=0 if torch.cuda.is_available() else 'cpu',  # Use GPU if available
    patience=15,  # early stopping patience
    save=True,    # save best model
    verbose=True
)
print("Training completed")

In [None]:
# Evaluate the model on validation data
print("Evaluating model...")
metrics = model.val()
print(f"mAP50: {metrics.box.map50}")
print(f"mAP50-95: {metrics.box.map}")

In [None]:
# Test on a few validation images
test_dir = "../data/images/val"
if os.path.exists(test_dir):
    test_images = os.listdir(test_dir)[:5]  # take first 5 images for testing

    plt.figure(figsize=(15, 12))
    for i, img_name in enumerate(test_images):
        img_path = os.path.join(test_dir, img_name)

        # Perform prediction
        results = model.predict(img_path)

        # Plot the image with detected bounding boxes
        img = cv2.imread(img_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

        # Extract and draw bounding boxes
        for r in results:
            boxes = r.boxes.xyxy.cpu().numpy()
            confs = r.boxes.conf.cpu().numpy()

            for box, conf in zip(boxes, confs):
                x1, y1, x2, y2 = map(int, box)
                cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
                cv2.putText(img, f"Person: {conf:.2f}", (x1, y1-10),
                            cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

        plt.subplot(2, 3, i+1)
        plt.imshow(img)
        plt.title(f"{img_name} - Detection")
        plt.axis('off')

    plt.tight_layout()
    plt.savefig('../models/detection_examples.png')
    plt.show()
    print("Generated detection examples")

In [None]:
# Save the model
os.makedirs("../models", exist_ok=True)
save_path = "../models/human_detection_yolo.pt"
model.export(format="pt", save_dir="../models")
print(f"Model saved to {save_path}")

In [None]:
print("\nCode block for loading and using the model:")
print("""
# Load the saved model for inference
from ultralytics import YOLO
model = YOLO("../models/human_detection_yolo.pt")

# To run on an image
results = model.predict("path_to_image.jpg", conf=0.5)

# To run on a video
results = model.predict("path_to_video.mp4", conf=0.5)

# To run on a webcam
results = model.predict(0, conf=0.5)  # 0 is the default webcam ID
""")

1. The code first checks for GPU availability
2. It creates a YAML configuration file that defines your dataset structure and class names
3. It loads the pre-trained YOLO11s model as a starting point (transfer learning)
4. The actual training happens with model.train()
5. During training, the model learns to identify people in the images using the labeled data
6. Early stopping is implemented to prevent overfitting (stops if no improvement for 15 epochs)
7. After training, the model is evaluated on validation data to measure accuracy (mAP metrics)
8. The trained model is saved to the models folder for future use in drowning detection
9. The code includes visualization to show detection results on a few test images

# 2.0 Drowning Prediction Model
1. Pose (Posture) detection : Use MediaPipe
2. Drowning Prediction : -
