Imports and Model Initialization

1.YOLOv8 Model Loading:

The YOLO model is initialized by specifying the path to the trained model weights (terapistchildbestwright.pt), which is assumed to have been trained to detect "Child" and "Therapist."

2.Re-identification (Re-ID) Model Loading:

OSNet (osnet_x0_25) is loaded for appearance-based person re-identification. It extracts embeddings from person images that help track them over time.

3.DeepSORT Tracker Initialization:

A DeepSort tracker is initialized to track objects based on motion and appearance embeddings.

Index

1.Imports and Dependencies

Importing Libraries

cv2

cv2_imshow from Google Colab

DeepSort from deep_sort_realtime

YOLO from ultralytics

torch

torchreid

numpy

2.Model Initialization

Loading YOLOv8 Model

Loading Re-identification Model (OSNet)

Moving Re-identification Model to GPU

3.Tracker Initialization

Configuring DeepSORT Tracker
4.Unique ID Management

Initializing Counters for Unique IDs

Creating ID Map for Tracks

5.video Processing Setup

Loading Video File

Setting Up Video Writer for Output

6.Main Processing Loop

Reading Frames from Video

YOLOv8 Object Detection

Extracting Bounding Boxes and Features
Extracting Appearance Features using Re-identification Model

Updating Tracker with Detections and Features

Drawing Bounding Boxes and IDs on Frames

Assigning Unique IDs to Tracks
Labeling Objects in Frames
Writing Processed Frame to Output Video

Displaying Frame in Notebook

7.Cleanup

Releasing Video Capture and Writer

Closing OpenCV Windows

In [1]:
!pip install ultralytics

Collecting ultralytics
  Downloading ultralytics-8.2.87-py3-none-any.whl.metadata (41 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/41.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.8/41.8 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
Collecting ultralytics-thop>=2.0.0 (from ultralytics)
  Downloading ultralytics_thop-2.0.6-py3-none-any.whl.metadata (9.1 kB)
Downloading ultralytics-8.2.87-py3-none-any.whl (872 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m872.1/872.1 kB[0m [31m20.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading ultralytics_thop-2.0.6-py3-none-any.whl (26 kB)
Installing collected packages: ultralytics-thop, ultralytics
Successfully installed ultralytics-8.2.87 ultralytics-thop-2.0.6


In [2]:
!pip install --upgrade ultralytics



In [3]:
!pip install deep-sort-realtime

Collecting deep-sort-realtime
  Downloading deep_sort_realtime-1.3.2-py3-none-any.whl.metadata (12 kB)
Downloading deep_sort_realtime-1.3.2-py3-none-any.whl (8.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.4/8.4 MB[0m [31m71.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: deep-sort-realtime
Successfully installed deep-sort-realtime-1.3.2


In [4]:
!pip install torchreid

Collecting torchreid
  Downloading torchreid-0.2.5.tar.gz (92 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/92.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.7/92.7 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: torchreid
  Building wheel for torchreid (setup.py) ... [?25l[?25hdone
  Created wheel for torchreid: filename=torchreid-0.2.5-py3-none-any.whl size=144325 sha256=7ef32b44512e5646959fd05eb6d3d24c850bb7d2c57dd5127149d11e7ace0b5e
  Stored in directory: /root/.cache/pip/wheels/bb/2d/36/816a48465cefd3e58be0317648a4c52ce39ae817f935212099
Successfully built torchreid
Installing collected packages: torchreid
Successfully installed torchreid-0.2.5


In [5]:
!pip install opencv-python



In [None]:
import cv2
from deep_sort_realtime.deepsort_tracker import DeepSort
from ultralytics import YOLO
import torch
import torchreid
import numpy as np

# Load the YOLOv8 model
model = YOLO('/content/drive/MyDrive/terapistchildbestwright.pt')

# Load the re-identification model (OSNet)
reid_model = torchreid.models.build_model(name='osnet_x0_25', num_classes=1000, pretrained=True)
reid_model.eval()

# Move the re-identification model to GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
reid_model.to(device)

# Initialize DeepSORT Tracker
tracker = DeepSort(max_age=30, nn_budget=70, max_iou_distance=0.7, n_init=3)

# Unique ID management for children and therapists
child_unique_id = 1
therapist_unique_id = 1
id_map = {}  # Maps track_id to (unique_id, class)

# Load the video
cap = cv2.VideoCapture('/content/drive/MyDrive/autism2.mp4')

# Get video writer to save the output video
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('output_video.mp4', fourcc, 20.0, (int(cap.get(3)), int(cap.get(4))))

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # YOLOv8 detection
    results = model(frame)
    detections = results[0].boxes

    # Convert detections to DeepSORT format
    boxes = []
    features = []
    for box in detections:
        x1, y1, x2, y2 = box.xyxy[0].cpu().numpy().astype(int)
        width = x2 - x1
        height = y2 - y1
        conf = box.conf[0]
        cls = int(box.cls[0])
        if cls in [0, 1]:  # 0 = Child, 1 = Therapist in your trained model
            # Correct format for DeepSORT: [[x1, y1, width, height], confidence]
            boxes.append([[x1, y1, width, height], conf.item()])

            # Extract appearance embeddings using the re-identification model
            crop = frame[y1:y2, x1:x2]
            crop = cv2.resize(crop, (128, 256))  # Resize as per the model's input size
            crop = torch.tensor(crop).permute(2, 0, 1).float().unsqueeze(0).to(device) / 255.0  # Prepare the image tensor
            with torch.no_grad():
                feature = reid_model(crop).cpu().numpy().flatten()  # Flatten to ensure proper shape
            features.append(feature)

    # Ensure boxes and features align
    assert len(boxes) == len(features), "Mismatch between number of boxes and features."

    # Update tracker with frame and appearance embeddings
    tracks = tracker.update_tracks(raw_detections=boxes, embeds=features, frame=frame)

    # Draw bounding boxes and IDs
    for track in tracks:
        if track.is_confirmed() and track.time_since_update <= 1:
            box = track.to_tlbr()  # Get bounding box in format [x1, y1, x2, y2]
            track_id = track.track_id  # Get track ID

            # Initialize unique_id for each track
            unique_id = None
            current_class = None  # Initialize current_class before use

            if track_id not in id_map:
                # Assign a new unique ID based on class type
                for detection in boxes:
                    detection_box = detection[0]
                    if (detection_box[0] <= box[0] <= detection_box[0] + detection_box[2] and
                        detection_box[1] <= box[1] <= detection_box[1] + detection_box[3] and
                        detection_box[0] <= box[2] <= detection_box[0] + detection_box[2] and
                        detection_box[1] <= box[3] <= detection_box[1] + detection_box[3]):
                        current_class = cls
                        break

                # Assign unique IDs based on the detected class
                if current_class == 0:  # Child
                    unique_id = child_unique_id
                    id_map[track_id] = (child_unique_id, current_class)
                    child_unique_id += 1
                elif current_class == 1:  # Therapist
                    unique_id = therapist_unique_id
                    id_map[track_id] = (therapist_unique_id, current_class)
                    therapist_unique_id += 1
            else:
                unique_id, current_class = id_map[track_id]

            # Ensure unique_id is defined before using it
            if unique_id is not None and current_class is not None:
                # Draw bounding boxes and IDs
                label = f"ID {unique_id} {'Child' if current_class == 0 else 'Therapist'}"
                cv2.rectangle(frame, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), (0, 255, 0), 2)
                cv2.putText(frame, label, (int(box[0]), int(box[1])-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

    # Write the frame to the output video
    out.write(frame)

# Release the video capture and writer
cap.release()
out.release()
cv2.destroyAllWindows()


Successfully loaded imagenet pretrained weights from "/root/.cache/torch/checkpoints/osnet_x0_25_imagenet.pth"

0: 384x640 2 Childs, 39.8ms
Speed: 3.1ms preprocess, 39.8ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 Childs, 31.7ms
Speed: 3.0ms preprocess, 31.7ms inference, 1.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 Childs, 31.8ms
Speed: 3.3ms preprocess, 31.8ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 Childs, 31.8ms
Speed: 3.3ms preprocess, 31.8ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 Childs, 31.8ms
Speed: 3.4ms preprocess, 31.8ms inference, 1.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 Childs, 31.7ms
Speed: 3.2ms preprocess, 31.7ms inference, 1.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 Childs, 31.8ms
Speed: 3.2ms preprocess, 31.8ms inference, 1.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384

Strategies for Improvement

1.Increase the Number of Training Epochs:

Training the model for up to 100 epochs can help the model better learn the distinguishing features of children and therapists, leading to improved detection accuracy and more reliable tracking.
2.Enhance Annotation Quality:

Improving the quality and consistency of annotations will help the model learn more effectively, leading to better performance.
3.Use Segmentation Instead of Detection:

Adopting segmentation techniques can provide more precise class differentiation. Segmentation captures detailed object boundaries, which can enhance the model’s ability to distinguish between closely related classes, such as children and therapists.

6.Troubleshooting and Considerations

Common Issues

Misidentification: Occurs due to low model accuracy.
Tracking Inconsistencies: May result from misidentifications or overlapping detections.
Suggested Fixes

Increase Training Epochs: As mentioned above, extending the training period can improve accuracy.

Improve Annotation Quality: Ensure high-quality and consistent labeling of training data.

Consider Segmentation: For more precise object class differentiation.