# YOLO Video Inference

This notebook demonstrates how to use our trained YOLO models to process videos and detect objects. The script will:
1. Process videos using trained YOLO models
2. Save annotated videos showing detections
3. Save detection coordinates to text files

## Setup

First, let's import the required libraries and functions:

In [1]:
from video_inference import process_video, process_all_videos
from pathlib import Path
import os
import cv2
from IPython.display import Video

# Create necessary directories if they don't exist
for dir_path in ['videos/trimmed', 'videos/output']:
    Path(dir_path).mkdir(parents=True, exist_ok=True)

## Configuration

Set up paths for input videos, output directory, and model weights:

In [2]:
# Paths
INPUT_FOLDER = "videos/trimmed"
OUTPUT_FOLDER = "videos/output"
MODEL_PATH = "runs/detect/train7/weights/best.pt"  # Your trained model

# Check available videos
input_videos = list(Path(INPUT_FOLDER).glob("*_trimmed.mp4"))
print(f"Found {len(input_videos)} videos in {INPUT_FOLDER}:")
for video in input_videos:
    print(f"- {video.name}")

Found 17 videos in videos/trimmed:
- 240702_rPi1_5_240702_CTL_rPi1_video_trimmed.mp4
- 240702_rPi2_12_240702_CTL_rPi2_video_trimmed.mp4
- 240702_rPi3_3_240207_CTL_rPi3_video_trimmed.mp4
- 240702_rPi4_1_240702_FLU1_rPi4_video_trimmed.mp4
- 240703_rPi5_13_240207_FLU1_rPi5_video_trimmed.mp4
- 240808_rPi6_43_240807_SFX50_rPi6_video_trimmed.mp4
- 240828_rPi7_44_240827_SFX1_rPi7_video_trimmed.mp4
- 240828_rPi8_14_240827_SFX1_rPi8_video_trimmed.mp4
- 240903_rPi9_43_240902_SFX1TMX1_rPi9_video_trimmed.mp4
- 240925_rPi10_47_240924_FLU50_rPi10_video_trimmed.mp4
- 241004_rPi11_40_241003_FLU10SFX10_rPi11_video_trimmed.mp4
- 241004_rPi16_5_241003_FLU10SFX10_rPi16_video_trimmed.mp4
- 241009_rPi13_6_241009_TBD_rPi13_video_trimmed.mp4
- 241010_rPi12_46_241009_TBD_rPi12_video_trimmed.mp4
- 241010_rPi13_15_241009_TBD_rPi13_video_trimmed.mp4
- 241010_rPi15_12_241009_TBD_rPi15_video_trimmed.mp4
- 241010_rPi17_9_241009_TBD_rPi17_video_trimmed.mp4


## Process Single Video

Let's process a single video to test the detection:

In [3]:
if input_videos:  # Only run if we found videos
    # Process the first video as a test
    test_video = input_videos[0]
    output_name = test_video.stem.replace("_trimmed", "_detected")
    output_video_path = Path(OUTPUT_FOLDER) / f"{output_name}.mp4"
    output_coords_path = Path(OUTPUT_FOLDER) / f"{output_name}.txt"
    
    print(f"Processing video: {test_video.name}")
    process_video(test_video, MODEL_PATH, output_video_path, output_coords_path)
    
    # Display the processed video
    if output_video_path.exists():
        print("\nProcessed video:")
        display(Video(str(output_video_path)))
else:
    print("No videos found to process!")

Processing video: 240702_rPi1_5_240702_CTL_rPi1_video_trimmed.mp4

0: 384x640 4 bees, 1 feeder, 237.2ms
Speed: 9.8ms preprocess, 237.2ms inference, 8.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 4 bees, 1 feeder, 545.1ms
Speed: 9.2ms preprocess, 545.1ms inference, 2.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 4 bees, 1 feeder, 281.2ms
Speed: 4.6ms preprocess, 281.2ms inference, 11.1ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 4 bees, 1 feeder, 363.4ms
Speed: 7.0ms preprocess, 363.4ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 4 bees, 1 feeder, 268.4ms
Speed: 5.8ms preprocess, 268.4ms inference, 0.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 4 bees, 1 feeder, 231.2ms
Speed: 4.7ms preprocess, 231.2ms inference, 0.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 4 bees, 1 feeder, 225.4ms
Speed: 6.6ms preprocess, 225.4ms inference, 1.0ms postprocess per image at shap

## Examine Detection Results

Let's look at the detection coordinates saved during processing:

In [4]:
import pandas as pd

if 'output_coords_path' in locals() and output_coords_path.exists():
    # Read the coordinates file
    coords_df = pd.read_csv(output_coords_path, sep=' ')
    
    print("First few detections:")
    display(coords_df.head())
    
    print(f"\nTotal detections: {len(coords_df)}")
    
    # Basic statistics
    print("\nConfidence score statistics:")
    display(coords_df['confidence'].describe())
else:
    print("No coordinate file found!")

First few detections:


Unnamed: 0,frame_number,class_id,center_x,center_y,width,height,confidence
0,0,0,0.186822,0.428398,0.125749,0.198829,0.893969
1,0,1,0.20302,0.179799,0.15563,0.246552,0.882184
2,0,0,0.378544,0.183905,0.092463,0.154101,0.8579
3,0,0,0.170585,0.285759,0.066319,0.14126,0.774312
4,0,0,0.217195,0.301937,0.064496,0.124534,0.715545



Total detections: 4315

Confidence score statistics:


count    4315.000000
mean        0.851808
std         0.083806
min         0.500139
25%         0.837890
50%         0.880155
75%         0.902679
max         0.956824
Name: confidence, dtype: float64

## Process All Videos

Now let's process all videos in the input folder:

In [None]:
# Process all videos
print(f"Processing all videos in {INPUT_FOLDER}...\n")
process_all_videos(INPUT_FOLDER, OUTPUT_FOLDER, MODEL_PATH)

# List processed videos
processed_videos = list(Path(OUTPUT_FOLDER).glob("*_detected.mp4"))
print(f"\nProcessed {len(processed_videos)} videos:")
for video in processed_videos:
    print(f"- {video.name}")

## Summary Statistics

Let's compile statistics across all processed videos:

In [7]:
# Compile statistics from all coordinate files
coord_files = list(Path(OUTPUT_FOLDER).glob("*_detected.txt"))
all_detections = []

for coord_file in coord_files:
    df = pd.read_csv(coord_file, sep=' ')
    df['video'] = coord_file.stem
    all_detections.append(df)

if all_detections:
    combined_df = pd.concat(all_detections)
    
    # Per-video statistics
    print("Detection statistics by video:")
    video_stats = combined_df.groupby('video').agg({
        'frame_number': 'count',
        'confidence': ['mean', 'std', 'min', 'max']  # Added std
    }).round(3)
    display(video_stats)
    
    # Overall statistics
    print("\nOverall statistics across all videos:")
    overall_stats = {
        'Total Detections': len(combined_df),
        'Mean Confidence': combined_df['confidence'].mean().round(3),
        'Std Confidence': combined_df['confidence'].std().round(3),  # Added std
        'Min Confidence': combined_df['confidence'].min().round(3),
        'Max Confidence': combined_df['confidence'].max().round(3)
    }
    display(pd.Series(overall_stats))
else:
    print("No detection files found!")

Detection statistics by video:


Unnamed: 0_level_0,frame_number,confidence,confidence,confidence,confidence
Unnamed: 0_level_1,count,mean,std,min,max
video,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
240702_rPi1_5_240702_CTL_rPi1_video_detected,4315,0.852,0.084,0.5,0.957
240702_rPi2_12_240702_CTL_rPi2_video_detected,4501,0.873,0.045,0.518,0.947
240702_rPi3_3_240207_CTL_rPi3_video_detected,3758,0.821,0.094,0.501,0.95
240702_rPi4_1_240702_FLU1_rPi4_video_detected,5402,0.892,0.049,0.602,0.967
240703_rPi5_13_240207_FLU1_rPi5_video_detected,5400,0.886,0.033,0.696,0.956
240808_rPi6_43_240807_SFX50_rPi6_video_detected,5239,0.885,0.054,0.501,0.968
240828_rPi7_44_240827_SFX1_rPi7_video_detected,5101,0.841,0.086,0.503,0.949
240828_rPi8_14_240827_SFX1_rPi8_video_detected,6065,0.813,0.139,0.5,0.96
240903_rPi9_43_240902_SFX1TMX1_rPi9_video_detected,4580,0.811,0.092,0.5,0.961
240925_rPi10_47_240924_FLU50_rPi10_video_detected,4710,0.855,0.081,0.502,0.96



Overall statistics across all videos:


Total Detections    80875.000
Mean Confidence         0.855
Std Confidence          0.087
Min Confidence          0.500
Max Confidence          0.968
dtype: float64