# Player Tracking: Dataset Preparation

**SC549: Neural Networks - Programming Assignment 03**

In this notebook, we'll:
1. Download sports videos from YouTube
2. Extract frames from videos
3. Organize our dataset
4. Perform basic video analysis

---

## üéØ Learning Objectives
- Understand video data structure (frames, FPS, resolution)
- Learn to process videos with OpenCV
- Prepare data for computer vision models

## 1. Import Libraries

In [None]:
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
from tqdm import tqdm
import pandas as pd

# For downloading videos (optional)
try:
    import yt_dlp
    YOUTUBE_AVAILABLE = True
except ImportError:
    print("yt-dlp not installed. You'll need to download videos manually.")
    YOUTUBE_AVAILABLE = False

print("‚úÖ Libraries imported successfully!")

## 2. Setup Paths

In [None]:
# Project directory structure
PROJECT_ROOT = Path('../')
DATA_DIR = PROJECT_ROOT / 'data'
VIDEOS_DIR = DATA_DIR / 'videos'
FRAMES_DIR = DATA_DIR / 'frames'
OUTPUTS_DIR = PROJECT_ROOT / 'outputs'

# Create directories if they don't exist
VIDEOS_DIR.mkdir(parents=True, exist_ok=True)
FRAMES_DIR.mkdir(parents=True, exist_ok=True)
OUTPUTS_DIR.mkdir(parents=True, exist_ok=True)

print(f"üìÅ Videos directory: {VIDEOS_DIR}")
print(f"üìÅ Frames directory: {FRAMES_DIR}")
print(f"üìÅ Outputs directory: {OUTPUTS_DIR}")

## 3. Download Videos from YouTube (Optional)

**Note**: You can skip this section and manually download videos to `data/videos/`

### Recommended Video Sources:
- Search for: "football highlights 10 seconds"
- Search for: "football match short clip"
- Search for: "rugby tackle slow motion"
- Search for: "basketball dunk replay"

### Manual Download Instructions:
1. Go to YouTube
2. Find short sports clips (5-10 seconds)
3. Use online tools like `y2mate.com` or `savefrom.net`
4. Save as MP4 to `data/videos/` folder
5. Rename files: `football_1.mp4`, `football_1.mp4`, etc.

In [None]:
def download_youtube_video(url, output_path, video_name):
    """
    Download a video from YouTube.
    
    Args:
        url (str): YouTube video URL
        output_path (Path): Directory to save video
        video_name (str): Name for the output file (without extension)
    """
    if not YOUTUBE_AVAILABLE:
        print("‚ùå yt-dlp not installed. Install with: pip install yt-dlp")
        return False
    
    ydl_opts = {
        'format': 'best[ext=mp4]',  # Download best quality MP4
        'outtmpl': str(output_path / f'{video_name}.%(ext)s'),
        'quiet': False,
    }
    
    try:
        with yt_dlp.YoutubeDL(ydl_opts) as ydl:
            ydl.download([url])
        print(f"‚úÖ Downloaded: {video_name}")
        return True
    except Exception as e:
        print(f"‚ùå Error downloading {video_name}: {str(e)}")
        return False

# Example: Uncomment and add your video URLs
# video_urls = [
#     ("https://youtube.com/watch?v=...", "football_1"),
#     ("https://youtube.com/watch?v=...", "football_1"),
# ]

# for url, name in video_urls:
#     download_youtube_video(url, VIDEOS_DIR, name)

print("üí° TIP: Add your YouTube URLs above or download videos manually!")

## 4. Analyze Videos

Let's check what videos we have and their properties.

In [None]:
def get_video_info(video_path):
    """
    Extract information from a video file.
    
    Args:
        video_path (Path): Path to video file
    
    Returns:
        dict: Video properties (fps, frames, duration, resolution)
    """
    cap = cv2.VideoCapture(str(video_path))
    
    # Get video properties
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    duration = frame_count / fps if fps > 0 else 0
    
    cap.release()
    
    return {
        'filename': video_path.name,
        'fps': fps,
        'frames': frame_count,
        'duration': round(duration, 2),
        'width': width,
        'height': height,
        'resolution': f"{width}x{height}"
    }

# Scan all videos in the directory
video_files = list(VIDEOS_DIR.glob('*.mp4')) + list(VIDEOS_DIR.glob('*.avi'))

if len(video_files) == 0:
    print("‚ö†Ô∏è  No videos found in data/videos/")
    print("üì• Please add 5-10 sports videos (MP4 or AVI format)")
else:
    # Analyze each video
    video_info_list = []
    for video_path in video_files:
        info = get_video_info(video_path)
        video_info_list.append(info)
    
    # Create a pandas DataFrame for nice display
    df_videos = pd.DataFrame(video_info_list)
    print(f"\nüìä Found {len(video_files)} video(s):\n")
    print(df_videos.to_string(index=False))
    
    # Check if videos meet requirements
    print("\n‚úÖ Checking requirements:")
    valid_count = 0
    for info in video_info_list:
        if 5 <= info['duration'] <= 10:
            valid_count += 1
            print(f"  ‚úì {info['filename']}: {info['duration']}s (valid)")
        else:
            print(f"  ‚úó {info['filename']}: {info['duration']}s (should be 5-10s)")
    
    print(f"\n{valid_count}/{len(video_files)} videos meet the 5-10 second requirement")
    if valid_count >= 5:
        print("‚úÖ Dataset requirement satisfied!")
    else:
        print(f"‚ö†Ô∏è  Need at least {5 - valid_count} more valid videos")

## 5. Display Sample Frames

Let's visualize the first frame of each video to understand our dataset.

In [None]:
def show_sample_frames(video_files, num_videos=5):
    """
    Display the first frame from each video.
    
    Args:
        video_files (list): List of video file paths
        num_videos (int): Maximum number of videos to display
    """
    num_videos = min(len(video_files), num_videos)
    
    fig, axes = plt.subplots(1, num_videos, figsize=(15, 3))
    if num_videos == 1:
        axes = [axes]
    
    for i, video_path in enumerate(video_files[:num_videos]):
        # Read first frame
        cap = cv2.VideoCapture(str(video_path))
        ret, frame = cap.read()
        cap.release()
        
        if ret:
            # Convert BGR to RGB for matplotlib
            frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            
            # Display
            axes[i].imshow(frame_rgb)
            axes[i].set_title(video_path.name, fontsize=10)
            axes[i].axis('off')
    
    plt.tight_layout()
    plt.savefig(OUTPUTS_DIR / 'screenshots' / 'sample_frames.png', dpi=150, bbox_inches='tight')
    plt.show()
    print(f"üíæ Saved: outputs/screenshots/sample_frames.png")

if len(video_files) > 0:
    # Create screenshots directory
    (OUTPUTS_DIR / 'screenshots').mkdir(parents=True, exist_ok=True)
    show_sample_frames(video_files)
else:
    print("‚ö†Ô∏è  No videos to display")

## 6. Extract Frames from Videos

We'll extract frames to work with individual images.

**Why extract frames?**
- Easier to process individual images
- Can apply detection frame-by-frame
- Helpful for debugging and visualization

In [None]:
def extract_frames(video_path, output_dir, sample_rate=1):
    """
    Extract frames from a video file.
    
    Args:
        video_path (Path): Path to video file
        output_dir (Path): Directory to save frames
        sample_rate (int): Extract every Nth frame (1 = all frames)
    
    Returns:
        int: Number of frames extracted
    """
    # Create output directory for this video
    video_name = video_path.stem  # Filename without extension
    frames_output = output_dir / video_name
    frames_output.mkdir(parents=True, exist_ok=True)
    
    # Open video
    cap = cv2.VideoCapture(str(video_path))
    frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    
    print(f"Processing: {video_path.name}")
    
    extracted = 0
    frame_idx = 0
    
    # Progress bar
    pbar = tqdm(total=frame_count, desc=f"Extracting frames")
    
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        
        # Save every Nth frame
        if frame_idx % sample_rate == 0:
            frame_filename = frames_output / f"frame_{frame_idx:06d}.jpg"
            cv2.imwrite(str(frame_filename), frame)
            extracted += 1
        
        frame_idx += 1
        pbar.update(1)
    
    cap.release()
    pbar.close()
    
    print(f"  ‚úÖ Extracted {extracted} frames to {frames_output}")
    return extracted

# Extract frames from all videos
if len(video_files) > 0:
    print("üé¨ Extracting frames from all videos...\n")
    
    total_frames = 0
    for video_path in video_files:
        num_frames = extract_frames(video_path, FRAMES_DIR, sample_rate=1)
        total_frames += num_frames
        print()
    
    print(f"\n‚úÖ Total frames extracted: {total_frames}")
    print(f"üìÅ Frames saved in: {FRAMES_DIR}")
else:
    print("‚ö†Ô∏è  No videos to process")

## 7. Dataset Summary

In [None]:
# Count total frames extracted
total_frames = sum(len(list(d.glob('*.jpg'))) for d in FRAMES_DIR.iterdir() if d.is_dir())

print("üìä Dataset Summary")
print("=" * 50)
print(f"Total videos: {len(video_files)}")
print(f"Total frames: {total_frames}")
print(f"Frames location: {FRAMES_DIR}")
print(f"Outputs location: {OUTPUTS_DIR}")
print("=" * 50)

if len(video_files) >= 5:
    print("\n‚úÖ Dataset ready for model training!")
    print("\nüìå Next step: Open notebook 02_player_detection.ipynb")
else:
    print(f"\n‚ö†Ô∏è  Need {5 - len(video_files)} more videos to meet requirements")
    print("Please add more sports videos to data/videos/")

## üéì Key Takeaways

1. **Video Structure**: Videos are sequences of frames (images) displayed at a specific FPS
2. **Frame Extraction**: Converting videos to images makes processing easier
3. **Dataset Organization**: Proper structure helps with reproducibility
4. **Metadata**: Always check video properties (resolution, FPS, duration)

---

## ‚úÖ Checklist
- [ ] Downloaded/collected 5-10 sports videos
- [ ] Each video is 5-10 seconds long
- [ ] Videos are in MP4 or AVI format
- [ ] Frames extracted successfully
- [ ] Dataset organized in proper directories

---

**Next**: Notebook 02 - Player Detection with YOLOv8