# Autoinjector Pose Estimation Demo

This notebook demonstrates real-time pose estimation for autoinjectors using a custom-trained YOLO pose model.

## Overview

The demo pipeline performs the following tasks:
1. **Object Detection**: Detects autoinjectors in the frame (capped/uncapped classes)
2. **Pose Estimation**: Estimates keypoint locations (front tip and back base)
3. **Temporal Smoothing**: Applies EMA filtering for stable keypoint tracking
4. **Visualization**: Renders bounding boxes, keypoints, and orientation information

## Key Features

### Advanced Technical Features
- **EMA Smoothing**: Exponential Moving Average filter reduces keypoint jitter for stable tracking
- **Model Warmup**: Eliminates first-frame inference latency
- **Half-Precision Inference**: FP16 automatically enabled for CUDA devices (2x speedup)
- **FPS Throttling**: Adaptive frame rate control for consistent performance
- **Orientation Calculation**: Real-time angle and length computation from keypoints

### Production-Ready Components
- Robust error handling for camera disconnections
- Graceful degradation when detections are unavailable
- Memory-efficient processing with contiguous array checks
- OS-specific camera backend optimizations

## Technical Architecture

### Class Schema
- **Classes**: 
  - `0` = capped (autoinjector with cap)
  - `1` = uncapped (autoinjector without cap)
- **Keypoints**:
  - `0` = front tip of autoinjector
  - `1` = back base of autoinjector

### Performance Optimizations
1. **Model Warmup**: Pre-initializes CUDA kernels and memory allocation
2. **Half-Precision (FP16)**: Automatically enabled for NVIDIA GPUs (requires CUDA compute capability ≥ 7.0)
3. **Stream Buffering**: Optional optimization for video file processing
4. **Memory Contiguity**: Ensures optimal array layout for OpenCV/PyTorch operations

## Usage Instructions

1. **Configure Model Path**: Update `model_path` to point to your trained model weights
2. **Select Device**: Choose appropriate device based on your hardware:
   - NVIDIA GPU: `"cuda"` or `"cuda:0"` (fastest, enables FP16)
   - Apple Silicon: `"mps"` (note: known pose model issues, use `"cpu"` if problems occur)
   - CPU: `"cpu"` or `None` (slowest but most compatible)
3. **Adjust Parameters**: Tune confidence/IoU thresholds based on your use case
4. **Run Demo**: Execute the cells to start real-time pose estimation

## Known Issues

- **Apple MPS Warning**: YOLO pose models have known issues with MPS backend. If you encounter errors, use `device="cpu"` instead.
  - Reference: https://github.com/ultralytics/ultralytics/issues/4031

## Performance Tips

- Lower `max_fps` if experiencing performance issues (reduces CPU/GPU usage)
- Use OS-specific camera backends for better stability (see code comments)
- Adjust `conf` threshold based on detection requirements (lower = more detections but more false positives)


In [34]:
"""
Autoinjector Pose Estimation Demo Notebook

This notebook demonstrates real-time pose estimation for autoinjectors using YOLO pose models.
The implementation includes advanced features such as:
- Exponential Moving Average (EMA) smoothing for stable keypoint tracking
- Model warmup to eliminate first-frame inference delays
- FPS throttling for consistent performance
- Orientation calculation and visualization
- Robust error handling for production use

Key Technical Decisions:
1. EMA smoothing (alpha=0.4) reduces jitter in keypoint detection for stable visualization
2. Half-precision (FP16) inference enabled automatically for CUDA devices (performance boost)
3. Stream buffer optimization for video processing pipelines
4. Adaptive FPS throttling ensures consistent frame timing regardless of inference speed
"""

# Standard library imports
import time

# Third-party imports
import cv2
import numpy as np
from ultralytics import YOLO


class AutoInjectorPoseDemo:
    """
    Production-ready YOLO pose estimation demo for autoinjector detection and tracking.
    
    This class provides an optimized pipeline for real-time pose estimation with advanced
    features including temporal smoothing, model warmup, and adaptive frame rate control.
    Designed for notebook environments with inline visualization capabilities.
    
    Class Schema:
        - Classes: 0=capped, 1=uncapped (auto-injector state detection)
        - Keypoints: 0=front tip, 1=back base (pose estimation)
    
    Key Features:
        - EMA-based keypoint smoothing for stable tracking
        - Model warmup to eliminate first-frame latency
        - Half-precision inference for CUDA devices (2x speedup)
        - Real-time orientation calculation (angle and length)
        - Robust error handling and graceful degradation
    
    Attributes:
        KP_FRONT (int): Keypoint index for the front (tip) of the autoinjector.
        KP_BACK (int): Keypoint index for the back (base) of the autoinjector.
        model (YOLO): Loaded YOLO pose estimation model.
        conf (float): Confidence threshold for detections (0.0-1.0).
        iou (float): Intersection over Union threshold for NMS (0.0-1.0).
        imgsz (int): Input image size for model inference (typically 640 for YOLO).
        stream_buffer (bool): Enable stream buffering for video processing optimization.
        half (bool): Use FP16 half-precision inference (CUDA only, requires TensorRT/CUDA support).
        _kf (dict): Internal state for Exponential Moving Average filtering of keypoints.
    """

    # Keypoint indices as defined in the model's training schema
    KP_FRONT = 0  # Front tip of the autoinjector
    KP_BACK = 1   # Back base of the autoinjector

    def __init__(self, model_path, conf=0.25, iou=0.5, device=None, imgsz=640, stream_buffer=False):
        """
        Initialize the pose estimation demo with optimized configuration.
        
        Args:
            model_path (str): Path to trained YOLO pose model weights (.pt file).
            conf (float, optional): Confidence threshold for detections. Lower values
                increase recall but may introduce false positives. Defaults to 0.25.
            iou (float, optional): IoU threshold for Non-Maximum Suppression.
                Higher values allow more overlapping detections. Defaults to 0.5.
            device (str, optional): Computation device. Options:
                - "cuda" or "cuda:0" for NVIDIA GPUs (enables FP16 automatically)
                - "mps" for Apple Silicon Macs (note: MPS has known pose model issues)
                - "cpu" or None for CPU inference. Defaults to None.
            imgsz (int, optional): Input image size. YOLO models are typically trained
                at 640x640. Larger sizes improve accuracy but reduce speed. Defaults to 640.
            stream_buffer (bool, optional): Enable stream buffering for video processing.
                Useful for consistent FPS in video files. Defaults to False.
        
        Note:
            - Model warmup is performed automatically to eliminate first-frame latency
            - Half-precision (FP16) is automatically enabled for CUDA devices
            - EMA filter state is initialized for keypoint smoothing
        """
        # Load YOLO pose model - task="pose" enables keypoint estimation
        self.model = YOLO(model_path, task="pose")
        
        # Override device if specified (allows forcing CPU/GPU/MPS)
        if device:
            self.model.overrides.update({"device": device})
        
        # Store inference configuration parameters
        self.conf = conf
        self.iou = iou
        self.imgsz = imgsz
        self.stream_buffer = stream_buffer
        
        # Enable half-precision inference for CUDA devices (2x speedup)
        # FP16 requires CUDA compute capability >= 7.0 and proper CUDA/PyTorch setup
        self.half = bool(device and device.startswith("cuda"))
        
        # Initialize EMA filter state for keypoint smoothing
        # EMA reduces temporal jitter in keypoint positions for stable visualization
        self._kf = {"f": None, "b": None}  # Front and back keypoint EMA state
        
        print("Loaded model with classes:", self.model.model.names)

        # Model warmup: Run inference on dummy image to eliminate first-frame latency
        # This is critical for production systems where first-frame delay is noticeable
        # The warmup initializes CUDA kernels, allocates memory, and compiles graph operations
        dummy = np.zeros((self.imgsz, self.imgsz, 3), dtype=np.uint8)
        _ = self.model.predict(
            dummy,
            conf=self.conf,
            iou=self.iou,
            imgsz=self.imgsz,
            half=self.half,
            stream_buffer=self.stream_buffer,
            verbose=False
        )

    @staticmethod
    def _put_label(frame, text, org):
        """
        Render text label with black background for optimal readability.
        
        This utility ensures text is visible regardless of underlying image content,
        which is critical for computer vision visualization overlays. Uses anti-aliased
        text rendering for professional appearance.
        
        Args:
            frame (np.ndarray): BGR image array to draw on (modified in-place).
            text (str): Text string to display.
            org (tuple): (x, y) coordinates for bottom-left corner of text baseline.
        
        Note:
            Black background rectangle with white text provides high contrast.
            Uses OpenCV LINE_AA for smooth, anti-aliased text rendering.
        """
        x, y = org
        
        # Calculate text dimensions to size background rectangle appropriately
        (tw, th), baseline = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 2)
        
        # Draw solid black background rectangle for text readability
        cv2.rectangle(frame, (x - 2, y - th - 4), (x + tw + 2, y + baseline), (0, 0, 0), -1)
        
        # Draw white text with anti-aliasing (LINE_AA = Line Anti-Aliased)
        cv2.putText(frame, text, (x, y), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 2, cv2.LINE_AA)

    @staticmethod
    def _ema(prev, cur, alpha=0.4):
        """
        Exponential Moving Average (EMA) filter for temporal smoothing.
        
        EMA is a low-pass filter that reduces high-frequency noise while maintaining
        responsiveness to actual movements. The alpha parameter controls the balance
        between smoothing (low alpha) and responsiveness (high alpha).
        
        Formula: smoothed = alpha * current + (1 - alpha) * previous
        
        Args:
            prev (np.ndarray or None): Previous smoothed value. None for first frame.
            cur (np.ndarray): Current raw value to filter.
            alpha (float, optional): Smoothing factor (0.0-1.0). 
                - Lower values (0.1-0.3): More smoothing, slower response
                - Higher values (0.5-0.9): Less smoothing, faster response
                Defaults to 0.4 (balanced).
        
        Returns:
            np.ndarray: Smoothed value. Returns cur unchanged if prev is None.
        
        Note:
            Alpha=0.4 provides good balance for keypoint smoothing (40% new value,
            60% previous value). This reduces jitter from detection noise while
            maintaining reasonable tracking responsiveness.
        """
        # First frame: return current value as-is (no smoothing possible)
        if prev is None:
            return cur
        
        # Apply EMA filter: weighted combination of previous and current values
        return alpha * cur + (1 - alpha) * prev

    def _draw_pose(self, frame, kp: np.ndarray):
        """
        Visualize keypoint pose with temporal smoothing and orientation calculation.
        
        This method renders the autoinjector pose by:
        1. Applying EMA smoothing to reduce keypoint jitter
        2. Drawing connecting line between keypoints (if both visible)
        3. Rendering keypoint markers with labels
        4. Calculating and displaying orientation (angle and length)
        
        Args:
            frame (np.ndarray): BGR image array to draw on (modified in-place).
            kp (np.ndarray): Keypoint array of shape (N, 2) or (N, 3) where:
                - Shape (N, 2): [x, y] coordinates (confidence assumed 1.0)
                - Shape (N, 3): [x, y, confidence] coordinates
                - N >= 2 required (needs at least front and back keypoints)
        
        Note:
            - Keypoint visibility determined by confidence > 0
            - EMA smoothing maintains temporal consistency across frames
            - Orientation angle: 0° = horizontal right, 90° = vertical down
            - Color scheme: Green line, Red front, Blue back
        """
        # Validate input: ensure keypoints are present and sufficient
        if kp is None or kp.size == 0 or kp.shape[0] < 2:
            return
        
        # Handle 2D keypoints (x, y only) by adding confidence channel
        # If confidence not provided, assume all keypoints are visible (conf=1.0)
        if kp.shape[1] == 2:
            kp = np.concatenate([kp, np.ones((kp.shape[0], 1), dtype=kp.dtype)], axis=1)

        # Extract front and back keypoint coordinates and confidence scores
        xf, yf, cf = kp[self.KP_FRONT]  # Front keypoint: (x, y, confidence)
        xb, yb, cb = kp[self.KP_BACK]   # Back keypoint: (x, y, confidence)

        # Apply Exponential Moving Average smoothing to reduce temporal jitter
        # This creates smooth, stable keypoint tracking even with noisy detections
        f_s = self._ema(self._kf["f"], np.array([xf, yf]), 0.4)
        b_s = self._ema(self._kf["b"], np.array([xb, yb]), 0.4)
        
        # Update EMA filter state for next frame
        self._kf["f"], self._kf["b"] = f_s, b_s
        
        # Convert smoothed coordinates to integers for pixel-level drawing
        xf, yf = f_s.astype(int)
        xb, yb = b_s.astype(int)

        # Draw connecting line between keypoints if both are visible
        # Green line represents the autoinjector's orientation vector
        if cf > 0 and cb > 0:
            cv2.line(frame, (xb, yb), (xf, yf), (0, 255, 0), 2)
        
        # Draw front keypoint (tip) - Red circle with label
        if cf > 0:
            cv2.circle(frame, (xf, yf), 5, (0, 0, 255), -1)
            cv2.putText(frame, "front", (xf + 6, yf - 6), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)
        
        # Draw back keypoint (base) - Blue circle with label
        if cb > 0:
            cv2.circle(frame, (xb, yb), 5, (255, 0, 0), -1)
            cv2.putText(frame, "back", (xb + 6, yb - 6), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1)

        # Calculate and display orientation information
        # Orientation is computed from back to front (base to tip)
        dx, dy = float(xf - xb), float(yf - yb)
        
        # Calculate angle in degrees: arctan2 handles all quadrants correctly
        # 0° = pointing right (horizontal), 90° = pointing down (vertical)
        angle = np.degrees(np.arctan2(dy, dx))
        
        # Calculate length (Euclidean distance) in pixels
        length = np.hypot(dx, dy)
        
        # Display orientation overlay (angle and length) in top-left corner
        self._put_label(frame, f"{angle:.1f} deg, L={length:.0f}px", (10, 70))

    def _process_frame(self, frame):
        """
        Process a single frame through the pose estimation pipeline.
        
        This is the core inference method that:
        1. Ensures memory contiguity for optimal OpenCV/PyTorch performance
        2. Runs YOLO inference with configured parameters
        3. Extracts and validates detection results (boxes, classes, keypoints)
        4. Visualizes all detections with bounding boxes, labels, and pose
        5. Handles edge cases gracefully with informative overlays
        
        Args:
            frame (np.ndarray): Input BGR image frame (H, W, 3).
        
        Returns:
            np.ndarray: Annotated frame with bounding boxes, labels, keypoints, and
                statistics drawn. Original frame returned if inference fails.
        
        Note:
            - Memory contiguity check (ascontiguousarray) is critical for performance
            - Early returns handle missing detections gracefully
            - Statistics overlay shows detection and keypoint counts for debugging
            - All tensor operations moved to CPU and converted to NumPy for OpenCV
        """
        # Ensure memory contiguity for optimal performance
        # Non-contiguous arrays can cause significant slowdowns in OpenCV/PyTorch operations
        frame = np.ascontiguousarray(frame)
        
        # Run YOLO inference with configured parameters
        # verbose=False suppresses YOLO's default progress logging
        results = self.model.predict(
            frame,
            conf=self.conf,
            iou=self.iou,
            imgsz=self.imgsz,
            half=self.half,
            stream_buffer=self.stream_buffer,
            verbose=False
        )
        
        # Early return if inference produced no results
        if not results:
            self._put_label(frame, "no results", (10, 50))
            return frame

        # Extract first (and typically only) result from batch
        res = results[0]
        
        # Validate that detections exist (boxes are present)
        if res.boxes is None or res.boxes.xyxy is None:
            self._put_label(frame, "no detections", (10, 50))
            return frame

        # Extract detection data from PyTorch tensors to NumPy arrays
        # All operations moved to CPU for OpenCV compatibility
        boxes_np = res.boxes.xyxy.cpu().numpy()  # Bounding boxes: (N, 4) in xyxy format
        
        # Extract class IDs (integer indices mapping to class names)
        clses_np = (
            res.boxes.cls.cpu().numpy().astype(int)
            if res.boxes.cls is not None
            else np.array([], dtype=int)
        )
        
        # Extract confidence scores for each detection
        confs_np = (
            res.boxes.conf.cpu().numpy()
            if res.boxes.conf is not None
            else np.array([])
        )

        # Extract keypoints if available (pose estimation results)
        # Check multiple conditions to handle edge cases gracefully
        kpts_np = None
        if (
            hasattr(res, "keypoints")
            and res.keypoints is not None
            and res.keypoints.data is not None
            and res.keypoints.data.numel() > 0  # Ensure tensor is non-empty
        ):
            kpts_np = res.keypoints.data.cpu().numpy()
            # Display detection and keypoint statistics for debugging/monitoring
            self._put_label(frame, f"dets {kpts_np.shape[0]}, kpts {kpts_np.shape[1]}", (10, 45))
        else:
            # Show detection count even when keypoints are unavailable
            self._put_label(frame, f"dets {boxes_np.shape[0]}, kpts 0", (10, 45))

        # Visualize each detected object
        for i in range(boxes_np.shape[0]):
            # Extract bounding box coordinates (top-left, bottom-right corners)
            x1, y1, x2, y2 = boxes_np[i].astype(int)
            
            # Draw bounding box (orange color: BGR 0, 200, 255)
            cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 200, 255), 2)

            # Build label with class name and confidence score
            label = "object"  # Default fallback label
            if i < clses_np.shape[0]:
                cid = clses_np[i]
                # Map class ID to human-readable name (e.g., "capped", "uncapped")
                name = self.model.model.names.get(cid, str(cid))
                label = name
                # Append confidence score if available
                if i < confs_np.shape[0]:
                    label = f"{name} {confs_np[i]:.2f}"
            
            # Position label above bounding box with minimum y-offset for edge cases
            self._put_label(frame, label, (x1, max(20, y1 - 8)))

            # Draw keypoint pose visualization if available
            if kpts_np is not None and i < kpts_np.shape[0]:
                self._draw_pose(frame, kpts_np[i])

        return frame

    def run_inline(self, camera_index=0, backend=None, max_fps=24, stop_after_seconds=30):
        """
        Run real-time pose estimation demo with webcam capture and inline visualization.
        
        This method provides a production-ready video capture and processing loop with:
        - Configurable camera backend selection (OS-specific optimizations)
        - Frame rate throttling for consistent performance
        - Adaptive FPS calculation and display
        - Robust error handling and graceful shutdown
        - Automatic timeout and user interrupt handling
        
        Args:
            camera_index (int, optional): Camera device index (typically 0 for default).
                Multiple cameras can be accessed via index 0, 1, 2, etc. Defaults to 0.
            backend (int, optional): OpenCV backend for video capture. Examples:
                - cv2.CAP_AVFOUNDATION (macOS, recommended for better stability)
                - cv2.CAP_DSHOW (Windows)
                - cv2.CAP_V4L2 (Linux)
                - None (uses default backend). Defaults to None.
            max_fps (int, optional): Maximum target frame rate. The system will throttle
                processing to maintain this FPS. Lower values reduce CPU/GPU usage.
                Set to None for unlimited (uses 30 FPS default). Defaults to 24.
            stop_after_seconds (int, optional): Maximum runtime in seconds. Demo will
                automatically stop after this duration. Useful for automated testing.
                Set to a large value for indefinite operation. Defaults to 30.
        
        Note:
            - Camera resolution set to 1280x720 for balance of quality and performance
            - MJPG codec used for efficient video capture
            - FPS throttling ensures consistent performance regardless of inference speed
            - Fail-safe mechanism handles camera disconnection gracefully
            - Press 'q' key or Ctrl+C to stop early
        
        Raises:
            KeyboardInterrupt: Handled gracefully, allows cleanup before exit.
        """
        # Initialize video capture with optional backend specification
        # Backend specification is important for OS-specific camera access optimizations
        cap = (
            cv2.VideoCapture(camera_index)
            if backend is None
            else cv2.VideoCapture(camera_index, backend)
        )
        
        # Validate camera access
        if not cap.isOpened():
            print("Cannot open webcam at index:", camera_index, "| backend:", backend)
            return

        # Configure camera settings for optimal performance
        # MJPG codec provides good compression and is widely supported
        cap.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc(*"MJPG"))
        cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)   # 720p width
        cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)   # 720p height

        # Calculate FPS throttling parameters
        target_fps = max_fps or 30  # Default to 30 FPS if not specified
        wait_time = max(1, int(1000 / target_fps))  # OpenCV waitKey time in milliseconds

        # Create resizable window for flexible viewing
        # WINDOW_NORMAL allows manual resizing by user
        cv2.namedWindow("Autoinjector Pose Demo", cv2.WINDOW_NORMAL)
        cv2.resizeWindow("Autoinjector Pose Demo", 960, 540)  # Initial window size

        # Initialize timing and error tracking variables
        t_start = time.time()  # Start time for timeout calculation
        prev = 0.0  # Previous frame time for FPS calculation
        fail_count, max_fail = 0, 30  # Camera read failure counter (robustness)

        print(f"Running for up to {stop_after_seconds} seconds. Press 'q' to stop early.")
        
        try:
            while True:
                frame_start = time.time()  # Track frame processing start time
                
                # Read frame from camera
                ret, frame = cap.read()
                
                # Handle camera read failures gracefully
                # This prevents crashes from temporary camera disconnections
                if not ret or frame is None:
                    fail_count += 1
                    if fail_count >= max_fail:
                        print("Camera feed lost")
                        break
                    continue
                fail_count = 0  # Reset counter on successful read

                # Process frame through pose estimation pipeline
                frame = self._process_frame(frame)

                # Calculate and display FPS (frames per second)
                # Uses wall-clock time for accurate real-time measurement
                now = time.time()
                fps_disp = 1.0 / (now - prev) if prev > 0 else 0.0
                prev = now
                self._put_label(frame, f"FPS {fps_disp:.1f}", (10, 25))

                # Display annotated frame
                cv2.imshow("Autoinjector Pose Demo", frame)

                # Check for user exit command (q key)
                if cv2.waitKey(wait_time) & 0xFF == ord("q"):
                    print("Stopped by user (pressed 'q').")
                    break

                # Check for timeout
                if time.time() - t_start > stop_after_seconds:
                    print(f"Reached time limit of {stop_after_seconds} seconds.")
                    break

                # FPS throttling: maintain consistent frame rate
                # This ensures predictable performance and prevents excessive CPU/GPU usage
                elapsed = time.time() - frame_start
                budget = 1.0 / target_fps  # Time budget per frame
                if elapsed < budget:
                    # Sleep if we finished processing early to maintain target FPS
                    time.sleep(budget - elapsed)

        except KeyboardInterrupt:
            # Handle Ctrl+C gracefully
            print("Stopped by keyboard interrupt.")
        finally:
            # Cleanup: always release resources regardless of exit condition
            cap.release()
            cv2.destroyWindow("Autoinjector Pose Demo")
            cv2.destroyAllWindows()


In [35]:
"""
Demo Execution Cell

This cell demonstrates how to use the AutoInjectorPoseDemo class for real-time
pose estimation with a webcam. Configure the parameters below based on your setup.
"""

# Configure model path (relative to notebook location)
model_path = "../model_training/training_runs/example_results/weights/last.pt"

# Device selection for inference
# Options:
#   - "cuda" or "cuda:0" for NVIDIA GPUs (enables FP16, fastest)
#   - "mps" for Apple Silicon Macs (note: MPS has known pose model issues)
#   - "cpu" or None for CPU inference (slowest but most compatible)
# 
# Recommendation: Use "cpu" for MPS devices until Ultralytics fixes the pose bug
# See: https://github.com/ultralytics/ultralytics/issues/4031
device_choice = "mps"  # Change to "cpu" if you encounter MPS warnings

# Initialize the pose estimation demo
# Parameters:
#   - conf: Confidence threshold (0.25 = balanced, lower = more detections)
#   - iou: IoU threshold for NMS (0.5 = standard, higher = more overlapping boxes)
#   - device: Computation device (see options above)
#   - imgsz: Input image size (640 = standard YOLO size, larger = slower but more accurate)
#   - stream_buffer: Enable for video files (False for webcam)
demo = AutoInjectorPoseDemo(
    model_path,
    conf=0.25,      # Confidence threshold
    iou=0.5,        # IoU threshold for NMS
    device=device_choice,
    imgsz=640,      # Input image size
    stream_buffer=False  # Disable for webcam (enable for video files)
)

# Run the demo with webcam capture
# Parameters:
#   - camera_index: Camera device index (0 = default, 1 = second camera, etc.)
#   - backend: OS-specific camera backend (None = auto-detect)
#              macOS: cv2.CAP_AVFOUNDATION (recommended for stability)
#              Windows: cv2.CAP_DSHOW
#              Linux: cv2.CAP_V4L2
#   - max_fps: Target frame rate (24 = smooth, lower = less CPU/GPU usage)
#   - stop_after_seconds: Auto-stop after N seconds (30 = demo mode)
#
# Usage tips:
#   - If preview stalls, try specifying the OS-specific backend
#   - Lower max_fps if you experience performance issues
#   - Press 'q' to stop early
demo.run_inline(
    camera_index=0,          # Default camera
    backend=None,            # Auto-detect backend (try cv2.CAP_AVFOUNDATION on macOS if issues)
    max_fps=24,             # Target 24 FPS (smooth real-time performance)
    stop_after_seconds=30   # Run for 30 seconds (increase for longer demos)
)

Loaded model with classes: {0: 'capped', 1: 'uncapped'}
Running for up to 30 seconds. Press 'q' to stop early.
Reached time limit of 30 seconds.
