#Object-Based Key Frame Extraction in Videos
*    Install PyTorch for deep learning.

*    Install OpenCV for computer vision tasks.

*    Install NumPy for numerical computing.

In [None]:
!pip install torch
!pip install opencv-python
!pip install numpy

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5

*    Loads PyTorch for deep learning and tensor operations.

*    Imports OpenCV for image/video processing and computer vision.

*    Loads NumPy for numerical computing with array support.

*    Enables garbage collection to manage memory usage.

*    Provides time-related functions for measuring execution speed.

In [None]:
import torch
import cv2
import numpy as np
import gc
import time

### 1. Object Detection (YOLOv5)

This code performs real-time object detection using YOLOv5, a lightweight deep learning model. Below is a breakdown of its functionality:

***Key Steps:***

**Model Loading:**

*  Loads the YOLOv5n (nano) model from Torch Hub, pretrained on the COCO dataset.

*  Optimized for low-RAM environments while maintaining decent accuracy.

**GPU Acceleration:**

*  Moves the model to GPU (model.cuda()) for faster inference.

**Inference Setup:**

*   Switches to evaluation mode (model.eval()) to disable dropout/batch norm layers.

*   Uses gradient-free inference (torch.no_grad()) to save memory during detection.

**Detection Function:**

*   Processes input frames (detect_objects(frames)) and returns detected objects with bounding boxes.

In [None]:
model = torch.hub.load('ultralytics/yolov5', 'yolov5n', pretrained=True)
model.cuda()
model.eval()

def detect_objects(frames):
    with torch.no_grad():
        results = model(frames)
    return results

Downloading: "https://github.com/ultralytics/yolov5/zipball/master" to /root/.cache/torch/hub/master.zip


Creating new Ultralytics Settings v0.0.6 file ✅ 
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.


YOLOv5 🚀 2025-4-21 Python-3.11.12 torch-2.6.0+cu124 CUDA:0 (Tesla T4, 15095MiB)

Downloading https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5n.pt to yolov5n.pt...
100%|██████████| 3.87M/3.87M [00:00<00:00, 248MB/s]

Fusing layers... 
YOLOv5n summary: 213 layers, 1867405 parameters, 0 gradients, 4.5 GFLOPs
Adding AutoShape... 


###Kalman Filter for Object Tracking
This code implements a Kalman Filter, a recursive algorithm used to estimate the state of a dynamic system (e.g., tracking object positions in 2D space).

***Key Components:***

**Initialization (__init__)**

*  State Transition Matrix (F): Models object motion (position + velocity).

*  Control Matrix (B): Adjusts state based on acceleration inputs (u_x, u_y).

*  Measurement Matrix (H): Maps true state to observed measurements.

*  Process Noise (Q): Uncertainty in motion model (scaled by std_acc).

*  Measurement Noise (R): Sensor noise (scaled by std_meas).

*  Covariance (P): Tracks estimation confidence.

**Prediction Step (predict):**

*  Updates state (x) and covariance (P) using motion dynamics and control input (u).

**Update Step (update):**

*  Corrects predictions with new measurements (z) using the Kalman Gain (K).



In [None]:
class KalmanFilter:
    def __init__(self, dt, u_x, u_y, std_acc, std_meas):
        self.F = np.array([[1, dt, 0, 0],
                           [0, 1, 0, 0],
                           [0, 0, 1, dt],
                           [0, 0, 0, 1]])

        self.B = np.array([[0.5 * dt**2, 0],
                           [dt, 0],
                           [0, 0.5 * dt**2],
                           [0, dt]])

        self.H = np.array([[1, 0, 0, 0],
                           [0, 0, 1, 0]])

        self.Q = np.array([[(dt**4)/4, (dt**3)/2, 0, 0],
                           [(dt**3)/2, dt**2, 0, 0],
                           [0, 0, (dt**4)/4, (dt**3)/2],
                           [0, 0, (dt**3)/2, dt**2]]) * std_acc**2

        self.R = np.eye(2) * std_meas**2
        self.P = np.eye(4)
        self.x = np.zeros((4, 1))

    def predict(self, u):
        self.x = np.dot(self.F, self.x) + np.dot(self.B, u)
        self.P = np.dot(self.F, np.dot(self.P, self.F.T)) + self.Q
        return self.x

    def update(self, z):
        y = z - np.dot(self.H, self.x)
        S = np.dot(self.H, np.dot(self.P, self.H.T)) + self.R
        K = np.dot(self.P, np.dot(self.H.T, np.linalg.inv(S)))

        self.x = self.x + np.dot(K, y)
        self.P = (np.eye(len(self.x)) - np.dot(K, self.H)) * self.P


###Saliency Score Computation (Simplified Itti-Koch Model):

This function calculates a saliency score for an image frame to identify visually prominent regions.

***Key Steps:***

**Convert to Grayscale (cv2.COLOR_BGR2GRAY):**

*  Simplifies the image to single-channel intensity values.

*  Compute Intensity Contrast:

*  Measures standard deviation (np.std) of grayscale pixel intensities.

*  Higher values indicate sharper intensity variations (e.g., edges, textures).

**Compute Color Contrast:**

*  Measures standard deviation of the original BGR frame’s pixel values.

*  Captures color diversity (e.g., vivid or contrasting hues).

**Combine Scores:**

*  Saliency Score = intensity_contrast + color_contrast.

*  Higher scores suggest more visually "salient" regions.

In [None]:
def calculate_saliency(frame):
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    intensity_contrast = np.std(gray)
    color_contrast = np.std(frame)
    saliency_score = intensity_contrast + color_contrast
    return saliency_score

###Keyframe Selection Based on Saliency Scores:

This function selects the most visually important frames from a sequence using precomputed saliency scores.

**How It Works:**

Inputs:

*  frames: List of video frames (images)

*  saliency_scores: Corresponding importance scores for each frame

*  threshold: Minimum score required for selection (default=0.5)

**Selection Process:**

*  Iterates through all frames and their scores

*  Selects frames where the saliency score exceeds the threshold

*  Returns the subset of high-importance frames

In [None]:
def select_keyframes(frames, saliency_scores, threshold=0.5):
    keyframes = []
    for i, score in enumerate(saliency_scores):
        if score > threshold:
            keyframes.append(frames[i])
    return keyframes

# Scene Transition Detection (Histogram Comparison)

In [None]:
def detect_scene_transitions(frames, threshold=0.3):
    transitions = []
    for i in range(1, len(frames)):
        hist1 = cv2.calcHist([frames[i-1]], [0], None, [256], [0, 256])
        hist2 = cv2.calcHist([frames[i]], [0], None, [256], [0, 256])
        similarity = cv2.compareHist(hist1, hist2, cv2.HISTCMP_CORREL)
        if similarity < threshold:
            transitions.append(i)
    return transitions

#  Summary Generation

In [None]:
def generate_summary(keyframes, output_path="summary.mp4", fps=30):
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    if len(keyframes) > 0:
        height, width, layers = keyframes[0].shape
        video = cv2.VideoWriter(output_path, fourcc, fps, (width, height))

        for frame in keyframes:
            video.write(frame)

        video.release()
        print(f"Summary video saved to {output_path}")
    else:
        print("No keyframes selected, cannot generate summary.")

In [None]:
def process_video(video_path, frame_resize=(320, 240), frame_sample_rate = 5, batch_size=16):
    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        print("Error opening video file")
        return

    frames = []
    saliency_scores = []
    kf = KalmanFilter(dt=0.1, u_x=1, u_y=1, std_acc=1, std_meas=0.1)
    start_time = time.time()

    frame_count = 0
    while True:
        ret, frame = cap.read()
        if not ret:
            break

        if frame_count % frame_sample_rate == 0: #Sample every N frames
            frame = cv2.resize(frame, frame_resize) # Resize the frame
            frames.append(frame)

        frame_count += 1
    cap.release()
    print(f"Time taken to read and resize frames: {time.time() - start_time:.2f} seconds")

    # Object Detection (Batched with limited batch size)
    start_time = time.time()
    labeled_frames = []
    for i in range(0, len(frames), batch_size):
        batch = frames[i:i + batch_size]
        results = detect_objects(batch)

        # Process results and draw bounding boxes
        for frame_idx, frame in enumerate(batch):
            result = results.pandas().xyxy[frame_idx]  # Get results for this frame
            for _, row in result.iterrows():
                x1, y1, x2, y2, confidence, class_id, class_name = int(row['xmin']), int(row['ymin']), int(row['xmax']), int(row['ymax']), row['confidence'], int(row['class']), row['name']

                if confidence > 0.5:  #Confidence Threshold

                  cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
                # Draw label
                  label = f"{class_name} {confidence:.2f}"
                  cv2.putText(frame, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
            labeled_frames.append(frame)
        del results, batch # Delete results to free memory
        torch.cuda.empty_cache() # Clear CUDA cache
        gc.collect() # Run garbage collection
    print(f"Time taken for object detection: {time.time() - start_time:.2f} seconds")

    # Simplified Saliency Score and Keyframe Selection
    start_time = time.time()
    saliency_scores = [calculate_saliency(frame) for frame in labeled_frames] # Use labeled frames
    keyframes = select_keyframes(labeled_frames, saliency_scores, threshold=np.mean(saliency_scores)) # Use labeled frames
    print(f"Time taken for saliency and keyframe selection: {time.time() - start_time:.2f} seconds")

    # Scene Transition Detection
    start_time = time.time()
    transition_frames = detect_scene_transitions(labeled_frames) # Use labeled frames
    transition_keyframes = [labeled_frames[i] for i in transition_frames] # Use labeled frames
    print(f"Time taken for scene transition detection: {time.time() - start_time:.2f} seconds")

    # Generate Summary
    start_time = time.time()
    generate_summary(keyframes + transition_keyframes, output_path="summary.mp4", fps=30)
    print(f"Time taken to generate summary: {time.time() - start_time:.2f} seconds")

    del frames, saliency_scores, keyframes, transition_frames, transition_keyframes, labeled_frames # Delete all frames
    torch.cuda.empty_cache() # Clear CUDA cache
    gc.collect() # Run garbage collection

In [None]:
# Example usage:
video_path = "cars.mp4"  # Replace with your video path
process_video(video_path, frame_resize=(320, 240), frame_sample_rate=5, batch_size=16) # Example parameters

Time taken to read and resize frames: 9.64 seconds


  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with amp.autocast(autocast):
  with a

Time taken for object detection: 72.67 seconds
Time taken for saliency and keyframe selection: 2.10 seconds
Time taken for scene transition detection: 0.26 seconds
Summary video saved to summary.mp4
Time taken to generate summary: 0.62 seconds
