
🎥 Recommended Video: [Video Object Detection Using Yolo](https://www.youtube.com/watch?v=Rwvd0PJF2jk)



## **7. Video Analysis**

### **7.1 What is Video Analysis?**
Video analysis involves processing and interpreting video data to extract meaningful information. It builds on image-based computer vision techniques but extends them to handle temporal information across frames. Key tasks in video analysis include:
- **Object Detection**: Identifying objects in each frame.
- **Object Tracking**: Following objects across frames.
- **Activity Recognition**: Understanding actions or events in a video.

---

### **7.2 Key Challenges in Video Analysis**
- **Temporal Consistency**: Ensuring smooth transitions and accurate tracking across frames.
- **Computational Complexity**: Processing video data requires significant computational resources.
- **Variability**: Changes in lighting, perspective, and object appearance over time.

---

### **7.3 Code Example: Reading and Displaying a Video**
Let’s start by reading and displaying a video using OpenCV.

```python
import cv2

# Open a video file
video_path = 'example_video.mp4'
cap = cv2.VideoCapture(video_path)

# Check if the video was opened successfully
if not cap.isOpened():
    print("Error: Could not open video.")
else:
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        # Display the frame
        cv2.imshow('Video', frame)

        # Exit if 'q' is pressed
        if cv2.waitKey(25) & 0xFF == ord('q'):
            break

# Release the video capture object and close windows
cap.release()
cv2.destroyAllWindows()
```

#### **Explanation**:
1. `cv2.VideoCapture()` opens the video file.
2. `cap.read()` reads each frame of the video.
3. `cv2.imshow()` displays the frame in a window.
4. The loop continues until the video ends or the user presses 'q'.

---

### **7.4 Code Example: Object Detection in Videos**
Let’s use a pre-trained YOLO model to detect objects in a video.

```python
from ultralytics import YOLO
import cv2

# Load a pre-trained YOLO model
model = YOLO('yolov8n.pt')

# Open a video file
video_path = 'example_video.mp4'
cap = cv2.VideoCapture(video_path)

# Process each frame
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Perform object detection
    results = model(frame)

    # Visualize the results
    annotated_frame = results[0].plot()

    # Display the annotated frame
    cv2.imshow('Object Detection', annotated_frame)

    # Exit if 'q' is pressed
    if cv2.waitKey(25) & 0xFF == ord('q'):
        break

# Release the video capture object and close windows
cap.release()
cv2.destroyAllWindows()
```

#### **Explanation**:
1. The YOLO model detects objects in each frame.
2. The `results[0].plot()` method visualizes the detected objects with bounding boxes and labels.
3. The annotated frames are displayed in real-time.

---

### **7.5 Code Example: Object Tracking in Videos**
Object tracking involves following objects across frames. Let’s use OpenCV’s `Tracker` API to track an object in a video.

```python
import cv2

# Open a video file
video_path = 'example_video.mp4'
cap = cv2.VideoCapture(video_path)

# Initialize the tracker
tracker = cv2.TrackerKCF_create()

# Read the first frame
ret, frame = cap.read()
if not ret:
    print("Error: Could not read video.")

# Select a bounding box to track
bbox = cv2.selectROI('Tracking', frame, False)
tracker.init(frame, bbox)

# Track the object in subsequent frames
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Update the tracker
    success, bbox = tracker.update(frame)

    # Draw the bounding box
    if success:
        x, y, w, h = [int(v) for v in bbox]
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
    else:
        cv2.putText(frame, "Tracking failed", (100, 80), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 0, 255), 2)

    # Display the frame
    cv2.imshow('Object Tracking', frame)

    # Exit if 'q' is pressed
    if cv2.waitKey(25) & 0xFF == ord('q'):
        break

# Release the video capture object and close windows
cap.release()
cv2.destroyAllWindows()
```

#### **Explanation**:
1. The user selects a bounding box to track in the first frame.
2. The `tracker.update()` method tracks the object in subsequent frames.
3. The bounding box is drawn around the tracked object.

---

### **7.6 Code Example: Activity Recognition in Videos**
Activity recognition involves identifying actions or events in a video. Let’s use a pre-trained model from TensorFlow Hub to recognize activities.

```python
import tensorflow as tf
import tensorflow_hub as hub
import cv2
import numpy as np

# Load a pre-trained activity recognition model
model = hub.load('https://tfhub.dev/deepmind/i3d-kinetics-400/1')

# Open a video file
video_path = 'example_video.mp4'
cap = cv2.VideoCapture(video_path)

# Process each frame
frames = []
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Resize and preprocess the frame
    frame = cv2.resize(frame, (224, 224))
    frame = frame / 255.0
    frames.append(frame)

    # Stop after collecting 16 frames (input size for I3D model)
    if len(frames) == 16:
        break

# Convert frames to a numpy array
frames = np.array(frames)

# Perform activity recognition
inputs = tf.expand_dims(frames, axis=0)
outputs = model(inputs)
predictions = tf.argmax(outputs, axis=-1).numpy()

# Print the predicted activity
print("Predicted Activity:", predictions)

# Release the video capture object
cap.release()
```

#### **Explanation**:
1. The I3D model from TensorFlow Hub is used for activity recognition.
2. The video frames are preprocessed and fed into the model.
3. The model predicts the activity in the video.

---

### **7.7 Code Example: Video Summarization**
Video summarization involves creating a shorter version of a video that captures its key events. Let’s use OpenCV to extract keyframes based on frame differences.

```python
import cv2
import numpy as np

# Open a video file
video_path = 'example_video.mp4'
cap = cv2.VideoCapture(video_path)

# Initialize variables
prev_frame = None
keyframes = []

# Process each frame
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Convert to grayscale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Compare with the previous frame
    if prev_frame is not None:
        diff = cv2.absdiff(gray, prev_frame)
        if diff.mean() > 10:  # Threshold for significant change
            keyframes.append(frame)

    prev_frame = gray

# Save keyframes as a summary video
if keyframes:
    height, width, _ = keyframes[0].shape
    out = cv2.VideoWriter('summary.mp4', cv2.VideoWriter_fourcc(*'mp4v'), 10, (width, height))
    for frame in keyframes:
        out.write(frame)
    out.release()

# Release the video capture object
cap.release()
```

#### **Explanation**:
1. Keyframes are identified by comparing consecutive frames.
2. Frames with significant changes are saved as keyframes.
3. The keyframes are compiled into a summary video.

---

### **7.8 In Conclusion**
- Video analysis extends image-based techniques to handle temporal data.
- Tasks include object detection, tracking, activity recognition, and summarization.
- Libraries like OpenCV and TensorFlow make it easy to implement video analysis pipelines.


In [None]:
! pip install ultralytics -q

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m914.7/914.7 kB[0m [31m28.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m110.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m86.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m56.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
!yolo detect predict model=yolov8m.pt source="/content/Cars.mp4"

Ultralytics 8.3.74 🚀 Python-3.11.11 torch-2.5.1+cu124 CUDA:0 (Tesla T4, 15095MiB)
YOLOv8m summary (fused): 218 layers, 25,886,080 parameters, 0 gradients, 78.9 GFLOPs

video 1/1 (frame 1/3000) /content/Cars.mp4: 384x640 2 persons, 9 cars, 60.0ms
video 1/1 (frame 2/3000) /content/Cars.mp4: 384x640 2 persons, 8 cars, 1 truck, 25.4ms
video 1/1 (frame 3/3000) /content/Cars.mp4: 384x640 2 persons, 8 cars, 1 truck, 25.3ms
video 1/1 (frame 4/3000) /content/Cars.mp4: 384x640 1 person, 8 cars, 1 truck, 25.3ms
video 1/1 (frame 5/3000) /content/Cars.mp4: 384x640 1 person, 9 cars, 25.3ms
video 1/1 (frame 6/3000) /content/Cars.mp4: 384x640 1 person, 9 cars, 1 truck, 25.3ms
video 1/1 (frame 7/3000) /content/Cars.mp4: 384x640 2 persons, 8 cars, 1 truck, 25.3ms
video 1/1 (frame 8/3000) /content/Cars.mp4: 384x640 3 persons, 8 cars, 1 truck, 25.3ms
video 1/1 (frame 9/3000) /content/Cars.mp4: 384x640 1 person, 8 cars, 1 truck, 17.8ms
video 1/1 (frame 10/3000) /content/Cars.mp4: 384x640 1 person, 8 cars, 

In [None]:
!ffmpeg -i {"/content/runs/detect/predict/Cars.avi"} -vcodec libx264 {"final.mp4"}

ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enab

In [None]:
!pip install ultralytics

Collecting ultralytics
  Downloading ultralytics-8.3.74-py3-none-any.whl.metadata (35 kB)
Collecting ultralytics-thop>=2.0.0 (from ultralytics)
  Downloading ultralytics_thop-2.0.14-py3-none-any.whl.metadata (9.4 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.8.0->ultralytics)
  Downloading nv

In [None]:
from ultralytics import YOLO
import cv2
from google.colab.patches import cv2_imshow # Import the cv2_imshow function

# Load a pre-trained YOLO model
model = YOLO('yolov8n.pt')

# Open a video file
video_path = '/content/Cars.mp4'
cap = cv2.VideoCapture(video_path)

# Process each frame
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Perform object detection
    results = model(frame)

    # Visualize the results
    annotated_frame = results[0].plot()

    # Display the annotated frame using cv2_imshow
    cv2_imshow(annotated_frame)

    # Exit if 'q' is pressed
    if cv2.waitKey(25) & 0xFF == ord('q'):
        break

# Release the video capture object and close windows
cap.release()
cv2.destroyAllWindows()

In [None]:
from ultralytics import YOLO
import cv2
from google.colab.patches import cv2_imshow  # If you still want to preview frames

# Load a pre-trained YOLO model
model = YOLO('yolov8n.pt')

# Open the video file
video_path = '/content/Cars.mp4'  # Make sure the path is correct
cap = cv2.VideoCapture(video_path)

if not cap.isOpened():
    print(f"Error opening video: {video_path}")
    exit()

# Get video properties (width, height, FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS)

# Define the codec and create a VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'mp4v')  # Or use *'XVID' for .avi
output_path = '/content/output_video.mp4'  # Path to save the new video
out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))  # Use original width and height

# Process each frame and write to the output video
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Perform object detection
    results = model(frame)

    # Visualize the results
    annotated_frame = results[0].plot()

    # Write the annotated frame to the output video
    out.write(annotated_frame)

    # Preview (optional - comment out if not needed)
    # cv2_imshow(annotated_frame)  # Preview if you want to see frame by frame
    # if cv2.waitKey(1) & 0xFF == ord('q'): # Break if 'q' is pressed during preview
    #     break # Break the loop

# Release resources
cap.release()
out.release()
cv2.destroyAllWindows()

print(f"Video saved to: {output_path}")

# Display the saved video (if needed, in a new cell)
# from IPython.display import HTML
# from base64 import b64encode

# mp4 = open(output_path,'rb').read()
# data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
# HTML("""
# <video width=640 controls>
#       <source src="%s" type="video/mp4">
# </video>
# """ % data_url)

In [None]:
!pip install ultralytics

Collecting ultralytics
  Downloading ultralytics-8.3.74-py3-none-any.whl.metadata (35 kB)
Collecting ultralytics-thop>=2.0.0 (from ultralytics)
  Downloading ultralytics_thop-2.0.14-py3-none-any.whl.metadata (9.4 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.8.0->ultralytics)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.8.0->ultralytics)
  Downloading nv

In [None]:
from ultralytics import YOLO
import cv2
from google.colab.patches import cv2_imshow  # If you still want to preview frames

# Load a pre-trained YOLO model
model = YOLO('yolov8n.pt')

# Open the video file
video_path = '/content/Cars.mp4'  # Make sure the path is correct
cap = cv2.VideoCapture(video_path)

if not cap.isOpened():
    print(f"Error opening video: {video_path}")
    exit()

# Get video properties (width, height, FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS)

# Define the codec and create a VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'mp4v')  # Or use *'XVID' for .avi
output_path = '/content/output_video.mp4'  # Path to save the new video
out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))  # Use original width and height

# Process each frame and write to the output video
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Perform object detection
    results = model(frame)

    # Visualize the results
    annotated_frame = results[0].plot()

    # Write the annotated frame to the output video
    out.write(annotated_frame)

    # Preview (optional - comment out if not needed)
    # cv2_imshow(annotated_frame)  # Preview if you want to see frame by frame
    # if cv2.waitKey(1) & 0xFF == ord('q'): # Break if 'q' is pressed during preview
    #     break # Break the loop

# Release resources
cap.release()
out.release()
cv2.destroyAllWindows()

print(f"Video saved to: {output_path}")

# Display the saved video (if needed, in a new cell)
# from IPython.display import HTML
# from base64 import b64encode

# mp4 = open(output_path,'rb').read()
# data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
# HTML("""
# <video width=640 controls>
#       <source src="%s" type="video/mp4">
# </video>
# """ % data_url)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Speed: 4.9ms preprocess, 157.7ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 5 cars, 1 truck, 151.9ms
Speed: 6.6ms preprocess, 151.9ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 5 cars, 1 truck, 159.6ms
Speed: 5.2ms preprocess, 159.6ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 5 cars, 1 truck, 149.8ms
Speed: 5.4ms preprocess, 149.8ms inference, 1.1ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 5 cars, 1 truck, 152.0ms
Speed: 6.1ms preprocess, 152.0ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 5 cars, 1 truck, 164.9ms
Speed: 5.0ms preprocess, 164.9ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 5 cars, 1 truck, 148.8ms
Speed: 5.1ms preprocess, 148.8ms inference, 1.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 5 cars, 1 truck

In [None]:
import cv2

# Open a video file
video_path = '/content/output_video.mp4'
cap = cv2.VideoCapture(video_path)

# Initialize the tracker
tracker = cv2.TrackerKCF_create()

# Read the first frame
ret, frame = cap.read()
if not ret:
    print("Error: Could not read video.")

# Select a bounding box to track
bbox = cv2.selectROI('Tracking', frame, False)
tracker.init(frame, bbox)

# Track the object in subsequent frames
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Update the tracker
    success, bbox = tracker.update(frame)

    # Draw the bounding box
    if success:
        x, y, w, h = [int(v) for v in bbox]
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
    else:
        cv2.putText(frame, "Tracking failed", (100, 80), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 0, 255), 2)

    # Display the frame
    cv2.imshow('Object Tracking', frame)

    # Exit if 'q' is pressed
    if cv2.waitKey(25) & 0xFF == ord('q'):
        break

# Release the video capture object and close windows
cap.release()
cv2.destroyAllWindows()