# **Histogram Frame Similarity Assessment - Detailed Process Explanation**

# **Step 1: Import Libraries**
# First, we import the required libraries for video processing and memory profiling:
# - `cv2` (OpenCV): This library allows us to load and process video frames, compute histograms, and compare images.
# - `numpy`: Used for numerical operations, which are helpful in image processing tasks such as flattening histograms.
# - `time`: This is used to measure the time taken for loading frames and processing them, giving performance metrics.
# - `psutil`: This library is used to track the memory usage of the program to monitor resource consumption during video processing.

# **Step 2: Set the Video Paths**
# We specify the paths for the input video and the output processed video:
# - `video_path` is the path to the video file that will be processed.
# - `output_video_path` is where the output video (after processing) will be saved.

# **Step 3: Define the Histogram Similarity Calculation Function**
# The function `calculate_histogram_similarity` compares two frames based on their color histograms:
# - It first calculates the histogram of both frames. The histograms are calculated for the grayscale frames using `cv2.calcHist`.
# - The histograms are then normalized to ensure they are on the same scale, which helps improve the comparison's accuracy.
# - The function then computes the correlation between the two histograms using `cv2.compareHist`. The result is a similarity score between 0 and 1:
#   - A score of `1` indicates the histograms are identical (i.e., the frames are visually identical).
#   - A score of `0` indicates no similarity.
# - This method works well for comparing the overall brightness distribution and structure of the frames, which can be effective for detecting changes in video content.

# **Step 4: Define the Frame Similarity Function**
# The function `is_frame_similar` is used to check whether two frames are similar enough based on histogram similarity:
# - It calls `calculate_histogram_similarity` to obtain the similarity score between two frames.
# - The function then compares this similarity score to a predefined threshold (default set to `0.991`). If the score exceeds this threshold, the frames are considered similar and the frame is skipped. Otherwise, the frame is considered significantly different and included in the output video.

# **Step 5: Read Video Frames**
# We use OpenCV to load all frames from the video:
# - `cv2.VideoCapture` is used to read the video file frame by frame.
# - Each frame is then converted to grayscale using `cv2.cvtColor`, as color is not relevant for histogram similarity in this case.
# - The grayscale frames are appended to a list `frames`, which will later be used for comparison.
# - This step loads all the frames from the video into memory, allowing us to perform frame-by-frame comparison.

# **Step 6: Load Video Frames and Measure Loading Time**
# The frames are read into memory, and the time taken to load the video frames is measured:
# - The `get_video_frames` function is called to load the frames, and we record the elapsed time taken to load all frames from the video.
# - The `loading_time` gives us insight into how long it takes to read the video and prepare the frames for processing.

# **Step 7: Set Up the Video Writer**
# To save the processed frames into a new video, we need to set up the video writer:
# - `cv2.VideoWriter_fourcc` is used to define the codec for the output video. Here, we choose `mp4v` for MP4 file format.
# - We specify the frame rate (`fps`) for the output video, which should ideally match the original video’s frame rate.
# - The resolution of the video is obtained from the first frame, ensuring that all output frames will have the correct size.
# - The video writer object `out` is then initialized, which will be used to write frames to the output video.

# **Step 8: Process Frames and Write to Output Video**
# Now, we begin processing the frames:
# - The first frame is always written to the output video as the baseline.
# - We then start comparing each subsequent frame to the last processed frame using the histogram similarity measure.
# - If the histogram similarity is above the threshold, the frame is skipped, meaning it is very similar to the last frame and does not need to be saved to the output video.
# - If the histogram similarity is below the threshold, the frame is written to the output video, and the last processed frame index is updated.
# - This step reduces the size of the output video by removing redundant frames and only keeping the frames that show significant differences.

# **Step 9: Calculate Processing Time**
# After processing all frames, we calculate the total time taken to compare frames and write the output:
# - `processing_time` is the time taken to compare frames and write non-similar frames to the output video.
# - The total time includes both the time to load the frames and the time to process them.
# - These metrics are helpful for understanding how efficient the frame comparison and video processing tasks are.

# **Step 10: Release Video Writer**
# After processing all the frames, we release the video writer:
# - `out.release()` ensures that the video file is properly closed and saved with the processed frames.
# - Without this step, the processed video may not be written correctly or may not be finalized.

# **Step 11: Memory Profiling**
# Memory usage is tracked before and after processing:
# - We use `psutil` to measure the memory usage of the process before and after the frames have been processed.
# - `memory_info.rss` gives the current memory used by the process in bytes, which is then converted to megabytes (MB).
# - We estimate the final memory consumption by considering the number of skipped frames, as each skipped frame would have been loaded into memory but not written to the output video.
# - Memory profiling is useful for understanding the program's resource consumption and can help in optimizing large-scale video processing tasks.

# **Step 12: Log the Results**
# Finally, we log important metrics about the video processing:
# - `Processed video saved as`: Displays the path to the output processed video.
# - `Total number of frames`: Shows the total number of frames in the original video.
# - `Number of frames skipped`: Displays how many frames were skipped due to high similarity to the previous frame.
# - `Percentage of frames skipped`: The percentage of frames that were not written to the output video.
# - `Frames processing per second`: Indicates how many frames were processed per second, which gives a sense of the processing speed.
# - Time metrics: Including the time taken to load the frames, process them, and the total time.
# - Memory usage: Shows both initial and final memory consumption, as well as the memory consumed during processing.
# - Total processing load: Indicates how many frames were processed in total.



In [14]:
import cv2
import numpy as np
import time
import psutil  # For memory profiling

# Set the video path (update this to your video file path)
video_path = '/content/00067cfb-e535423e.mov'  # Change this to your video file path
output_video_path = '/content/processed_video_histogram.mp4'  # Output video path

# Function to calculate histogram similarity between frames
def calculate_histogram_similarity(frame1, frame2):
    # Convert frames to histograms
    hist1 = cv2.calcHist([frame1], [0], None, [256], [0, 256])
    hist2 = cv2.calcHist([frame2], [0], None, [256], [0, 256])

    # Normalize the histograms
    hist1 = cv2.normalize(hist1, hist1).flatten()
    hist2 = cv2.normalize(hist2, hist2).flatten()

    # Calculate the correlation between the two histograms
    similarity = cv2.compareHist(hist1, hist2, cv2.HISTCMP_CORREL)

    return similarity  # Similarity ranges from 0 (no similarity) to 1 (identical histograms)

# Function to check if frames are similar based on histogram similarity
def is_frame_similar(frame1, frame2, threshold=0.991):
    # Calculate histogram similarity
    histogram_similarity = calculate_histogram_similarity(frame1, frame2)

    # Check if similarity exceeds the threshold
    return histogram_similarity > threshold

# Read video frames
def get_video_frames(video_path):
    cap = cv2.VideoCapture(video_path)
    frames = []
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)  # Convert to grayscale
        frames.append(gray_frame)
    cap.release()
    return frames

# Load video frames
start_time = time.time()  # Start time for reading frames
frames = get_video_frames(video_path)
loading_time = time.time() - start_time  # Time taken to load frames

# Set up the video writer
fourcc = cv2.VideoWriter_fourcc(*'mp4v')  # Codec for mp4
fps = 30  # Set to your video's frame rate
frame_height, frame_width = frames[0].shape
out = cv2.VideoWriter(output_video_path, fourcc, fps, (frame_width, frame_height))

# Process frames and write to output video
frame_count = len(frames)
skipped_frames = 0
out.write(cv2.cvtColor(frames[0], cv2.COLOR_GRAY2BGR))  # Write the first frame
last_processed_frame_index = 0  # Track the last processed frame index

# Start time for processing
start_processing_time = time.time()

# Start comparing from frame 1 onwards
for i in range(1, frame_count):
    # Compare current frame with the last processed frame using histogram similarity
    if is_frame_similar(frames[last_processed_frame_index], frames[i]):
        skipped_frames += 1
    else:
        out.write(cv2.cvtColor(frames[i], cv2.COLOR_GRAY2BGR))  # Write the current frame if not similar
        last_processed_frame_index = i  # Update the last processed frame index

# Calculate processing time
end_processing_time = time.time()
processing_time = end_processing_time - start_processing_time
total_time = loading_time + processing_time  # Total time includes loading and processing

# Release the video writer
out.release()

# Memory usage before and after processing
process = psutil.Process()
memory_info = process.memory_info()
initial_memory = memory_info.rss / (1024 * 1024)  # Convert bytes to MB
final_memory = (memory_info.rss + (skipped_frames * frames[0].nbytes)) / (1024 * 1024)  # Rough estimate of final memory
memory_consumed = final_memory - initial_memory

# Log the required information
print(f"Processed video saved as: {output_video_path}")
print(f"Total number of frames: {frame_count}")
print(f"Number of frames skipped: {skipped_frames}")
print(f"% of frames skipped: {(skipped_frames / frame_count) * 100:.2f}%")
print(f"Frames processing per second: {frame_count / processing_time:.2f} FPS")
print(f"Total time taken to load frames: {loading_time:.6f} seconds")
print(f"Total processing time: {processing_time:.6f} seconds")
print(f"Total time (loading + processing): {total_time:.6f} seconds")
print(f"Average time per frame comparison: {processing_time / (frame_count - 1):.6f} seconds")
print(f"Total memory usage: Initial = {initial_memory:.2f} MB, Final = {final_memory:.2f} MB")
print(f"Memory consumed during processing: {memory_consumed:.2f} MB")
print(f"Total processing load: {frame_count} frames")


Processed video saved as: /content/processed_video_histogram.mp4
Total number of frames: 1206
Number of frames skipped: 1090
% of frames skipped: 90.38%
Frames processing per second: 411.70 FPS
Total time taken to load frames: 10.101037 seconds
Total processing time: 2.929312 seconds
Total time (loading + processing): 13.030348 seconds
Average time per frame comparison: 0.002431 seconds
Total memory usage: Initial = 2296.19 MB, Final = 3254.20 MB
Memory consumed during processing: 958.01 MB
Total processing load: 1206 frames
