
# Video Indexing, Embedding, and Metadata Generation

This guide aims to walk you through a comprehensive pipeline for video data processing. You'll learn how to:

1. Generate initial video metadata.
2. Extract key frames from videos.
3. Calculate embeddings for these frames using ResNet50 and GPT-3.
4. Generate additional metadata using a Language Model.
5. Update the metadata to a JSON file, Supabase, and Weaviate.

All the code is idempotent, meaning you can run it multiple times without worrying about duplicate data. So let's get started!



## Video Indexing

The first step in our pipeline is to generate initial metadata for our videos. This includes basic information like filename, type, source path, frame rate, original resolution, and whether the video has audio. The metadata serves as the foundation for all subsequent steps.


In [6]:
from pathlib import Path

# Create a Path object for the mounted NAS directory
nas_path = Path(r"S:\\")

(nas_path)

S:\


In [7]:
#!/usr/local/bin/python3

import cv2
import os
import json
from pydantic import BaseModel, Field
from typing import List, Dict

current_directory = os.getcwd()

# Define Pydantic model for video metadata
class VideoMetadata(BaseModel):
    filename: str
    type: str
    src_path: str
    frame_rate: int
    original_resolution: str
    has_audio: bool

# Function to index all video files in a directory and generate metadata
def index_videos(root_directory):
    video_files = []
    video_metadata_list = []
    
    # Traverse root directory to find all video files
    for root, dirs, files in os.walk(root_directory):
        for file in files:
            if file.lower().endswith(('.mp4', '.mkv', '.flv', '.avi')):
                video_files.append(os.path.join(root, file))
    
    # Generate metadata for each video file
    for video_file in video_files:
        cap = cv2.VideoCapture(video_file)
        frame_rate = int(cap.get(cv2.CAP_PROP_FPS))
        width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
        resolution = f'{width}x{height}'
        has_audio = True  # This is a placeholder. Actual check for audio stream needed.
        
        video_metadata = VideoMetadata(
            filename=os.path.basename(video_file),
            type=os.path.splitext(video_file)[1][1:],
            src_path=video_file,
            frame_rate=frame_rate,
            original_resolution=resolution,
            has_audio=has_audio
        )
        
        video_metadata_list.append(video_metadata.dict())
    
    return video_metadata_list

# Sample usage
root_directory = (r'S:\\')
# root_directory = '/volume1'  # Replace with your actual root directory

# To source the USER's HOME Directory on Linux and Windows
# home_directory = os.getenv("HOME") or os.getenv("USERPROFILE")

video_metadata_list = index_videos(root_directory)

# Convert metadata list to JSON and save
json_file_path = os.path.join(current_directory, 'index_vidoes.json')
# json_file_path = '$PWD/indexed_videos.json'  # Replace with your actual JSON file path

with open(json_file_path, 'w') as f:
    json.dump(video_metadata_list, f, indent=4)



## Frame Extraction

Once we have the basic metadata, we move on to frame extraction. We'll be using key framing to minimize the number of frames needed for analysis. The extracted frames will be stored as numpy arrays in a directory for easy retrieval and further processing.


In [None]:
#!/usr/local/bin/python3

import cv2
import numpy as np
import os

# Function to extract key frames from video
def extract_key_frames(video_path, save_path):
    cap = cv2.VideoCapture(video_path)
    frames = []
    
    while cap.isOpened():
        ret, frame = cap.read()
        
        if not ret:
            break
            
        # Your key frame extraction logic here
        # For demonstration, we'll save every 10th frame
        
        frame_id = int(cap.get(1))  # Current frame number
        if frame_id % 10 == 0:
            frame_file = os.path.join(save_path, f'frame_{frame_id}.npy')
            np.save(frame_file, frame)
            frames.append({'frame_id': frame_id, 'frame_path': frame_file})
    
    cap.release()
    return frames

# Sample usage
video_path = '/path/to/sample_video.mp4'
frame_save_path = '/path/to/save/frames'

# Create directory to save frames if it doesn't exist
if not os.path.exists(frame_save_path):
    os.makedirs(frame_save_path)

key_frames = extract_key_frames(video_path, frame_save_path)
print(key_frames)



## Embedding Calculation

After extracting the key frames, the next step is to calculate their embeddings. We'll be using both ResNet50 and GPT-3 to generate these embeddings. These embeddings will serve as a compact representation of the frames and will be used for various downstream tasks like search, categorization, and analysis.


In [None]:

# Placeholder code for Embedding Calculation
# Here you would add the code to calculate ResNet50 and GPT-3 embeddings for the extracted frames
# For demonstration, let's assume we have a function `calculate_embeddings` that takes a frame path and returns the embeddings

def calculate_embeddings(frame_path):
    # Your actual embedding calculation logic here
    return 'sample_embedding'

# Calculate embeddings for key frames
frame_embeddings = {frame['frame_id']: calculate_embeddings(frame['frame_path']) for frame in key_frames}
print(frame_embeddings)



## LLM Data Generation

Once we have the embeddings, we can proceed to enrich our video metadata using a Language Model. We'll be using GPT-3 for this task. The Language Model will generate a summary, description, keywords, category, and creator for each video, which will be added to our metadata.


In [None]:

# Placeholder code for LLM Data Generation
# Here you would add the code to generate additional metadata like Summary, Description, Keywords, etc., using a Language Model
# For demonstration, let's assume we have a function `generate_llm_data` that takes an embedding and returns the additional metadata

def generate_llm_data(embedding):
    # Your actual LLM data generation logic here
    return {
        'Summary': 'sample_summary',
        'Description': 'sample_description',
        'Keywords': ['sample_keyword1', 'sample_keyword2'],
        'Category': 'sample_category',
        'Creator': 'sample_creator'
    }

# Generate LLM data for key frames
llm_data = {frame['frame_id']: generate_llm_data(frame_embeddings[frame['frame_id']]) for frame in key_frames}
print(llm_data)



## Updating JSON File

All the metadata, frame paths, and embeddings will be consolidated and stored in a JSON file. We'll be using a nested JSON structure to keep the data organized. The JSON file will be updated in an idempotent manner, ensuring that running the code multiple times does not create duplicate entries.


In [None]:
#!/usr/local/bin/python3

import json

# Function to update JSON file in an idempotent manner
def update_json_file(json_file_path, new_data):
    # Read existing data from JSON file
    if os.path.exists(json_file_path):
        with open(json_file_path, 'r') as f:
            existing_data = json.load(f)
    else:
        existing_data = {}
    
    # Update existing data with new data
    for video_id, metadata in new_data.items():
        if video_id not in existing_data:
            existing_data[video_id] = {}
        existing_data[video_id].update(metadata)
    
    # Write updated data back to JSON file
    with open(json_file_path, 'w') as f:
        json.dump(existing_data, f, indent=4)

# Sample usage
new_data = {
    "some_unique_video_id": {
        "metadata": video_metadata.dict(),  # This would come from your actual metadata
        "key_frames": key_frames,
        "frame_embeddings": frame_embeddings,
        "llm_data": llm_data
    }
}

# Update JSON file with new data
json_file_path = '/path/to/indexed_videos.json'  # Replace with your actual JSON file path
update_json_file(json_file_path, new_data)



## Updating Supabase and Weaviate

Finally, all the generated and collected data will be updated to Supabase and Weaviate databases. This will allow us to efficiently query and retrieve video data for various applications.


In [None]:

# Placeholder code for Database Updating
# Here you would add the code to update Supabase and Weaviate databases with the new data
# You can use the Python SDKs for both Supabase and Weaviate to accomplish this

# Sample Supabase code (placeholder)
# supabase = create_client(SUPABASE_URL, SUPABASE_API_KEY)
# response = supabase.table('videos').upsert(new_data)

# Sample Weaviate code (placeholder)
# client = weaviate.Client(WEAVIATE_URL)
# client.batch.create(new_data)

# Note: Make sure to replace the placeholder code with your actual implementation for database updates
