# Video Analysis with Azure OpenAI

This notebook demonstrates how to process video files frame-by-frame and analyze them using Azure OpenAI's GPT-4.1 vision capabilities.

Azure offers a built-in video content understanding solution: [Azure Video Content Understanding](https://learn.microsoft.com/en-us/azure/ai-services/content-understanding/video/overview). 

**However, the Azure service has important technical constraints:**
- **Frame sampling (~1 FPS):** The analyzer inspects about one frame per second. Rapid motions or single-frame events may be missed.
- **Frame resolution (512 × 512 px):** Sampled frames are resized to 512 pixels square. Small text or distant objects can be lost.

This notebook avoids these limitations and provides much greater control over frame extraction rate, resolution, and analysis workflow.

## Overview
- Extract frames from a video file
- Convert frames to base64 format for API consumption
- Send frames to Azure OpenAI for analysis
- Calculate the cost of processing

## Prerequisites
- Azure OpenAI resource with GPT-4.1 model deployed
- Video file to analyze (e.g., `dash-cam-sample.mov`)
- Required Python packages: `opencv-python`, `openai`, `azure-identity`

## Import Required Libraries

Import the necessary libraries for video processing and Azure OpenAI integration.

In [None]:
import cv2 
import base64
import os
import base64
import cv2
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

## Extract Frames from Video

**Note:** Make sure your video file is in the same directory as this notebook, or update the path accordingly.

In [None]:
# Set the desired frame extraction rate (frames per second)
target_fps = 1

video = cv2.VideoCapture("dash-cam-sample.mov")

# Get video properties
fps = video.get(cv2.CAP_PROP_FPS)
print(f"Video FPS: {fps}")

frame_skip = max(1, int(fps // target_fps))

base64Frames = []
frame_count = 0

while video.isOpened():
    success, frame = video.read()
    if not success:
        break
    
    # Only process every nth frame where n = frame_skip
    if frame_count % frame_skip == 0:
        _, buffer = cv2.imencode(".jpg", frame)
        base64Frames.append(base64.b64encode(buffer).decode("utf-8"))
    
    frame_count += 1

video.release()
print(f"{len(base64Frames)} frames extracted (target: {target_fps} per second).")
print(f"Total frames processed: {frame_count}")

## Analyze Video with Azure OpenAI

Send the extracted frames to Azure OpenAI's GPT-4.1 model for analysis. The model will describe what happens in the video based on the frame sequence.

### Configuration
- **Model**: GPT-4.1 with vision capabilities
- **Authentication**: Azure AD (Entra ID) using DefaultAzureCredential
- **Image Detail**: Set to 'low' for cost efficiency

### Environment Variables Required
- `AZURE_OPENAI_ENDPOINT`: Your Azure OpenAI resource endpoint

In [None]:
endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
deployment = "gpt-4.1"
detail = "low"

# Initialize Azure OpenAI client with Entra ID authentication
token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
    azure_endpoint=endpoint,
    azure_ad_token_provider=token_provider,
    api_version="2025-01-01-preview",
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "These are frames from a video. Please describe what happens in the video."
            },
            *[
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{frame}",
                        "detail": {detail}
                    }
                }
                for frame in base64Frames
            ]
        ]
    }
]

completion = client.chat.completions.create(
    model=deployment,
    messages=messages,
    temperature=0.5
)

print(completion.choices[0].message.content)

## Cost Analysis

Calculate the estimated cost of processing this video based on Azure OpenAI pricing.

### Current Pricing (as of June 2025)
- **Input tokens**: $2.00 per million tokens
- **Output tokens**: $8.00 per million tokens

**Reference**: [Azure OpenAI Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/)

### Cost Factors
- Number of frames sent
- Image detail level (low/high)
- Length of text prompt
- Length of generated response

In [None]:
input_tokens = completion.usage.prompt_tokens
output_tokens = completion.usage.completion_tokens

input_cost = input_tokens / 1_000_000 * 2  # $2 per million input tokens
output_cost = output_tokens / 1_000_000 * 8  # $8 per million output tokens
total_cost = input_cost + output_cost

print(f"Token Usage:")
print(f"  Input tokens: {input_tokens:,}")
print(f"  Output tokens: {output_tokens:,}")
print(f"  Total tokens: {input_tokens + output_tokens:,}")
print(f"")
print(f"Cost Breakdown:")
print(f"  Input cost: ${input_cost:.6f}")
print(f"  Output cost: ${output_cost:.6f}")
print(f"  Total cost: ${total_cost:.6f}")
print(f"")
print(f"Cost per frame: ${total_cost/len(base64Frames):.6f}")

### Use Cases
- **Security**: Analyze surveillance footage
- **Insurance**: Process dash cam footage for claims
- **Content Moderation**: Review video content
- **Research**: Analyze behavioral patterns in video data

### Performance Tips
- Use `"detail": "low"` for faster, cheaper processing
- Use `"detail": "high"` if you need more precise analysis (e.g., for fine-grained object detection or scene understanding), but note this increases cost and latency
- Adjust frame sampling rate based on video content
- Experiment with different frame sampling rates
- Customize prompts for specific use cases (e.g., object detection, activity recognition)
- Use structured outputs if integrating into software/pipelines