A Python script that processes smart-glasses videos using GCP Vertex AI to enable natural language queries about video content.
Video-Analyzer-small.mp4
Live demonstration of Video-Analyzer processing smart-glasses footage with AI-powered analysis
- Video Segmentation: Uses Google Cloud Video Intelligence API to segment videos into shots
- Multimodal Embeddings: Embeds video segments using Vertex AI's
multimodalembedding@001model - Vector Search: Stores embeddings in Vertex Vector Search for efficient retrieval
- Natural Language Queries: Query videos using natural language (e.g., "who did I meet at the gym?")
- AI Analysis: Uses Gemini 2.5 Pro to analyze retrieved video segments for insights
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment:
cp env.example .env # Edit .env with your actual credentials # Make sure .env is in .gitignore!
-
Required Environment Variables:
GOOGLE_CLOUD_PROJECT=your-project-id GOOGLE_APPLICATION_CREDENTIALS=./path-to-service-account.json GCS_BUCKET_NAME=your-bucket-name GEMINI_API_KEY=your-gemini-api-key VECTOR_SEARCH_INDEX_ENDPOINT_ID=your-endpoint-id # Optional VECTOR_SEARCH_DEPLOYED_INDEX_ID=your-index-name # Optional GCP_REGION=us-central1 # Optional GEMINI_MODEL=gemini-2.5-flash # Optional
-
GCP Setup Requirements:
- Go to Google Cloud Console
- Click "Select a project" β "New Project"
- Name your project (e.g.,
video-reasoning-project) - Note the Project ID (not name) - this is your
GOOGLE_CLOUD_PROJECT
- Go to "APIs & Services" β "Library"
- Enable these APIs:
- Vertex AI API
- Cloud Storage API
- Cloud Video Intelligence API
- Go to "IAM & Admin" β "Service Accounts"
- Click "Create Service Account"
- Name:
video-reasoning-sa - Grant these roles:
- Storage Admin (for GCS access)
- Vertex AI User (for AI models)
- Service Usage Consumer (for API access)
- Create a key:
- Click the service account β "Keys" β "Add Key" β "JSON"
- Download the JSON file
- Place it in your project directory
- This file path is your
GOOGLE_APPLICATION_CREDENTIALS
- Go to "Cloud Storage" β "Buckets"
- Click "Create Bucket"
- Name:
your-unique-bucket-name(must be globally unique) - Region:
us-central1(or your preferred region) - This bucket name is your
GCS_BUCKET_NAME
- Go to Google AI Studio
- Click "Create API Key"
- Copy the API key
- This is your
GEMINI_API_KEY
- Go to "Billing" in GCP Console
- Enable billing for your project
- Note: GCP requires billing to be enabled for most AI services
If you have gcloud CLI installed:
# Set project
gcloud config set project YOUR_PROJECT_ID
# Enable APIs
gcloud services enable aiplatform.googleapis.com
gcloud services enable storage.googleapis.com
gcloud services enable videointelligence.googleapis.com
# Create service account
gcloud iam service-accounts create video-reasoning-sa \
--description="Video Reasoning Service Account" \
--display-name="Video Reasoning SA"
# Grant permissions
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:video-reasoning-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/storage.admin"
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:video-reasoning-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
# Create key
gcloud iam service-accounts keys create service-account-key.json \
--iam-account=video-reasoning-sa@YOUR_PROJECT_ID.iam.gserviceaccount.comVideo Reasoning/
βββ src/
β βββ config/
β β βββ __init__.py
β β βββ settings.py # Configuration management
β βββ services/
β β βββ __init__.py
β β βββ storage.py # GCS operations
β β βββ segmentation.py # Video Intelligence API
β β βββ embeddings.py # Multimodal embeddings
β β βββ vector_search.py # Vector Search
β β βββ analysis.py # Gemini analysis
β βββ utils/
β β βββ __init__.py
β β βββ formatter.py # Output formatting
β βββ pipeline.py # Main pipeline orchestrator
βββ app.py # Streamlit web UI
βββ main.py # CLI entry point
βββ requirements.txt
βββ .env # Your credentials (not in git)
βββ README.md
The easiest way to use Video-Analyzer is through the web interface:
# Install Gradio (compatible with Python 3.9.7+)
pip install gradio
# Run the web app
python app.pyThen open your browser to http://localhost:7860 and:
- π€ Upload a video file
- π Ask a question about the video
- π Click "Analyze Video" to get AI-powered insights
For programmatic use or automation:
# Simple usage (reads from .env)
python main.py \
--video-path ./video.mp4 \
--query "what did I promise?"
# Or override with command line args
python main.py \
--video-path ./video.mp4 \
--query "what did I promise?" \
--project-id your-project-id \
--region us-central1--video-path: Local video file path or GCS URI (gs://bucket/video.mp4)--query: Natural language question about the video content--project-id: GCP project ID (overridesGOOGLE_CLOUD_PROJECTenv var)--region: GCP region (overridesGCP_REGIONenv var)--top-k: Number of segments to retrieve and analyze (default: 10)
The script outputs:
- Colorized Console Summary: Top 3 analyzed segments with key insights
- Full JSON Results: Complete analysis for all retrieved segments
{
"clip_start": 45.2,
"clip_end": 78.9,
"summary": "Meeting with John at the gym entrance",
"promises": ["Call him tomorrow about the project"],
"body_language": "Confident handshake, direct eye contact",
"confidence_score": 0.85,
"actions": ["Handshake", "Pointing at equipment"]
}Approximate costs (based on GCP pricing as of 2024):
- Video Intelligence API: ~$0.10-0.20 per minute of video
- Multimodal Embeddings: ~$0.0002 per embedding (1408 dimensions)
- Vector Search: ~$0.10 per 1000 queries + storage costs
- Gemini 2.5 Pro: ~$0.001-0.002 per query
Rate Limits:
- Video Intelligence: 100 videos/hour
- Multimodal Embeddings: 1000 requests/minute
- Vector Search: Varies by index configuration
- Gemini: 60 requests/minute
- Input Processing: Upload local videos to GCS or use direct GCS URIs
- Segmentation: Video Intelligence API detects shot boundaries
- Embedding: Each segment gets a 1408-D multimodal embedding
- Storage: Embeddings stored in Vector Search with metadata
- Query: Text queries embedded and matched against video segments
- Analysis: Top segments analyzed by Gemini for structured insights
- Store service account keys securely
- Use environment variables for sensitive configuration
- GCS bucket should have appropriate access controls
- Consider VPC Service Controls for production deployments