Extract frames from videos and analyze them using AI-powered image recognition.
VisionFrameAnalyzer is a Go-based tool that:
β
Extracts frames from a video at set intervals using ffmpeg
β
Uses an AI-powered vision model to analyze and describe each frame
β
Provides a structured pipeline for video-to-image processing
- π Frame Extraction β Converts video frames into images
- πΌ AI-Powered Analysis β Describes each frame using an LLM vision model
- β‘ Multi-Frame Processing β Handles multiple frames efficiently
- π Detailed Logging β Provides structured logs for debugging
- Go (Golang)
- FFmpeg (Frame Extraction)
- Ollama (LLM-powered image analysis)
- Slog + Tint (Logging)
- Kubernetes Ready (Optional Multi-Cluster Support)
- Go (1.21+ recommended)
- FFmpeg
- Ollama
If you don't have Go installed or are experiencing GOROOT issues, install/fix it first:
# macOS with Homebrew
brew install go
# Verify installation
go version
# If you see GOROOT errors, set it explicitly in your shell profile
echo 'export GOROOT=$(brew --prefix go)/libexec' >> ~/.zshrc # or ~/.bash_profile
echo 'export PATH=$PATH:$GOROOT/bin:$HOME/go/bin' >> ~/.zshrc
source ~/.zshrc # or source ~/.bash_profile
brew install ffmpeg
brew install ollama
go mod tidy
# Navigate to project directory
cd /path/to/vision
# Build and run in one step
go run ./cmd/visionanalyzer --video path/to/video.mp4 --output output_directory
# Or build an executable and run it
go build -o ./bin/visionanalyzer ./cmd/visionanalyzer
./bin/visionanalyzer --video path/to/video.mp4 --output output_directory
# Build the container
docker build -t vision-analyzer .
# Run the container
docker run -v $(pwd):/data vision-analyzer --video /data/input.mp4 --output /data/frames
- Ensure Ollama is running locally on port 11434
- The tool uses
llama3.2-vision:11b
model by default
--video
: Path to input video file (required)--output
: Output directory for frames (default: "output_frames")
# Build and run directly
go build -o visionanalyzer ./cmd/visionanalyzer
./visionanalyzer --video path/to/video.mp4
# Or run without building
go run ./cmd/visionanalyzer --video path/to/video.mp4
# Specify custom output directory
./visionanalyzer --video path/to/video.mp4 --output custom_output
./visionanalyzer --help
# Process a video
./visionanalyzer --video path/to/video.mp4
# Specify custom output directory
./visionanalyzer --video path/to/video.mp4 --output custom_output
# Search for frames containing specific content (requires PostgreSQL)
export DB_ENABLED=true
./visionanalyzer --search "person cooking" --limit 10 --video path/to/video.mp4
# Show help
./visionanalyzer --help
output_frames/
βββ video_name/
βββ frame_0001.jpg
βββ frame_0002.jpg
βββ analysis_results.json
βββ ...
The analysis_results.json
file contains frame-by-frame analysis:
[
{
"frame": "frame_0001.jpg",
"content": "Detailed analysis of frame contents..."
}
]
VisionFrameAnalyzer can store analysis results in PostgreSQL with pgvector for vector similarity search.
- Install PostgreSQL (14+ recommended)
# macOS with Homebrew
brew install postgresql@14
brew services start postgresql@14
# Verify installation
psql --version
For easy development, you can use Docker Compose to set up PostgreSQL with pgvector:
version: '3.8'
services:
db:
image: postgres:14
environment:
POSTGRES_USER: user
POSTGRES_PASSWORD: password
POSTGRES_DB: visiondb
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/data
volumes:
pgdata:
Create the necessary tables and enable pgvector extension:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE videos (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE frames (
id SERIAL PRIMARY KEY,
video_id INTEGER REFERENCES videos(id),
frame_number INTEGER NOT NULL,
image_path TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE analyses (
id SERIAL PRIMARY KEY,
frame_id INTEGER REFERENCES frames(id),
content JSONB,
vector VECTOR(768), -- Adjust dimension based on your model
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
The pgvector implementation allows you to search for frames with similar content using vector similarity, which is much more powerful than basic text search.
VisionFrameAnalyzer offers two ways to search for frames:
- Vector Similarity Search - Find frames that are semantically similar to your query:
# Search for frames showing a person cooking (top 5 results)
export DB_ENABLED=true
./visionanalyzer --search "person cooking" --video path/to/video.mp4
# Increase the number of results
./visionanalyzer --search "person cooking" --limit 10 --video path/to/video.mp4
vision/
βββ cmd/
β βββ visionanalyzer/ # Main executable package
βββ internal/
β βββ analyzer/ # AI vision analysis functionality
β βββ extractor/ # Video frame extraction functionality
β βββ models/ # Shared data structures
β βββ storage/ # Result storage and persistence
π½οΈ Automated Video Analysis β Extract insights from video feeds
π Content Moderation β Detect and describe images in video content
π Machine Learning Pipelines β Pre-process video datasets for AI models
MIT License. See LICENSE for details.