vidlizer

Point it at a video, image, or PDF. Get structured JSON — scene by scene.

vidlizer pulls frames out of any video, image, or PDF using ffmpeg, sends them to a vision LLM, and returns a flow array — one entry per scene. Each entry tells you what happened, who was on screen, what text was visible, and what changed. If the video has audio, it transcribes it with Apple MLX Whisper and merges the speech into each step.

Runs fully local via Ollama or any OpenAI-compatible server (LM Studio, vLLM, oMLX) — no API key, no data leaving your machine. Or connect OpenRouter for cloud models. vidlizer setup detects what you have installed and writes your config in under a minute.

vidlizer demo.mp4
vidlizer "https://youtube.com/watch?v=..."
vidlizer screenshot.png
vidlizer document.pdf

✨ Features

Any input — local video, image (jpg/png/webp/…), PDF, or URL (YouTube, Loom, Vimeo, Twitter)
4 providers — Ollama (fully offline), LM Studio (port 1234), oMLX (Apple Silicon, port 8000), OpenRouter (cloud) — auto-detected in that order
Cross-provider fallback — primary model fails → automatically switches provider (e.g. oMLX → OpenRouter)
JSON repair — malformed model output is re-sent to the model to fix before skipping; recovers from partial JSON
Free-model guard — :free OpenRouter models auto-force concurrency=1 to stay within rate limits
3 output formats — --format json (default), summary (plain text by phase), markdown (step-per-section doc)
Usage tracking — --stats shows per-model token + cost breakdown across all runs; get_usage_stats() MCP tool
Auto transcript — detects audio, transcribes with Apple MLX Whisper (Neural Engine), merges speech into each flow step
Perceptual dedup — removes near-duplicate frames before sending (saves tokens)
analyze_moment — --start/--end flags to focus on a time range
In-memory cache — repeat runs on the same file skip the API call
Cost guard — aborts if spend exceeds MAX_COST_USD (default $1.00)
Live progress — Rich streaming indicator shows elapsed time and token count per batch
MCP server — use from Claude Code, Cursor, Claude Desktop; provider/model locked via env vars; result includes model_used + provider_used
Auto-install — missing ffmpeg is brew-installed; mlx-whisper bundled in default install (macOS)
doctor --fix — interactive repair wizard: installs missing ffmpeg/Ollama/LM Studio via Homebrew, re-runs vidlizer setup for .env, upgrades mlx-whisper
mcp-setup — one-command MCP config wizard: detects vidlizer-mcp, reads .env, writes editor config or shows a claude mcp add-json one-liner
Mac-native — file picker dialog, Apple MLX transcription, handles macOS Unicode filenames (e.g. "11:26 AM")

📦 Requirements

macOS (Apple Silicon recommended for transcription speed)
Python 3.10+
Ollama mode: Ollama installed + a vision model pulled (5 GB+ RAM)
LM Studio mode: LM Studio 0.3.16+ with a vision model loaded
Cloud mode: An OpenRouter API key

ffmpeg is installed automatically via Homebrew on first run if missing.

🚀 Install

Option 1 — pipx (recommended, isolated)

pipx install vidlizer
vidlizer setup    # interactive wizard: detects providers, writes .env

Option 2 — pip / virtualenv

pip install vidlizer
vidlizer setup

Option 3 — from source

git clone https://github.com/arizawan/vidlizer.git
cd vidlizer
python -m venv .venv && source .venv/bin/activate
pip install -e .
vidlizer setup    # or: cp env.sample .env

First-run wizard

vidlizer setup detects all installed providers, lets you pick primary + fallback, and writes a .env for you. It also offers to pull a vision model for Ollama if none is installed.

$ vidlizer setup
  Detected providers:
    1.  Ollama        → qwen2.5vl:3b
    2.  OpenRouter    → google/gemma-3-27b-it:free

  Primary provider (1–2): 1
  Fallback (1–1, Enter to skip): 2

✓  .env written → /your/project/.env

Health check

vidlizer doctor          # shows ffmpeg, .env, provider, mlx-whisper status
vidlizer doctor --fix    # interactive repair: brew-installs ffmpeg/Ollama/LM Studio, re-runs setup

Manual provider setup

Ollama (fully offline, no API key):

# Install Ollama from https://ollama.com, then:
ollama pull qwen2.5vl:3b   # ~3.2 GB, requires 5 GB+ RAM (recommended)
ollama pull qwen2.5vl:7b   # ~6.0 GB, requires 10 GB+ RAM (best quality)

LM Studio (GPU-accelerated local inference):

# In LM Studio: load a vision model (e.g. Qwen2.5-VL 7B), enable the local server
# Set PROVIDER=openai and OPENAI_BASE_URL=http://localhost:1234/v1 in .env

OpenRouter (cloud):

# Paste your OpenRouter key in .env:
# OPENROUTER_API_KEY=sk-or-v1-...

⚡ Quick start

# Ollama — fully local, no API key
vidlizer demo.mp4 --provider ollama --model qwen2.5vl:3b

# LM Studio (or any OpenAI-compat server)
vidlizer demo.mp4 --provider openai --model qwen/qwen2.5-vl-7b-instruct

# OpenRouter (cloud)
vidlizer demo.mp4 --provider openrouter --model google/gemini-2.5-flash

# Analyze a YouTube video
vidlizer "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# Analyze a single image or PDF
vidlizer screenshot.png
vidlizer report.pdf

# Focus on a time range
vidlizer demo.mp4 --start 30 --end 90

# Output as Markdown or plain-text summary
vidlizer demo.mp4 --format markdown -o result.md
vidlizer demo.mp4 --format summary -o result.txt

Run with no arguments for an interactive file picker + provider/model selector.

📄 Output

Three formats via --format:

Format	Flag	Default extension	Description
JSON	`--format json`	`.analysis.json`	Full structured flow array (default)
Markdown	`--format markdown`	`.analysis.md`	Step-per-section document with scene/action/speech
Summary	`--format summary`	`.analysis.txt`	Plain text grouped by phase

Default output path is <normalized-name>.analysis.json (or matching extension), or pass -o path.

{
  "flow": [
    {
      "step": 1,
      "timestamp_s": 0.0,
      "phase": "Introduction",
      "scene": "Title card with product logo on dark background.",
      "subjects": ["Logo", "Text overlay"],
      "action": "Static title card displayed.",
      "text_visible": "vidlizer — analyze any video",
      "context": "Opening of a product demo.",
      "observations": "Clean minimal design.",
      "next_scene": "Screen recording of the CLI.",
      "speech": "Welcome to the vidlizer demo."
    }
  ],
  "transcript": [
    { "start": 0.0, "end": 2.4, "text": "Welcome to the vidlizer demo." }
  ],
  "model_used": "gemma-4-E2B-it-MLX-4bit",
  "provider_used": "openai"
}

model_used and provider_used reflect the model that actually produced the result — including after fallback. Surfaced in MCP analyze_video response so agents know which provider ran.

Flow step fields

Field	Description
`step`	Sequential integer
`timestamp_s`	Approximate time in seconds (from frame label)
`phase`	Logical section — Introduction, Demo, Action, Conclusion…
`scene`	What is currently visible
`subjects`	Key people, objects, UI elements present
`action`	What is happening — interaction, movement, narration, event
`text_visible`	All readable text on screen
`context`	Persistent state — timer, score, topic, brand…
`observations`	Errors, anomalies, emotions, key facts
`next_scene`	Brief description of what follows
`speech`	Transcript text spoken during this step (audio videos only)

🤖 Models

All providers send one frame per request (batch_size=1) for maximum compatibility with context-limited models. Thinking-mode output (<think> tags) is stripped automatically.

Local — Ollama

Fully offline, no API key. Install with ollama pull <name>:

Model	Disk	RAM	Notes
`qwen2.5vl:3b` ★	3.2 GB	~5 GB	Recommended — 128K ctx, strong JSON, multi-image
`qwen2.5vl:7b` ★	6.0 GB	~9 GB	Best Ollama quality — 128K ctx, needs 10+ GB RAM
`minicpm-v:8b`	5.5 GB	~8 GB	Strong OCR + visual reasoning, 32K ctx
`llava-onevision:7b`	5.5 GB	~8 GB	Strong multi-image + video frames, reliable JSON

Fallback order (if configured model unavailable): qwen2.5vl:7b → qwen2.5vl:3b → minicpm-v:8b → llava-onevision:7b → llava:13b → llava:7b

Uses Ollama's native /api/chat with format: json for reliable structured output.

Local — LM Studio / oMLX / vLLM / LocalAI (OpenAI-compatible)

GPU-accelerated inference via any OpenAI-compatible server. Set PROVIDER=openai and point OPENAI_BASE_URL at your server.

Model	VRAM	Notes
`qwen/qwen2.5-vl-7b-instruct` ★	~8 GB	Recommended — 128K ctx, reliable JSON, multi-image
`qwen/qwen3-vl-8b` ★	~10 GB	Latest Qwen vision — thinking tags stripped automatically
`qwen/qwen2.5-vl-3b-instruct`	~5 GB	Lightweight — fast, 5–6 GB VRAM
`google/gemma-4-e4b-it`	~6 GB	Google MoE — LM Studio 0.3.16+ native support
`google/gemma-4-9b-it`	~10 GB	Stronger Gemma 4 — 128K ctx, better instruction following
`zai-org/glm-4.6v-flash`	~8 GB	ZhipuAI MoE — 128K ctx, strong JSON, low latency
`openbmb/minicpm-v-4.5`	~8 GB	8B Qwen3-based — strong OCR, multi-image, vLLM ready

oMLX (Apple Silicon native, omlx.ai) — MLX-format models from HuggingFace, auto-detected on port 8000 (distinct from LM Studio). Model IDs are HuggingFace paths:

oMLX Model	Unified RAM	Notes
`mlx-community/Qwen2.5-VL-7B-Instruct-8bit` ★	~8 GB	Best Apple Silicon pick — fast, strong JSON
`mlx-community/Qwen2.5-VL-3B-Instruct-8bit`	~4 GB	Lightweight — 4–5 GB RAM
`mlx-community/Qwen3-VL-8B-8bit`	~9 GB	Latest Qwen vision — thinking tags stripped
`mlx-community/MiniCPM-V-2_6-8bit`	~8 GB	Strong OCR + reasoning

Model IDs are as shown in LM Studio's model browser, oMLX's admin panel, or your vLLM config. LM Studio / oMLX serve one model at a time (no fallback needed). vLLM with multiple models loaded uses the same fallback sequence.

Fallback fragment order (vLLM / oMLX with multiple models): qwen2.5-vl-7b → qwen2.5-vl-3b → qwen3-vl → gemma-4 → glm-4 → minicpm-v → llava-onevision → llava

Cloud — OpenRouter

Models fetched live with current pricing. Run vidlizer --list-models to see the live list.

Model	Input / 1M tokens	Notes
`google/gemini-2.5-flash` ★	$0.15	Recommended — 1M ctx, fast, accurate
`google/gemini-2.5-flash-lite`	$0.075	Cheaper — slightly less accurate
`google/gemini-2.5-pro`	$1.25	Best quality, higher cost
`openai/gpt-4o`	$2.50	OpenAI flagship
`openai/gpt-4o-mini`	$0.15	OpenAI budget option
`nvidia/nemotron-nano-12b-v2-vl:free`	free ⚡	Rate-limited (8K/req), 128K ctx
`google/gemma-4-31b-it:free`	free ⚡	Rate-limited, 128K ctx

Free models auto-fallback to the cheapest available paid model on failure.

🎙️ Transcription

For videos with an audio track, vidlizer automatically:

Extracts a mono 16kHz WAV with ffmpeg
Transcribes with Apple MLX Whisper (Neural Engine + GPU on M-series)
Merges each transcript segment into the nearest flow step as speech

mlx-whisper is included in the default install on macOS. The base Whisper model (~150 MB) is downloaded on first transcription.

To opt out: --no-transcript

🛠️ CLI reference

Subcommands

vidlizer setup              # interactive wizard: detects providers, writes .env
vidlizer doctor             # health check: ffmpeg, .env, providers, mlx-whisper
vidlizer doctor --fix       # auto-repair: brew-installs missing deps, re-runs setup
vidlizer mcp-setup          # generate MCP config for Claude Code / Cursor / Claude Desktop

Analysis options

vidlizer [video] [options]

positional:
  video                 Path to file or URL (YouTube/Loom/Vimeo/Twitter)

options:
  -o, --output PATH     Output path (default: <name>.analysis.json/.md/.txt)
  --format FORMAT       Output format: json (default), summary, markdown
  --provider PROVIDER   ollama | openai | openrouter  (openai covers LM Studio + oMLX + vLLM)
  --model MODEL         Model slug — Ollama name, OpenAI-compat ID, or OpenRouter slug
  --max-frames N        Max frames to send (default 60, hard cap 200)
  --start SECONDS       Analyze from this timestamp
  --end SECONDS         Analyze up to this timestamp
  --scene THRESHOLD     Scene-change sensitivity 0–1 (default 0.1)
  --min-interval SECS   Minimum seconds between frames (default 2)
  --fps FPS             Extract at fixed FPS instead of scene-change
  --scale PX            Frame width in pixels (default 512)
  --batch-size N        Frames per API call (0 = auto, default 1 for all providers)
  --concurrency N       Parallel batch workers (default: 4 for OpenRouter, 1 for local)
  --dedup-threshold N   Perceptual dedup Hamming distance (default 8, 0 = off)
  --no-transcript       Skip audio transcription
  --max-cost USD        Abort if spend exceeds this (default 1.00)
  --timeout SECS        Per-request timeout (default 600)
  -v, --verbose         Debug output
  --stats               Show token + cost usage stats by model, then exit
  --clear-stats         Reset usage log, then exit

🔧 Environment variables

Copy env.sample to .env:

# ── Provider ────────────────────────────────────────────────────────────────
# ollama    — local Ollama server (no API key, no cost)
# openai    — any OpenAI-compatible server (LM Studio, oMLX, vLLM, LocalAI, real OpenAI)
# openrouter — cloud inference via OpenRouter
PROVIDER=ollama

# ── Ollama (local, no API key) ──────────────────────────────────────────────
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=qwen2.5vl:3b

# ── OpenAI-compatible (LM Studio, oMLX, vLLM, LocalAI, real OpenAI) ─────────
#   LM Studio default port: 1234  |  oMLX default port: 8000  |  vLLM: 8000
OPENAI_BASE_URL=http://localhost:1234/v1
OPENAI_API_KEY=lm-studio        # "not-needed" for oMLX/vLLM, real key for OpenAI
OPENAI_MODEL=                   # exact model ID as shown in server (required)

# ── OpenRouter (cloud) ──────────────────────────────────────────────────────
OPENROUTER_API_KEY=sk-or-v1-...
OPENROUTER_MODEL=google/gemini-2.5-flash

# ── Frame extraction ────────────────────────────────────────────────────────
SCENE_THRESHOLD=0.1     # lower = more frames
MIN_INTERVAL=2          # seconds between forced frames
MAX_FRAMES=60
FRAME_WIDTH=512
MAX_COST_USD=1.00
BATCH_SIZE=0            # 0 = auto (defaults to 1 per request for all providers)
REQUEST_TIMEOUT=600

# ── Fallback ────────────────────────────────────────────────────────────────
# Same-provider fallback: set FALLBACK_MODEL only.
#   Blank = auto-detect installed models (Ollama) or available models (OpenAI-compat).
# Cross-provider fallback: set FALLBACK_PROVIDER + FALLBACK_MODEL.
#   FALLBACK_BASE_URL and FALLBACK_API_KEY default to the primary provider's values
#   if not set — override only when the fallback uses a different server/key.
FALLBACK_PROVIDER=      # ollama | openai | openrouter  (blank = same as PROVIDER)
FALLBACK_MODEL=         # model ID for fallback  (blank = auto-detect same-provider)
FALLBACK_BASE_URL=      # base URL for fallback openai/ollama server (optional)
FALLBACK_API_KEY=       # API key for fallback provider (optional)

📊 Usage tracking

Every successful run appends a record to ~/.cache/vidlizer/usage.jsonl. Test runs (pytest) are excluded.

vidlizer --stats          # print per-model breakdown
vidlizer --clear-stats    # reset log

Example output:

Usage statistics  (/Users/you/.cache/vidlizer/usage.jsonl)

  Total runs:       12
  Total tokens in:  58,240
  Total tokens out: 6,102
  Total cost:       ~$0.0312
  Total steps:      94

  Model                               Provider     Runs   Tokens in  Tokens out  Cost USD
  gemma-4-E2B-it-MLX-4bit             openai          9      45,800       4,900      free
  google/gemini-2.5-flash             openrouter       3      12,440       1,202  ~$0.0312

Also available as MCP tool: get_usage_stats() → same data as JSON. clear_usage_stats() resets the log.

🔌 MCP Server

Use vidlizer from any MCP-compatible agent — Claude Code, Cursor, Claude Desktop, Gemini CLI.

Model and provider are set via env vars and cannot be overridden by the AI agent. This prevents agents from switching to unexpected or expensive models mid-session.

Install

# pipx (recommended — isolated, globally available)
pipx install "vidlizer[mcp]"

# If vidlizer is already installed via pipx — inject mcp into the same venv
pipx inject vidlizer mcp

# pip / virtualenv
pip install "vidlizer[mcp]"

vidlizer-mcp is added to your PATH automatically. Verify with which vidlizer-mcp.

Note on pipx: pipx install vidlizer creates the vidlizer-mcp entry point but the mcp package is not installed. Run pipx inject vidlizer mcp to add it, or use pipx install "vidlizer[mcp]" upfront.

One-command MCP setup

vidlizer mcp-setup

Interactive wizard — detects vidlizer-mcp, reads your .env, and either writes the config to your editor's config file or shows a one-liner to copy-paste. Supports Claude Code, Cursor, Claude Desktop, Windsurf.

Configure manually

Use the absolute path to the binary — which vidlizer-mcp gives it. All configs below use JSON (works in Claude Code, Cursor, Claude Desktop, Gemini CLI).

For Claude Code via CLI:

claude mcp add-json vidlizer '{"type":"stdio","command":"/path/to/vidlizer-mcp","env":{"PROVIDER":"openrouter","OPENROUTER_API_KEY":"sk-or-v1-...","OPENROUTER_MODEL":"google/gemini-2.5-flash"}}'
# Add --scope project for per-project config instead of global

Single provider — OpenRouter (cloud)

{
  "mcpServers": {
    "vidlizer": {
      "type": "stdio",
      "command": "/absolute/path/to/.venv/bin/vidlizer-mcp",
      "env": {
        "PROVIDER": "openrouter",
        "OPENROUTER_API_KEY": "sk-or-v1-...",
        "OPENROUTER_MODEL": "google/gemini-2.5-flash"
      }
    }
  }
}

Free models (:free suffix) automatically fall back to the cheapest paid model if rate-limited — no extra config needed.

Single provider — Ollama (local, no API key)

{
  "mcpServers": {
    "vidlizer": {
      "type": "stdio",
      "command": "/absolute/path/to/.venv/bin/vidlizer-mcp",
      "env": {
        "PROVIDER": "ollama",
        "OLLAMA_HOST": "http://localhost:11434",
        "OLLAMA_MODEL": "qwen2.5vl:7b"
      }
    }
  }
}

FALLBACK_MODEL is optional — omitting it auto-detects installed models and tries them in preferred order.

Single provider — LM Studio (OpenAI-compatible, port 1234)

{
  "mcpServers": {
    "vidlizer": {
      "type": "stdio",
      "command": "/absolute/path/to/.venv/bin/vidlizer-mcp",
      "env": {
        "PROVIDER": "openai",
        "OPENAI_BASE_URL": "http://localhost:1234/v1",
        "OPENAI_API_KEY": "lm-studio",
        "OPENAI_MODEL": "qwen/qwen2.5-vl-7b-instruct"
      }
    }
  }
}

Single provider — oMLX (Apple Silicon, port 8000)

{
  "mcpServers": {
    "vidlizer": {
      "type": "stdio",
      "command": "/absolute/path/to/.venv/bin/vidlizer-mcp",
      "env": {
        "PROVIDER": "openai",
        "OPENAI_BASE_URL": "http://localhost:8000/v1",
        "OPENAI_API_KEY": "not-needed",
        "OPENAI_MODEL": "mlx-community/Qwen2.5-VL-7B-Instruct-8bit"
      }
    }
  }
}

Cross-provider fallback — oMLX → OpenRouter

Primary runs on local oMLX; if it fails, switches to OpenRouter cloud automatically.

{
  "mcpServers": {
    "vidlizer": {
      "type": "stdio",
      "command": "/absolute/path/to/.venv/bin/vidlizer-mcp",
      "env": {
        "PROVIDER": "openai",
        "OPENAI_BASE_URL": "http://localhost:8000/v1",
        "OPENAI_API_KEY": "not-needed",
        "OPENAI_MODEL": "mlx-community/Qwen2.5-VL-7B-Instruct-8bit",
        "FALLBACK_PROVIDER": "openrouter",
        "FALLBACK_MODEL": "google/gemini-2.5-flash",
        "FALLBACK_API_KEY": "sk-or-v1-..."
      }
    }
  }
}

Cross-provider fallback — oMLX → Ollama

Both local; oMLX fails (model missing / server down) → Ollama takes over.

{
  "mcpServers": {
    "vidlizer": {
      "type": "stdio",
      "command": "/absolute/path/to/.venv/bin/vidlizer-mcp",
      "env": {
        "PROVIDER": "openai",
        "OPENAI_BASE_URL": "http://localhost:8000/v1",
        "OPENAI_API_KEY": "not-needed",
        "OPENAI_MODEL": "mlx-community/Qwen2.5-VL-7B-Instruct-8bit",
        "FALLBACK_PROVIDER": "ollama",
        "FALLBACK_MODEL": "qwen2.5vl:7b"
      }
    }
  }
}

FALLBACK_BASE_URL defaults to OLLAMA_HOST (or http://localhost:11434) — only set it if Ollama is on a non-standard host.

Cross-provider fallback — Ollama → OpenRouter

Local-first; cloud kicks in if no model is installed or inference fails.

{
  "mcpServers": {
    "vidlizer": {
      "type": "stdio",
      "command": "/absolute/path/to/.venv/bin/vidlizer-mcp",
      "env": {
        "PROVIDER": "ollama",
        "OLLAMA_HOST": "http://localhost:11434",
        "OLLAMA_MODEL": "qwen2.5vl:7b",
        "FALLBACK_PROVIDER": "openrouter",
        "FALLBACK_MODEL": "google/gemini-2.5-flash",
        "FALLBACK_API_KEY": "sk-or-v1-..."
      }
    }
  }
}

Same-provider fallback — pin explicit fallback model

When auto-detection isn't wanted; both models use the same provider.

{
  "mcpServers": {
    "vidlizer": {
      "type": "stdio",
      "command": "/absolute/path/to/.venv/bin/vidlizer-mcp",
      "env": {
        "PROVIDER": "ollama",
        "OLLAMA_MODEL": "qwen2.5vl:7b",
        "FALLBACK_MODEL": "qwen2.5vl:3b"
      }
    }
  }
}

Fallback env var reference:

Var	Purpose	Default
`FALLBACK_PROVIDER`	Provider for fallback (`ollama`/`openai`/`openrouter`)	same as `PROVIDER`
`FALLBACK_MODEL`	Model ID for fallback	auto-detect (same provider)
`FALLBACK_BASE_URL`	Base URL for fallback `openai`/`ollama` server	`OPENAI_BASE_URL` or `OLLAMA_HOST`
`FALLBACK_API_KEY`	API key for fallback provider	`OPENAI_API_KEY` or `OPENROUTER_API_KEY`

Logs

All activity (frame extraction, API calls, errors) written to:

tail -f ~/.cache/vidlizer/mcp.log

Tools

Tool	Returns	Tokens out
`analyze_video(path, **opts)`	`analysis_id` + meta	~100
`list_analyses()`	all stored analyses (meta only)	~50/entry
`get_summary(id, level)`	brief/medium/full text summary	~200–2K
`get_step(id, step)`	single flow step	~150
`get_steps(id, start, end)`	step range	scaled
`get_phase(id, phase)`	all steps in named phase	scaled
`search_analysis(id, query)`	steps matching text (`query` or `keyword`)	only hits
`get_transcript(id, start_s, end_s)`	transcript slice	scaled
`get_full_analysis(id)`	full flow + transcript	full
`delete_analysis(id)`	confirmation	~10
`get_usage_stats()`	per-model token + cost breakdown	~50/model
`clear_usage_stats()`	reset usage log	~10

Token-efficient workflow

analyze_video stores the full result on disk and returns only analysis_id + metadata (~100 tokens). The LLM pulls specific parts on demand — a 60-step video costs ~100 tokens to register but only loads what's needed per query.

agent: analyze_video("demo.mp4")
→ { "analysis_id": "abc123", "step_count": 42, "phases": ["Intro", "Demo", "Outro"] }

agent: get_summary("abc123", level="brief")
→ "Intro: Title card displayed | Demo: CLI recording | Outro: Results shown"

agent: get_phase("abc123", "Demo")
→ [ { step, timestamp_s, action, scene }, … ]

agent: search_analysis("abc123", "error")
→ [ { step: 17, matched_field: "observations", matched_value: "Stack trace visible" } ]

MCP resources

URI	Content
`vidlizer://analyses`	All analyses (meta JSON)
`vidlizer://analyses/{id}`	Full analysis JSON
`vidlizer://analyses/{id}/summary`	Medium text summary

🧪 Testing

Fully automated test suite — 248 unit + integration tests, 3 e2e tests.

make install-dev    # installs pytest, pytest-html, pytest-mock
make test           # runs unit + integration (no network) → reports/test-report.html
make test-e2e       # also runs YouTube download + full pipeline e2e
make smoke          # real-provider smoke test against all detected providers

Unit/integration report: reports/test-report.html. Tests cover:

Frame extraction (ffmpeg), perceptual dedup, cache TTL
Audio detection, transcript merge (no duplicates)
PDF → frames, image encoding, URL detection
Output formatter: json/summary/markdown correctness
Full pipeline with mocked OpenRouter (fake HTTP server)
HTTP layer: SSE parsing, 429 handling, cost cap, Ollama streaming
Batch: JSON repair retry, free-model concurrency guard
Models: pricing lookup, fallback sequences, format helpers
Usage tracking: record/stats/clear lifecycle
Real CLI subprocess invocations against real media
Real YouTube download + full analysis (opt-in -m e2e)

Smoke test

make smoke runs the full real-provider pipeline against every detected provider in order (Ollama → LM Studio → oMLX → OpenRouter). Before tests start, it interactively prompts for each local provider:

Vision model found → "Test ollama with qwen2.5vl:3b? [Y/n]"
No vision model (Ollama) → "Download qwen2.5vl:3b (~2.3 GB)? [y/N]"
No vision model (LM Studio/oMLX) → instructions to load one + retry prompt

Each provider is tested in isolation with its own output directory (prevents cache contamination). Local models are unloaded from VRAM after each provider's tests. Results land in reports/smoke-TIMESTAMP.html with per-provider pass/fail scorecards.

make smoke                                  # auto-detect all providers
make smoke ARGS="--provider ollama"         # force single provider
make smoke ARGS="--provider openrouter"     # OpenRouter only (no local needed)

🤝 Contributing

Contributions welcome! See CONTRIBUTING.md for setup, code style, and PR guidelines.

Bug report: open an issue — include provider, model, macOS version, full error
Feature request: open an issue describing the use case
PR: branch from main, add tests, make test passes, ruff check vidlizer/ clean

❤️ Support

vidlizer is free and open source. If it saves you time:

📝 License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.github		.github
assets		assets
reports		reports
scripts		scripts
tests		tests
vidlizer		vidlizer
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
demo.tape		demo.tape
env.sample		env.sample
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
sample-output.json		sample-output.json

Folders and files

Latest commit

History

Repository files navigation

vidlizer

✨ Features

📦 Requirements

🚀 Install

Option 1 — pipx (recommended, isolated)

Option 2 — pip / virtualenv

Option 3 — from source

First-run wizard

Health check

Manual provider setup

⚡ Quick start

📄 Output

Flow step fields

🤖 Models

Local — Ollama

Local — LM Studio / oMLX / vLLM / LocalAI (OpenAI-compatible)

Cloud — OpenRouter

🎙️ Transcription

🛠️ CLI reference

Subcommands

Analysis options

🔧 Environment variables

📊 Usage tracking

🔌 MCP Server

Install

One-command MCP setup

Configure manually

Single provider — OpenRouter (cloud)

Single provider — Ollama (local, no API key)

Single provider — LM Studio (OpenAI-compatible, port 1234)

Single provider — oMLX (Apple Silicon, port 8000)

Cross-provider fallback — oMLX → OpenRouter

Cross-provider fallback — oMLX → Ollama

Cross-provider fallback — Ollama → OpenRouter

Same-provider fallback — pin explicit fallback model

Logs

Tools

Token-efficient workflow

MCP resources

🧪 Testing

Smoke test

🤝 Contributing

❤️ Support

📝 License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages