FFmpeg for AI agents — every command returns structured JSON with recovery hints.
CutAgent is designed from the ground up for AI agents and programmatic video editing. Every CLI command outputs structured JSON. Every operation is composable through a declarative Edit Decision List (EDL) format. No GUI, no human-formatted text — just clean machine-readable interfaces for professional video cutting.
- Agent-first: Every command returns structured JSON — built for LLM tool use, not human eyes
- Declarative EDL: Describe your edit as a JSON document, execute it in one call
- Zero runtime dependencies: Pure Python + FFmpeg — or
pip install 'cutagent[ffmpeg]'to bundle everything - Content intelligence: Scene detection, silence detection, audio levels, keyframe analysis, beat detection
- Professional editing: Trim, split, concat, reorder, extract, crop, resize, fade with crossfade transitions, speed control
- Audio polish: Mix background music, adjust volume, replace audio, normalize loudness (EBU R128)
- Text & motion graphics: Burn-in titles, lower-thirds, annotations, and keyframe-driven animations
- Structured errors: Error codes, recovery hints, and context in every failure response
| Dimension | CutAgent | MoviePy | ffmpeg-python | raw FFmpeg |
|---|---|---|---|---|
| Output format | Structured JSON | Python objects / text | N/A (returns nothing) | Human text |
| Error handling | Codes + recovery hints | Exceptions | Exceptions | Unstructured stderr |
| Agent-friendly | Yes | Partial | No | No |
| Declarative EDL | Yes | No | No | No |
| Content intelligence | Scenes, silence, beats | Limited | No | Manual |
| Zero extra deps | Python + FFmpeg | NumPy, etc. | FFmpeg | FFmpeg |
- Python 3.10+
- FFmpeg and FFprobe (see setup options below)
pip install cutagentWith bundled FFmpeg (no separate install needed):
pip install 'cutagent[ffmpeg]'This uses static-ffmpeg to auto-download ffmpeg + ffprobe binaries on first use. Works on Windows, macOS (Intel + Apple Silicon), and Linux.
From source (development):
git clone https://github.com/DaKev/cutagent.git
cd cutagent
pip install -e ".[dev]"CutAgent needs ffmpeg and ffprobe. It searches for them in this order:
- Environment variables
CUTAGENT_FFMPEG/CUTAGENT_FFPROBE(exact path to binary) - Environment variable
CUTAGENT_FFMPEG_DIR(directory containing both binaries) - System PATH (
ffmpeg/ffprobeon$PATH) - static-ffmpeg package (if installed via
pip install 'cutagent[ffmpeg]') - imageio-ffmpeg package (ffmpeg only, if installed)
Platform-specific install (if not using cutagent[ffmpeg]):
| Platform | Command |
|---|---|
| macOS | brew install ffmpeg |
| Ubuntu/Debian | sudo apt install ffmpeg |
| Windows | winget install ffmpeg or choco install ffmpeg |
Verify your setup:
cutagent doctorThis checks for ffmpeg/ffprobe, reports versions, and flags any issues.
from cutagent import crop, execute_edl, probe, trim
# Inspect a video
info = probe("interview.mp4")
print(info.duration, info.width, info.height)
# Trim a segment
result = trim("interview.mp4", start="00:02:15", end="00:05:40", output="clip.mp4")
# Crop to a vertical center cut
result = crop("interview.mp4", x=420, y=0, width=1080, height=1920, output="vertical.mp4")
# Execute a full edit decision list
edl = {
"version": "1.0",
"inputs": ["interview.mp4"],
"operations": [
{"op": "trim", "source": "interview.mp4", "start": "00:02:15", "end": "00:05:40"},
{"op": "trim", "source": "interview.mp4", "start": "00:12:00", "end": "00:14:30"},
{"op": "concat", "segments": ["$0", "$1"]}
],
"output": {"path": "highlight.mp4", "codec": "copy"}
}
result = execute_edl(edl)AI agents: start here — run cutagent capabilities to get the full machine-readable schema of all operations, a quality checklist, a phased workflow, and recipe examples for common video editing patterns.
# AI agents: start here — discover all operations, workflow, and recipes
cutagent capabilitiesCutAgent now supports a payload-first execution path for single operations:
# 1) Discover schemas at runtime
cutagent schema index
cutagent schema operation trim
cutagent schema edl
# 2) Dry-run a single operation payload (no media mutation)
cutagent op trim --dry-run --json '{
"source": "input.mp4",
"start": "00:00:01",
"end": "00:00:05",
"output": {"path": "clip.mp4", "codec": "copy"}
}'
# 3) Execute after validation
cutagent op trim --json '{
"source": "input.mp4",
"start": "00:00:01",
"end": "00:00:05",
"output": {"path": "clip.mp4", "codec": "copy"}
}'
# 4) Transform operations use the same payload-first flow
cutagent op resize --dry-run --json '{
"source": "input.mp4",
"width": 1080,
"height": 1920,
"fit": "contain",
"output": {"path": "social.mp4", "codec": "libx264"}
}'For large analysis responses, shape output to protect agent context:
# Keep only selected fields
cutagent probe input.mp4 --fields path,duration,width,height
# Stream heavy list responses as NDJSON
cutagent scenes input.mp4 --response-format ndjson
cutagent keyframes input.mp4 --response-format ndjson --limit 100
cutagent beats input.mp4 --response-format ndjson --limit 100 --min-strength 1.0Optional response sanitization for agent-facing reads:
cutagent execute edit.json --dry-run --sanitize-output basiccutagent probe interview.mp4 # Media metadata
cutagent summarize interview.mp4 # Full content map (scenes + silence + suggested cuts)
cutagent scenes interview.mp4 --threshold 0.3 # Scene boundaries
cutagent silence interview.mp4 # Silence intervals (dead air, pauses)
cutagent silence interview.mp4 --limit 50 # Limit large silence outputs
cutagent beats interview.mp4 # Musical beats (for rhythm-aligned cuts)
cutagent beats interview.mp4 --min-strength 1.0 # Keep only stronger beats
cutagent keyframes interview.mp4 # Keyframe positions
cutagent keyframes interview.mp4 --limit 100 # Limit large keyframe outputs
cutagent audio-levels interview.mp4 # Audio levels over timecutagent trim interview.mp4 --start 00:02:15 --end 00:05:40 -o clip.mp4
cutagent split interview.mp4 --at 00:05:00,00:10:00 --prefix segment
cutagent concat clip1.mp4 clip2.mp4 -o merged.mp4
cutagent speed interview.mp4 --factor 2.0 -o fast.mp4
cutagent crop interview.mp4 --x 160 --y 0 --width 320 --height 480 -o portrait_crop.mp4
cutagent resize interview.mp4 --width 1080 --height 1920 --fit contain -o social.mp4
cutagent extract interview.mp4 --stream audio -o audio.aaccutagent normalize interview.mp4 -o normalized.mp4 # EBU R128 loudness
cutagent mix interview.mp4 --audio music.mp3 --mix-level 0.2 -o with_music.mp4 # Background music
cutagent volume interview.mp4 --gain-db 6.0 -o louder.mp4 # Volume adjustment
cutagent replace-audio interview.mp4 --audio voiceover.mp3 -o replaced.mp4 # Replace audio track# Burn-in titles and lower-thirds
cutagent text interview.mp4 --entries-json '[{"text": "Interview Title", "position": "center", "font_size": 72, "start": "0", "end": "3"}]' -o titled.mp4
# Keyframe-driven animations (slide-in, fade-in)
cutagent animate interview.mp4 --layers-json '[{"type": "text", "text": "Hello", "start": 0, "end": 3, "properties": {"opacity": {"keyframes": [{"t": 0, "value": 0}, {"t": 0.5, "value": 1}]}}}]' -o animated.mp4
# Fade in/out for polished opening and closing
cutagent fade interview.mp4 --fade-in 1.0 --fade-out 1.0 -o faded.mp4cutagent validate edit.json # Dry-run validation
cutagent execute edit.json # Execute the full editThe Edit Decision List is a declarative JSON format for multi-step edits. Operations run sequentially; $N references the output of operation N:
{
"version": "1.0",
"inputs": ["interview.mp4", "broll.mp4", "background_music.mp3"],
"operations": [
{"op": "trim", "source": "$input.0", "start": "00:01:00", "end": "00:03:00"},
{"op": "trim", "source": "$input.1", "start": "00:00:10", "end": "00:00:20"},
{"op": "crop", "source": "$0", "x": 160, "y": 0, "width": 320, "height": 480},
{"op": "resize", "source": "$2", "width": 1080, "height": 1920, "fit": "contain"},
{"op": "normalize", "source": "$3"},
{"op": "fade", "source": "$1", "fade_in": 0.5, "fade_out": 0.5},
{"op": "concat", "segments": ["$4", "$5"], "transition": "crossfade", "transition_duration": 0.5},
{"op": "mix_audio", "source": "$6", "audio": "$input.2", "mix_level": 0.15}
],
"output": {"path": "final.mp4", "codec": "libx264"}
}Available operations: trim, split, concat, reorder, extract, fade, speed, crop, resize, mix_audio, volume, replace_audio, normalize, text, animate
CutAgent exposes tool schemas and CLI commands designed for LLM tool use and MCP integration. Use cutagent.tools to get JSON schema definitions for your agent's tool registry, then invoke the CLI and parse the structured output.
import json
import subprocess
# Get tool definitions for your LLM
from cutagent.tools import dump_all_schemas
schemas = json.loads(dump_all_schemas())
# Invoke CLI and parse JSON output
result = subprocess.run(
["cutagent", "probe", "video.mp4"],
capture_output=True, text=True, check=False
)
info = json.loads(result.stdout)
# Validate EDL before execute
subprocess.run(["cutagent", "validate", "edit.json"], check=True)
# Runtime schema introspection from CLI
subprocess.run(["cutagent", "schema", "operation", "trim"], check=True)CutAgent doesn't capture screens — FFmpeg (its underlying engine) handles that part. Capture with FFmpeg, then immediately hand the file to CutAgent for post-production.
macOS (avfoundation)
# List available devices first
ffmpeg -f avfoundation -list_devices true -i ""
# Record screen (device index 1) with system audio (device index 0)
ffmpeg -f avfoundation -i "1:0" -t 300 screen.mp4Linux (x11grab)
# Full-screen capture at 1920×1080
ffmpeg -f x11grab -s 1920x1080 -r 30 -i :0.0 -t 300 screen.mp4Windows (gdigrab)
# Full desktop capture
ffmpeg -f gdigrab -framerate 30 -i desktop -t 300 screen.mp4After recording, the typical cleanup steps are silence detection (to find dead air at the start/end or during pauses), trimming, and audio normalization.
# Inspect the recording
cutagent probe screen.mp4
# Find silence intervals (dead air, pauses)
cutagent silence screen.mp4 --threshold -35 --min-duration 0.5
# Get a full content map (scenes + silence + suggested cuts)
cutagent summarize screen.mp4
# Trim to the content window (remove intro/outro dead air)
cutagent trim screen.mp4 --start 00:00:02.1 --end 00:08:43.7 -o content.mp4
# Normalize audio loudness for streaming/sharing
cutagent normalize content.mp4 -o final.mp4This example auto-detects silence boundaries and builds the full post-processing pipeline programmatically:
from cutagent import probe, detect_silence, execute_edl
from cutagent.models import format_time
recording = "screen.mp4"
# Detect intro/outro silence
silences = detect_silence(recording, threshold=-35.0, min_duration=0.5)
# Derive content window from first and last silence boundary
content_start = format_time(silences[0].end) if silences else "0"
content_end = format_time(silences[-1].start) if len(silences) >= 2 else format_time(probe(recording).duration)
# Build and execute the EDL: trim dead air → normalize audio
edl = {
"version": "1.0",
"inputs": [recording],
"operations": [
{"op": "trim", "source": "$input.0", "start": content_start, "end": content_end},
{"op": "normalize", "source": "$0", "target_lufs": -16.0},
],
"output": {"path": "final.mp4", "codec": "libx264"},
}
result = execute_edl(edl)
print(result.to_dict()){
"version": "1.0",
"inputs": ["screen.mp4"],
"operations": [
{"op": "trim", "source": "$input.0", "start": "00:00:02.1", "end": "00:08:43.7"},
{"op": "normalize", "source": "$0", "target_lufs": -16.0},
{"op": "text", "source": "$1",
"entries": [{"text": "Demo", "position": "bottom-right", "font_size": 32,
"start": "0", "end": "5", "font_color": "white"}]}
],
"output": {"path": "final.mp4", "codec": "libx264"}
}┌──────────────────────────────────────────────────────────────────┐
│ cutagent (CLI / Python API) │
├──────────────────┬─────────────────┬─────────────────────────────┤
│ cli/__init__.py │ engine.py │ validation.py │
│ CLI composition │ EDL execution │ Dry-run validation │
├──────────────────┼─────────────────┼─────────────────────────────┤
│ probe.py │ operations.py │ models.py │
│ Media analysis │ Video ops │ Typed dataclasses │
│ + beat detect │ audio_ops.py │ │
│ │ Audio ops │ │
├──────────────────┴─────────────────┴─────────────────────────────┤
│ ffmpeg.py (subprocess wrappers) │ errors.py (error codes) │
└──────────────────────────────────────────────────────────────────┘
ffmpeg.pyis the only module that spawns subprocessesmodels.pyanderrors.pyhave zero internal dependencies- All public functions return typed dataclasses, never raw dicts
- The CLI outputs JSON exclusively — designed for machine consumption
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Validation error (bad input, invalid EDL) |
| 2 | Execution error (FFmpeg failed) |
| 3 | System error (FFmpeg not found, permissions) |
Every error includes a code, message, and recovery suggestions:
{
"error": true,
"code": "TRIM_BEYOND_DURATION",
"message": "End time 01:00:00 (3600.000s) exceeds duration (120.500s)",
"recovery": [
"Source duration is 120.500s — set end to 120.500 or less",
"Run 'cutagent probe <file>' to check the actual duration"
],
"context": {"source": "clip.mp4", "duration": 120.5, "end": "01:00:00"}
}Contributions are welcome! Please read CONTRIBUTING.md for guidelines on:
- Setting up the development environment
- Architecture principles and code style
- Adding new operations
- The JSON output contract
