watch-video

Give a coding agent the ability to "watch" a video — entirely on your own machine.

watch-video turns a Loom, a YouTube / Vimeo / TikTok / Instagram / X link (or any of the ~1000+ sites yt-dlp supports), or a local file into artifacts an LLM agent can actually read:

a timestamped transcript (local Whisper),
a deduplicated set of key frames (so static screencasts don't waste tokens),
OCR of on-screen text (so financial figures, dashboard numbers, and labels that are shown but never spoken survive), and
a timeline.md that interleaves each frame with what was said and what was on screen at that moment.

It ships both as a command-line tool and as a Claude Code / Codex plugin (the agent runs it for you and reads the result).

Everything runs locally — yt-dlp + ffmpeg + faster-whisper + tesseract. No video or audio is sent to any cloud service. This is the whole point: recordings often contain private or financial data. Version: 1.1.1 · License: MIT

Why

Coding agents increasingly receive screen recordings instead of written tickets — usually a Loom where someone points at a dashboard and says "change this number." Those recordings carry sparse, non-technical instructions and on-screen figures that are never spoken. To act on one, an agent needs the video as readable text + frames.

Existing video plugins either send audio to a cloud transcription API (a non-starter for private/financial recordings) or have no OCR and no frame de-duplication. watch-video is built for the local + Loom-native + on-screen-numbers case. See docs/WHY.md for the full rationale and a feature comparison, and docs/REFERENCE-PLUGINS.md for prior art / credits.

Install

Dependencies

Tool	Required?	Purpose	Install
`uv`	✅	runs the single-file script + its inline deps	`curl -LsSf https://astral.sh/uv/install.sh \| sh`
`ffmpeg`	✅	frame extraction	`brew install ffmpeg` / `apt install ffmpeg` / `dnf install ffmpeg`
`tesseract`	optional	OCR of on-screen text	`brew install tesseract` / `apt install tesseract-ocr` / `dnf install tesseract`

yt-dlp, faster-whisper, Pillow, numpy, pytesseract are pulled automatically by uv on first run (pinned to Python 3.10–3.12). Or run the bootstrap:

python3 scripts/setup.py        # installs uv + ffmpeg (required) and tesseract (optional)
python3 scripts/setup.py --check  # verify only, no install

(On Windows, use python or py if python3 isn't on your PATH.)

Platform support

Platform	Status	Notes
Linux	✅ Supported	Primary dev/test platform.
macOS	✅ Supported	`setup.py` uses Homebrew; all deps available. All paths are POSIX.
Windows (incl. Claude CLI in PowerShell)	⚠️ Should work — not yet tested	The plugin invokes the tool through `uv run` (the one required dep), so it does not depend on a `python3` on PATH, and all paths use `pathlib`. `uv`, `ffmpeg`, `tesseract`, `yt-dlp`, and `faster-whisper` all have Windows builds (`setup.py` uses `winget`). Two caveats: (1) run the bare CLI as `uv run --script watch-video …` — the `./watch-video` shebang form is POSIX-only; (2) the dev scripts `tests/.sh` and `scripts/build-skill.sh` are Bash, so they need Git Bash or WSL* (end users don't run these).

⚠️ Windows has not been tested. The plugin install/run path is built on uv + pathlib specifically to be cross-platform, and Windows should work, but the maintainers have not verified it (macOS is also not maintainer-smoke-tested). If you hit any issue on Windows (or macOS), please open an issue and we'll try to address it.

A. As a Claude Code plugin (recommended for agents)

This repository is a single-plugin marketplace (.claude-plugin/marketplace.json at its root). In Claude Code:

/plugin marketplace add webdevbar/watch-video
/plugin install watch-video@watch-video

(or /plugin marketplace add /path/to/watch-video for a local clone). Then invoke the namespaced command on any video:

Non-interactive equivalents (handy for scripting/CI): claude plugin marketplace add webdevbar/watch-video then claude plugin install watch-video@watch-video; claude plugin validate . checks the manifests.

/watch-video:watch https://www.loom.com/share/XXXX
/watch-video:watch ./recording.mp4 --no-ocr

The command runs the bundled CLI via a thin wrapper, applies sensible agent defaults (tuned OCR on), prints the output directory, and the skill instructs the agent to read the result in order. Dependencies: run python3 scripts/setup.py once (the plugin's SKILL.md documents this).

B. As a Codex skill

git clone https://github.com/webdevbar/watch-video ~/.codex/skills/watch-video
# or copy the built bundle (see Development → build-skill.sh)

The .codex-plugin/plugin.json manifest + shared SKILL.md drive it the same way.

C. As a CLI

git clone https://github.com/webdevbar/watch-video
cd watch-video
chmod +x watch-video
./watch-video <loom-url | any-url | local-file.mp4> [options]

The shebang is #!/usr/bin/env -S uv run --script, so uv handles the environment. On Windows (PowerShell/cmd) the shebang doesn't apply — run it explicitly:

uv run --script watch-video <loom-url | any-url | local-file.mp4> [options]

D. claude.ai web bundle (non-sensitive videos only)

scripts/build-skill.sh produces a watch.skill bundle you can upload at Settings → Capabilities → Skills. ⚠️ The hosted web sandbox runs the pipeline on Anthropic's servers, so the video leaves your machine — use the web surface for non-sensitive / public videos only. Sensitive recordings stay on Claude Code/Codex.

Quick start

Through an agent (Claude Code):

/watch-video:watch https://www.loom.com/share/abc123

As a CLI:

# a local file
./watch-video ./demo.mp4

# a public URL (YouTube, Vimeo, etc. — anything yt-dlp supports)
./watch-video "https://www.youtube.com/watch?v=XXXX"

# a private/team Loom or any login-gated source (reads your browser cookies locally)
./watch-video https://www.loom.com/share/XXXX --cookies-from-browser firefox

# denser sampling + sharper OCR for a number-heavy dashboard recording
./watch-video ./dashboard.mp4 --periodic 2 --ocr-tuned

Sources: local files, Loom, YouTube, Vimeo, TikTok, Instagram, X/Twitter, and any other site yt-dlp supports. Login-gated or private videos need --cookies-from-browser <browser> (or --cookies cookies.txt).

Output lands in ./watch-video-out/<slug>/ (override with --out). The CLI prints the path to SUMMARY.md on stdout; everything else goes to stderr.

Output (what the agent reads)

In ./watch-video-out/<slug>/:

File	What it is
`SUMMARY.md`	Read first. Run metadata + the exact list of artifacts produced + read order.
`timeline.md`	Each kept frame interleaved with the transcript line(s) spoken then + an OCR snippet. The primary comprehension doc.
`transcript.md` / `.txt`	Full timestamped transcript.
`frames/*.jpg`	The deduplicated key frames (timestamp in the filename).
`frames/ocr-combined.md`	All on-screen text, per frame. Cross-check exact numbers against the frame image (OCR can misread; the image is ground truth).
`frames/ocr/*.txt`	Per-frame raw OCR.
`contact-sheet.jpg`	A montage overview of all kept frames.
`.watch-video.json`	Hidden run manifest (tool signature, version, source, created, downloaded-source). Used as the ownership marker for safe cleanup — not for reading.

Read order (skip any file a run didn't produce): SUMMARY.md → timeline.md → transcript.md → frames/*.jpg → frames/ocr-combined.md.

Options

Flag	Default	Description
`--out DIR`	`./watch-video-out/<slug>`	exact output folder for this run
`--model NAME`	`small`	Whisper model: `tiny`/`base`/`small`/`medium`/`large-v3`
`--periodic SECONDS`	`4`	sample a frame every N seconds
`--scene-threshold FLOAT`	`0.08`	extra frame on scene change (0–1)
`--max-frames N`	`200`	strict cap on kept frames (evenly thinned, endpoints kept)
`--dedupe-distance N`	`6`	perceptual-hash distance to treat frames as duplicates
`--ocr-tuned`	off	upscale + threshold + tuned PSM for sharper on-screen-number OCR
`--no-ocr`	—	skip OCR
`--no-timeline`	—	skip `timeline.md`
`--no-contact-sheet`	—	skip the montage
`--no-transcribe`	—	skip transcription (no model download)
`--no-source`	—	delete the downloaded source video after extraction (never a local input)
`--language LANG`	auto	force transcription language (e.g. `en`)
`--cookies-from-browser B`	—	private Loom auth via browser cookies (`firefox`/`chrome`/…)
`--cookies FILE`	—	a Netscape `cookies.txt` for yt-dlp
`--force`	—	allow writing into a non-empty directory not created by watch-video
`--clean …` / `--clean-older-than DAYS`	—	janitorial cleanup (see below)

Safety: a run refuses to write into a non-empty directory it didn't create (use --force to override), so it never clobbers your files.

Cleanup & ephemeral mode

Output folders persist (re-readable, auditable) and are gitignored. To remove them:

./watch-video --clean <slug>            # delete one run's folder (under ./watch-video-out/)
./watch-video --clean ./path/to/dir     # delete an explicit folder (custom --out)
./watch-video --clean all               # delete all watch-video folders under ./watch-video-out/
./watch-video --clean-older-than 7      # delete watch-video folders older than 7 days

Cleanup only deletes folders carrying a valid .watch-video.json signature, removes only watch-video's own artifacts within them, and removes the folder itself only if it's then empty — your files are never deleted.

Ephemeral mode (/watch-video:watch … --ephemeral) is a two-step, agent-driven flow: the agent runs the tool, reads the artifacts into context, then runs watch-run.py --clean <dir> to delete them — nothing sensitive lingers on disk.

Privacy

Local-first. On Claude Code and Codex (your machine), transcription + OCR never leave the device. No cloud transcription backend is ever used.
Web surface caveat. The claude.ai web bundle runs in a hosted sandbox — for non-sensitive / public videos only.
Output is sensitive. Folders may contain private/financial data; they're .gitignored — never commit one. For private Loom links, cookies are read locally; never copy them into output/transcripts.

How it works

acquire (local file | yt-dlp)
   → extract frames (first + scene-change + periodic, via ffmpeg)
   → dedupe (perceptual hash; strict --max-frames cap)
   → transcribe (faster-whisper, local CPU int8)   [skippable]
   → OCR each frame (tesseract)                     [optional]
   → contact sheet
   → timeline.md (interleave frames + transcript + OCR)
   → SUMMARY.md + .watch-video.json manifest

The transcription backend is isolated in one transcribe() function — a drop-in swap to CUDA or whisper.cpp+Vulkan/Metal. See docs/TRANSCRIPTION-BACKENDS.md.

The plugin layer is a thin wrapper (scripts/watch-run.py) around the unchanged CLI — one implementation of the pipeline, never duplicated.

A step-by-step user guide lives in docs/USER-GUIDE.md.

Development

bash tests/run_all.sh                 # full suite (offline; tesseract optional)
WV_TRANSCRIBE_TEST=1 bash tests/transcribe_test.sh   # opt-in: exercises real Whisper
bash scripts/build-skill.sh dist/watch.skill         # build the distributable bundle

Tests are offline bash integration + unit tests against a synthetic clip (no network once the uv cache is warm; --no-transcribe avoids the model download). The CLI is a single file (watch-video); the wrapper/bootstrap/builder live in scripts/.

Credits

Packaging modeled on, and with thanks to, mathiaschu/watch and bradautomates/claude-video. The OCR, frame de-duplication, timeline.md interleave, and local faster-whisper backend are watch-video's additions.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

watch-video

Table of contents

Why

Install

Dependencies

Platform support

A. As a Claude Code plugin (recommended for agents)

B. As a Codex skill

C. As a CLI

D. claude.ai web bundle (non-sensitive videos only)

Quick start

Output (what the agent reads)

Options

Cleanup & ephemeral mode

Privacy

How it works

Development

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.claude-plugin		.claude-plugin
.codex-plugin		.codex-plugin
.github/workflows		.github/workflows
commands		commands
docs		docs
scripts		scripts
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
watch-video		watch-video

Folders and files

Latest commit

History

Repository files navigation

watch-video

Table of contents

Why

Install

Dependencies

Platform support

A. As a Claude Code plugin (recommended for agents)

B. As a Codex skill

C. As a CLI

D. claude.ai web bundle (non-sensitive videos only)

Quick start

Output (what the agent reads)

Options

Cleanup & ephemeral mode

Privacy

How it works

Development

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages