OpenCode skill for transcribing online videos to text, enabling AI agents to learn from video content. Works with Bilibili (B站), YouTube, Vimeo, Twitch, and any platform supported by yt-dlp.
When you send a video link to your AI agent, this skill:
- Fetches existing subtitles if available (instant)
- Falls back to downloading audio and transcribing with local Whisper (1-2 min for typical videos)
- Returns the full transcript your agent can read and summarize
Bilibili (bilibili.com, b23.tv) · YouTube (youtube.com, youtu.be) · Vimeo · Twitch · and hundreds more
See references/setup.md for full installation instructions.
Quick check:
yt-dlp --version && ffmpeg -version | head -1 && whisper --help > /dev/null && echo "All good"CPU transcription with faster-whisper small model (~4 cores):
| Video length | Time |
|---|---|
| 5 min | ~1 min |
| 15 min | ~3 min |
| 30 min | ~6.5 min |
| 1 hour | ~13 min |
The video-toolkit MCP server (configured in opencode.jsonc) exposes the following native tools:
video-toolkit_get-transcript— fetch existing subtitles (platform-dependent, instant)video-toolkit_generate-subtitles— AI transcription via local Whispervideo-toolkit_list-transcript-languages— list available subtitle languagesvideo-toolkit_download-video— download video to local storagevideo-toolkit_list-downloads— list downloaded videosvideo-toolkit_transcribe-audio— transcribe local audio files
MIT