Skip to content

v0.7.0 — Audio & video (transcribe + media Q&A)

Latest

Choose a tag to compare

@MarcosNahuel MarcosNahuel released this 19 Jun 18:13
· 6 commits to main since this release

Gemini is natively multimodal in audio and video — capabilities Claude Code lacks. v0.7.0 offloads them to agy.

  • /agy:transcribe <audio|video|YouTube-URL> [focus] — faithful transcript + summary in the source language; timestamps for video/URLs. Voice notes, meetings, calls, screencasts. → docs/agy/transcripts/
  • /agy:media <file|URL> | <question> — multimodal Q&A over audio/video/image ("what decisions were made?", "what happens at 2:30?"), grounded with time references. → docs/agy/media/

Verified on a real WhatsApp .ogg voice note and a YouTube video. 15 commands now, no Node runtime.