You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Speech-to-text via model.transcribe(audio, options?) — a third call
primitive alongside answer() and image(). One shared adapter posts
multipart audio to the OpenAI-compatible /audio/transcriptions
endpoint; registered for groq, mistral, and openai. Audio comes
from a file:// or data: URI; result is { status, text, language, durationSeconds, cost, timestamps }.
Factory path only for now — thin-gate has no /v1/transcription route
yet, so the cross-process client cannot transcribe.
mo transcribe <model> <audio-file> CLI command — MIME type guessed
from the extension (--mime to override), --language / --prompt
hints, --json output, duration/cost summary on stderr.
Catalog support for transcription entries: type: "transcription", transcriptionPrice (USD per audio minute), "audio" in inputFormat. Cost is duration × per-minute price when the provider
reports duration, with a token-pricing fallback for OpenAI's gpt-4o-*-transcribe models (computeTranscriptionCost in _pricing.js).
Live transcription smoke tests (test/live/transcription.live.test.js),
key-gated per provider; the audio fixture is a generated sine WAV.
Changed
README repositioned around the gateway wedge new "How it compares"
section (LiteLLM, Vercel AI SDK, OpenRouter, raw SDKs).