Skip to content

mohdel v0.111.0

Choose a tag to compare

@clbrge clbrge released this 10 Jun 10:25
· 7 commits to main since this release

Added

  • Speech-to-text via model.transcribe(audio, options?) — a third call
    primitive alongside answer() and image(). One shared adapter posts
    multipart audio to the OpenAI-compatible /audio/transcriptions
    endpoint; registered for groq, mistral, and openai. Audio comes
    from a file:// or data: URI; result is
    { status, text, language, durationSeconds, cost, timestamps }.
    Factory path only for now — thin-gate has no /v1/transcription route
    yet, so the cross-process client cannot transcribe.
  • mo transcribe <model> <audio-file> CLI command — MIME type guessed
    from the extension (--mime to override), --language / --prompt
    hints, --json output, duration/cost summary on stderr.
  • Catalog support for transcription entries: type: "transcription",
    transcriptionPrice (USD per audio minute), "audio" in
    inputFormat. Cost is duration × per-minute price when the provider
    reports duration, with a token-pricing fallback for OpenAI's
    gpt-4o-*-transcribe models (computeTranscriptionCost in
    _pricing.js).
  • Live transcription smoke tests (test/live/transcription.live.test.js),
    key-gated per provider; the audio fixture is a generated sine WAV.

Changed

  • README repositioned around the gateway wedge new "How it compares"
    section (LiteLLM, Vercel AI SDK, OpenRouter, raw SDKs).
  • package.json description rewritten in searcher vocabulary:
    LiteLLM-style unified API, provider names spelled out, per-call USD
    cost tracking, speech-to-text.