mohdel v0.111.0

clbrge released this 10 Jun 10:25

· 7 commits to main since this release

4040151

Added

Speech-to-text via model.transcribe(audio, options?) — a third call
primitive alongside answer() and image(). One shared adapter posts
multipart audio to the OpenAI-compatible /audio/transcriptions
endpoint; registered for groq, mistral, and openai. Audio comes
from a file:// or data: URI; result is
{ status, text, language, durationSeconds, cost, timestamps }.
Factory path only for now — thin-gate has no /v1/transcription route
yet, so the cross-process client cannot transcribe.
mo transcribe <model> <audio-file> CLI command — MIME type guessed
from the extension (--mime to override), --language / --prompt
hints, --json output, duration/cost summary on stderr.
Catalog support for transcription entries: type: "transcription",
transcriptionPrice (USD per audio minute), "audio" in
inputFormat. Cost is duration × per-minute price when the provider
reports duration, with a token-pricing fallback for OpenAI's
gpt-4o-*-transcribe models (computeTranscriptionCost in
_pricing.js).
Live transcription smoke tests (test/live/transcription.live.test.js),
key-gated per provider; the audio fixture is a generated sine WAV.

Changed

README repositioned around the gateway wedge new "How it compares"
section (LiteLLM, Vercel AI SDK, OpenRouter, raw SDKs).
package.json description rewritten in searcher vocabulary:
LiteLLM-style unified API, provider names spelled out, per-call USD
cost tracking, speech-to-text.

Assets 2