Skip to content

docs: regenerate API documentation#494

Merged
AlemTuzlak merged 1 commit intomainfrom
docs/auto-update-1776855558
Apr 23, 2026
Merged

docs: regenerate API documentation#494
AlemTuzlak merged 1 commit intomainfrom
docs/auto-update-1776855558

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Automated documentation update from release

@AlemTuzlak AlemTuzlak merged commit e33ee09 into main Apr 23, 2026
@AlemTuzlak AlemTuzlak deleted the docs/auto-update-1776855558 branch April 23, 2026 09:49
tombeckenham pushed a commit to tombeckenham/ai-tom that referenced this pull request Apr 23, 2026
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AlemTuzlak added a commit that referenced this pull request Apr 23, 2026
…ia + 3.1 Flash TTS, streaming generateAudio + hooks (#463)

* feat: add fal audio, speech, and transcription adapters

Adds falSpeech, falTranscription, and falAudio adapters to @tanstack/ai-fal,
completing fal's media coverage alongside image and video. Introduces a new
generateAudio activity in @tanstack/ai for music and sound-effect generation,
with matching devtools events and types.

Closes #328

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: add ElevenLabs TTS/music/SFX/transcription adapters and Gemini Lyria + 3.1 Flash TTS

Extends @tanstack/ai-elevenlabs (which already covers realtime voice) with
Speech, Music, Sound Effects, and Transcription adapters, each tree-shakeable
under its own import.

Adds Gemini Lyria 3 Pro / Clip music generation via a new generateAudio
adapter, plus the new Gemini 3.1 Flash TTS Preview model with multi-speaker
dialogue support.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: document fal audio, speech, and transcription adapters

Adds a new Audio Generation page, expands the fal adapter reference with
sections for text-to-speech, transcription, and audio/music, and adds fal
sections to the Text-to-Speech and Transcription guides.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: add example pages and tests for audio/tts providers

Expand the ts-react-chat example with provider tabs for OpenAI,
ElevenLabs, Gemini, and Fal on the TTS and transcription pages, plus a
new /generations/audio page covering ElevenLabs Music, ElevenLabs SFX,
Gemini Lyria, and Fal audio generation.

Add a Gemini TTS unit test and wire an audio-gen feature into the E2E
harness (adapter factory, API route, UI, fixture, and Playwright spec).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: apply automated fixes

* docs: lead audio generation guide with Gemini and ElevenLabs

Reorder the Audio Generation page so the direct Gemini (Lyria) and
ElevenLabs (music/sfx) adapters appear before fal.ai, and update the
environment variables + result-shape notes to cover all three providers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ts-react-chat): add audio home tile, sample prompts, and fal model selector

Expose an Audio tile on the welcome grid, offer one-click sample prompts
for every audio provider, and let the Fal provider pick between current
text-to-music models (default MiniMax v2.6). Threads a model override
through the audio API and server fn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: apply automated fixes

* chore: split ElevenLabs audio adapters out to separate PR (#485)

Moves the new ElevenLabs TTS / Music / SFX / Transcription REST adapters
out of this PR into their own issue (#485) and branch
(`elevenlabs-audio-adapters`) so the fal + Gemini audio work can ship
independently. The follow-up PR will rebuild these adapters on top of
the official `@elevenlabs/elevenlabs-js` SDK rather than hand-rolled
fetch calls.

Removed from this branch:
- `packages/typescript/ai-elevenlabs/src/{adapters,utils,model-meta.ts}`
  and their tests (realtime voice code untouched)
- ElevenLabs sections in `docs/media/audio-generation.md`
- ElevenLabs entries in `examples/ts-react-chat` audio-providers catalog,
  server adapter factories, zod schemas, and default provider wiring
- `@tanstack/ai-elevenlabs` bump from the audio changeset

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: apply automated fixes

* fix(ai-fal, ai-gemini): audio adapter bug fixes

- ai-fal: replace `btoa(String.fromCharCode(...bytes))` with a chunked
  helper; the spread form throws RangeError on any realistic TTS clip
  (V8 arg limit ~65k).
- ai-gemini: honor `TTSOptions.voice` as a fallback for the prebuilt
  voice name, move `systemInstruction` inside `config` per the
  @google/genai contract, and wrap raw `audio/L16;codec=pcm` output in
  a RIFF/WAV container so the result is actually playable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(ts-react-chat): warn on rejected audio model overrides

Log a warning instead of silently swapping to the default when a client
sends a model id outside the provider's allowlist, so stale clients or
typo'd config ids are debuggable. Also correct the AudioProviderConfig
JSDoc to describe the models[] ordering as a non-binding UI convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: split generateAudio into generateMusic and generateSoundEffects

Replaces the unreleased generateAudio activity with two distinct activities so
music and sound-effects each have their own types, adapter kinds, provider
factories, and devtools events. This lets providers advertise only the
capabilities they support (Gemini Lyria is music-only; fal has distinct music
and SFX catalogs) and leaves room for kind-specific options without a breaking
change.

- Core: generateMusic/generateSoundEffects activities and MusicAdapter/
  SoundEffectsAdapter interfaces + bases; GeneratedAudio shared between
  MusicGenerationResult and SoundEffectsGenerationResult
- Events: music:request:* and soundEffects:request:* replace audio:*
- fal: falMusic + falSoundEffects factories sharing internal request/response
  helpers; FalMusic/FalSoundEffectsProviderOptions in model-meta
- Gemini: geminiMusic/createGeminiMusic/GeminiMusicAdapter (Lyria is music-only
  so no SFX counterpart)
- ts-react-chat: /generations/music and /generations/sound-effects routes
  backed by a shared AudioGenerationForm; split server fns and API routes
- E2E: music-gen + sound-effects-gen features, parameterized MediaAudioGenUI,
  split fixtures and specs (both feature support sets are empty since
  aimock 1.14 cannot mock Gemini's Lyria AUDIO modality)
- Docs: music-generation.md + sound-effects-generation.md; fal adapter docs
  split; changesets rewritten in place

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fixed type issue

* Delete terminal output

* revert: restore single generateAudio activity

Supersedes 1010e9b. The split into generateMusic + generateSoundEffects
doesn't hold up against fal's audio catalog: dozens of models span
audio-to-audio, voice-change/clone, enhancement, separation, isolation,
merge, and understanding, and individual models (e.g. stable-audio-25)
generate music AND sound effects. A single broader generateAudio activity
fits that reality.

Keeps the aimock Gemini-Lyria gap: audio-gen feature-support stays empty
because aimock 1.14 has no AUDIO-modality mock for generateContent — the
E2E is green by skipping rather than by hitting a mock that doesn't exist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: enforce exactly one of url or b64Json on GeneratedImage and GeneratedAudio

Model GeneratedImage and GeneratedAudio on a shared mutually-exclusive GeneratedMediaSource union so the type rejects empty objects and objects that set both fields. Update the openai, gemini, grok, openrouter, and fal image adapters to construct results by branching on which field is present; openrouter and fal no longer synthesize a data URI on url when returning base64.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: apply automated fixes

* chore(e2e): drop audio-gen scaffolding pending aimock support

The audio-gen feature set was empty because aimock cannot currently mock audio generation, so the Playwright spec ran against zero providers. Remove the dead scaffolding; the wiring can return once aimock audio support lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: add useGenerateAudio hook and streaming support for generateAudio

Closes the parity gap with the other media activities — audio generation
now has the same client-hook UX (connection + fetcher transports) as
image, speech, video, transcription, and summarize. Adds streaming to
generateAudio so it can ride the SSE transport, a matching
AudioGenerateInput type in ai-client, framework hooks in ai-react /
ai-solid / ai-vue / ai-svelte, unit tests, an updated ts-react-chat
example, and docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ai-fal): translate duration per audio model

Fal audio models use different input field names for length: ElevenLabs
Music takes `music_length_ms` in milliseconds, Stable Audio 2.5 takes
`seconds_total`, and most others accept `duration`. The adapter was
passing a generic `duration` unconditionally, so the slider in the
example was silently ignored for ElevenLabs and Stable Audio.

Also: align the Gemini Lyria adapter with the API's MP3 default (only
send responseMimeType when the caller asks for WAV), expand the example
to include Lyria 3 Pro and a dedicated Fal SFX provider, and rename the
example's "Direct" mode to "Hooks" to better reflect what it demos.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(ai-gemini): rename GEMINI_LYRIA_MODELS to GEMINI_AUDIO_MODELS

Align the audio model constant and its re-export with the `generateAudio`
activity naming used across providers, and drop the unused duplicate
`GeminiLyriaModel` type — `GeminiAudioModel` is the single canonical type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ai-gemini): address CR findings — constructor config, TTS model name, PCM channels, voice validation, image error surfacing

* fix(ai-fal): address CR findings — generateId entropy, fetch.ok guards, response-shape validation, size params, proxy+apiKey, content types

* fix(ai-fal): throw on unknown image response shape instead of returning empty

* fix(ai-image-adapters): fix double-wrapped errors, duplicate keys, signature mismatch, null guards

* fix(ai-gemini): address CR findings — test import, image model output meta, option filtering

* fix(example-ts-react-chat): blob URL revocation, route link, body validation, falsy duration render

* fix(ai-core): emit adapter-error events, consistent async, reordered base adapter ctor, type sync

* fix(ai-openrouter): drop redundant null guards that TS types already enforce

The defensive nullish-coalescing on response.choices and img/img.imageUrl
guards that the fix-loop added are impossible per the SDK type signatures;
eslint's no-unnecessary-condition correctly rejects them. Keep only the
typeof url !== 'string' check, which is a real runtime shape guard
(imageUrl.url is typed as string but provider may send a non-string in
rare degraded responses).

* fix: address CodeRabbit review feedback — SSE types, mime normalization, voice validation, etc.

Applies the reviewer-flagged changes that weren't load-bearing for the merge:

- event-client: AudioRequestCompletedEvent.audio is now a mutually-exclusive
  {url; never b64Json} | {b64Json; never url} union so consumers can't read
  both fields simultaneously, mirroring the GeneratedAudio contract in core.
- fal utils: extractUrlExtension now strips URL fragments and trailing
  slashes, parses via the URL API so a TLD like `.com` isn't mistaken for
  an extension, and only inspects the final path segment.
- fal utils: deriveAudioContentType returns `audio/aac` for aac, separated
  from the `m4a`/`mp4` → `audio/mp4` case.
- fal speech: prefer URL-derived extension when deriving `format`, and
  normalize `mpeg` → `mp3` so the field is a usable file extension.
- gemini audio: drop `negativePrompt` (not accepted by GenerateContentConfig)
  and `responseMimeType` (Lyria Clip rejects it, Pro returns MP3 by default)
  from the public provider options surface, and document that the generic
  `duration` option is ignored by Lyria (Clip is fixed at 30s, Pro takes
  duration via the natural-language prompt).
- gemini tts: multiSpeakerVoiceConfig.speakerVoiceConfigs length is now
  validated (1 or 2 speakers), partial user-supplied voiceConfig correctly
  falls back to the standard voice/'Kore' default, parsePcmMimeType tightens
  detection to exclude subtypes containing "wav" so containerized
  `audio/wav;codec=pcm` is no longer re-wrapped, and createGeminiSpeech /
  createGeminiAudio factory functions now spread config before the explicit
  apiKey argument so caller config can't silently override the API key.
- ts-react-chat API routes: replace zod 4's removed `.flatten()` with
  `z.treeifyError()` for validation error details.
- ts-react-chat audio route: `toAudioOutput` returns `null` per the
  `onResult` hook contract instead of throwing synchronously — failures
  are still surfaced via the hook's error state.
- Updates the tests affected by the above behavior changes.

* docs: document debug logging for new audio/speech/transcription activities

- debug-logging.md: list generateAudio/generateTranscription in Non-chat
  activities section; clarify that the `provider` category now applies to
  streaming generateAudio/generateSpeech/generateTranscription calls too.
- audio-generation.md, text-to-speech.md, transcription.md: add a single
  contextual callout at the moment a builder is most likely to need it
  (immediately before the Options table / next to Error Handling), pointing
  to the debug-logging guide.

* docs(skill): add audio/speech CR gotchas + debug-logging to media-generation skill

Agents hitting the new generateAudio/generateSpeech/generateTranscription
activities will run into:

- Gemini Lyria doesn't accept responseMimeType or negativePrompt via
  GenerateContentConfig — shape the prompt instead.
- Lyria 3 Clip is fixed 30s; Lyria 3 Pro reads duration from natural-language
  in the prompt, not the duration option. fal audio maps duration per-model.
- Gemini TTS multiSpeakerVoiceConfig is validated to 1 or 2 speakers.
- debug: DebugOption is threaded through every generate*() activity — reach
  for it instead of writing logging middleware.

Adds four Common Mistake entries, sources the debug-logging doc, and
cross-references the ai-core/debug-logging sub-skill.

* fix(ai-fal): decode data URL audio inputs to Blob for transcription

fal-client auto-uploads Blob/File inputs via fal.storage.upload but
passes strings through unchanged, so data URLs reached fal's API and
got rejected with 422 "Unsupported data URL". Decode data URL strings
to a Blob in buildInput so the auto-upload path handles them; plain
http(s) URLs still pass through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: regenerate API documentation (#494)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Alem Tuzlak <t.zlak@hotmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant