Release Implement Text-to-Speech (TTS) UI and Enhanced Audio Playback · damianvtran/local-operator-ui

What's Changed

This release introduces comprehensive Text-to-Speech (TTS) capabilities to the Local Operator UI, allowing users to generate speech from agent messages and selected text. It includes new UI components for audio playback, integrates with the backend's new speech API, and enhances user interaction with new keyboard shortcuts and controls.

What does this change address? This is a new feature implementation to enable audio output for agents and improve language interaction within the UI.
What are the key improvements or modifications?
- TTS Playback Controls: Added play/pause, stop, and replay functionality for agent messages, with caching of audio blobs.
- Text Selection to Speech: Implemented a new control that appears on text selection, allowing users to generate and play speech for selected text.
- New AudioAttachment Component: A dedicated React component for playing audio files, including progress, volume, and playback rate controls.
- Keyboard Shortcuts: Introduced Cmd/Ctrl + Shift + S to start speech-to-text recording and spacebar hold to record.
- Cross-Origin Policy Update: Modified index.html to allow blob: for media-src in Content Security Policy.
- Platform Detection: Added platform detection to display correct keyboard shortcuts (Cmd for macOS, Ctrl for others).
- Integration with Backend API: Connected the UI to the new /v1/tools/speech and /v1/agents/{agent_id}/speech endpoints.
- Zustand Speech Store: Created a new Zustand store (useSpeechStore) to manage speech playback state, audio caching, and loading.

Impact

Does this change introduce any breaking changes? No breaking changes; it's additive and backward-compatible.
Are there any dependency updates? No new external dependencies.
Are there any performance or security implications? Performance is dependent on the backend API and network; security considerations include proper handling of audio data and API key authentication for speech generation.

PRs

feat: Implement Text-to-Speech (TTS) UI and Enhanced Audio Playback by @damianvtran in #67

Full Changelog: v0.11.0...v0.11.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Text-to-Speech (TTS) UI and Enhanced Audio Playback

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Impact

PRs

Contributors

Uh oh!