LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
-
Updated
May 19, 2025 - Python
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
A simple, high-quality voice conversion tool focused on ease of use and performance.
Realtime AI speech with OpenAI Realtime API and Gemini Live API on Arduino ESP32 with Secure Websockets and Deno edge functions with >15 minutes uninterrupted conversations globally for AI toys, AI companions, AI devices and more
A lightning-fast, cross-platform AI chat application built with React Native.
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
A desktop application that uses AI to translate voice between languages in real time, while preserving the speaker's tone and emotion.
A real-time speech-to-speech chatbot powered by Whisper Small, Llama 3.2, and Kokoro-82M.
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not limited to end-to-end speech interaction, end-to-end speech translation and speech recognition.
Speech-to-speech AI assistant with natural conversation flow, mid-speech interruption, vision capabilities and AI-initiated follow-ups. Features low-latency audio streaming, dynamic visual feedback, and works with local LLM/TTS services via OpenAI-compatible endpoints.
Samantha OS1 is a conversational AI assistant powered by the Realtime API from OpenAI
This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming architecture for fluid conversations with immediate responses and natural interruption handling.
An easy-to-use, fast, and easily integrable tool for evaluating audio LLM
Cascading voice assistant combining real-time speech recognition, AI reasoning, and neural text-to-speech capabilities.
svelte component for using the openai realtime api
Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".
🗣️ Real‑time, low‑latency voice, vision, and conversational‑memory AI assistant built on LiveKit and local LLMs ✨
💬 "Realtime" voice transcription and cloning using ElevenLabs's API.
Codename's rvc fork version 3, based on Applio.
Add a description, image, and links to the speech-to-speech topic page so that developers can more easily learn about it.
To associate your repository with the speech-to-speech topic, visit your repo's landing page and select "manage topics."