A lightweight voice assistant that uses Ollama for AI responses and Google Text-to-Speech (gTTS) for voice output, featuring real-time voice interaction, conversation memory, and audio processing optimizations.
-
Voice Interaction
- Real-time speech detection using Silero VAD
- Whisper-based transcription (faster-whisper)
- Interruptible speech playback
- Background audio processing
-
Enhanced Audio
- Google TTS with natural chunking
- FFmpeg-accelerated playback (1.15x speed - Optional)
- Audio queue prioritization system
- Automatic temp file cleanup
-
Conversation Management
- Persistent conversation history (JSON)
- Context-aware prompting
- Model-specific system prompts
- Configurable history length
-
Technical Features
- GPU acceleration support (CUDA)
- Multi-threaded audio processing
- Cross-platform compatibility
- Model selection interface
- Python 3.7+
- Ollama installed and running locally
- Internet connection (for Google TTS service)
- FFmpeg (optional - for audio speed adjustment)
git clone https://github.com/ExoFi-Labs/OllamaGTTS.git
cd OllamaGTTSpython -m venv venvActivate the virtual environment:
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
pip install -r requirements.txt[!NOTE] To install pyaudio in macOS environment, please install
portaudiofirst.
xcode-select --install
brew install portaudioIf you haven't already installed Ollama, follow the instructions at Ollama's official website.
Make sure you have at least one model downloaded using:
ollama pull llama3.3or any other model of your choice.
FFmpeg is used to adjust the speed of audio playback. The application works without FFmpeg, but audio will play at normal speed.
- Windows: Download from FFmpeg's official website and add to your PATH
- macOS: Install using Homebrew:
brew install ffmpeg - Ubuntu/Debian: Install using apt:
sudo apt install ffmpeg - Other Linux: Use your distribution's package manager
- Untested on Mac/Ubuntu
- Run the application:
python ollama_gttsg.py- Select a model from the list of available models
- Enter a system message or press Enter to use the model's default
- Start your conversation with the assistant
- Type your message and press Enter to send
- Type
exitorquitto end the conversation
- The application connects to your local Ollama instance and lists available models
- When you send a message, it gets streamed to the selected Ollama model
- As responses are received, the text is chunked at natural pause points
- Each chunk is converted to speech using Google's TTS service
- Audio is played back in the correct order with speed adjustment (if FFmpeg is available)
- Conversation history is stored for future context
To change the TTS voice language, modify the lang parameter in the create_and_queue_audio function. Default is English ('en').
If you have FFmpeg installed, you can change the speech speed by modifying the speed_factor value in the create_and_queue_audio function. The default is 1.15 (15% faster than normal).
vad_threshold = 0.5 # Speech detection sensitivity (0.3-0.7) silence_duration = 1.0 # Seconds of silence to end speech speed_factor = 1.15 # Playback speed multiplier
max_history = 10 # Number of exchanges to remember
- Make sure your system's audio is not muted
- Check that pygame is properly installed
- Try restarting the application
- Make sure you've downloaded at least one model using
ollama pull
- If you want audio speed adjustment, make sure FFmpeg is installed and available in your system PATH
- Without FFmpeg, the application will still work but will use normal speed audio
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.