Feel free to just download any version you want, the README supports all the versions. All features will work after the version 1.4.
A voice-driven AI assistant that listens for spoken queries, performs local actions, optionally captures images for visual requests, and answers using Ollama chat models.
- Voice input with Whisper speech recognition
- Text-to-speech output using
edge-tts - Intelligent command handling with Ollama chat models
- App launching on macOS for
open <app>voice commands - Web search query simplification and results retrieval
- Vision-context detection and camera image capture for visual queries
- ANSI-enhanced console output with
rich
- Install Ollama
- Sign-in to Ollama
- Install ministral-3:3b-cloud, ministral-3:8b-cloud, and gemma4:31b-cloud
- macOS
- Python 3.11+ (recommended)
- Microphone access
- Camera access for image-based queries
- Installed Python packages:
ollamaspeech_recognitionedge_ttsddgsopencv-pythonrich
- Clone or copy this folder:
cd /path/to/AI/OSS_speech_model/Phase-1/V1.7- Create and activate a Python virtual environment:
python3 -m venv venv
source venv/bin/activate- Install required packages:
pip install ollama speech_recognition edge_tts ddgs opencv-python rich- Make sure your environment has access to Ollama.
Run the assistant with:
python main.pyThe assistant will prompt:
- Choose a voice:
MaleorFemale - Speak a query after the prompt
- It will attempt to answer, launch apps, or capture an image if visual context is needed
- When finished, it asks whether you want to continue
- The current implementation uses
afplayto play audio on macOS. - App launching relies on
mdfindand the local macOS application metadata. - If the assistant cannot understand audio, it prompts again.
- Visual queries are captured and saved temporarily as
instant_photo.pngin the current folder.
- If speech recognition fails, verify microphone permissions and
speech_recognitioninstallation. - If camera capture fails, verify camera permissions and that
opencv-pythoncan access the device. - If audio playback fails, ensure
afplayis available on macOS.