A privacy-first, full-stack local pipeline that transforms voice commands into technical actions. This assistant leverages Llama 3 for orchestration and Whisper for transcription to execute code and manage files within a secure, local sandbox, also save sessions on pine cone to fetch them back in future.
The project follows a modular pipeline designed for low latency and high reliability:
- Input Layer: Captures real-time audio via
streamlit-mic-recorder, uploaded audio files (.wav,.mp3), or direct text commands. - STT Layer (OpenAI Whisper): Local transcription of audio into text. We use the
basemodel withfp16=Falseto ensure stability across CPU/GPU configurations. - Task Router (Ollama/Llama 3): A high-precision intent classifier that breaks down user requests into actionable JSON tasks.
- Execution Layer:
- Logic Engine: Generates code or summaries based on detected intents.
- File System: Automatically writes outputs to the local
/outputdirectory.
- Memory Layer (Pinecone RAG):
- Uses
mxbai-embed-largeto vectorize interactions. - Stores data in the
voice-assistindex for long-term session persistence and retrieval.
- Uses
Before running the application, ensure you have the following installed:
- Ollama: The engine for running LLMs locally. Download here
- Python 3.10+: Ensure Python is added to your system PATH.
- Pinecone API Key: Required for the vector database. Get it from the Pinecone Console.
Open your terminal and pull the models required for logic and embeddings:
ollama pull llama3:latest
ollama pull mxbai-embed-largeInstall all necessary libraries using the requirements.txt file:
pip install -r requirements.txtIf you encounter library conflicts or version mismatches (common with PyTorch/Whisper), run the recovery script:
python repair_env.pySet your Pinecone API key in your environment variables or directly in agent.py:
# In agent.py
api_key = "YOUR_PINECONE_API_KEY"- Cross-Platform STT:
fp16=Falseis set in the Whisper configuration to prevent CUDA-related crashes on non-NVIDIA hardware. - Local Vectorization: Using Ollama for embeddings ensures that no data leaves the local network, maximizing privacy and reducing latency.
- Self-Healing Index: The system automatically detects if the
voice-assistindex exists in Pinecone with the correct 1024 dimensions; if not, it creates it automatically.
| File | Function |
|---|---|
app.py |
Streamlit frontend and terminal-themed UI logic. |
agent.py |
Backend logic, Task Router, and Pinecone integration. |
requirements.txt |
Comprehensive list of project dependencies. |
repair_env.py |
Environment diagnostic and fix script. |
/output |
Secure directory for AI-generated files and scripts. |
- Launch:
streamlit run app.py - Initialize: Click "Start New Session" in the sidebar.
- Command: Type or speak: "Write a python script to add two numbers and save it."
- Recall: Use the sidebar dropdown to reload any past session stored in Pinecone.