Skip to content

DevSahuIit/Voice-Assistant

Repository files navigation

🎙️ Voice-Assistant (Memo AI)

A privacy-first, full-stack local pipeline that transforms voice commands into technical actions. This assistant leverages Llama 3 for orchestration and Whisper for transcription to execute code and manage files within a secure, local sandbox, also save sessions on pine cone to fetch them back in future.


🏗️ System Architecture

The project follows a modular pipeline designed for low latency and high reliability:

  1. Input Layer: Captures real-time audio via streamlit-mic-recorder, uploaded audio files (.wav, .mp3), or direct text commands.
  2. STT Layer (OpenAI Whisper): Local transcription of audio into text. We use the base model with fp16=False to ensure stability across CPU/GPU configurations.
  3. Task Router (Ollama/Llama 3): A high-precision intent classifier that breaks down user requests into actionable JSON tasks.
  4. Execution Layer:
    • Logic Engine: Generates code or summaries based on detected intents.
    • File System: Automatically writes outputs to the local /output directory.
  5. Memory Layer (Pinecone RAG):
    • Uses mxbai-embed-large to vectorize interactions.
    • Stores data in the voice-assist index for long-term session persistence and retrieval.

🛠️ Prerequisites

Before running the application, ensure you have the following installed:

  • Ollama: The engine for running LLMs locally. Download here
  • Python 3.10+: Ensure Python is added to your system PATH.
  • Pinecone API Key: Required for the vector database. Get it from the Pinecone Console.

🚀 Setup & Installation

1. Pull Local Models

Open your terminal and pull the models required for logic and embeddings:

ollama pull llama3:latest
ollama pull mxbai-embed-large

2. Install Python Dependencies

Install all necessary libraries using the requirements.txt file:

pip install -r requirements.txt

3. Environment Repair (Optional)

If you encounter library conflicts or version mismatches (common with PyTorch/Whisper), run the recovery script:

python repair_env.py

4. Configure API Key

Set your Pinecone API key in your environment variables or directly in agent.py:

# In agent.py
api_key = "YOUR_PINECONE_API_KEY"

🔌 Hardware Workarounds

  • Cross-Platform STT: fp16=False is set in the Whisper configuration to prevent CUDA-related crashes on non-NVIDIA hardware.
  • Local Vectorization: Using Ollama for embeddings ensures that no data leaves the local network, maximizing privacy and reducing latency.
  • Self-Healing Index: The system automatically detects if the voice-assist index exists in Pinecone with the correct 1024 dimensions; if not, it creates it automatically.

📁 File Structure

File Function
app.py Streamlit frontend and terminal-themed UI logic.
agent.py Backend logic, Task Router, and Pinecone integration.
requirements.txt Comprehensive list of project dependencies.
repair_env.py Environment diagnostic and fix script.
/output Secure directory for AI-generated files and scripts.

📝 Usage

  1. Launch: streamlit run app.py
  2. Initialize: Click "Start New Session" in the sidebar.
  3. Command: Type or speak: "Write a python script to add two numbers and save it."
  4. Recall: Use the sidebar dropdown to reload any past session stored in Pinecone.

⚖️ License

MIT

About

Local Voice-AI Agent: A privacy-first assistant using Whisper for STT and Llama 3 for intent classification. Features a Streamlit UI to process audio inputs, execute local file operations, and generate code within a secure sandbox. Built for seamless voice-to-action workflows with modular tool execution and a full-stack local pipeline.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages