A Voice-Controlled AI Agent that accepts audio input via microphone or file upload, accurately classifies the user's intent, and executes corresponding actions locally (file creation, code generation, summarization) via a clean UI.
- Audio Input: Record directly via microphone or upload an audio file securely in the UI.
- Intelligent Intent Parsing: Processes the transcript and classifies intent using structured JSON output.
- Core Actions:
create_file: Safely creates empty files.write_code: Automatically generates and saves code.summarize_text: Outputs summarized content.general_chat: Responds contextually to standard queries.
- Safe Sandbox Environment: All file and code generation strictly targets the
/outputdirectory to prevent accidental system changes.
This project handles varying levels of local machine capabilities. Specifically designed and fallback-tested for environments with limited resources (e.g., AMD Radeon RX 6500M with 8GB RAM):
- Speech-to-Text (STT):
- Local Fallback: Uses
openai-whisper(tinymodel) running purely on CPU. - Fast Performance Mode: Because 8GB of system RAM restricts local heavy inference, the UI provides a toggle to use Groq API's Whisper-large-v3 for massive speed improvements.
- Local Fallback: Uses
- Intent Classification & LLM Engine:
- Running a reliable 8B parameter model (e.g., Llama 3) locally consumes roughly 4.5-5GB RAM, leaving the system severely bottlenecked.
- Therefore, the Groq API (Llama 3 8B or 70B) is utilized for near-instant inference, providing structural intent parsing (
JSONformatted outputs) and conversational responses.
- Frontend: Streamlit is used for the web interface, as it natively introduced a great
st.audio_inputwidget replacing the need for external tools.
- Python 3.10+
- Free Groq API Key (from console.groq.com)
- FFmpeg installed on your system (Required for Whisper local audio processing).
-
Clone the repository:
git clone <your-repo-link> cd local_ai_agent
-
Install dependencies:
pip install -r requirements.txt
-
Setup environment variables: Rename
.env.exampleto.envand add your Groq API key:GROQ_API_KEY=your_groq_api_key_here
(Alternatively, you can just paste it directly into the settings field on the Streamlit Web UI).
Start the Streamlit server:
streamlit run app.pyThis will launch the Web UI locally.
- Open the UI.
- Click the microphone and say: "Create a python file called hello.py."
- Click "Process Command".
- The system transcribes the audio, detects the
create_fileintent, outputs a JSON payload, createshello.pyin the/outputfolder, and displays the success message.