Voice-Controlled Local AI Agent

A Voice-Controlled AI Agent that accepts audio input via microphone or file upload, accurately classifies the user's intent, and executes corresponding actions locally (file creation, code generation, summarization) via a clean UI.

Features

Audio Input: Record directly via microphone or upload an audio file securely in the UI.
Intelligent Intent Parsing: Processes the transcript and classifies intent using structured JSON output.
Core Actions:
- create_file: Safely creates empty files.
- write_code: Automatically generates and saves code.
- summarize_text: Outputs summarized content.
- general_chat: Responds contextually to standard queries.
Safe Sandbox Environment: All file and code generation strictly targets the /output directory to prevent accidental system changes.

System Architecture & Hardware Workarounds

This project handles varying levels of local machine capabilities. Specifically designed and fallback-tested for environments with limited resources (e.g., AMD Radeon RX 6500M with 8GB RAM):

Speech-to-Text (STT):
- Local Fallback: Uses openai-whisper (tiny model) running purely on CPU.
- Fast Performance Mode: Because 8GB of system RAM restricts local heavy inference, the UI provides a toggle to use Groq API's Whisper-large-v3 for massive speed improvements.
Intent Classification & LLM Engine:
- Running a reliable 8B parameter model (e.g., Llama 3) locally consumes roughly 4.5-5GB RAM, leaving the system severely bottlenecked.
- Therefore, the Groq API (Llama 3 8B or 70B) is utilized for near-instant inference, providing structural intent parsing (JSON formatted outputs) and conversational responses.
Frontend: Streamlit is used for the web interface, as it natively introduced a great st.audio_input widget replacing the need for external tools.

Setup Instructions

Prerequisites

Python 3.10+
Free Groq API Key (from console.groq.com)
FFmpeg installed on your system (Required for Whisper local audio processing).

Installation

Clone the repository:

git clone <your-repo-link>
cd local_ai_agent

Install dependencies:
```
pip install -r requirements.txt
```
Setup environment variables: Rename .env.example to .env and add your Groq API key:
```
GROQ_API_KEY=your_groq_api_key_here
```
(Alternatively, you can just paste it directly into the settings field on the Streamlit Web UI).

Running the App

Start the Streamlit server:

streamlit run app.py

This will launch the Web UI locally.

Example Usage flow

Open the UI.
Click the microphone and say: "Create a python file called hello.py."
Click "Process Command".
The system transcribes the audio, detects the create_file intent, outputs a JSON payload, creates hello.py in the /output folder, and displays the success message.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
output		output
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
agent_core.py		agent_core.py
app.py		app.py
audio_processor.py		audio_processor.py
packages.txt		packages.txt
requirements.txt		requirements.txt
tools.py		tools.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice-Controlled Local AI Agent

Features

System Architecture & Hardware Workarounds

Setup Instructions

Prerequisites

Installation

Running the App

Example Usage flow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice-Controlled Local AI Agent

Features

System Architecture & Hardware Workarounds

Setup Instructions

Prerequisites

Installation

Running the App

Example Usage flow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages