A local AI agent that accepts voice input, converts speech to text, detects user intent, and executes local tools like file creation, code execution, and summarization.
Chronos/
- (all inside chronos folder)
- app.py # Streamlit UI
- agent.py # Intent routing & orchestration
- tools.py # Tool execution (files, code, etc.)
- stt.py # Speech-to-text logic
- output/ # Safe execution directory
- requirements.txt
- README.md
A fully offline, voice-controlled AI agent that transcribes speech, detects intent, and executes local actions — built with Whisper, qwen2.5-coder, Ollama, and Streamlit.
Due to local model dependencies (Ollama, Whisper), this project runs locally. Watch the demo here: https://youtu.be/UmMykbFlc6k
Audio Input --> Speech-to-Text (Whisper) --> Intent Detection (qwen2.5-coder via Ollama) --> Command Routing (Agent Logic)--> Tool Execution (File / Code / System Actions) --> Output Display (Streamlit UI)
- Voice or file audio input
- Speech to text via OpenAI Whisper (medium model)
- Intent detection via qwen2.5-coder:7b (GPU via Ollama)
- Create files, write code, summarize text, run code, launch files, general chat
- Human-in-the-loop confirmation for file operations
- Session memory — full conversation history
- Direct text summarizer
- 100% offline — no data leaves your machine
- Compound commands supported
- Graceful error handling
- Output folder safety
| Component | Tool |
|---|---|
| Speech to Text | OpenAI Whisper (medium) |
| Intent Detection | qwen2.5-coder:7b via Ollama |
| UI | Streamlit |
| Runtime | Python 3.10 |
- NVIDIA RTX 4050 6GB (qwen2.5 inference)
- Intel i7-12650HX
- 12GB DDR5 RAM
The two models never compete for GPU memory. Ollama exposes an Anthropic-compatible API so the Anthropic Python SDK talks to local models with zero code changes.
- Accepts voice input via microphone or uploaded audio file
- Designed to simulate real-world human interaction with AI
- Implemented using OpenAI Whisper (medium model)
- Converts raw audio into accurate textual transcription
- Chosen for its robustness across accents and noisy input
- Powered by qwen2.5-coder:7b running locally via Ollama
- Converts natural language into structured intent
- Central decision-making layer
- Maps detected intent to appropriate tool functions
- Handles:
- Validation
- Error handling
- Input normalization
Responsible for executing real system-level actions:
- File creation and modification
- Code generation and saving
- Running Python scripts in terminal
- Opening files in VS Code
- Text summarization
All operations are modular and extendable.
- Provides an interactive interface
- Displays:
- Transcribed text
- Detected intent
- Execution output
- Enables both voice and manual text input
- Accepts real-time voice commands
- Enables natural human-AI interaction
- Uses LLM reasoning to convert language → structured commands
- Automatically writes and executes Python scripts
- Create, open, and manage files dynamically
- Extracts key information from large text inputs
- Handles multi-step instructions logically
- Confirms sensitive actions before execution
- Maintains conversation context within a session
- All operations are sandboxed inside
/outputdirectory
- No external API calls
- Ensures complete privacy
| Component | Technology Used |
|---|---|
| Speech Recognition | OpenAI Whisper (medium) |
| LLM (Intent) | qwen2.5-coder:7b via Ollama |
| Interface | Streamlit |
| Backend Logic | Python |
- GPU: NVIDIA RTX 4050 (6GB VRAM)
- CPU: Intel i7-12650HX
- RAM: 12GB DDR5
This setup enables efficient local inference for both Whisper and LLM models.
- Eliminates dependency on cloud APIs
- Reduces latency and improves privacy
- Whisper and LLM run independently
- Prevents GPU memory conflicts
- Each function is isolated and reusable
- Easy to extend with new capabilities
- Converts unstructured input into structured commands
- Makes the system scalable and maintainable
- Python 3.10+
- Ollama installed (ollama.com)
- NVIDIA GPU recommended
git clone https://github.com/YOURUSERNAME/chronos-ai
cd chronos-ai
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt(if ollama not started then -- ollama serve) then
ollama pull qwen2.5-coder:7bstreamlit run app.py- "Create a file called notes.txt with my shopping list"
- "Write a Python file called calculator with add and subtract functions"
- "Summarize this: [text]"
- "Run the calculator file"
- "Open the shopping file"
- "What is machine learning?"
- All file operations are sandboxed to the
/outputfolder for safety. - Prevents accidental modification of system files
- Controlled execution environment
- Model benchmarking between llama3 and qwen2.5
- Autonomous multi-step planning agent
- Cross-language execution support
- Persistent long-term memory
- GUI automation (mouse/keyboard control)
- Deployment as a desktop assistant
Chronos demonstrates how modern local AI models can be combined into a cohesive system capable of understanding and executing real-world tasks. This project highlights the potential of offline AI agents as powerful, privacy-preserving alternatives to cloud-based assistants.