Personal Project | AI Agents & Automation
Velune is an autonomous multi-agent system that decomposes high-level goals into executable subtasks and completes them with minimal human intervention. It uses Hugging Face open-source LLMs with a Plan → Execute → Critique → Remember pipeline.
| Capability | Implementation |
|---|---|
| Goal Decomposition | Dedicated planning agent generates a 3–5 step structured plan before execution |
| ReAct Reasoning | LangChain ReAct agent — Thought → Action → Observation loop, streamed live |
| Quality Critique | Critic agent scores responses 1–10 and flags gaps below a configurable threshold |
| Multi-Tool Use | Web search, news, Wikipedia, arXiv, calculator, datetime, Python REPL |
| Semantic Memory | ChromaDB vector store (short-term conversation buffer + long-term retrieval) |
| Human-in-the-Loop | Python REPL requires explicit opt-in; full reasoning trace always visible |
| HF Models | Qwen 2.5 72B (default) and 7B — both served free via HF Inference |
Example goals you can try:
- "What are the latest breakthroughs in nuclear fusion energy (2024–2025)?"
- "Calculate the future value of $10,000 at 7% compounded annually for 25 years."
- "Compare microservices vs monolithic architecture — trade-offs and when to choose each."
- "Summarise recent large language model research from arXiv."
User Goal
│
▼
┌──────────────────────────────────────────────────────┐
│ Planner (structured LLM call) │
│ → 3–5 prioritised tasks + complexity + strategy │
└──────────────────────┬───────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────┐
│ ReAct Executor (LangChain) │
│ ┌──────────┐ ┌──────────┐ ┌─────────────────────┐│
│ │ Thought │→ │ Action │→ │ Observation ││
│ └──────────┘ └──────────┘ └─────────────────────┘│
│ │ │
│ web_search · news_search · wikipedia │
│ arxiv_search · calculator · get_datetime │
│ python_repl (opt-in) │
└──────────────────────┬───────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────┐
│ Critic Agent │
│ • Scores accuracy, completeness, clarity, depth │
│ • Flags responses below threshold with follow-up │
└──────────────────────┬───────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────┐
│ Memory Layer │
│ • Short-term: ConversationBufferWindowMemory (k=10) │
│ • Long-term: ChromaDB + all-MiniLM-L6-v2 │
└──────────────────────────────────────────────────────┘
- Agent Framework — LangChain (ReAct pattern)
- LLMs — Hugging Face Inference API (
Qwen/Qwen2.5-72B-Instruct,Qwen/Qwen2.5-7B-Instruct) - Vector Store — ChromaDB (in-memory)
- Embeddings — sentence-transformers
all-MiniLM-L6-v2 - Tools — DuckDuckGo Search, Wikipedia, arXiv, math eval, Python REPL
- UI — Streamlit (Agent · Memory · Analytics tabs)
- Deployment — Hugging Face Spaces
Both models are served free through Hugging Face's own inference provider (hf-inference) and support the chat completions API.
| Model | Best for |
|---|---|
Qwen/Qwen2.5-72B-Instruct |
Complex reasoning, research, long-form answers (default) |
Qwen/Qwen2.5-7B-Instruct |
Faster responses, lighter tasks |
Other open models (Mistral, Gemma, Phi, Llama) may require paid inference providers or gated access on Hugging Face — they are intentionally excluded.
git clone https://github.com/YOUR_USERNAME/velune
cd velune
pip install -r requirements.txt
# Add your Hugging Face token
echo "HUGGINGFACE_TOKEN=hf_..." > .env
streamlit run app.py- Create a new Space on huggingface.co/spaces
- Select Streamlit as the SDK
- Push this repo to the Space
- Add your token as a Space Secret:
HUGGINGFACE_TOKEN
The app reads it from the environment automatically — no code changes needed.
velune/
├── app.py # Streamlit UI, LLM factory, agent pipeline
├── requirements.txt
├── .env # Local secrets — gitignored, never committed
└── src/
├── config.py # Model list, AgentConfig, AppConfig
├── models.py # Pydantic types: Plan, Task, Critique, AgentStep
├── agent/
│ ├── planner.py # Structured planning agent
│ ├── executor.py # ReAct executor + streaming
│ └── critic.py # Quality-assurance critic
├── memory/
│ └── store.py # ChromaDB long-term + conversation short-term memory
├── tools/
│ ├── registry.py # Tool assembly
│ ├── search.py # web_search, news_search, wikipedia, arxiv_search
│ └── compute.py # calculator, get_datetime, python_repl
└── ui/
├── components.py # render_plan, render_step, render_critique, tabs
└── styles.py # Custom CSS
- Python REPL is off by default — users explicitly enable it in the sidebar
- The full reasoning trace (Thought → Action → Observation) is always visible
- Relevant past memories are surfaced before each task so the agent uses prior context
- Critic scores flag low-quality responses and suggest targeted follow-ups
Apache License 2.0