Reads papers, writes code, runs experiments, monitors training. No API keys required. No token costs. No data sent to the cloud. Cloud providers (OpenRouter, Anthropic, OpenAI, Google) are optional upgrades.
Current status: Week 5 of 13 complete.
- Chat with Gemma 4 (E4B) running locally on your GPU via Ollama
- Search the web through a self-hosted SearXNG instance
- Decide autonomously when to search using a ReAct agent loop
- Run parallel web searches simultaneously
- Know the current time in any timezone
- Serve a streaming HTTP API (FastAPI + SSE)
- Accept image uploads and analyse them with Gemma 4's vision (works with both local and cloud providers)
- Maintain per-session conversation history
- Switch between local and cloud models via a single
.envchange
| Component | Spec |
|---|---|
| GPU | RTX 5060 8 GB GDDR7 |
| CPU RAM | 32 GB |
| OS | Windows 11 |
| Shell | Conda + bash |
Gemma 4 E4B uses ~3.5 GB VRAM. Any GPU with 6+ GB should work.
CPU-only is possible but slow — change default_model to a smaller model.
aira/
├── app/
│ ├── main.py FastAPI entry point
│ ├── config.py All settings, reads from .env
│ ├── agent/
│ │ ├── react_loop.py ReAct agent — decides tools, runs them, answers
│ │ ├── tool_schema.py Tool definitions (JSON schema for Gemma 4)
│ │ └── context.py System prompt + result formatters
│ ├── api/
│ │ └── chat.py POST /api/chat, SSE streaming, session history
│ └── tools/
│ ├── web_search.py SearXNG + trafilatura pipeline
│ ├── searxng.py SearXNG HTTP client
│ ├── extractor.py HTML → clean plain text
│ └── time_tool.py Current time in any timezone
├── searxng/
│ └── settings.yml SearXNG configuration
├── chat.py Terminal REPL (Week 1, kept for reference)
├── docker-compose.yml Starts SearXNG
├── requirements.txt Python dependencies
├── .env.example Environment variable template
└── ROADMAP.md Full 10-week build plan
git clone <your-repo-url>
cd airaconda create -n aira python=3.11
conda activate aira
pip install -r requirements.txtAdditional packages needed for the FastAPI backend (Week 4):
pip install fastapi uvicorn[standard] sse-starlette python-dotenv pydantic-settings python-multipart openaiDownload from https://ollama.com and install it.
Verify it is running:
curl http://localhost:11434/api/tagsPull the models:
ollama pull gemma4:e4b # main model — ~3.5 GB VRAM
ollama pull gemma4:e2b # fast model — ~2 GB VRAM (optional)
ollama pull qwen2.5-coder:7b # coding model (optional, Week 10)Ollama runs as a background service automatically after install. If it is not running, start it with:
ollama serveInstall Docker Desktop from https://www.docker.com if you don't have it.
docker compose up -dVerify SearXNG is running:
curl "http://localhost:8080/search?q=test&format=json"You should get a JSON response with search results. If it returns HTML instead,
check searxng/settings.yml — the json format must be listed under search.formats.
The searxng/settings.yml included in this repo already has JSON enabled:
search:
formats:
- html
- jsoncp .env.example .envEdit .env:
# Ollama server — leave as-is if running locally
OLLAMA_HOST=http://localhost:11434
# SearXNG — leave as-is if using docker-compose
SEARXNG_URL=http://localhost:8080
# Provider: "local" (free, GPU) or "cloud" (paid, OpenRouter)
USE_PROVIDER=local
# Optional cloud API keys — leave blank to use local only
OPENROUTER_API_KEY=
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
GOOGLE_API_KEY=
BRAVE_API_KEY=uvicorn app.main:app --reload --host 0.0.0.0 --port 8000The API is now live at http://localhost:8000.
Interactive API docs: http://localhost:8000/docs
curl http://localhost:8000/api/healthExpected response:
{
"app": "ok",
"ollama": "ok",
"searxng": "ok",
"provider": "local",
"model": "gemma4:e4b",
"openrouter": "not configured"
}curl -X POST http://localhost:8000/api/chat \
-H "Content-Type: application/json" \
-d '{"session_id": "test", "message": "What is the capital of France?"}'The response is a stream of Server-Sent Events:
data: {"type": "token", "text": "The "}
data: {"type": "token", "text": "capital "}
data: {"type": "token", "text": "of France is Paris."}
data: {"type": "done"}
When the agent searches the web, status events appear first:
data: {"type": "status", "text": "Searching: latest Python release"}
data: {"type": "token", "text": "Python 3.13 was released..."}
data: {"type": "done"}
curl -X POST http://localhost:8000/api/chat/image \
-F "session_id=test" \
-F "message=What does this show?" \
-F "image=@/path/to/image.png"To use the cloud provider for image analysis:
curl -X POST http://localhost:8000/api/chat/image \
-F "session_id=test" \
-F "message=What does this show?" \
-F "provider=cloud" \
-F "image=@/path/to/image.png"Supported formats: JPEG, PNG, WEBP, GIF.
Both local (Gemma 4 E4B via Ollama) and cloud (OpenRouter) handle images natively.
The two providers use different wire formats — Ollama uses a separate images field,
OpenRouter uses the OpenAI content array format. This is handled automatically.
curl http://localhost:8000/api/modelscurl -X POST http://localhost:8000/api/reset \
-H "Content-Type: application/json" \
-d '{"session_id": "test"}'python chat.pyThis runs the agent directly in your terminal. Useful for quick testing.
Set USE_PROVIDER=cloud in .env and add your OpenRouter key:
USE_PROVIDER=cloud
OPENROUTER_API_KEY=sk-or-v1-...Restart the server. All requests now go to google/gemini-2.5-flash by default.
Your tools (web search, time) still run locally — only the LLM calls go to the cloud.
To change which cloud model is used, edit config.py:
cloud_default_model: str = "google/gemini-2.5-flash"Browse available models at https://openrouter.ai/models.
Every message goes through app/agent/react_loop.py:
User message
│
▼
_call(messages, tools=TOOLS) ← ask model: search or answer?
│
├── model requests tools
│ │
│ ├── asyncio.gather(web_search(), ...) ← parallel tool execution
│ │
│ ├── inject results into messages
│ │
│ └── loop back (up to 5 rounds)
│
└── model gives final answer
│
▼
stream tokens → SSE → client
The model never calls tools directly. It requests them, your code runs them locally, results are injected back into the conversation, and the model reads them to answer.
All settings live in app/config.py and can be overridden via .env.
| Setting | Default | Description |
|---|---|---|
OLLAMA_HOST |
http://localhost:11434 |
Ollama server URL |
SEARXNG_URL |
http://localhost:8080 |
SearXNG server URL |
USE_PROVIDER |
local |
local or cloud |
OPENROUTER_API_KEY |
— | OpenRouter key (optional) |
BRAVE_API_KEY |
— | Brave Search key (optional, replaces SearXNG) |
Model names (default_model, cloud_default_model, etc.) are set directly in config.py
since they rarely change per machine.
| Week | Goal | Status |
|---|---|---|
| 1 | Local chat — Gemma 4 + Ollama + conversation memory | Complete |
| 2 | Web search — SearXNG + trafilatura pipeline | Complete |
| 3 | ReAct agent loop — autonomous tool use, parallel search | Complete |
| 4 | FastAPI backend — SSE streaming, session history, image upload | Complete |
| 5 | Multimodal input — image understanding via Gemma 4 vision | Complete |
| 6 | Frontend + GitHub release — vanilla JS chat UI | Not started |
| 7 | RAG — chat with your own documents via ChromaDB | Not started |
| 8 | Voice — Whisper STT + Kokoro TTS, fully offline | Not started |
| 9 | Code interpreter — run Python/ML code safely | Not started |
| 10 | Coding agent — read/write files, run tests, git ops | Not started |
See ROADMAP.md for the full day-by-day plan.
Apache 2.0 — free for personal and commercial use.