Aleph — your AI research partner

Reads papers, writes code, runs experiments, monitors training. No API keys required. No token costs. No data sent to the cloud. Cloud providers (OpenRouter, Anthropic, OpenAI, Google) are optional upgrades.

Current status: Week 5 of 13 complete.

What it can do right now

Chat with Gemma 4 (E4B) running locally on your GPU via Ollama
Search the web through a self-hosted SearXNG instance
Decide autonomously when to search using a ReAct agent loop
Run parallel web searches simultaneously
Know the current time in any timezone
Serve a streaming HTTP API (FastAPI + SSE)
Accept image uploads and analyse them with Gemma 4's vision (works with both local and cloud providers)
Maintain per-session conversation history
Switch between local and cloud models via a single .env change

Hardware this was built on

Component	Spec
GPU	RTX 5060 8 GB GDDR7
CPU RAM	32 GB
OS	Windows 11
Shell	Conda + bash

Gemma 4 E4B uses ~3.5 GB VRAM. Any GPU with 6+ GB should work. CPU-only is possible but slow — change default_model to a smaller model.

Project structure

aira/
├── app/
│   ├── main.py              FastAPI entry point
│   ├── config.py            All settings, reads from .env
│   ├── agent/
│   │   ├── react_loop.py    ReAct agent — decides tools, runs them, answers
│   │   ├── tool_schema.py   Tool definitions (JSON schema for Gemma 4)
│   │   └── context.py       System prompt + result formatters
│   ├── api/
│   │   └── chat.py          POST /api/chat, SSE streaming, session history
│   └── tools/
│       ├── web_search.py    SearXNG + trafilatura pipeline
│       ├── searxng.py       SearXNG HTTP client
│       ├── extractor.py     HTML → clean plain text
│       └── time_tool.py     Current time in any timezone
├── searxng/
│   └── settings.yml         SearXNG configuration
├── chat.py                  Terminal REPL (Week 1, kept for reference)
├── docker-compose.yml       Starts SearXNG
├── requirements.txt         Python dependencies
├── .env.example             Environment variable template
└── ROADMAP.md               Full 10-week build plan

Setup — step by step

1. Clone the repo

git clone <your-repo-url>
cd aira

2. Create the Conda environment

conda create -n aira python=3.11
conda activate aira
pip install -r requirements.txt

Additional packages needed for the FastAPI backend (Week 4):

pip install fastapi uvicorn[standard] sse-starlette python-dotenv pydantic-settings python-multipart openai

3. Install Ollama

Download from https://ollama.com and install it.

Verify it is running:

curl http://localhost:11434/api/tags

Pull the models:

ollama pull gemma4:e4b          # main model — ~3.5 GB VRAM
ollama pull gemma4:e2b          # fast model — ~2 GB VRAM (optional)
ollama pull qwen2.5-coder:7b    # coding model (optional, Week 10)

Ollama runs as a background service automatically after install. If it is not running, start it with:

ollama serve

4. Start SearXNG with Docker

Install Docker Desktop from https://www.docker.com if you don't have it.

docker compose up -d

Verify SearXNG is running:

curl "http://localhost:8080/search?q=test&format=json"

You should get a JSON response with search results. If it returns HTML instead, check searxng/settings.yml — the json format must be listed under search.formats.

The searxng/settings.yml included in this repo already has JSON enabled:

search:
  formats:
    - html
    - json

5. Create your `.env` file

cp .env.example .env

Edit .env:

# Ollama server — leave as-is if running locally
OLLAMA_HOST=http://localhost:11434

# SearXNG — leave as-is if using docker-compose
SEARXNG_URL=http://localhost:8080

# Provider: "local" (free, GPU) or "cloud" (paid, OpenRouter)
USE_PROVIDER=local

# Optional cloud API keys — leave blank to use local only
OPENROUTER_API_KEY=
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
GOOGLE_API_KEY=
BRAVE_API_KEY=

6. Start the FastAPI server

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

The API is now live at http://localhost:8000. Interactive API docs: http://localhost:8000/docs

Using the API

Health check — verify all services are up

curl http://localhost:8000/api/health

Expected response:

{
  "app": "ok",
  "ollama": "ok",
  "searxng": "ok",
  "provider": "local",
  "model": "gemma4:e4b",
  "openrouter": "not configured"
}

Send a chat message

curl -X POST http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"session_id": "test", "message": "What is the capital of France?"}'

The response is a stream of Server-Sent Events:

data: {"type": "token", "text": "The "}
data: {"type": "token", "text": "capital "}
data: {"type": "token", "text": "of France is Paris."}
data: {"type": "done"}

When the agent searches the web, status events appear first:

data: {"type": "status", "text": "Searching: latest Python release"}
data: {"type": "token", "text": "Python 3.13 was released..."}
data: {"type": "done"}

Send a message with an image

curl -X POST http://localhost:8000/api/chat/image \
  -F "session_id=test" \
  -F "message=What does this show?" \
  -F "image=@/path/to/image.png"

To use the cloud provider for image analysis:

curl -X POST http://localhost:8000/api/chat/image \
  -F "session_id=test" \
  -F "message=What does this show?" \
  -F "provider=cloud" \
  -F "image=@/path/to/image.png"

Supported formats: JPEG, PNG, WEBP, GIF. Both local (Gemma 4 E4B via Ollama) and cloud (OpenRouter) handle images natively. The two providers use different wire formats — Ollama uses a separate images field, OpenRouter uses the OpenAI content array format. This is handled automatically.

List available models

curl http://localhost:8000/api/models

Clear session history

curl -X POST http://localhost:8000/api/reset \
  -H "Content-Type: application/json" \
  -d '{"session_id": "test"}'

Terminal chat (no server needed)

python chat.py

This runs the agent directly in your terminal. Useful for quick testing.

Switching to cloud models

Set USE_PROVIDER=cloud in .env and add your OpenRouter key:

USE_PROVIDER=cloud
OPENROUTER_API_KEY=sk-or-v1-...

Restart the server. All requests now go to google/gemini-2.5-flash by default. Your tools (web search, time) still run locally — only the LLM calls go to the cloud.

To change which cloud model is used, edit config.py:

cloud_default_model: str = "google/gemini-2.5-flash"

Browse available models at https://openrouter.ai/models.

How the ReAct agent loop works

Every message goes through app/agent/react_loop.py:

User message
    │
    ▼
_call(messages, tools=TOOLS)         ← ask model: search or answer?
    │
    ├── model requests tools
    │       │
    │       ├── asyncio.gather(web_search(), ...)   ← parallel tool execution
    │       │
    │       ├── inject results into messages
    │       │
    │       └── loop back (up to 5 rounds)
    │
    └── model gives final answer
            │
            ▼
        stream tokens → SSE → client

The model never calls tools directly. It requests them, your code runs them locally, results are injected back into the conversation, and the model reads them to answer.

Configuration reference

All settings live in app/config.py and can be overridden via .env.

Setting	Default	Description
`OLLAMA_HOST`	`http://localhost:11434`	Ollama server URL
`SEARXNG_URL`	`http://localhost:8080`	SearXNG server URL
`USE_PROVIDER`	`local`	`local` or `cloud`
`OPENROUTER_API_KEY`	—	OpenRouter key (optional)
`BRAVE_API_KEY`	—	Brave Search key (optional, replaces SearXNG)

Model names (default_model, cloud_default_model, etc.) are set directly in config.py since they rarely change per machine.

Progress

Week	Goal	Status
1	Local chat — Gemma 4 + Ollama + conversation memory	Complete
2	Web search — SearXNG + trafilatura pipeline	Complete
3	ReAct agent loop — autonomous tool use, parallel search	Complete
4	FastAPI backend — SSE streaming, session history, image upload	Complete
5	Multimodal input — image understanding via Gemma 4 vision	Complete
6	Frontend + GitHub release — vanilla JS chat UI	Not started
7	RAG — chat with your own documents via ChromaDB	Not started
8	Voice — Whisper STT + Kokoro TTS, fully offline	Not started
9	Code interpreter — run Python/ML code safely	Not started
10	Coding agent — read/write files, run tests, git ops	Not started

See ROADMAP.md for the full day-by-day plan.

License

Apache 2.0 — free for personal and commercial use.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
searxng		searxng
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
ROADMAP.md		ROADMAP.md
chat.py		chat.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
setup.md		setup.md
test_context.py		test_context.py
test_search.py		test_search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aleph — your AI research partner

What it can do right now

Hardware this was built on

Project structure

Setup — step by step

1. Clone the repo

2. Create the Conda environment

3. Install Ollama

4. Start SearXNG with Docker

5. Create your `.env` file

6. Start the FastAPI server

Using the API

Health check — verify all services are up

Send a chat message

Send a message with an image

List available models

Clear session history

Terminal chat (no server needed)

Switching to cloud models

How the ReAct agent loop works

Configuration reference

Progress

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Aleph — your AI research partner

What it can do right now

Hardware this was built on

Project structure

Setup — step by step

1. Clone the repo

2. Create the Conda environment

3. Install Ollama

4. Start SearXNG with Docker

5. Create your .env file

6. Start the FastAPI server

Using the API

Health check — verify all services are up

Send a chat message

Send a message with an image

List available models

Clear session history

Terminal chat (no server needed)

Switching to cloud models

How the ReAct agent loop works

Configuration reference

Progress

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

5. Create your `.env` file

Packages