Skip to content

chenli-git/aleph

Repository files navigation

Aleph — your AI research partner

Reads papers, writes code, runs experiments, monitors training. No API keys required. No token costs. No data sent to the cloud. Cloud providers (OpenRouter, Anthropic, OpenAI, Google) are optional upgrades.

Current status: Week 5 of 13 complete.


What it can do right now

  • Chat with Gemma 4 (E4B) running locally on your GPU via Ollama
  • Search the web through a self-hosted SearXNG instance
  • Decide autonomously when to search using a ReAct agent loop
  • Run parallel web searches simultaneously
  • Know the current time in any timezone
  • Serve a streaming HTTP API (FastAPI + SSE)
  • Accept image uploads and analyse them with Gemma 4's vision (works with both local and cloud providers)
  • Maintain per-session conversation history
  • Switch between local and cloud models via a single .env change

Hardware this was built on

Component Spec
GPU RTX 5060 8 GB GDDR7
CPU RAM 32 GB
OS Windows 11
Shell Conda + bash

Gemma 4 E4B uses ~3.5 GB VRAM. Any GPU with 6+ GB should work. CPU-only is possible but slow — change default_model to a smaller model.


Project structure

aira/
├── app/
│   ├── main.py              FastAPI entry point
│   ├── config.py            All settings, reads from .env
│   ├── agent/
│   │   ├── react_loop.py    ReAct agent — decides tools, runs them, answers
│   │   ├── tool_schema.py   Tool definitions (JSON schema for Gemma 4)
│   │   └── context.py       System prompt + result formatters
│   ├── api/
│   │   └── chat.py          POST /api/chat, SSE streaming, session history
│   └── tools/
│       ├── web_search.py    SearXNG + trafilatura pipeline
│       ├── searxng.py       SearXNG HTTP client
│       ├── extractor.py     HTML → clean plain text
│       └── time_tool.py     Current time in any timezone
├── searxng/
│   └── settings.yml         SearXNG configuration
├── chat.py                  Terminal REPL (Week 1, kept for reference)
├── docker-compose.yml       Starts SearXNG
├── requirements.txt         Python dependencies
├── .env.example             Environment variable template
└── ROADMAP.md               Full 10-week build plan

Setup — step by step

1. Clone the repo

git clone <your-repo-url>
cd aira

2. Create the Conda environment

conda create -n aira python=3.11
conda activate aira
pip install -r requirements.txt

Additional packages needed for the FastAPI backend (Week 4):

pip install fastapi uvicorn[standard] sse-starlette python-dotenv pydantic-settings python-multipart openai

3. Install Ollama

Download from https://ollama.com and install it.

Verify it is running:

curl http://localhost:11434/api/tags

Pull the models:

ollama pull gemma4:e4b          # main model — ~3.5 GB VRAM
ollama pull gemma4:e2b          # fast model — ~2 GB VRAM (optional)
ollama pull qwen2.5-coder:7b    # coding model (optional, Week 10)

Ollama runs as a background service automatically after install. If it is not running, start it with:

ollama serve

4. Start SearXNG with Docker

Install Docker Desktop from https://www.docker.com if you don't have it.

docker compose up -d

Verify SearXNG is running:

curl "http://localhost:8080/search?q=test&format=json"

You should get a JSON response with search results. If it returns HTML instead, check searxng/settings.yml — the json format must be listed under search.formats.

The searxng/settings.yml included in this repo already has JSON enabled:

search:
  formats:
    - html
    - json

5. Create your .env file

cp .env.example .env

Edit .env:

# Ollama server — leave as-is if running locally
OLLAMA_HOST=http://localhost:11434

# SearXNG — leave as-is if using docker-compose
SEARXNG_URL=http://localhost:8080

# Provider: "local" (free, GPU) or "cloud" (paid, OpenRouter)
USE_PROVIDER=local

# Optional cloud API keys — leave blank to use local only
OPENROUTER_API_KEY=
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
GOOGLE_API_KEY=
BRAVE_API_KEY=

6. Start the FastAPI server

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

The API is now live at http://localhost:8000. Interactive API docs: http://localhost:8000/docs


Using the API

Health check — verify all services are up

curl http://localhost:8000/api/health

Expected response:

{
  "app": "ok",
  "ollama": "ok",
  "searxng": "ok",
  "provider": "local",
  "model": "gemma4:e4b",
  "openrouter": "not configured"
}

Send a chat message

curl -X POST http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"session_id": "test", "message": "What is the capital of France?"}'

The response is a stream of Server-Sent Events:

data: {"type": "token", "text": "The "}
data: {"type": "token", "text": "capital "}
data: {"type": "token", "text": "of France is Paris."}
data: {"type": "done"}

When the agent searches the web, status events appear first:

data: {"type": "status", "text": "Searching: latest Python release"}
data: {"type": "token", "text": "Python 3.13 was released..."}
data: {"type": "done"}

Send a message with an image

curl -X POST http://localhost:8000/api/chat/image \
  -F "session_id=test" \
  -F "message=What does this show?" \
  -F "image=@/path/to/image.png"

To use the cloud provider for image analysis:

curl -X POST http://localhost:8000/api/chat/image \
  -F "session_id=test" \
  -F "message=What does this show?" \
  -F "provider=cloud" \
  -F "image=@/path/to/image.png"

Supported formats: JPEG, PNG, WEBP, GIF. Both local (Gemma 4 E4B via Ollama) and cloud (OpenRouter) handle images natively. The two providers use different wire formats — Ollama uses a separate images field, OpenRouter uses the OpenAI content array format. This is handled automatically.

List available models

curl http://localhost:8000/api/models

Clear session history

curl -X POST http://localhost:8000/api/reset \
  -H "Content-Type: application/json" \
  -d '{"session_id": "test"}'

Terminal chat (no server needed)

python chat.py

This runs the agent directly in your terminal. Useful for quick testing.


Switching to cloud models

Set USE_PROVIDER=cloud in .env and add your OpenRouter key:

USE_PROVIDER=cloud
OPENROUTER_API_KEY=sk-or-v1-...

Restart the server. All requests now go to google/gemini-2.5-flash by default. Your tools (web search, time) still run locally — only the LLM calls go to the cloud.

To change which cloud model is used, edit config.py:

cloud_default_model: str = "google/gemini-2.5-flash"

Browse available models at https://openrouter.ai/models.


How the ReAct agent loop works

Every message goes through app/agent/react_loop.py:

User message
    │
    ▼
_call(messages, tools=TOOLS)         ← ask model: search or answer?
    │
    ├── model requests tools
    │       │
    │       ├── asyncio.gather(web_search(), ...)   ← parallel tool execution
    │       │
    │       ├── inject results into messages
    │       │
    │       └── loop back (up to 5 rounds)
    │
    └── model gives final answer
            │
            ▼
        stream tokens → SSE → client

The model never calls tools directly. It requests them, your code runs them locally, results are injected back into the conversation, and the model reads them to answer.


Configuration reference

All settings live in app/config.py and can be overridden via .env.

Setting Default Description
OLLAMA_HOST http://localhost:11434 Ollama server URL
SEARXNG_URL http://localhost:8080 SearXNG server URL
USE_PROVIDER local local or cloud
OPENROUTER_API_KEY OpenRouter key (optional)
BRAVE_API_KEY Brave Search key (optional, replaces SearXNG)

Model names (default_model, cloud_default_model, etc.) are set directly in config.py since they rarely change per machine.


Progress

Week Goal Status
1 Local chat — Gemma 4 + Ollama + conversation memory Complete
2 Web search — SearXNG + trafilatura pipeline Complete
3 ReAct agent loop — autonomous tool use, parallel search Complete
4 FastAPI backend — SSE streaming, session history, image upload Complete
5 Multimodal input — image understanding via Gemma 4 vision Complete
6 Frontend + GitHub release — vanilla JS chat UI Not started
7 RAG — chat with your own documents via ChromaDB Not started
8 Voice — Whisper STT + Kokoro TTS, fully offline Not started
9 Code interpreter — run Python/ML code safely Not started
10 Coding agent — read/write files, run tests, git ops Not started

See ROADMAP.md for the full day-by-day plan.


License

Apache 2.0 — free for personal and commercial use.

About

Aleph — your AI research partner can reads papers, writes code, do experiments, monitors training. The ultimate goal is to automate research work to replace ML/AL researchers. lets see how far it can take us..

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages