Skip to content

VarunGitGood/repi

Repository files navigation

repi

Log ingestion and LLM-based investigation engine. Ingests log files into PostgreSQL (pgvector), retrieves relevant log clusters via hybrid search (BM25 + dense vectors with RRF), and runs a ReAct loop where an LLM autonomously investigates root causes.

Architecture

repi/
├── cli.py          # Typer CLI — lifecycle commands: init, serve, ui, stop
├── worker.py       # Background file watcher — polls watcher_configs, ingests new log bytes
├── api/            # FastAPI — ingest, investigate, watchers, config endpoints
├── core/           # Settings (pydantic-settings), DI container, Redis cache
├── ingestion/      # Log parsing → signature clustering → embedding → upsert
├── retrieval/      # pgvector HNSW + PostgreSQL FTS, RRF fusion, query expansion
├── investigation/  # ReAct loop, tools (search_logs, get_timeline, scan_window, get_service_summary), evidence store
├── llm/            # Provider protocol + adapters (OpenAI, Anthropic, Mistral, Gemini, Ollama)
├── intent/         # Natural-language query → service / time-window / log-level extraction
└── models/         # SQLModel tables: log_chunks, investigations, investigation_steps, investigation_chunks, watcher_configs, watcher_offsets

Run with Docker (quickstart)

Multi-arch images (linux/amd64 + linux/arm64) are published to GitHub Container Registry on every push to main and every tagged release.

# Pull the latest image
docker pull ghcr.io/varungitgood/repi:latest

# One-shot help check
docker run --rm ghcr.io/varungitgood/repi:latest --help

# Full stack — Postgres + pgvector + Redis + repi API, all in one
OPENAI_API_KEY=sk-... docker compose --profile all-in-one up

The all-in-one profile lives in docker-compose.yml. The API binds to http://localhost:8000 and reads its provider key from your shell env (OPENAI_API_KEY, ANTHROPIC_API_KEY, MISTRAL_API_KEY, or GEMINI_API_KEY). Without --profile all-in-one, the compose file only brings up db + redis — the original behavior used by the local-dev flow below.

Override the image tag (e.g. to pin a release) via REPI_IMAGE=ghcr.io/varungitgood/repi:v0.1.0.

Local development (contributors)

Flow for hacking on repi from a fresh clone. End users who pipx install repi (once D1 lands) call the same commands without the uv run prefix — see /repi/docs for the install-and-run path.

Prerequisites: Docker, Python 3.11+, uv, Node.js (for the web UI).

uv sync                                 # resolve lockfile into .venv
uv run repi init --with-docker          # db + redis up, prompts provider/key, writes .repi/config.json, applies schema
uv run repi serve                       # terminal 1: API on :8000
uv run repi ui                          # terminal 2: web UI on :3000
uv run repi stop                        # when done: tear down docker stack

Configuration

repi has one source of truth: .repi/config.json. It's created by repi init and mutated by the web UI's Config page (PUT /config). The file is gitignored.

Three ways to edit it, in order of convenience:

  1. Web UI — visit the Config page; changes save immediately and the API hot-reloads them.
  2. Re-run repi init --force — re-prompts for provider + key and rewrites the file.
  3. Edit the JSON directly.repi/config.json is plain JSON; restart the API after editing.

Shell env vars (e.g. REPI_ENV=development uv run repi serve) override values from config.json at runtime — handy for one-off commands or CI.

REPI_ENV defaults to production. Flip to development in .repi/config.json (or via the UI) for verbose logs + --reload.

Coming from an older checkout? If you have a legacy config.json at the repo root, move it: mkdir -p .repi && mv config.json .repi/config.json. The root config.json is no longer read.

Usage

Ingest a log file manually

curl -X POST \
  -F "service=my-service" \
  -F "file=@/path/to/app.log" \
  http://localhost:8000/ingest

Investigate

curl -X POST http://localhost:8000/investigate \
  -H "Content-Type: application/json" \
  -d '{"query": "why did checkout fail last friday night"}'

Stream the ReAct steps live:

curl -N http://localhost:8000/investigate/{id}/stream

Continuous ingestion with the Worker

The worker watches directories for new log bytes and ingests them automatically.

  1. Register a watcher via the API:
curl -X POST http://localhost:8000/watchers \
  -H "Content-Type: application/json" \
  -d '{"service_name": "auth-svc", "watch_path": "/var/log/auth"}'
  1. Start the worker:
uv run python -m repi.worker

The worker polls watcher_configs every 30s (WATCHER_CONFIG_REFRESH_SECS) and uses watchfiles to detect new bytes, seeking forward from the last stored offset.

Environment Variables

Variable Default Description
REPI_ENV production production | development. Production = quiet logs, no auto-reload.
DATABASE_URL postgresql+asyncpg://lograg_user:password_here@localhost:5432/lograg PostgreSQL asyncpg URL
LLM_PROVIDER openai openai | anthropic | mistral | gemini | ollama
LLM_MODEL provider default Override model name
OPENAI_API_KEY / ANTHROPIC_API_KEY / … Provider API key
REDIS_URL redis://localhost:6379 Redis for caching
ENABLE_REDIS_CACHE true Set false to disable Redis
TIME_WINDOW_INITIAL_MINUTES 10 First search window for investigation
TIME_WINDOW_EXPANSIONS 60,360,1440 Progressive window expansion (minutes)
UI_PORT 3000 Port the web UI binds to (read by repi ui)
WATCHER_CONFIG_REFRESH_SECS 30 How often the worker polls for config changes
OLLAMA_BASE_URL http://localhost:11434 Ollama endpoint

Development

# Run all tests
uv run pytest tests/ -v

# Run worker tests only
uv run pytest tests/worker/ -v

# Run investigation tests
uv run pytest tests/investigation/ -v

License

MIT

About

Autonomous log investigation engine — ingest logs into pgvector, run hybrid search, and let an LLM ReAct loop surface root causes.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors