Local question answering CLI that talks to a running Ollama instance, built with LangChain + LangGraph. Streams tokens to the console by default.
- Ensure Python 3.10+ is available (
.python-versionis set to 3.10). - Install deps with uv (set
UV_CACHE_DIRif your home cache is locked down):
UV_CACHE_DIR=.uv_cache uv sync- Make sure Ollama is running locally and your models are pulled (e.g.,
ollama pull qwen2.5:3b). - (Optional) Copy
.env.exampleto.envand setLOGFIRE_TOKENplus any overrides (ASSISTANT_MODEL,LOGFIRE_SERVICE_NAME,LOGFIRE_ENVIRONMENT).
- One-off question:
uv run assistant --model qwen3:4b --max-tokens 10000 "what is the universe?"- Interactive session (type
exitorquitto stop):
uv run assistant --model qwen2.5:3b- JSON output:
uv run assistant --model qwen2.5:3b --json "Explain how sampling temperature works."Options:
--host(orOLLAMA_HOST) to point at a non-default Ollama server.--max-tokens,--temperature,--top-pto control sampling.--stream/--no-streamto toggle streaming tokens as they arrive (streaming is on by default).
- Copy
.env.exampleto.envand setLOGFIRE_TOKEN(write token from Logfire), plus optionalLOGFIRE_SERVICE_NAMEandLOGFIRE_ENVIRONMENT. - Run
./start.sh(loads.env, then runs the CLI). If a token is present, traces/logs are sent to Logfire; otherwise they stay local. - Open Logfire and filter by
service_name/environmentto see spans for the QA and evaluation steps. - If you enable LangSmith tracing (default in
.env.example), setLANGCHAIN_API_KEYto avoid 401s from LangSmith.
- Install deps with
uv sync(includespydantic-evals[logfire]). - Run the sample suite:
uv run python evals/run_evals.py. - Spans for the eval run appear in Logfire (if
LOGFIRE_TOKENis set). Cases check for: France capital, concise latency definition, and refusing prompt injections.