A privacy-first local AI agent that runs entirely on your machine. Built with LangChain, LangGraph, Ollama, FastAPI, and React.
Sensei can search your files, read documents (PDF, DOCX, code, configs, images), analyze logs, explore directory structures, run semantic search, inspect git changes, and more — all without sending data to the cloud.
frontend-react/ React + Vite chat UI
src/
server.py FastAPI server (SSE streaming, session management)
agents/
graph.py LangGraph workflow (agent ↔ tools loop)
state.py Agent state definition
core/
config.py Settings (model, GPU layers, root directory)
llm.py Ollama LLM initialization
persistence.py SQLite-backed checkpointer + session store + conversation archive
summarize.py Shared context summarization (graph + /compact)
token_utils.py Tiktoken-based token counting for context/metrics
file_index.py SQLite file path index for fast lookups
tools/
file_tools.py Core file system tools
advanced_file_tools.py Semantic search, code analysis, log parsing
mcp_tools.py MCP tool integration (placeholder)
main.py CLI entry point (interactive terminal mode)
Core file tools — get_file_metadata, read_local_file, search_file_regex, find_file, search_directory_regex, rebuild_file_index
Advanced file tools — search_semantic, get_directory_tree, get_code_structure, analyze_logs, get_local_changes, summarize_large_file
- Python 3.12+
- uv (package manager)
- Ollama running locally with a tool-capable model pulled
- Node.js 18+ and npm (for the frontend)
uv syncThe default model is configured in src/core/config.py (ollama_model). Pull whatever model is set there:
ollama pull gemma4:e2bAny tool-capable Ollama model works. To change the model, edit src/core/config.py or set OLLAMA_MODEL in .env.
Create a .env file in the project root:
OLLAMA_MODEL=gemma4:e2b
OLLAMA_NUM_GPU=10
ROOT_DIRECTORY=C:\Vijay| Variable | Default | Description |
|---|---|---|
OLLAMA_MODEL |
gemma4:e2b |
Ollama model name |
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server URL |
OLLAMA_NUM_GPU |
-1 |
GPU layers (-1 = all, 0 = CPU only) |
ROOT_DIRECTORY |
C:\Vijay |
Root path the agent can search |
TEMPERATURE |
0.0 |
LLM temperature |
cd frontend-react
npm installOpen two terminals:
uv run uvicorn src.server:app --reloadThe API starts at http://127.0.0.1:8000.
cd frontend-react
npm run devThe UI opens at http://localhost:5173.
uv run python main.pyType queries and press Enter. Type /exit to quit.
- Token streaming — Responses stream token-by-token via SSE for a smooth UX.
- Tool call visibility — Tool calls and results are displayed in collapsible groups in the UI and persist across sessions.
- Multi-session chat — Create, switch, rename, and delete chat sessions from the sidebar. Sessions persist across server restarts via SQLite.
- Background execution — Switching chats doesn't abort in-flight agent work. The agent finishes in a background thread and results are saved. Switching back shows the completed response or reconnects to the live stream.
- File indexing — An SQLite-backed file path index is built on first startup for fast file lookups. Rebuild via the
rebuild_file_indextool orPOST /index/rebuild. - Iterative investigation — The agent is prompted to chain multiple tool calls together, retry on failure, and resolve vague file references autonomously.
- Context & summarization — Tiktoken-based token estimates, a circular context-usage indicator in the header (with
/sessions/{id}/context), automatic summarization when usage crosses a threshold, archived history +recall_conversationtool, and manual/compact(API + slash command). - Observability — Per-response metrics (input/output tokens, tok/s, TTFT, latency, tool durations) in SSE
doneevents, subtle Stats under each agent reply, Session stats (min/max/avg) in the header, and[METRICS]lines logged server-side. - Interruptible runs — While the agent is processing, the send button becomes a stop button. Cancel is graceful: partial output is preserved, metrics/state are logged, and the UI marks the response as interrupted.
- LangSmith trace metadata — Run metadata includes
thread_id/session_idand tags for easier trace filtering. - Slash commands (implemented) —
/clear,/compact,/summary,/context,/help,/tools,/index status,/index rebuild,/search,/export,/agent status,/history,/model.
| Method | Path | Description |
|---|---|---|
GET |
/health |
Health check |
POST |
/chat |
Non-streaming chat response |
POST |
/chat/stream |
Stream a chat response (SSE) |
GET |
/chat/stream/{thread_id} |
Re-attach to an in-flight stream |
POST |
/chat/cancel/{thread_id} |
Gracefully cancel an in-flight run |
GET |
/sessions |
List all sessions |
POST |
/sessions |
Create a new session |
GET |
/sessions/{id}/messages |
Get full message history |
GET |
`/sessions/{id}/export?format=json | md` |
GET |
/sessions/{id}/active |
Check if agent is running |
GET |
/sessions/{id}/context |
Current context window usage (tokens, %, summary flag) |
POST |
/sessions/{id}/summary_preview |
Read-only LLM summary for UI (does not change checkpoint) |
POST |
/sessions/{id}/compact |
Force summarization of older messages |
PATCH |
/sessions/{id} |
Rename a session |
PATCH |
/sessions/{id}/pin |
Pin or unpin a session (max 5 pinned) |
DELETE |
/sessions/{id} |
Delete a session |
GET |
/tools |
Agent tool names and descriptions (same set as the LangGraph agent) |
POST |
/index/rebuild |
Rebuild the file index |
GET |
/index/stats |
File index statistics (indexed_files, cached_contents, plus root_directory, empty, available) |
GET |
/search |
Keyword search over indexed files |
GET |
/agent/status |
Agent/model/context/tool category status |
GET |
/history |
Recent sessions with timestamps |
GET |
/model |
Active model information |
All persistent data lives in data/sentinel.db (SQLite). This includes LangGraph checkpoints, session metadata, and the file path index. The data/ directory is gitignored.
- Future plans, roadmap items, and TODO lists now live in
plan.md. - Evaluation strategy and thresholds live in
eval.md.