Sensei

A privacy-first local AI agent that runs entirely on your machine. Built with LangChain, LangGraph, Ollama, FastAPI, and React.

Sensei can search your files, read documents (PDF, DOCX, code, configs, images), analyze logs, explore directory structures, run semantic search, inspect git changes, and more — all without sending data to the cloud.

Architecture

frontend-react/          React + Vite chat UI
src/
  server.py              FastAPI server (SSE streaming, session management)
  agents/
    graph.py             LangGraph workflow (agent ↔ tools loop)
    state.py             Agent state definition
  core/
    config.py            Settings (model, GPU layers, root directory)
    llm.py               Ollama LLM initialization
    persistence.py       SQLite-backed checkpointer + session store + conversation archive
    summarize.py         Shared context summarization (graph + /compact)
    token_utils.py       Tiktoken-based token counting for context/metrics
    file_index.py        SQLite file path index for fast lookups
  tools/
    file_tools.py        Core file system tools
    advanced_file_tools.py  Semantic search, code analysis, log parsing
    mcp_tools.py         MCP tool integration (placeholder)
main.py                  CLI entry point (interactive terminal mode)

Tools

Core file tools — get_file_metadata, read_local_file, search_file_regex, find_file, search_directory_regex, rebuild_file_index

Advanced file tools — search_semantic, get_directory_tree, get_code_structure, analyze_logs, get_local_changes, summarize_large_file

Prerequisites

Python 3.12+
uv (package manager)
Ollama running locally with a tool-capable model pulled
Node.js 18+ and npm (for the frontend)

Setup

1. Install Python dependencies

uv sync

2. Pull an Ollama model

The default model is configured in src/core/config.py (ollama_model). Pull whatever model is set there:

ollama pull gemma4:e2b

Any tool-capable Ollama model works. To change the model, edit src/core/config.py or set OLLAMA_MODEL in .env.

3. Configure environment (optional)

Create a .env file in the project root:

OLLAMA_MODEL=gemma4:e2b
OLLAMA_NUM_GPU=10
ROOT_DIRECTORY=C:\Vijay

Variable	Default	Description
`OLLAMA_MODEL`	`gemma4:e2b`	Ollama model name
`OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama server URL
`OLLAMA_NUM_GPU`	`-1`	GPU layers (-1 = all, 0 = CPU only)
`ROOT_DIRECTORY`	`C:\Vijay`	Root path the agent can search
`TEMPERATURE`	`0.0`	LLM temperature

4. Install frontend dependencies

cd frontend-react
npm install

Running

Open two terminals:

Terminal 1 — Backend

uv run uvicorn src.server:app --reload

The API starts at http://127.0.0.1:8000.

Terminal 2 — Frontend

cd frontend-react
npm run dev

The UI opens at http://localhost:5173.

CLI mode (no frontend)

uv run python main.py

Type queries and press Enter. Type /exit to quit.

Key features

Token streaming — Responses stream token-by-token via SSE for a smooth UX.
Tool call visibility — Tool calls and results are displayed in collapsible groups in the UI and persist across sessions.
Multi-session chat — Create, switch, rename, and delete chat sessions from the sidebar. Sessions persist across server restarts via SQLite.
Background execution — Switching chats doesn't abort in-flight agent work. The agent finishes in a background thread and results are saved. Switching back shows the completed response or reconnects to the live stream.
File indexing — An SQLite-backed file path index is built on first startup for fast file lookups. Rebuild via the rebuild_file_index tool or POST /index/rebuild.
Iterative investigation — The agent is prompted to chain multiple tool calls together, retry on failure, and resolve vague file references autonomously.
Context & summarization — Tiktoken-based token estimates, a circular context-usage indicator in the header (with /sessions/{id}/context), automatic summarization when usage crosses a threshold, archived history + recall_conversation tool, and manual /compact (API + slash command).
Observability — Per-response metrics (input/output tokens, tok/s, TTFT, latency, tool durations) in SSE done events, subtle Stats under each agent reply, Session stats (min/max/avg) in the header, and [METRICS] lines logged server-side.
Interruptible runs — While the agent is processing, the send button becomes a stop button. Cancel is graceful: partial output is preserved, metrics/state are logged, and the UI marks the response as interrupted.
LangSmith trace metadata — Run metadata includes thread_id/session_id and tags for easier trace filtering.
Slash commands (implemented) — /clear, /compact, /summary, /context, /help, /tools, /index status, /index rebuild, /search, /export, /agent status, /history, /model.

API endpoints

Method	Path	Description
`GET`	`/health`	Health check
`POST`	`/chat`	Non-streaming chat response
`POST`	`/chat/stream`	Stream a chat response (SSE)
`GET`	`/chat/stream/{thread_id}`	Re-attach to an in-flight stream
`POST`	`/chat/cancel/{thread_id}`	Gracefully cancel an in-flight run
`GET`	`/sessions`	List all sessions
`POST`	`/sessions`	Create a new session
`GET`	`/sessions/{id}/messages`	Get full message history
`GET`	`/sessions/{id}/export?format=json	md`
`GET`	`/sessions/{id}/active`	Check if agent is running
`GET`	`/sessions/{id}/context`	Current context window usage (tokens, %, summary flag)
`POST`	`/sessions/{id}/summary_preview`	Read-only LLM summary for UI (does not change checkpoint)
`POST`	`/sessions/{id}/compact`	Force summarization of older messages
`PATCH`	`/sessions/{id}`	Rename a session
`PATCH`	`/sessions/{id}/pin`	Pin or unpin a session (max 5 pinned)
`DELETE`	`/sessions/{id}`	Delete a session
`GET`	`/tools`	Agent tool names and descriptions (same set as the LangGraph agent)
`POST`	`/index/rebuild`	Rebuild the file index
`GET`	`/index/stats`	File index statistics (`indexed_files`, `cached_contents`, plus `root_directory`, `empty`, `available`)
`GET`	`/search`	Keyword search over indexed files
`GET`	`/agent/status`	Agent/model/context/tool category status
`GET`	`/history`	Recent sessions with timestamps
`GET`	`/model`	Active model information

Data

All persistent data lives in data/sentinel.db (SQLite). This includes LangGraph checkpoints, session metadata, and the file path index. The data/ directory is gitignored.

Planning documents

Future plans, roadmap items, and TODO lists now live in plan.md.
Evaluation strategy and thresholds live in eval.md.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.cursor/rules		.cursor/rules
.github/workflows		.github/workflows
frontend-react		frontend-react
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
PROJECT_SPEC.md		PROJECT_SPEC.md
README.md		README.md
eval.md		eval.md
main.py		main.py
plan.md		plan.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
run_all_tests.py		run_all_tests.py
test-log.md		test-log.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sensei

Architecture

Tools

Prerequisites

Setup

1. Install Python dependencies

2. Pull an Ollama model

3. Configure environment (optional)

4. Install frontend dependencies

Running

Terminal 1 — Backend

Terminal 2 — Frontend

CLI mode (no frontend)

Key features

API endpoints

Data

Planning documents

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sensei

Architecture

Tools

Prerequisites

Setup

1. Install Python dependencies

2. Pull an Ollama model

3. Configure environment (optional)

4. Install frontend dependencies

Running

Terminal 1 — Backend

Terminal 2 — Frontend

CLI mode (no frontend)

Key features

API endpoints

Data

Planning documents

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages