Skip to content

VjayRam/project-sensei

Repository files navigation

Sensei

A privacy-first local AI agent that runs entirely on your machine. Built with LangChain, LangGraph, Ollama, FastAPI, and React.

Sensei can search your files, read documents (PDF, DOCX, code, configs, images), analyze logs, explore directory structures, run semantic search, inspect git changes, and more — all without sending data to the cloud.

Architecture

frontend-react/          React + Vite chat UI
src/
  server.py              FastAPI server (SSE streaming, session management)
  agents/
    graph.py             LangGraph workflow (agent ↔ tools loop)
    state.py             Agent state definition
  core/
    config.py            Settings (model, GPU layers, root directory)
    llm.py               Ollama LLM initialization
    persistence.py       SQLite-backed checkpointer + session store + conversation archive
    summarize.py         Shared context summarization (graph + /compact)
    token_utils.py       Tiktoken-based token counting for context/metrics
    file_index.py        SQLite file path index for fast lookups
  tools/
    file_tools.py        Core file system tools
    advanced_file_tools.py  Semantic search, code analysis, log parsing
    mcp_tools.py         MCP tool integration (placeholder)
main.py                  CLI entry point (interactive terminal mode)

Tools

Core file toolsget_file_metadata, read_local_file, search_file_regex, find_file, search_directory_regex, rebuild_file_index

Advanced file toolssearch_semantic, get_directory_tree, get_code_structure, analyze_logs, get_local_changes, summarize_large_file

Prerequisites

  • Python 3.12+
  • uv (package manager)
  • Ollama running locally with a tool-capable model pulled
  • Node.js 18+ and npm (for the frontend)

Setup

1. Install Python dependencies

uv sync

2. Pull an Ollama model

The default model is configured in src/core/config.py (ollama_model). Pull whatever model is set there:

ollama pull gemma4:e2b

Any tool-capable Ollama model works. To change the model, edit src/core/config.py or set OLLAMA_MODEL in .env.

3. Configure environment (optional)

Create a .env file in the project root:

OLLAMA_MODEL=gemma4:e2b
OLLAMA_NUM_GPU=10
ROOT_DIRECTORY=C:\Vijay
Variable Default Description
OLLAMA_MODEL gemma4:e2b Ollama model name
OLLAMA_BASE_URL http://localhost:11434 Ollama server URL
OLLAMA_NUM_GPU -1 GPU layers (-1 = all, 0 = CPU only)
ROOT_DIRECTORY C:\Vijay Root path the agent can search
TEMPERATURE 0.0 LLM temperature

4. Install frontend dependencies

cd frontend-react
npm install

Running

Open two terminals:

Terminal 1 — Backend

uv run uvicorn src.server:app --reload

The API starts at http://127.0.0.1:8000.

Terminal 2 — Frontend

cd frontend-react
npm run dev

The UI opens at http://localhost:5173.

CLI mode (no frontend)

uv run python main.py

Type queries and press Enter. Type /exit to quit.

Key features

  • Token streaming — Responses stream token-by-token via SSE for a smooth UX.
  • Tool call visibility — Tool calls and results are displayed in collapsible groups in the UI and persist across sessions.
  • Multi-session chat — Create, switch, rename, and delete chat sessions from the sidebar. Sessions persist across server restarts via SQLite.
  • Background execution — Switching chats doesn't abort in-flight agent work. The agent finishes in a background thread and results are saved. Switching back shows the completed response or reconnects to the live stream.
  • File indexing — An SQLite-backed file path index is built on first startup for fast file lookups. Rebuild via the rebuild_file_index tool or POST /index/rebuild.
  • Iterative investigation — The agent is prompted to chain multiple tool calls together, retry on failure, and resolve vague file references autonomously.
  • Context & summarization — Tiktoken-based token estimates, a circular context-usage indicator in the header (with /sessions/{id}/context), automatic summarization when usage crosses a threshold, archived history + recall_conversation tool, and manual /compact (API + slash command).
  • Observability — Per-response metrics (input/output tokens, tok/s, TTFT, latency, tool durations) in SSE done events, subtle Stats under each agent reply, Session stats (min/max/avg) in the header, and [METRICS] lines logged server-side.
  • Interruptible runs — While the agent is processing, the send button becomes a stop button. Cancel is graceful: partial output is preserved, metrics/state are logged, and the UI marks the response as interrupted.
  • LangSmith trace metadata — Run metadata includes thread_id/session_id and tags for easier trace filtering.
  • Slash commands (implemented)/clear, /compact, /summary, /context, /help, /tools, /index status, /index rebuild, /search, /export, /agent status, /history, /model.

API endpoints

Method Path Description
GET /health Health check
POST /chat Non-streaming chat response
POST /chat/stream Stream a chat response (SSE)
GET /chat/stream/{thread_id} Re-attach to an in-flight stream
POST /chat/cancel/{thread_id} Gracefully cancel an in-flight run
GET /sessions List all sessions
POST /sessions Create a new session
GET /sessions/{id}/messages Get full message history
GET `/sessions/{id}/export?format=json md`
GET /sessions/{id}/active Check if agent is running
GET /sessions/{id}/context Current context window usage (tokens, %, summary flag)
POST /sessions/{id}/summary_preview Read-only LLM summary for UI (does not change checkpoint)
POST /sessions/{id}/compact Force summarization of older messages
PATCH /sessions/{id} Rename a session
PATCH /sessions/{id}/pin Pin or unpin a session (max 5 pinned)
DELETE /sessions/{id} Delete a session
GET /tools Agent tool names and descriptions (same set as the LangGraph agent)
POST /index/rebuild Rebuild the file index
GET /index/stats File index statistics (indexed_files, cached_contents, plus root_directory, empty, available)
GET /search Keyword search over indexed files
GET /agent/status Agent/model/context/tool category status
GET /history Recent sessions with timestamps
GET /model Active model information

Data

All persistent data lives in data/sentinel.db (SQLite). This includes LangGraph checkpoints, session metadata, and the file path index. The data/ directory is gitignored.

Planning documents

  • Future plans, roadmap items, and TODO lists now live in plan.md.
  • Evaluation strategy and thresholds live in eval.md.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors