Gloss

Persistent memory and recall for Claude Code. Every conversation compounds.

Why

Claude throws 40 things at you and there's no way to respond bullet by bullet. You can't highlight the 3 that matter, push back on the 1 you disagree with, and ignore the rest — not in a terminal. And when context compaction kicks in, the good stuff vanishes. Next turn, Claude doesn't remember what it said. Next session, you don't remember which conversation it happened in.

Gloss fixes both problems. It gives you the full uncompacted record of every Claude Code session across every project, with tools to curate (highlight, comment, tag) and recall (semantic search, MCP tools) — so the knowledge compounds instead of evaporating.

Highlights are a response mechanism. Select the parts that matter, annotate the parts you disagree with, tag decisions as settled. Those curated signals feed back into Claude as structured context — not next session, but this session, via the MCP server.

Semantic search is cross-project recall. Ask "what did we decide about the DuckDB architecture?" and Gloss searches 12,000+ conversations using hybrid FTS + vector embeddings. Claude finds the answer itself, mid-conversation, without you copy-pasting from old sessions.

The core loop: Claude generates → you curate in Gloss → curation feeds back to Claude → Claude generates better. Every conversation makes every future conversation — and the current one — smarter.

Quick Start

bun install
bun src/cli.ts serve

Opens http://localhost:3456 — all your conversations from ~/.claude/projects/ are discovered automatically, indexed in SQLite, and available to browse. On first launch, Gloss will start building the full-text and vector search indexes in the background.

Features

Server (default mode)

Multi-session browsing at localhost:3456/c/<session-id>
Index page with search, Recent/By-project views, project filter, and min-turns filter
Live updates via WebSocket — new turns appear as the JSONL grows
Annotation API — highlights persist in SQLite, sync across tabs
Session discovery — scans ~/.claude/projects/ on startup, rescans periodically with adaptive backoff
Semantic search — hybrid FTS + vector search with AI-powered answers at /ask
Copy resume — one-click copy of claude --resume <uuid> from the index page

Semantic Search (Ask)

Type a natural language question in the search bar on the index page. Gloss uses a three-stage retrieval pipeline:

1. Retrieval (FTS + Vector, ~50ms)

FTS5 full-text search — per-token queries ensure each concept gets proper representation instead of being drowned by generic words in a combined OR query
Vector similarity — 256-dimensional Snowflake Arctic Embed embeddings, cosine similarity search across all indexed turns
Metadata matching — project names and session titles
RRF fusion (k=60) — combines all ranking signals fairly so no single retriever dominates

2. Context assembly

Top-ranked sessions are loaded and the most relevant turns are extracted with surrounding context windows. This produces a focused evidence set for the LLM.

3. Answer synthesis (~5-15s)

Claude Haiku reads the evidence and generates a direct answer with numbered source citations. The answer streams in real-time. Source cards below the answer link directly to the referenced turns in their conversations.

Vector indexing

Embeddings are generated locally using @huggingface/transformers — no API calls, no external services. The model runs in a subprocess to avoid blocking the server.

First-run expectations:

Model download: ~100MB on first launch (cached after that)
Indexing speed: ~50 sessions/minute depending on conversation length
A typical collection of ~800 sessions takes 15-20 minutes to fully index
Indexing runs in the background — the server is usable immediately, search quality improves as more sessions get indexed

What gets indexed:

Sessions with 3+ turns by default (configurable via Settings > Min turns on the index page)
Files between 10KB and 50MB
Each turn is truncated to 2,000 characters before embedding
Embeddings are stored as 1KB BLOBs in SQLite (256 × float32)

Recommendation: Set the min turns filter to 5-7 if you have many short test/debug sessions. This avoids wasting indexing time on throwaway conversations and keeps the vector index focused on substantive sessions. You can change this in Settings on the index page — it applies to both the visible session list and what gets vectorized.

Disabling embeddings:

bun src/cli.ts serve --no-embeddings    # Skip vector indexing entirely

FTS search still works without embeddings. Vector search adds recall for semantic/synonym queries (e.g., finding "database" when the conversation says "SQLite") but FTS handles exact keyword matches well on its own.

Viewer

Dark/light mode (follows system preference)
Collapsible tool calls, results, and thinking blocks
Toggle checkboxes for tools, thinking, and tags/kinds
Rendered markdown tables, code blocks, inline formatting
Clickable file paths and URLs
Slash commands shown as styled pills
Session continuations as expandable dividers

Annotations

Select text and press h to highlight
Add comments, assign kinds (decision, bug, constraint, todo, question, insight)
Tag highlights for organization
Three-tier restore: precise (char offsets) > fuzzy (prefix/suffix) > legacy (text search)

Export formats

Format	What	Use case
For Claude	XML `<context_bundle>` with `<highlight>`, `<trigger>`, `<quote>`, `<note>`	Paste into Claude Code to give it context from a previous session
Markdown	Numbered list with speaker, timestamp, quoted text, and comments	Documentation, notes, sharing
JSONL Slice	Full turn text for annotated exchanges + their conversation partner	Raw material for further processing
Download	Raw annotations JSON with all metadata and offsets	Backup, portability

Static export

Self-contained HTML files with CSS/JS inlined — works via file:// with no server needed.

bun src/cli.ts export <session.jsonl>
bun src/cli.ts export --no-tools --no-thinking <session.jsonl>
bun src/cli.ts export -o output.html <session.jsonl>

CLI

bun src/cli.ts serve                       # Start the server (default)
bun src/cli.ts serve --port 8080           # Custom port
bun src/cli.ts serve --no-embeddings       # Disable vector indexing
bun src/cli.ts export <file>               # Export to self-contained HTML
bun src/cli.ts export -o out.html          # Custom output path
bun src/cli.ts highlights --json           # Query highlights from SQLite
bun src/cli.ts highlights --tags           # List all tags with counts
bun src/cli.ts import                      # Import sidecar .annotations.json files
bun src/cli.ts search-exclude list         # Show excluded project patterns
bun src/cli.ts search-exclude add "foo*"   # Exclude projects matching pattern
bun src/cli.ts search-exclude remove "foo*"

Slash Commands

When working in the Gloss repo, these skills are available:

Command	Description
`/gloss:convo`	Start server or export a conversation
`/gloss:index`	Browse all conversations
`/gloss:highlights`	Pull highlights from the current session
`/gloss:search`	Search highlights across all sessions
`/gloss:auto-tag`	AI-powered auto-tagging of highlights

MCP Server (Claude Code integration)

Gloss exposes a Model Context Protocol server so Claude Code can search and read past conversations directly during a session.

claude mcp add --transport stdio --scope user gloss -- bun /path/to/gloss/src/mcp-server.ts

Requires the Gloss server running on :3456. The MCP server talks to it over HTTP — no duplicate embedding engine.

Tools:

Tool	What it does
`search_conversations`	Hybrid FTS + vector search across all sessions. Returns sources with relevance scores, matched tokens, and turn ranges.
`read_conversation`	Read turns from a session by ID + range (server-side slicing, max 30 turns/call).
`get_highlights`	Query annotations by session, tag, text search, or recency. Filters compose.
`list_sessions`	Browse sessions by project or recency.

Search results include RRF relevance scores per source so Claude can weight strong matches over weak ones, plus startTurnIndex/endTurnIndex for precise follow-up reads. Text truncation is semantic-aware — won't break mid-code-block.

Architecture

~/.claude/projects/       JSONL session logs (source of truth)
        |
   discovery.ts           Scans for sessions, extracts metadata from first 32KB
        |
   ~/.convo/db.sqlite     Session index + annotations + embeddings
        |
   server.ts              HTTP routes + WebSocket live updates
        |                  Background: turn counts, FTS indexing, vector indexing
        |
   localhost:3456          Index page, conversation viewer, Ask, annotation API

Key modules:

File	Role
`src/server.ts`	Multi-session HTTP + WebSocket server
`src/discovery.ts`	JSONL scanning, SQLite sync, turn counting
`src/db.ts`	SQLite schema, session/annotation/embedding/FTS CRUD
`src/cli.ts`	CLI entry point (serve, export, highlights, search-exclude)
`src/ask.ts`	Hybrid search pipeline: FTS + vector + RRF fusion + Haiku synthesis
`src/ask-page.ts`	Streaming Ask UI with answer + source cards
`src/mcp-server.ts`	MCP stdio server — bridges Claude Code to the Gloss HTTP API
`src/embeddings.ts`	Embedding engine (subprocess) + in-memory vector index
`src/indexer.ts`	Background embedding backfill with batching and progress logging
`src/index-page.ts`	Server index page with search/filter/grouping
`src/incremental-parser.ts`	Streaming JSONL parser for live updates
`src/parser.ts`	Full JSONL-to-conversation parser
`src/renderer.ts`	Turn-to-HTML renderer
`src/convert.ts`	JSONL-to-HTML pipeline (export path)
`src/templates/html-template.ts`	Dual-mode HTML (server vs inline)
`src/templates/client-js.ts`	Client JS (annotations, WS, exports)
`src/templates/css.ts`	Shared styles

How Ask works (detailed)

User question
     |
     v
 Per-token FTS queries ──────────┐
 (each keyword searched           |
  individually for coverage)      |
                                  |── RRF fusion (k=60) ──> Top N sessions
 Vector cosine search ───────────┤
 (256-dim Arctic Embed,           |
  "query:" prefix encoding)       |
                                  |
 Metadata LIKE matching ─────────┘
 (project names, titles)
     |
     v
 Load source turns + context windows
     |
     v
 Claude Haiku (-p --model haiku)
 reads evidence, streams answer
 with numbered citations

The -p flag pipes the prompt via stdin to the Claude CLI. Haiku was chosen for synthesis because it's fast (~5-15s) and the retrieval pipeline has already done the hard work of finding relevant content — the LLM just needs to read and summarize.

Development

bun install
bun test              # Run tests
bunx tsc --noEmit     # Type check

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.claude/skills		.claude/skills
assets		assets
src		src
.gitignore		.gitignore
README.md		README.md
build.ts		build.ts
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gloss

Why

Quick Start

Features

Server (default mode)

Semantic Search (Ask)

Vector indexing

Viewer

Annotations

Export formats

Static export

CLI

Slash Commands

MCP Server (Claude Code integration)

Architecture

How Ask works (detailed)

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gloss

Why

Quick Start

Features

Server (default mode)

Semantic Search (Ask)

Vector indexing

Viewer

Annotations

Export formats

Static export

CLI

Slash Commands

MCP Server (Claude Code integration)

Architecture

How Ask works (detailed)

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages