A reusable Model Context Protocol server that provides semantic search and a tag-based knowledge graph for any project. Auto-discovers a knowledge directory from cwd; silently disables when absent.
Written in TypeScript. Uses local sentence-transformer embeddings (Xenova/multilingual-e5-small) β no API keys, no network calls after the first model download.
- π Semantic search β embedding-based natural language queries (multilingual)
- π€ RAG search β tiered results with automatic LLM summarization via MCP sampling
- π·οΈ Tag search with graph traversal β follow
related:links across fragments - π Markdown fragments with YAML frontmatter β human-readable, git-friendly
- π Zero overhead when unused β exits silently if no knowledge is present
- π§ Flexible auto-discovery β co-located, hidden, sibling, or user-global
npm install -g knowledgebased
# or run on demand:
npx -y knowledgebased setupsetup registers the server in ~/.copilot/mcp-config.json (or you can configure any MCP client manually). It will:
- Auto-activate in any project where knowledge is discovered
- Stay disabled (zero overhead) elsewhere
Add to your .mcp.json / client config:
{
"mcpServers": {
"knowledge": {
"type": "stdio",
"command": "npx",
"args": ["-y", "knowledgebased"]
}
}
}The server discovers knowledge from two independent phases, then unions all results.
Given cwd = ~/workspace/my-project/, here is every location the server checks:
~/
βββ .knowledgebased.json β Phase 2: user-global config (always read)
βββ notes/ β Phase 2: external KB (declared in bases)
β βββ *.md
β
βββ workspace/
βββ my-project.knowledge/ β Phase 1 β£: sibling folder
β βββ *.md
β
βββ my-project/ β cwd
βββ .knowledge.json β Phase 1 β : config pointer (highest pri)
βββ knowledge/ β Phase 1 β‘: co-located, visible
β βββ *.md
βββ .knowledge/ β Phase 1 β’: co-located, hidden
β βββ *.md
βββ src/
Walks up from cwd. At each ancestor directory, tries four patterns in order β first match stops the entire walk:
| Priority | Pattern | Within git root | Beyond git root |
|---|---|---|---|
| β | .knowledge.json |
β | β (explicit intent) |
| β‘ | knowledge/ |
β | β (too generic) |
| β’ | .knowledge/ |
β | β (too generic) |
| β£ | ../<project>.knowledge/ |
β | β (explicit naming) |
Beyond the git root, only explicitly-intentioned patterns (β config pointer and β£ sibling) are checked. If no git root is found at all, generic patterns are never used β only β and β£ apply. This prevents accidental matches with unrelated knowledge/ directories outside a project context.
Result: 0 or 1 project source (alias: repo, refs validated against cwd).
Always runs (even if Phase 1 found a project source). Reads ~/.knowledgebased.json and matches cwd against repos entries.
Result: 0βN external sources (alias: base ID, refs unscoped). Both phases are unioned and deduped by canonical directory hash.
Defines named knowledge bases and binds them to repos:
{
"bases": {
"personal": "~/notes",
"team": { "knowledge": "~/team/conventions", "cacheDir": "~/.cache/team" }
},
"repos": {
"*": ["personal"],
"~/workspace/my-project": ["team"]
}
}| Field | Description |
|---|---|
bases.<id> |
A string path (shorthand) or { "knowledge": "...", "cacheDir": "..." }. Paths support ~ expansion. |
repos."*" |
Wildcard β these bases are active in every project. |
repos.<path> |
Array of base IDs to activate when cwd is inside this path. Longest-prefix match wins (segment-boundary, case-insensitive on Windows). |
In the example above:
personalis available everywhere (wildcard"*")teamis only available when working inside~/workspace/my-project- Fragments from external sources are prefixed with their alias:
personal@notes/foo.md
Points to a knowledge directory that lives elsewhere:
{ "knowledge": "../shared-kb", "cacheDir": "./.cache/embeddings" }| Field | Required | Description |
|---|---|---|
knowledge |
optional | Path to the knowledge directory. Resolved relative to the config file. Defaults to ./knowledge. |
cacheDir |
optional | Override for the embedding cache. Defaults to ~/.cache/knowledgebased/<hash>. |
These conditions cause a loud startup error:
reposreferences a non-existent base ID- Base ID is
"*", or contains@,/, or spaces - Two bases resolve to the same canonical directory
Markdown files with YAML frontmatter:
---
tags: [workflow, git]
related: [workflow/branch-naming]
source: session/2026-04-21
verified: false
refs: [src/utils.ts::parseArgs]
---
# Fragment Title
Content goes here...| Tool | Description |
|---|---|
search_knowledge |
Tag-based search with graph traversal |
search_semantic |
Embedding-based semantic search with similarity scores |
search_rag |
Semantic search with automatic LLM summarization via MCP sampling |
list_tags |
List all tags with counts |
list_sources |
List loaded knowledge sources |
add_knowledge |
Create a new fragment |
update_knowledge |
Update an existing fragment |
delete_knowledge |
Delete a fragment permanently |
audit_knowledge |
Validate refs and related links |
reload_sources |
Re-discover sources from config |
User question
β
ββ "What topics does the KB cover?" β search_semantic (explore)
β Low threshold, scan fragment titles and scores.
β
ββ "How does X work?" β search_rag (answer)
β Returns concise summary + references.
β If key details are missing, follow up with search_knowledge.
β
ββ "Give me everything about Y" β search_knowledge (enumerate)
tags=["Y"], returns full unabridged content.
search_rag combines semantic search with MCP client sampling to deliver concise, query-aware results. Results are split into tiers:
| Tier | Score | Behavior |
|---|---|---|
| direct | β₯ directThreshold (0.85) |
Full content returned verbatim |
| related | One-hop graph neighbors of direct hits | Summarized via LLM sampling |
| summarized | β₯ threshold (0.80), < directThreshold |
Summarized via LLM sampling |
Every response includes a references table listing all used fragments with their similarity score, tier, and reason for inclusion.
When the MCP client doesn't support sampling, summarized/related fragments fall back to metadata-only output (title, tags, and a content preview).
Parameters:
| Parameter | Default | Description |
|---|---|---|
query |
β | Natural language search query |
threshold |
0.80 | Minimum similarity score for inclusion |
directThreshold |
0.85 | Score above which fragments are returned verbatim |
maxTokens |
500 | Max tokens for the LLM summary |
knowledgebased setup # Register globally in ~/.copilot/mcp-config.json
knowledgebased init # Create knowledge/ in cwd
knowledgebased init --knowledge ../other/kb # Create .knowledge.json pointing elsewherenpm install
npm run build # compile TS β dist/
npm test # run unit tests via node:test + tsx
npm start # run from compiled output
npm run watch # incremental rebuildMIT