Skip to content

AlexK-Notable/znote-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

162 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

znote-mcp

A Zettelkasten knowledge management server for the Model Context Protocol. Create, link, search, and synthesize atomic notes through Claude and other MCP-compatible clients.

Origin: This project began as a fork of entanglr/zettelkasten-mcp by Peter J. Herrel and has since diverged substantially.

What It Does

znote-mcp gives AI assistants a persistent, structured memory. Instead of losing context between conversations, your notes accumulate into a knowledge network where ideas connect, patterns emerge, and past work informs future thinking.

You write notes. The system links them, indexes them, and makes them searchable — by keyword, by concept, or by meaning. Over time, a few hundred notes become a knowledge graph that surfaces connections you wouldn't have found on your own.

Quick Start

# Install
git clone https://github.com/AlexK-Notable/znote-mcp.git && cd znote-mcp
uv venv && source .venv/bin/activate
uv sync

# Run
python -m znote_mcp.main

# Optional: semantic search (find notes by meaning, not just keywords)
uv pip install -e ".[semantic]"       # CPU
uv pip install -e ".[semantic-gpu]"   # NVIDIA GPU

Add to Claude Desktop:

{
  "mcpServers": {
    "znote": {
      "command": "/path/to/znote-mcp/.venv/bin/python",
      "args": ["-m", "znote_mcp.main"],
      "env": {
        "ZETTELKASTEN_NOTES_DIR": "~/.zettelkasten/notes",
        "ZETTELKASTEN_DATABASE_PATH": "~/.zettelkasten/db/zettelkasten.db"
      }
    }
  }
}

Configuration lives in ~/.zettelkasten/.env (survives updates) or <project>/.env. See .env.example for all options.

The Zettelkasten Method

The system follows the Zettelkasten method developed by Niklas Luhmann, who used it to produce over 70 books and hundreds of articles. Three principles:

  1. Atomicity — each note holds one idea
  2. Connectivity — notes link to each other with typed relationships
  3. Emergence — the network reveals patterns the individual notes don't

The result is both vertical depth (following a thread deeper into a topic) and horizontal breadth (discovering unexpected connections across domains). Luhmann called his system a "communication partner" — this is a digital implementation of that idea.


Features

17 MCP Tools

The server exposes 17 tools, all prefixed with zk_:

Noteszk_create_note, zk_get_note, zk_update_note, zk_delete_note, zk_note_history, zk_bulk_create_notes

Linkszk_manage_links (create/remove with semantic types)

Searchzk_search_notes (auto-selects strategy), zk_fts_search (FTS5 with boolean/phrase/prefix), zk_list_notes (by date, project, connectivity), zk_find_related (linked, similar, or semantic)

Tagszk_manage_tags (add/remove, batch-capable), zk_cleanup_tags

Projectszk_manage_projects (create/list/get/delete)

Systemzk_status (dashboard), zk_system (rebuild, sync, backup, reindex), zk_restore

Note Types and Link Types

Five note types capture different stages of thought:

Type Purpose
fleeting Quick captures — ideas, observations, things to revisit
literature Notes from reading material
permanent Refined, evergreen notes — the core of the system
structure Index or outline notes that organize others
hub Entry points to major topics

Seven link types with semantic inverses create a multi-dimensional knowledge graph:

Link Inverse Meaning
reference reference Related information (symmetric)
extends extended_by Builds on concepts from another note
refines refined_by Clarifies or improves another note
contradicts contradicted_by Presents opposing views
questions questioned_by Poses questions about another note
supports supported_by Provides evidence for another note
related related Generic relationship (symmetric)

Semantic Search

Beyond keyword matching, semantic search finds notes by meaning. A search for "distributed consensus algorithms" will surface your notes about Raft, Paxos, and Byzantine fault tolerance — even if those exact words never appear in the query.

How it works:

  1. Notes are embedded into 768-dimensional vectors using a local ONNX model
  2. Long notes are split into overlapping chunks, each embedded separately
  3. Queries are embedded and matched against note vectors via cosine similarity (sqlite-vec)
  4. A cross-encoder reranker rescores the top candidates for higher precision
  5. Multi-chunk notes are deduplicated, keeping the best-matching chunk

Everything runs locally. No API calls, no data leaves your machine.

When the [semantic] extra is installed, embeddings auto-enable on startup and models are downloaded in the background on first run.

Hardware-Aware Auto-Tuning

The server detects your hardware on startup and configures itself accordingly. GPU memory, system RAM, and CPU architecture determine batch sizes, token limits, and memory budgets. You don't need to tune anything — but if you want to, every auto-detected value can be overridden with an environment variable.

Tier Batch Size Embed Tokens Rerank Tokens Memory Budget
GPU 16GB+ 64 8192 8192 10 GB
GPU 8GB+ 32 4096 4096 6 GB
GPU small 16 2048 2048 3 GB
CPU 32GB+ 16 8192 4096 8 GB
CPU 16GB+ 8 4096 2048 4 GB
CPU 8GB+ 4 2048 1024 2 GB
CPU small 2 512 512 1 GB

The detected tier is logged at startup (e.g., Hardware auto-tune: gpu-8gb+ (NVIDIA RTX 4070)).

Why These Models

The default embedding and reranker models weren't chosen arbitrarily — they were selected through benchmarking on a real 961-note Zettelkasten knowledge base.

Embedding model: Alibaba-NLP/gte-modernbert-base — selected after evaluating 9 models across 12 configurations (FP32 and INT8, varying chunk sizes). Quality was measured by link prediction MRR and tag coherence; performance by throughput and memory usage. gte-modernbert-base offers the best quality-to-performance ratio with native long-context support (8192 tokens). Models evaluated: all-MiniLM-L6-v2, bge-small-en-v1.5, bge-base-en-v1.5, gte-modernbert-base, nomic-embed-text-v1.5, snowflake-arctic-embed-m-v2.0, snowflake-arctic-embed-l-v2.0, mxbai-embed-large-v1, and Alibaba-NLP/gte-embedding-gemma-300m.

Reranker model: Alibaba-NLP/gte-reranker-modernbert-base — selected after evaluating 10 cross-encoder models across semantic challenge categories (synonym matching, conceptual relationships, negation handling, specificity ranking). It's the only model that consistently improves retrieval quality across all categories while supporting long contexts. Models evaluated: ms-marco-MiniLM-L-6-v2, ms-marco-MiniLM-L-12-v2, bge-reranker-base, bge-reranker-large, gte-reranker-modernbert-base, bge-reranker-v2-m3, jina-reranker-v2-base-multilingual, stsb-distilroberta-base, stsb-roberta-base, and stsb-roberta-large.

Full benchmark methodology and results are in docs/MODEL_SELECTION.md. Raw data lives in benchmarks/. Reproduction scripts are in scripts/.

Dual Storage

Notes are stored as plain Markdown files with YAML frontmatter. That's the source of truth — you can edit them in any text editor, version them with Git, back them up however you like. They're just files.

SQLite sits alongside as an indexing layer: FTS5 for full-text search, sqlite-vec for vector search, and standard tables for the link graph. The database is disposable — delete it and it rebuilds from your Markdown files.

Multi-Process Safety

Multiple Claude Code instances (or other MCP clients) can safely work with the same notes simultaneously:

  • Each process gets its own in-memory SQLite database — no lock contention
  • Git commits on every write provide version hashes for optimistic concurrency control
  • Pass expected_version on updates to detect conflicts instead of silently overwriting

Projects and Organization

Notes are organized into projects (with hierarchical sub-projects via /). Project moves are batch-capable. Obsidian vault mirroring copies notes into a vault directory for browsing in Obsidian, with wikilinks rewritten to match.


Configuration Reference

Create ~/.zettelkasten/.env (recommended) or <project>/.env. Priority: process env > project .env > ~/.zettelkasten/.env > defaults.

Core

Variable Default Description
ZETTELKASTEN_NOTES_DIR ~/.zettelkasten/notes Markdown file storage
ZETTELKASTEN_DATABASE_PATH ~/.zettelkasten/db/zettelkasten.db SQLite database path
ZETTELKASTEN_LOG_LEVEL INFO Logging level
ZETTELKASTEN_GIT_ENABLED true Git versioning for conflict detection
ZETTELKASTEN_IN_MEMORY_DB true Per-process in-memory SQLite
ZETTELKASTEN_OBSIDIAN_VAULT (unset) Obsidian vault path for mirroring

Semantic Search

Requires pip install znote-mcp[semantic] (CPU) or znote-mcp[semantic-gpu] (NVIDIA CUDA 12.x). These are mutually exclusive — don't install both.

Variable Default Description
ZETTELKASTEN_EMBEDDINGS_ENABLED false Auto-enables when deps installed; set false to force off
ZETTELKASTEN_EMBEDDING_MODEL Alibaba-NLP/gte-modernbert-base Embedding model (HuggingFace ID)
ZETTELKASTEN_RERANKER_MODEL Alibaba-NLP/gte-reranker-modernbert-base Reranker model
ZETTELKASTEN_EMBEDDING_DIM 768 Must match model output dimension
ZETTELKASTEN_EMBEDDING_MAX_TOKENS 2048 Max tokens per embedding input (auto-tuned)
ZETTELKASTEN_RERANKER_MAX_TOKENS 2048 Max tokens for reranker input (auto-tuned)
ZETTELKASTEN_EMBEDDING_BATCH_SIZE 8 Batch size for reindex (auto-tuned)
ZETTELKASTEN_EMBEDDING_CHUNK_SIZE 2048 Token threshold for splitting long notes
ZETTELKASTEN_EMBEDDING_CHUNK_OVERLAP 256 Overlap tokens between chunks
ZETTELKASTEN_EMBEDDING_MEMORY_BUDGET_GB 6.0 Memory budget for adaptive batching (auto-tuned)
ZETTELKASTEN_ONNX_PROVIDERS auto auto, cpu, or explicit provider list
ZETTELKASTEN_ONNX_QUANTIZED false INT8 quantized models (~4x smaller, ~97% quality)
ZETTELKASTEN_EMBEDDING_CACHE_DIR HF default Custom model cache directory
ZETTELKASTEN_RERANKER_IDLE_TIMEOUT 600 Seconds before idle reranker unloads (0 = never)

See .env.example for full documentation including memory usage guidance.

Prompting

System prompts, project knowledge, and chat prompts are provided in the docs/ directory:

System promptssystem-prompt.md, system-prompt-with-protocol.md

Project knowledgezettelkasten-methodology-technical.md, link-types-in-zettelkasten-mcp-server.md

Chat promptsknowledge-creation.md, knowledge-creation-batch.md, knowledge-exploration.md, knowledge-synthesis.md

Developer docsExample MCP server, MCP Python SDK


Architecture

Layer Structure

MCP Tools (server/mcp_server.py) ─── 17 registered tools
    │
Services ─── business logic
    ├── zettel_service.py ─── CRUD, links, tags, bulk ops, embedding on write
    ├── search_service.py ─── text, FTS5, semantic, graph traversal
    └── embedding_service.py ─── thread-safe embedding/reranking lifecycle
    │
Repositories ─── storage abstraction
    ├── note_repository.py ─── dual storage (markdown + SQLite + vectors)
    ├── tag_repository.py ─── tag CRUD and batch ops
    ├── link_repository.py ─── semantic link graph
    ├── project_repository.py ─── hierarchical project registry
    └── fts_index.py ─── FTS5 full-text index
    │
Infrastructure
    ├── hardware.py ─── GPU/RAM detection, 7-tier auto-tuning
    ├── onnx_providers.py ─── ONNX Runtime embedding + reranker
    ├── text_chunker.py ─── token-aware chunking with sentence boundaries
    ├── git_wrapper.py ─── git versioning for concurrency
    ├── obsidian_mirror.py ─── vault sync with wikilink rewriting
    ├── resilience_coordinator.py ─── AIMD + circuit breaker orchestration per component
    ├── aimd.py ─── AIMD adaptive memory budget controller
    ├── circuit_breaker.py ─── GPU↔CPU switching with cooldown escalation
    └── setup_manager.py ─── auto-install deps, model warmup

Project Structure

znote-mcp/
├── src/znote_mcp/
│   ├── models/                  # Pydantic schemas + SQLAlchemy ORM
│   ├── storage/                 # Repository layer
│   │   ├── base.py                  # Abstract repository interface
│   │   ├── note_repository.py       # Markdown + SQLite + sqlite-vec
│   │   ├── tag_repository.py        # Tag operations
│   │   ├── link_repository.py       # Link graph
│   │   ├── project_repository.py    # Hierarchical projects
│   │   ├── fts_index.py             # FTS5 index
│   │   ├── git_wrapper.py           # Git versioning
│   │   ├── obsidian_mirror.py       # Obsidian sync
│   │   └── markdown_parser.py       # Frontmatter parsing
│   ├── services/                # Business logic
│   │   ├── zettel_service.py        # Core CRUD + embedding on write
│   │   ├── search_service.py        # All search strategies
│   │   ├── embedding_service.py     # Embedding lifecycle
│   │   ├── embedding_types.py       # Protocol interfaces (PEP 544)
│   │   ├── onnx_providers.py        # ONNX Runtime providers
│   │   ├── resilience_coordinator.py # AIMD + circuit breaker orchestration
│   │   ├── aimd.py                  # AIMD adaptive memory budget controller
│   │   ├── circuit_breaker.py       # GPU↔CPU switching with cooldowns
│   │   └── text_chunker.py          # Token-aware chunking
│   ├── server/mcp_server.py     # MCP server (17 tools)
│   ├── config.py                # Pydantic config with env var support
│   ├── hardware.py              # Hardware detection + auto-tuning
│   ├── setup_manager.py         # Semantic dep auto-install + model warmup
│   ├── backup.py                # Snapshot backup/restore
│   ├── observability.py         # Structured logging and metrics
│   ├── exceptions.py            # Error hierarchy with codes
│   └── main.py                  # Entry point
├── tests/                       # 40 test files, 1030 tests
├── benchmarks/                  # Embedding + reranker benchmark data
├── scripts/                     # Benchmark and utility scripts
├── alembic/                     # Database migrations
├── docs/                        # Prompts, design docs, model selection guide
└── .env.example                 # Full configuration reference

Tech Stack

Component Technology Why
Protocol MCP (Python SDK) Standard for AI tool integration
Database SQLite (WAL mode) + sqlite-vec Zero-config, embedded, vector-capable
Search FTS5 + sqlite-vec KNN Full-text and semantic in one database
Embeddings ONNX Runtime Local inference, no API dependency
Embedding model gte-modernbert-base (768-dim) Best quality/performance ratio (benchmarked)
Reranker gte-reranker-modernbert-base Only model that improves all categories (benchmarked)
Tokenizer HuggingFace tokenizers Fast, model-aligned tokenization
Config Pydantic + python-dotenv Typed config with env var support
ORM SQLAlchemy 2.0 Database abstraction
Migrations Alembic Schema versioning
Versioning Git (subprocess) Optimistic concurrency control
Python 3.10+ Type hints, structural pattern matching

Embedding Interfaces

Embedding and reranker providers use typing.Protocol (PEP 544) for structural subtyping. Any object with the right method signatures works — no base class inheritance required. This makes testing straightforward: swap in a fake provider that returns deterministic vectors.

Adaptive Batching

Rather than fixed batch sizes, the embedding system uses greedy adaptive batching. Given a memory budget and the actual token lengths of pending texts, it packs as many items as possible into each batch without exceeding the budget. Short notes get large batches (fast); long notes get small batches (safe). This replaces the earlier fixed-bucket approach and improves reindex throughput significantly on mixed-length corpora.

AIMD Adaptive Resilience

When ONNX embedding encounters memory pressure (OOM errors, allocation failures), the server adapts automatically using an AIMD (Additive Increase / Multiplicative Decrease) controller — the same algorithm TCP uses for congestion control, applied to GPU memory budgets.

How it works:

  • On success: memory budget increases linearly (additive increase), probing toward full capacity
  • On failure: memory budget halves immediately (multiplicative decrease), reducing pressure fast
  • Circuit breaker: if failures persist or budget hits the 1GB floor, the circuit breaker trips and switches to CPU. After a cooldown, it retries GPU
  • Cross-component linking: stress on the embedder sends a caution signal to the reranker (and vice versa), preventing cascading failures
  • Automatic recovery: unlike the old fixed staircase, AIMD recovers to full capacity when conditions improve — no server restart needed

Agent signaling: state transitions are prepended as inline notices to tool responses, so agents know what's happening:

---
⚠ Embedding system state change
  Event: memory pressure — budget reduced to 1.50GB
  Component: embedder
  Semantic search: operational, may be slower
---

Agent controls via zk_system:

Action Effect
embedding_reset Reset AIMD + circuit breakers to initial state
embedding_force_cpu Force CPU mode (stays until reset)
embedding_disable Disable semantic search for session
embedding_enable Re-enable after manual disable

Embedder and reranker are tracked independently. zk_status reports AIMD phase, budget, circuit breaker state, and provider for each component.

INT8 Quantization

Quantized ONNX models are supported for both embedding and reranking. INT8 models are roughly 4x smaller (143MB vs 569MB) with approximately 97% quality retention. Performance impact is hardware-dependent — faster on some platforms, slower on others. Set ZETTELKASTEN_ONNX_QUANTIZED=true to enable. Falls back to FP32 if quantized model files aren't found.

Tests

1030 tests across 40 files covering unit, integration, E2E, protocol, embedding, concurrency, and resilience:

uv run pytest -v tests/                                    # All tests
uv run pytest --cov=znote_mcp --cov-report=term-missing    # With coverage
uv run pytest tests/test_e2e.py -v                         # E2E (isolated)
uv run pytest tests/test_mcp_protocol.py -v                # MCP protocol (34 tests, no mocking)
Category What it covers
Unit Models, repositories, services, config, hardware detection
Integration Cross-layer, MCP tool wiring, protocol round-trips
E2E Full system workflows in isolated environments
Embeddings Provider, service, chunking, search, reindex (5 phased files)
Concurrency Thread safety, multi-process, versioned operations
Resilience AIMD controller, circuit breaker, coordinator, error injection, OOM recovery, inline notices, agent controls

Important Notice

This software is experimental and provided as-is without warranty of any kind. While efforts have been made to ensure data integrity, it may contain bugs that could potentially lead to data loss or corruption. Always back up your notes regularly.

Acknowledgments

  • Peter J. Herrel (@diggy) and Entanglr — original creators of zettelkasten-mcp
  • Alibaba NLP — gte-modernbert-base and gte-reranker-modernbert-base models (Apache-2.0)
  • This MCP server was built with the assistance of Claude, who helped organize the atomic thoughts of this project into a coherent knowledge graph.

License

MIT License — see LICENSE for details.

About

A Model Context Protocol (MCP) server that implements the Zettelkasten knowledge management methodology, allowing you to create, link, explore and synthesize atomic notes through Claude and other MCP-compatible clients.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages