A secure, vector-based memory server for Claude Desktop using sqlite-vec
and sentence-transformers
. This MCP server provides persistent semantic memory capabilities that enhance AI coding assistants by remembering and retrieving relevant coding experiences, solutions, and knowledge.
- π Semantic Search: Vector-based similarity search using 384-dimensional embeddings
- πΎ Persistent Storage: SQLite database with vector indexing via
sqlite-vec
- π·οΈ Smart Organization: Categories and tags for better memory organization
- π Security First: Input validation, path sanitization, and resource limits
- β‘ High Performance: Fast embedding generation with
sentence-transformers
- π§Ή Auto-Cleanup: Intelligent memory management and cleanup tools
- π Rich Statistics: Comprehensive memory database analytics
- π Automatic Deduplication: SHA-256 content hashing prevents storing duplicate memories
- π Access Tracking: Monitors memory usage with access counts and timestamps for optimization
- π§ Smart Cleanup Algorithm: Prioritizes memory retention based on recency, access patterns, and importance
Component | Technology | Purpose |
---|---|---|
Vector DB | sqlite-vec | Vector storage and similarity search |
Embeddings | sentence-transformers/all-MiniLM-L6-v2 | 384D text embeddings |
MCP Framework | FastMCP | High-level tools-only server |
Dependencies | uv script headers | Self-contained deployment |
Security | Custom validation | Path/input sanitization |
Testing | pytest + coverage | Comprehensive test suite |
vector-memory-mcp/
βββ main.py # Main MCP server entry point
βββ README.md # This documentation
βββ requirements.txt # Python dependencies
βββ pyproject.toml # Modern Python project config
βββ .python-version # Python version specification
βββ claude-desktop-config.example.json # Claude Desktop config example
β
βββ src/ # Core package modules
β βββ __init__.py # Package initialization
β βββ models.py # Data models & configuration
β βββ security.py # Security validation & sanitization
β βββ embeddings.py # Sentence-transformers wrapper
β βββ memory_store.py # SQLite-vec operations
β
βββ .gitignore # Git exclusions
This project is organized for clarity and ease of use:
main.py
- Start here! Main server entry pointsrc/
- Core implementation (security, embeddings, memory store)claude-desktop-config.example.json
- Configuration template
New here? Start with main.py
and claude-desktop-config.example.json
- Python 3.10 or higher (recommended: 3.11)
- uv package manager
- Claude Desktop app
Installing uv (if not already installed):
macOS and Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh
Verify installation:
uv --version
-
Clone the project:
git clone <repository-url> cd vector-memory-mcp
-
Install dependencies (automatic with uv): Dependencies are automatically managed via inline metadata in main.py. No manual installation needed.
To verify dependencies:
uv pip list
-
Test the server:
# Test with sample working directory uv run main.py --working-dir ./test-memory
-
Configure Claude Desktop:
Copy the example configuration:
cp claude-desktop-config.example.json ~/path/to/your/config/
Open Claude Desktop Settings β Developer β Edit Config, and add (replace paths with absolute paths):
{ "mcpServers": { "vector-memory": { "command": "uv", "args": [ "run", "/absolute/path/to/vector-memory-mcp/main.py", "--working-dir", "/your/project/path" ] } } }
Important: Use absolute paths, not relative paths.
-
Restart Claude Desktop and look for the MCP integration icon.
Store coding experiences, solutions, and insights:
Please store this memory:
Content: "Fixed React useEffect infinite loop by adding dependency array with [userId, apiKey]. The issue was that the effect was recreating the API call function on every render."
Category: bug-fix
Tags: ["react", "useEffect", "infinite-loop", "hooks"]
Find relevant memories using natural language:
Search for: "React hook dependency issues"
See what you've stored recently:
Show me my 10 most recent memories
View memory database statistics:
Show memory database statistics
Clean up old, unused memories:
Clear memories older than 30 days, keep max 1000 total
Category | Use Cases |
---|---|
code-solution |
Working code snippets, implementations |
bug-fix |
Bug fixes and debugging approaches |
architecture |
System design decisions and patterns |
learning |
New concepts, tutorials, insights |
tool-usage |
Tool configurations, CLI commands |
debugging |
Debugging techniques and discoveries |
performance |
Optimization strategies and results |
security |
Security considerations and fixes |
other |
Everything else |
The server requires working directory specification:
# Run with uv (recommended)
uv run main.py --working-dir /path/to/project
# Working directory is where memory database will be stored
uv run main.py --working-dir ~/projects/my-project
your-project/
βββ memory/
β βββ vector_memory.db # SQLite database with vectors
βββ src/ # Your project files
βββ other-files...
- Max memory content: 10,000 characters
- Max total memories: 10,000 entries
- Max search results: 50 per query
- Max tags per memory: 10 tags
- Path validation: Blocks suspicious characters
# Store a useful code pattern
"Implemented JWT refresh token logic using axios interceptors"
# Store a debugging discovery
"Memory leak in React was caused by missing cleanup in useEffect"
# Store architecture decisions
"Chose Redux Toolkit over Context API for complex state management because..."
# Store team conventions
"Team coding style: always use async/await instead of .then() chains"
# Store deployment procedures
"Production deployment requires running migration scripts before code deploy"
# Store infrastructure knowledge
"AWS RDS connection pooling settings for high-traffic applications"
# Store learning insights
"Understanding JavaScript closures: inner functions have access to outer scope"
# Store performance discoveries
"Using React.memo reduced re-renders by 60% in the dashboard component"
# Store security learnings
"OWASP Top 10: Always sanitize user input to prevent XSS attacks"
The server uses sentence-transformers to convert your memories into 384-dimensional vectors that capture semantic meaning:
Query | Finds Memories About |
---|---|
"authentication patterns" | JWT, OAuth, login systems, session management |
"database performance" | SQL optimization, indexing, query tuning, caching |
"React state management" | useState, Redux, Context API, state patterns |
"API error handling" | HTTP status codes, retry logic, error responses |
- 0.9+ similarity: Extremely relevant, almost exact matches
- 0.8-0.9: Highly relevant, strong semantic similarity
- 0.7-0.8: Moderately relevant, good contextual match
- 0.6-0.7: Somewhat relevant, might be useful
- <0.6: Low relevance, probably not helpful
The get_memory_stats
tool provides comprehensive insights:
{
"total_memories": 247,
"memory_limit": 10000,
"usage_percentage": 2.5,
"categories": {
"code-solution": 89,
"bug-fix": 67,
"learning": 45,
"architecture": 23,
"debugging": 18,
"other": 5
},
"recent_week_count": 12,
"database_size_mb": 15.7,
"health_status": "Healthy"
}
- total_memories: Current number of memories stored in the database
- memory_limit: Maximum allowed memories (default: 10,000)
- usage_percentage: Database capacity usage (total_memories / memory_limit * 100)
- categories: Breakdown of memory count by category type
- recent_week_count: Number of memories created in the last 7 days
- database_size_mb: Physical size of the SQLite database file on disk
- health_status: Overall database health indicator based on usage and performance metrics
- Sanitizes all user input to prevent injection attacks
- Removes control characters and null bytes
- Enforces length limits on all content
- Validates and normalizes all file paths
- Prevents directory traversal attacks
- Blocks suspicious character patterns
- Limits total memory count and individual memory size
- Prevents database bloat and memory exhaustion
- Implements cleanup mechanisms for old data
- Uses parameterized queries exclusively
- No dynamic SQL construction from user input
- SQLite WAL mode for safe concurrent access
# Check if uv is installed
uv --version
# Test server manually
uv run main.py --working-dir ./test
# Check Python version
python --version # Should be 3.10+
- Verify absolute paths in configuration
- Check Claude Desktop logs:
~/Library/Logs/Claude/
- Restart Claude Desktop after config changes
- Test server manually before configuring Claude
- Verify sentence-transformers model downloaded successfully
- Check database file permissions in memory/ directory
- Try broader search terms
- Review memory content for relevance
- Run
get_memory_stats
to check database health - Use
clear_old_memories
to clean up old entries - Consider increasing hardware resources for embedding generation
Run the server manually to see detailed logs:
uv run main.py --working-dir ./debug-test
Store multiple related memories by calling the tool multiple times through Claude Desktop interface.
Use tags to organize by project:
["project-alpha", "frontend", "react"]
["project-beta", "backend", "node"]
["project-gamma", "devops", "docker"]
["javascript", "react", "hooks"]
["python", "django", "orm"]
["aws", "lambda", "serverless"]
["authentication", "security", "jwt"]
["performance", "optimization", "caching"]
["testing", "unit-tests", "mocking"]
"Code review insight: Extract validation logic into separate functions for better testability and reusability"
"Sprint retrospective: Using feature flags reduced deployment risk and enabled faster rollbacks"
"Technical debt: UserService class has grown too large, needs refactoring into smaller domain-specific services"
Based on testing with various dataset sizes:
Memory Count | Search Time | Storage Size | RAM Usage |
---|---|---|---|
1,000 | <50ms | ~5MB | ~100MB |
5,000 | <100ms | ~20MB | ~200MB |
10,000 | <200ms | ~40MB | ~300MB |
Tested on MacBook Air M1 with sentence-transformers/all-MiniLM-L6-v2
The memory store uses 4 optimized indexes for performance:
- idx_category: Speeds up category-based filtering and statistics
- idx_created_at: Optimizes temporal queries and recent memory retrieval
- idx_content_hash: Enables fast deduplication checks via SHA-256 hash lookups
- idx_access_count: Improves cleanup algorithm efficiency by tracking usage patterns
Content deduplication uses SHA-256 hashing to prevent storing identical memories:
- Hash calculated on normalized content (trimmed, lowercased)
- Check performed before insertion
- Duplicate attempts return existing memory ID
- Reduces storage overhead and maintains data quality
Each memory tracks usage statistics for intelligent management:
- access_count: Number of times memory retrieved via search or direct access
- last_accessed_at: Timestamp of most recent access
- created_at: Original creation timestamp
- Used by cleanup algorithm to identify valuable vs. stale memories
Smart cleanup prioritizes memory retention based on multiple factors:
- Recency: Newer memories are prioritized over older ones
- Access patterns: Frequently accessed memories are protected
- Age threshold: Configurable days_old parameter for hard cutoff
- Count limit: Maintains max_memories cap by removing least valuable entries
- Scoring system: Combines access_count and recency for retention decisions
Advanced operations available through internal implementation:
- get_memory_by_id(memory_id): Direct memory retrieval by unique ID for internal operations
- delete_memory(memory_id): Permanently removes specific memory from both metadata and vector tables
These methods support advanced workflows and custom integrations beyond standard MCP tools.
This is a standalone MCP server designed for personal/team use. For improvements:
- Fork the repository
- Modify as needed for your use case
- Test thoroughly with your specific requirements
- Share improvements via pull requests
This project is released under the MIT License.
- sqlite-vec: Alex Garcia's excellent SQLite vector extension
- sentence-transformers: Nils Reimers' semantic embedding library
- FastMCP: Anthropic's high-level MCP framework
- Claude Desktop: For providing the MCP integration platform
Built for developers who want persistent AI memory without the complexity of dedicated vector databases.