An intelligent middleware server that adds memory and learning capabilities to Ollama by implementing the MCP (Model Context Protocol) pattern.
MCP Memory Server acts as a smart proxy between your applications (like ELVIS) and Ollama, automatically enriching prompts with relevant context and learning from every interaction.
- Automatic Context Enrichment: Searches Brain memory and adds relevant context to prompts
- Learning from Experience: Tracks what works and improves over time
- Model-Specific Optimization: Learns each model's strengths and best practices
- Similar Task Recognition: Finds and applies lessons from similar past tasks
- Drop-in Ollama Replacement: Compatible with existing Ollama API clients
- MCP Tool Integration: Access to Brain, filesystem, and other MCP tools
Your App (ELVIS) → MCP Memory Server → Ollama
↓
Brain Memory System
Learning Engine
Context Enricher
# Clone the repository
git clone [repository-url]
cd mcp-memory-server
# Install dependencies
npm install
# Build the TypeScript code
npm run build
# Copy environment template
cp .env.example .env
# Edit .env with your settings
Create a .env
file with:
# Server Configuration
PORT=8090 # Port for Memory Server
OLLAMA_URL=http://localhost:11434 # Ollama API endpoint
MCP_URL=http://localhost:3000 # MCP tools endpoint (optional)
# Memory Configuration
MAX_CONTEXT_TOKENS=2000 # Max tokens to add as context
SIMILARITY_LIMIT=10 # How many similar memories to search
RELEVANCE_THRESHOLD=0.3 # Min relevance score (0-1)
AUTO_ENRICH=true # Enable automatic enrichment
# Brain Integration
BRAIN_ENABLED=true # Enable Brain memory system
BRAIN_DATA_DIR=~/.brain # Brain data directory
# Cache Configuration
CACHE_TTL=3600 # Cache TTL in seconds
CACHE_MAX_SIZE=1000 # Max cache entries
# Start the server
npm start
# Or with custom settings
PORT=9000 OLLAMA_URL=http://remote:11434 npm start
Simply point ELVIS to the Memory Server instead of Ollama:
// Before (direct to Ollama)
const elvis = new ELVIS({
ollamaUrl: 'http://localhost:11434'
});
// After (through Memory Server)
const elvis = new ELVIS({
ollamaUrl: 'http://localhost:8090' // Memory Server port
});
The server provides Ollama-compatible endpoints plus additional memory endpoints:
POST /api/generate
- Generate text (with automatic enrichment)POST /api/generate/stream
- Streaming generationGET /api/tags
- List available models
GET /api/memory/stats
- Get memory statisticsPOST /api/memory/search
- Search memoriesGET /api/memory/insights
- Get recent insights
POST /api/learning/feedback
- Provide feedback on responsesGET /api/learning/model-stats/:model
- Get model performance stats
When a request comes in, the server:
- Extracts keywords from the prompt
- Searches Brain for relevant memories
- Finds similar past tasks
- Adds model-specific tips
- Includes recent insights
- Builds an enriched prompt with all context
After each response, the server:
- Assesses response quality
- Identifies the approach used
- Extracts insights and patterns
- Stores successful patterns
- Updates model performance metrics
The server tracks several types of memory:
- Task Memories: Complete record of past tasks and outcomes
- Model Context: Performance stats and best practices per model
- Insights: Learned patterns and successful approaches
- Domain Knowledge: Subject-specific information
Original prompt:
Analyze the performance bottlenecks in our Brain memory system
Enriched prompt (automatically generated):
=== MODEL GUIDANCE ===
You are deepseek-r1, with these strengths:
- deep analysis
- complex reasoning
- step-by-step thinking
Tips for best results:
- Use "think step by step" in prompts
- Excellent for mathematical proofs
=== RELEVANT CONTEXT ===
Context 1 (relevance: 89%):
Previous Brain performance analysis showed query optimization...
Context 2 (relevance: 76%):
Memory indexing strategies that improved recall speed by 40%...
=== SIMILAR PAST TASKS ===
Task 1: Analyze todo-manager performance issues
Approach: Profiling-driven analysis
Quality: 92%
Duration: 38 minutes
Key learning: Identifying hotspots first saved significant time
=== RECENT INSIGHTS ===
1. Using profiler data improves analysis accuracy
2. Database queries are often the bottleneck
3. Caching strategies significantly impact performance
=== CURRENT TASK ===
Analyze the performance bottlenecks in our Brain memory system
src/
├── types.ts # TypeScript interfaces
├── server.ts # Express server setup
├── MemoryManager.ts # Memory storage and retrieval
├── ContextEnricher.ts # Prompt enrichment logic
├── LearningEngine.ts # Learning from interactions
├── clients/
│ ├── BrainClient.ts # Brain memory integration
│ └── OllamaClient.ts # Ollama API client
└── index.ts # Entry point
npm test # Run all tests
npm run test:watch # Watch mode
npm run test:coverage # With coverage
npm run build # Build TypeScript
npm run dev # Watch mode
You can extend the memory system by implementing custom memory providers:
class CustomMemoryProvider {
async search(query: string): Promise<Memory[]> {
// Your custom search logic
}
}
Add model-specific configurations in the code:
modelStats.set('your-model', {
strengths: ['domain expertise'],
avgResponseTime: 20 * 60 * 1000,
tips: ['Works best with examples']
});
The server logs all interactions for analysis:
- Request/response times
- Memory hit rates
- Model performance trends
- Quality assessments
-
Ollama Connection Failed
- Ensure Ollama is running:
curl http://localhost:11434
- Check OLLAMA_URL in .env
- Ensure Ollama is running:
-
Brain Not Available
- Server works without Brain but with limited memory
- Check MCP_URL configuration
-
High Memory Usage
- Adjust CACHE_MAX_SIZE
- Implement cache cleanup
-
Slow Enrichment
- Reduce SIMILARITY_LIMIT
- Increase RELEVANCE_THRESHOLD
- Vector database integration for semantic search
- Web UI for memory management
- Multi-user support with isolated memories
- Plugin system for custom enrichers
- Metrics dashboard
- Memory export/import
- A/B testing for enrichment strategies
This is a proof-of-concept for the MCP middleware pattern. Contributions welcome!
Key areas for contribution:
- Better learning algorithms
- More sophisticated context selection
- Additional memory providers
- Performance optimizations
- Testing infrastructure
MIT
Built with curiosity by MikeyBeez & Claude 🧠✨