MCP Memory Server

An intelligent middleware server that adds memory and learning capabilities to Ollama by implementing the MCP (Model Context Protocol) pattern.

Overview

MCP Memory Server acts as a smart proxy between your applications (like ELVIS) and Ollama, automatically enriching prompts with relevant context and learning from every interaction.

Key Features

Automatic Context Enrichment: Searches Brain memory and adds relevant context to prompts
Learning from Experience: Tracks what works and improves over time
Model-Specific Optimization: Learns each model's strengths and best practices
Similar Task Recognition: Finds and applies lessons from similar past tasks
Drop-in Ollama Replacement: Compatible with existing Ollama API clients
MCP Tool Integration: Access to Brain, filesystem, and other MCP tools

Architecture

Your App (ELVIS) → MCP Memory Server → Ollama
                          ↓
                    Brain Memory System
                    Learning Engine
                    Context Enricher

Installation

# Clone the repository
git clone [repository-url]
cd mcp-memory-server

# Install dependencies
npm install

# Build the TypeScript code
npm run build

# Copy environment template
cp .env.example .env

# Edit .env with your settings

Configuration

Create a .env file with:

# Server Configuration
PORT=8090                          # Port for Memory Server
OLLAMA_URL=http://localhost:11434  # Ollama API endpoint
MCP_URL=http://localhost:3000      # MCP tools endpoint (optional)

# Memory Configuration
MAX_CONTEXT_TOKENS=2000            # Max tokens to add as context
SIMILARITY_LIMIT=10                # How many similar memories to search
RELEVANCE_THRESHOLD=0.3            # Min relevance score (0-1)
AUTO_ENRICH=true                   # Enable automatic enrichment

# Brain Integration
BRAIN_ENABLED=true                 # Enable Brain memory system
BRAIN_DATA_DIR=~/.brain           # Brain data directory

# Cache Configuration
CACHE_TTL=3600                     # Cache TTL in seconds
CACHE_MAX_SIZE=1000               # Max cache entries

Usage

Starting the Server

# Start the server
npm start

# Or with custom settings
PORT=9000 OLLAMA_URL=http://remote:11434 npm start

Using with ELVIS

Simply point ELVIS to the Memory Server instead of Ollama:

// Before (direct to Ollama)
const elvis = new ELVIS({
  ollamaUrl: 'http://localhost:11434'
});

// After (through Memory Server)
const elvis = new ELVIS({
  ollamaUrl: 'http://localhost:8090'  // Memory Server port
});

API Endpoints

The server provides Ollama-compatible endpoints plus additional memory endpoints:

Ollama-Compatible Endpoints

POST /api/generate - Generate text (with automatic enrichment)
POST /api/generate/stream - Streaming generation
GET /api/tags - List available models

Memory Management Endpoints

GET /api/memory/stats - Get memory statistics
POST /api/memory/search - Search memories
GET /api/memory/insights - Get recent insights

Learning Endpoints

POST /api/learning/feedback - Provide feedback on responses
GET /api/learning/model-stats/:model - Get model performance stats

How It Works

1. Context Enrichment

When a request comes in, the server:

Extracts keywords from the prompt
Searches Brain for relevant memories
Finds similar past tasks
Adds model-specific tips
Includes recent insights
Builds an enriched prompt with all context

2. Learning Process

After each response, the server:

Assesses response quality
Identifies the approach used
Extracts insights and patterns
Stores successful patterns
Updates model performance metrics

3. Memory Types

The server tracks several types of memory:

Task Memories: Complete record of past tasks and outcomes
Model Context: Performance stats and best practices per model
Insights: Learned patterns and successful approaches
Domain Knowledge: Subject-specific information

Example Enrichment

Original prompt:

Analyze the performance bottlenecks in our Brain memory system

Enriched prompt (automatically generated):

=== MODEL GUIDANCE ===
You are deepseek-r1, with these strengths:
- deep analysis
- complex reasoning
- step-by-step thinking

Tips for best results:
- Use "think step by step" in prompts
- Excellent for mathematical proofs

=== RELEVANT CONTEXT ===
Context 1 (relevance: 89%):
Previous Brain performance analysis showed query optimization...

Context 2 (relevance: 76%):
Memory indexing strategies that improved recall speed by 40%...

=== SIMILAR PAST TASKS ===
Task 1: Analyze todo-manager performance issues
Approach: Profiling-driven analysis
Quality: 92%
Duration: 38 minutes
Key learning: Identifying hotspots first saved significant time

=== RECENT INSIGHTS ===
1. Using profiler data improves analysis accuracy
2. Database queries are often the bottleneck
3. Caching strategies significantly impact performance

=== CURRENT TASK ===
Analyze the performance bottlenecks in our Brain memory system

Development

Project Structure

src/
├── types.ts              # TypeScript interfaces
├── server.ts             # Express server setup
├── MemoryManager.ts      # Memory storage and retrieval
├── ContextEnricher.ts    # Prompt enrichment logic
├── LearningEngine.ts     # Learning from interactions
├── clients/
│   ├── BrainClient.ts    # Brain memory integration
│   └── OllamaClient.ts   # Ollama API client
└── index.ts              # Entry point

Running Tests

npm test                    # Run all tests
npm run test:watch         # Watch mode
npm run test:coverage      # With coverage

Building

npm run build              # Build TypeScript
npm run dev               # Watch mode

Advanced Usage

Custom Memory Sources

You can extend the memory system by implementing custom memory providers:

class CustomMemoryProvider {
  async search(query: string): Promise<Memory[]> {
    // Your custom search logic
  }
}

Model-Specific Configurations

Add model-specific configurations in the code:

modelStats.set('your-model', {
  strengths: ['domain expertise'],
  avgResponseTime: 20 * 60 * 1000,
  tips: ['Works best with examples']
});

Monitoring and Metrics

The server logs all interactions for analysis:

Request/response times
Memory hit rates
Model performance trends
Quality assessments

Troubleshooting

Common Issues

Ollama Connection Failed
- Ensure Ollama is running: curl http://localhost:11434
- Check OLLAMA_URL in .env
Brain Not Available
- Server works without Brain but with limited memory
- Check MCP_URL configuration
High Memory Usage
- Adjust CACHE_MAX_SIZE
- Implement cache cleanup
Slow Enrichment
- Reduce SIMILARITY_LIMIT
- Increase RELEVANCE_THRESHOLD

Future Enhancements

Vector database integration for semantic search
Web UI for memory management
Multi-user support with isolated memories
Plugin system for custom enrichers
Metrics dashboard
Memory export/import
A/B testing for enrichment strategies

Contributing

This is a proof-of-concept for the MCP middleware pattern. Contributions welcome!

Key areas for contribution:

Better learning algorithms
More sophisticated context selection
Additional memory providers
Performance optimizations
Testing infrastructure

License

MIT

Built with curiosity by MikeyBeez & Claude 🧠✨

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
examples		examples
plugins		plugins
src		src
test		test
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
config.json		config.json
docker-compose.yml		docker-compose.yml
fix-build.js		fix-build.js
memory-cli.js		memory-cli.js
memory-server.service		memory-server.service
package-lock.json		package-lock.json
package.json		package.json
package.json.backup		package.json.backup
simple-server.js		simple-server.js
start.sh		start.sh
tsconfig.json		tsconfig.json

MikeyBeez/mcp-memory-server

Folders and files

Latest commit

History

Repository files navigation