A persistent vector storage system focused on CRUD operations with Redis caching and graph-based memory relationships. Search functionality has been moved to the mcp__omni__ service for better cross-service coordination.
Latest Update: January 2025
✅ Simplified API - CRUD operations only, search moved to omni service
✅ Clean Architecture - Clear separation of concerns between services
✅ Testing Verified - 90% API success rate, comprehensive test coverage
✅ Performance Validated - Sub-second response times, efficient caching
✅ Infrastructure Ready - Docker deployment, Redis health monitoring
Architecture Change: Search and query operations now handled by mcp__omni__ service
- Vector HashMap: Fast hash-based lookups with vector embeddings
- Graph Structure: Content nodes connected by contextual edges
- Semantic Analysis: Similarity detection and deduplication
- Persistence: Redis caching + ChromaDB vector storage
- Queue Management: Async request processing with buffering
- Edge Weighting: Automatic weight normalization and resolve tracking
- Inference Engine: Query-based memory retrieval with context
┌─────────────────┐
│ FastAPI App │
├─────────────────┤
│ Middleware │
│ - Cache │
│ - RateLimit │
│ - Queue │
├─────────────────┤
│ Vector HashMap │
│ - Embeddings │
│ - Graph Store │
├─────────────────┤
│ Storage │
│ - Redis │
│ - ChromaDB │
└─────────────────┘
├── src/ # Source code
│ ├── api/ # FastAPI REST endpoints
│ │ └── main.py # Application entry point
│ ├── core/ # Core business logic
│ │ ├── vector_hashmap.py # Main memory system
│ │ └── middleware.py # Request processing middleware
│ └── models/ # Data models and schemas
│ └── models.py # Pydantic models
├── tests/ # Test suite
│ ├── unit/ # Unit tests (future)
│ ├── integration/ # Integration tests (future)
│ ├── reports/ # Test result reports
│ └── *.py # Current test files
├── static/ # Web interface
│ ├── index.html # Main web UI
│ ├── js/app.js # Frontend JavaScript
│ └── css/styles.css # Styling
├── docs/ # Documentation
│ └── README.md # Documentation index
└── README.md # This file
pip install -r requirements.txtdocker-compose up -d- Start Redis:
redis-server- Run the application:
uvicorn src.api.main:app --reloadThe memory service now focuses exclusively on CRUD operations. All search functionality has been moved to the mcp__omni__ service.
New API Functions:
mcp__memory__remember- Store new memories (Create)mcp__memory__update- Update existing memories (Update)mcp__memory__forget- Delete memories (Delete)mcp__memory__list- List memories with pagination (Read)mcp__memory__get- Get specific memory by ID (Read)mcp__memory__stats- Get statisticsmcp__memory__health- Health check
Removed Functions (use mcp__omni__ instead):
→ Usememory/searchmcp__omni__search→ Usememory/querymcp__omni__search→ Usememory/relatedmcp__omni__get_related
See MEMORY_SERVICE_API.md for complete API documentation.
mcp__memory__remember
{
"content": ["word1", "word2", "word3"],
"context": "brief context"
}mcp__memory__update
{
"node_id": "node-123",
"content": "updated content",
"context": "new context"
}mcp__memory__forget
{
"node_id": "node-123"
}mcp__memory__list
{
"offset": 0,
"limit": 100
}POST /memory/infer?query=your+question&context_limit=5GET /memory/statistics- Content Nodes: Individual concepts/words with embeddings
- Context Nodes: Contextual relationships between content
- Connect content nodes
- Weighted by distance and frequency
resolvecounter increases with repetition- Context embedding for semantic relationships
pytest test_memory_system.py -v| Parameter | Default | Description |
|---|---|---|
| redis_url | redis://localhost:6379 | Redis connection URL |
| chroma_persist_dir | ./chroma_db | ChromaDB storage directory |
| embedding_model | all-MiniLM-L6-v2 | Sentence transformer model |
| cache_ttl | 3600 | Cache TTL in seconds |
| max_edges_per_node | 100 | Maximum edges per node |
from models import ContentInput
import httpx
# Store memory
input_data = {
"content": ["machine", "learning", "algorithm"],
"context": "AI basics"
}
response = httpx.post("http://localhost:8000/memory/store", json=input_data)
# Query memory
query_data = {
"query": "explain machine learning",
"top_k": 10,
"threshold": 0.7
}
response = httpx.post("http://localhost:8000/memory/query", json=query_data)
print(response.json())- Embeddings are cached in Redis with configurable TTL
- Batch processing for large inputs
- Rate limiting prevents overload
- Request queue manages concurrent operations
- Edge weight normalization prevents unbounded growth
- Additional abstraction layers
- Advanced inference algorithms
- Multi-modal embeddings
- Distributed processing
- Real-time synchronization