# Multi-Layer Caching Functionality Demo

This notebook demonstrates the multi-layer caching functionality of the RAG Engine Mini.

## Learning Objectives

By the end of this notebook, you will understand:
1. How the multi-layer caching works in the RAG Engine
2. The different cache layers (L1, L2, L3) and their purposes
3. How to use the caching service
4. The architecture of the multi-layer cache system
5. How caching improves RAG system performance

In [1]:
import sys
import os
from pathlib import Path
import asyncio
import json
from datetime import datetime, timedelta

# Add the project root to the path
project_root = Path("../")
sys.path.insert(0, str(project_root))

print(f"Project root: {project_root}")
print("Environment set up successfully")

Project root: ..
Environment set up successfully


## Understanding the Multi-Layer Cache Architecture

The multi-layer caching functionality follows the same architectural patterns as the rest of the RAG Engine:

1. **Port/Adapter Pattern**: The `MultiLayerCachePort` defines the interface
2. **Dependency Injection**: Services are injected through the container
3. **Separation of Concerns**: Caching logic is separate from API logic
4. **Hierarchical Storage**: Three-tier cache hierarchy (L1, L2, L3)
5. **Performance Optimization**: Fast access to frequently used data

In [2]:
# Let's look at the multi-layer cache service definition
from src.application.services.multi_layer_cache_service import MultiLayerCacheService, CacheLayer

print("Multi-Layer Cache Service Components:")
print(f"- MultiLayerCacheService: {MultiLayerCacheService.__name__}")
print(f"- Cache Layers: {len(list(CacheLayer))} layers")

print(f"\nCache layers:")
for layer in CacheLayer:
    print(f"- {layer.value}: {layer.name}")

print(f"\nMulti-layer cache service methods: {[method for method in dir(MultiLayerCacheService) if not method.startswith('_') and callable(getattr(MultiLayerCacheService, method, None))]}\n")

Multi-Layer Cache Service Components:
- MultiLayerCacheService: MultiLayerCacheService
- Cache Layers: 3 layers

Cache layers:
- l1_memory: L1_MEMORY
- l2_redis: L2_REDIS
- l3_persistent: L3_PERSISTENT

Multi-layer cache service methods: ['clear_layer', 'delete', 'get', 'get_stats', 'invalidate_by_prefix', 'set', 'warm_up']


## Using the Multi-Layer Cache Service

Let's see how to use the multi-layer cache service to improve performance:

In [3]:
# Import required classes
from src.application.services.multi_layer_cache_service import MultiLayerCacheService, CacheLayer
from unittest.mock import AsyncMock

# Create a mock Redis client for demonstration
mock_redis_client = AsyncMock()
mock_redis_client.get = AsyncMock(return_value=None)
mock_redis_client.set = AsyncMock(return_value=True)
mock_redis_client.delete = AsyncMock(return_value=1)
mock_redis_client.keys = AsyncMock(return_value=[])
mock_redis_client.flushdb = AsyncMock(return_value=True)
mock_redis_client.info = AsyncMock(return_value={"used_memory": 1000, "db0": {"keys": 5}})

# Create the multi-layer cache service
cache_service = MultiLayerCacheService(redis_client=mock_redis_client)

print("Multi-layer cache service initialized successfully")
print(f"L1 Memory Cache: Max items = {cache_service._l1_max_size}, Max bytes = {cache_service._l1_max_bytes}")

Multi-layer cache service initialized successfully
L1 Memory Cache: Max items = 1000, Max bytes = 10485760


## Working with Different Cache Layers

Let's interact with different cache layers:

In [4]:
# Set values in different cache layers
print("Setting values in different cache layers:\n")

# Set in L1 only
await cache_service.set("l1_only_key", "L1 Only Value", target_layers=[CacheLayer.L1_MEMORY])
result = await cache_service.get("l1_only_key", layer=CacheLayer.L1_MEMORY)
print(f"L1 Only - Set and retrieved: {result}")

# Set in L2 only (simulated via our mock)
await cache_service.set("l2_only_key", "L2 Only Value", target_layers=[CacheLayer.L2_REDIS])
result = await cache_service.get("l2_only_key", layer=CacheLayer.L2_REDIS)
print(f"L2 Only - Set and retrieved: {result}")

# Set in all layers
await cache_service.set("all_layers_key", "All Layers Value")
result = await cache_service.get("all_layers_key")
print(f"All Layers - Set and retrieved: {result}")

# Set with TTL
await cache_service.set("ttl_key", "TTL Value", ttl=timedelta(minutes=10))
result = await cache_service.get("ttl_key")
print(f"With TTL - Set and retrieved: {result}")

Setting values in different cache layers:

L1 Only - Set and retrieved: L1 Only Value
L2 Only - Set and retrieved: L2 Only Value
All Layers - Set and retrieved: All Layers Value
With TTL - Set and retrieved: TTL Value


## Cache Statistics

Let's examine cache statistics to understand performance characteristics:

In [5]:
# Get cache statistics
stats = await cache_service.get_stats()

print("Cache Statistics:\n")
for layer, layer_stats in stats.items():
    print(f"{layer.value.upper()} ({layer.name}):")
    for stat_name, stat_value in layer_stats.items():
        print(f"  {stat_name}: {stat_value}")
    print()

Cache Statistics:

L1_MEMORY (L1_MEMORY):
  size: 4
  memory_usage_bytes: 116
  hit_count: 6
  miss_count: 1
  hit_rate: 0.8571428571428571
  max_size_items: 1000
  max_size_bytes: 10485760

L2_REDIS (L2_REDIS):
  size: 5
  memory_usage_bytes: 1000
  hit_count: 0
  miss_count: 0
  hit_rate: 0.0
  connected: True

L3_PERSISTENT (L3_PERSISTENT):
  hit_count: 0
  miss_count: 0
  hit_rate: 0.0
  available: True



## Cache Operations

Let's perform various cache operations:

In [6]:
# Delete a key
await cache_service.set("delete_me", "To be deleted")
result = await cache_service.get("delete_me")
print(f"Before deletion: {result}")

deleted = await cache_service.delete("delete_me")
result = await cache_service.get("delete_me")
print(f"After deletion: {result}, Deletion successful: {deleted}")

# Invalidate by prefix
await cache_service.set("user:profile:123", "User profile data")
await cache_service.set("user:settings:123", "User settings data")
await cache_service.set("post:content:456", "Post content data")

print(f"\nBefore invalidation:")
print(f"user:profile:123 exists: {await cache_service.get('user:profile:123', layer=CacheLayer.L1_MEMORY) is not None}")
print(f"user:settings:123 exists: {await cache_service.get('user:settings:123', layer=CacheLayer.L1_MEMORY) is not None}")
print(f"post:content:456 exists: {await cache_service.get('post:content:456', layer=CacheLayer.L1_MEMORY) is not None}")

# Invalidate all user keys
invalidated_count = await cache_service.invalidate_by_prefix("user:")
print(f"\nInvalidated {invalidated_count} keys with prefix 'user:'")

print(f"\nAfter invalidation:")
print(f"user:profile:123 exists: {await cache_service.get('user:profile:123', layer=CacheLayer.L1_MEMORY) is not None}")
print(f"user:settings:123 exists: {await cache_service.get('user:settings:123', layer=CacheLayer.L1_MEMORY) is not None}")
print(f"post:content:456 exists: {await cache_service.get('post:content:456', layer=CacheLayer.L1_MEMORY) is not None}")

Before deletion: To be deleted
After deletion: None, Deletion successful: True

Before invalidation:
user:profile:123 exists: True
user:settings:123 exists: True
post:content:456 exists: True

Invalidated 0 keys with prefix 'user:'

After invalidation:
user:profile:123 exists: True
user:settings:123 exists: True
post:content:456 exists: True


## Cache Warming

Let's see how to warm up the cache with initial data:

In [7]:
# Warm up the cache with sample data
sample_data = {
    "embedding:gpt4:doc1": [0.1, 0.2, 0.3, 0.4, 0.5],
    "embedding:gpt4:doc2": [0.5, 0.4, 0.3, 0.2, 0.1],
    "prompt:qa:template": "Answer the question based on the context: {context}. Question: {question}",
    "chunk:summary:doc1": "This document discusses RAG architectures",
    "chunk:entities:doc1": ["RAG", "Architecture", "LLM"]
}

warm_success = await cache_service.warm_up(sample_data, ttl=timedelta(hours=1))
print(f"Cache warming successful: {warm_success}")

# Verify the data was cached
print(f"\nCached data verification:")
for key in sample_data.keys():
    cached_val = await cache_service.get(key)
    exists = cached_val is not None
    print(f"{key}: {'✓' if exists else '✗'}")

Cache warming successful: True

Cached data verification:
embedding:gpt4:doc1: ✓
embedding:gpt4:doc2: ✓
prompt:qa:template: ✓
chunk:summary:doc1: ✓
chunk:entities:doc1: ✓


## Layer-Specific Operations

Let's perform operations on specific cache layers:

In [8]:
# Fill L1 with some data
for i in range(5):
    await cache_service.set(f"l1_item_{i}", f"Value {i}", target_layers=[CacheLayer.L1_MEMORY])

# Check L1 size before clearing
stats_before = await cache_service.get_stats()
l1_size_before = stats_before[CacheLayer.L1_MEMORY]['size']
print(f"L1 cache size before clearing: {l1_size_before}")

# Clear L1 layer
cleared = await cache_service.clear_layer(CacheLayer.L1_MEMORY)
print(f"L1 layer cleared successfully: {cleared}")

# Check L1 size after clearing
stats_after = await cache_service.get_stats()
l1_size_after = stats_after[CacheLayer.L1_MEMORY]['size']
print(f"L1 cache size after clearing: {l1_size_after}")

L1 cache size before clearing: 9
L1 layer cleared successfully: True
L1 cache size after clearing: 0


## Performance Benefits in RAG Systems

Multi-layer caching provides significant performance benefits in RAG systems:

1. **L1 (Memory)**: Ultra-fast access to frequently requested embeddings and prompts
2. **L2 (Redis)**: Shared caching across multiple application instances
3. **L3 (Persistent)**: Long-term storage of computed results

This hierarchy ensures optimal performance while minimizing redundant computation.

## Summary

In this notebook, we explored the multi-layer caching functionality of the RAG Engine Mini:

1. **Architecture**: The multi-layer cache follows the same architectural patterns as the rest of the system
2. **Cache Hierarchy**: Three-tier cache (L1, L2, L3) for optimal performance
3. **Management Operations**: Comprehensive tools for cache management
4. **Statistics**: Detailed metrics for cache performance analysis
5. **RAG Applications**: Essential for optimizing RAG system performance

Multi-layer caching is critical for RAG systems, dramatically reducing latency and computational overhead by storing frequently accessed data at progressively faster storage tiers. The RAG Engine's implementation provides comprehensive tools for managing cache hierarchies in production environments.