Persistent JSON caching for Python with async support - Cache function results and object state effortlessly.
Cacherator is a Python library that provides persistent JSON-based caching for class state and function results. It enables developers to cache expensive operations with minimal configuration, supporting both synchronous and asynchronous functions.
- Zero-configuration caching - Simple inheritance and decorator pattern
- Async/await support - Native support for asynchronous functions
- Persistent storage - Cache survives program restarts
- TTL (Time-To-Live) - Automatic cache expiration
- Selective caching - Fine-grained control over what gets cached
- Cache management - Built-in methods for inspection and clearing
- Cache status tracking - Per-call hit/miss detection
- Flexible logging - Global and per-instance control
- DynamoDB backend - Optional cloud cache for cross-machine sharing
pip install cacheratorFor cross-machine cache sharing via DynamoDB:
pip install boto3from cacherator import JSONCache, Cached
import time
class Calculator(JSONCache):
def __init__(self):
super().__init__(data_id="calc")
@Cached()
def expensive_calculation(self, x, y):
time.sleep(2) # Simulate expensive operation
return x ** y
calc = Calculator()
result = calc.expensive_calculation(2, 10) # Takes 2 seconds
result = calc.expensive_calculation(2, 10) # Instant!class APIClient(JSONCache):
@Cached(ttl=1) # Cache for 1 day
async def fetch_user(self, user_id):
# Expensive API call
response = await api.get(f"/users/{user_id}")
return response.json()
client = APIClient()
user = await client.fetch_user(123) # API call
user = await client.fetch_user(123) # Cached!class GameState(JSONCache):
def __init__(self, game_id):
super().__init__(data_id=f"game_{game_id}")
if not hasattr(self, "score"):
self.score = 0
self.level = 1
def add_points(self, points):
self.score += points
self.json_cache_save()
# Session 1
game = GameState("player1")
game.add_points(100)
# Session 2 (after restart)
game = GameState("player1")
print(game.score) # 100 - persisted!Pass dynamodb_table to use DynamoDB as the cache backend instead of local JSON:
from cacherator import JSONCache, Cached
class WebScraper(JSONCache):
def __init__(self):
super().__init__(dynamodb_table='my-cache-table')
@Cached(ttl=7)
def scrape_expensive_data(self, url):
return fetch_data(url)
# On machine 1 (laptop)
scraper = WebScraper()
data = scraper.scrape_expensive_data("https://example.com") # Scrapes and caches
# On machine 2 (EC2 instance) - same code
scraper = WebScraper()
data = scraper.scrape_expensive_data("https://example.com") # Uses cached data!How it works:
- No table specified: saves/loads from local JSON file
- Table specified: saves/loads from DynamoDB only — no local file used
- Compression: Payloads over 100KB are automatically gzip-compressed before writing to DynamoDB, reducing typical HTML payloads by 80-90%. A warning is logged if the compressed payload still exceeds DynamoDB's 400KB item limit.
DynamoDB table:
- Auto-created if missing (requires IAM permissions)
- Partition key:
cache_id(String) - TTL enabled for automatic expiry
- Pay-per-request billing mode
AWS credentials via standard boto3 chain:
- Environment variables:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_REGION - IAM role (recommended for EC2/Lambda)
- AWS credentials file (
~/.aws/credentials)
class WeatherService(JSONCache):
@Cached(ttl=0.25) # 6 hours (0.25 days)
def get_forecast(self, city):
return fetch_weather(city)
@Cached(ttl=30) # 30 days
def get_historical(self, city, year):
return fetch_historical(city, year)class DataProcessor(JSONCache):
def __init__(self):
self._excluded_cache_vars = ["temp_data", "api_key"]
super().__init__()
self.results = {}
self.temp_data = [] # Won't be cached
self.api_key = "secret" # Won't be cachedprocessor = DataProcessor()
# Get cache statistics
stats = processor.json_cache_stats()
print(stats)
# {'total_entries': 5, 'functions': {'process': 3, 'analyze': 2}}
# Clear specific function cache
processor.json_cache_clear("process")
# Clear all cache
processor.json_cache_clear()Detect whether a @Cached method returned cached data or executed the function:
class DataService(JSONCache):
def __init__(self):
super().__init__(data_id="my-service", ttl=7)
@Cached(ttl=7)
def fetch(self, key: str) -> str:
return expensive_operation(key)
svc = DataService()
# last_cache_status is None before any call
print(svc.last_cache_status) # None
svc.fetch("foo")
print(svc.last_cache_status) # "miss" (first run) or "hit" (subsequent runs)
# Full per-call history keyed by function signature
print(svc.cache_status)
# {"fetch('foo',){}": "hit"}Status values:
"hit"— returned from cache"miss"— function was executed (no valid cache entry)None— no@Cachedmethod has been called yet
cache_status is populated on init for all keys loaded from cache, and updated on every @Cached call. It is cleared when json_cache_clear() is called.
from cacherator import JSONCache
# Disable logging globally
JSONCache.set_logging(False)
# Enable logging globally (default)
JSONCache.set_logging(True)
# Per-instance control
processor = DataProcessor(logging=False) # Silent modeWhen logging is enabled:
- DynamoDB operations are logged (table creation, reads, writes)
- Local JSON operations are silent (fast, not interesting)
When logging is disabled:
- All operations are silent
## Configuration
### JSONCache Constructor
```python
JSONCache(
data_id="unique_id", # Unique identifier (default: class name)
directory="cache", # Cache directory (default: "data/cache"), ignored if dynamodb_table set
clear_cache=False, # Clear existing cache on init
ttl=999, # Default TTL in days
logging=True, # Enable logging (True/False)
dynamodb_table=None, # DynamoDB table name — uses DynamoDB instead of local JSON if set
)
@Cached(
ttl=7, # Time-to-live in days (default: class ttl)
clear_cache=False # Clear cache for this function
)class GitHubClient(JSONCache):
def __init__(self):
super().__init__(data_id="github_client", ttl=1)
@Cached(ttl=0.5) # 12 hours
async def get_user(self, username):
async with aiohttp.ClientSession() as session:
async with session.get(f"https://api.github.com/users/{username}") as resp:
return await resp.json()
@Cached(ttl=7) # 1 week
async def get_repos(self, username):
async with aiohttp.ClientSession() as session:
async with session.get(f"https://api.github.com/users/{username}/repos") as resp:
return await resp.json()class UserRepository(JSONCache):
def __init__(self):
super().__init__(data_id="user_repo", ttl=0.1) # 2.4 hours
@Cached()
def get_user_by_id(self, user_id):
return db.query("SELECT * FROM users WHERE id = ?", user_id)
@Cached(ttl=1)
def get_user_stats(self, user_id):
return db.query("SELECT COUNT(*) FROM posts WHERE user_id = ?", user_id)class ModelPredictor(JSONCache):
def __init__(self):
super().__init__(data_id="ml_predictor")
self.model = load_model()
@Cached(ttl=30)
def predict(self, features_hash, features):
# Cache predictions by feature hash
return self.model.predict(features)- Expensive API calls and network requests
- Database queries with relatively static data
- Heavy computational operations
- Machine learning model predictions
- Data transformations and aggregations
- Set short TTL (minutes to hours) for frequently changing data
- Set long TTL (days to weeks) for stable reference data
- Consider data freshness requirements for your application
- Non-deterministic functions (random number generation, timestamps)
- Very fast operations (overhead exceeds benefit)
- Non-JSON-serializable objects without custom handling
- Real-time data without appropriate TTL configuration
Cacherator introduces minimal overhead:
- Cache hit: ~0.1ms
- Cache miss: Function execution time + ~1ms
- Disk I/O: Non-blocking, asynchronous operations
- API calls (100ms - 5s) reduced to ~0.1ms
- Database queries (10ms - 1s) reduced to ~0.1ms
- Heavy computations (1s+) reduced to ~0.1ms
- Python: 3.7 and above
- Async: Full support for async/await syntax
- Operating Systems: Windows, macOS, Linux
- Data Types: All JSON-serializable types plus datetime objects
- Optional Dependencies: boto3 (for DynamoDB backend), dynamorator
- Changed: Simplified to single-backend model — local JSON (default) or DynamoDB (when
dynamodb_tableis set), never both simultaneously - Changed:
cache_statusvalues simplified to"hit"/"miss"(removed"l1"/"l2") - Removed:
save_on_delparameter — no longer needed - Removed: L1→L2 backfill on load
- Removed: Deprecated
json_cache_save_db()method
- Added:
cache_statusdict — per-function-signature hit/miss tracking with L1/L2 source, populated on init and updated on every@Cachedcall - Added:
last_cache_status— status of the most recent@Cachedcall ("l1","l2","miss", orNone) - Changed:
json_cache_clear()now also clearscache_statusentries
- Fixed: L1 cache hits now automatically backfill L2 (DynamoDB) when enabled
- Fixed: Removed misleading
json_cache_save_dbbranch in@Cacheddecorator —json_cache_save()is always used, which handles both L1 and L2
- Added:
save_on_delparameter (defaultFalse) —__del__no longer writes to DynamoDB unless opted in, eliminating unnecessary writes on program exit - Changed:
__del__always saves to local JSON (L1); DynamoDB (L2) write requires explicitjson_cache_save()orsave_on_del=True - Removed: Unreliable dirty-check on
json_cache_save()— saves are now always performed when called
- Added: Automatic gzip compression for DynamoDB payloads exceeding 100KB
- Added: Warning logged when compressed payload still exceeds DynamoDB's 400KB limit
- Added: Compression is transparent — no API changes required
- Fixed:
json_cache_save()now automatically syncs to DynamoDB (L2) when enabled - Deprecated:
json_cache_save_db()is now redundant (usejson_cache_save()instead)
- Added: Optional DynamoDB backend for cross-machine cache sharing via dynamorator
- Added: Two-layer cache architecture (L1: local JSON, L2: DynamoDB)
- Added: Constructor parameter
dynamodb_tablefor enabling DynamoDB - Added: Automatic DynamoDB table creation with TTL support
- Changed: DynamoDB backend now uses dynamorator package
- Changed: Simplified logging to boolean (True/False)
- Removed: Environment variable configuration (use constructor parameter)
- Removed: LogLevel enum (simplified to boolean)
# Explicitly save cache
obj.json_cache_save()
# Check for serialization errors
obj._excluded_cache_vars = ["problematic_attr"]# Verify TTL hasn't expired
obj = MyClass(ttl=30) # Increase TTL
# Ensure arguments are identical (type matters)
obj.func(1, 2) # Different from
obj.func(1.0, 2) # (int vs float)# Exclude large attributes
self._excluded_cache_vars = ["large_data"]
# Use separate cache instances
processor1 = DataProcessor(data_id="dataset1")
processor2 = DataProcessor(data_id="dataset2")Contributions are welcome. Please see CONTRIBUTING.md for guidelines.
MIT License - see LICENSE file for details.
- GitHub Repository: https://github.com/Redundando/cacherator
- Issue Tracker: https://github.com/Redundando/cacherator/issues
- PyPI Package: https://pypi.org/project/cacherator/
Developed by Arved Klöhn