Production-quality Python implementations of essential system design patterns — rate limiting, circuit breaker, cache-aside, pub/sub, Saga, and CQRS — plus case studies for URL shortener and notification system.
python patterns/rate_limiter.py # Token bucket, sliding window, fixed window
python patterns/circuit_breaker.py # CLOSED/OPEN/HALF-OPEN with metrics
python patterns/cache_aside.py # LRU cache with TTL, hit-rate stats
python patterns/pub_sub.py # Event bus with wildcard subscriptions
python patterns/saga.py # Distributed transaction with compensation
python patterns/cqrs.py # Separate read/write models + event sourcingpython case_studies/url_shortener.py # Base62 encoding, dedup, analytics
python case_studies/notification_system.py # Multi-channel, priority queue, retry| Algorithm | Description | Best for |
|---|---|---|
| Token Bucket | Tokens refill at rate R, allow bursts up to capacity C | APIs with burst tolerance |
| Sliding Window Log | Track exact request timestamps in window | Strict fairness |
| Fixed Window | Simple counter per time window | Basic throttling |
| Leaky Bucket | Smooths burst into constant output rate | Uniform downstream load |
Real world: Stripe uses token bucket. GitHub API uses sliding window. AWS API Gateway uses fixed window.
from patterns.rate_limiter import TokenBucket
limiter = TokenBucket(rate=10, capacity=50) # 10 req/s, burst of 50
if limiter.allow():
process_request()
else:
return 429 # Too Many RequestsThree states:
- CLOSED: all requests pass through normally
- OPEN: fail fast, return error immediately (no call made)
- HALF_OPEN: let one request through to test recovery
Transitions:
- CLOSED → OPEN: after N consecutive failures
- OPEN → HALF_OPEN: after timeout_seconds
- HALF_OPEN → CLOSED: on success
- HALF_OPEN → OPEN: on failure
Real world: Netflix Hystrix, Resilience4j (Java), AWS SDK built-in retry. Used for any external call (DB, HTTP, third-party API).
from patterns.circuit_breaker import CircuitBreaker, CircuitBreakerOpen
cb = CircuitBreaker(failure_threshold=5, timeout_seconds=30)
try:
result = cb.call(external_api.get, user_id)
except CircuitBreakerOpen:
result = get_cached_fallback(user_id)The application manages cache explicitly:
- Read: check cache → miss → load from DB → store in cache → return
- Write: update DB → invalidate cache (NOT update cache)
Why invalidate instead of update on write? Avoids race conditions where two writers overwrite each other's cache entries.
Real world: Every major web application uses this pattern. Redis + PostgreSQL is the canonical stack.
def get_user(user_id):
cached = cache.get(f"user:{user_id}")
if cached:
return cached # hit
user = db.find(user_id) # miss
cache.set(f"user:{user_id}", user, ttl=300)
return user
def update_user(user_id, data):
db.update(user_id, data)
cache.delete(f"user:{user_id}") # invalidatePublishers emit events to topics. Subscribers listen to topics they care about. Publishers don't know who's listening.
Benefits: decoupled services, easy to add new subscribers, audit logging for free.
Real world: Kafka, RabbitMQ, AWS SNS/SQS, Google Pub/Sub. Shopify's order pipeline, Uber's driver location updates.
bus.subscribe("orders.created", send_confirmation_email)
bus.subscribe("orders.*", analytics_tracker) # wildcard
bus.publish("orders.created", {"order_id": "ORD-001", "amount": 99.99})Manage distributed transactions across multiple services. Each step has a compensation (rollback) action.
If step 3 fails → run compensation for step 2, then step 1.
Real world: Uber uses Saga for trip booking (reserve driver → charge payment → update trip). Amazon uses it for order fulfillment.
order_saga = Saga([
SagaStep("reserve_inventory", reserve, release), # with compensation
SagaStep("charge_payment", charge, refund), # with compensation
SagaStep("send_confirmation", confirm, no_op), # no rollback needed
])
order_saga.execute({"product_id": "A1", "amount": 49.99})Command Query Responsibility Segregation: separate models for writing (commands) and reading (queries).
Write model: enforces business rules, emits events Read model: pre-projected, optimized for specific queries (no joins, no computation at read time)
Real world: LinkedIn uses CQRS for the activity feed. Twitter uses it for timeline generation. Event sourcing is often combined with CQRS.
Design decisions:
- Base62 counter encoding (not random hash) — shorter codes, no collision
- 7 chars = 62^7 = 3.5 trillion possible URLs
- MD5-based deduplication: same URL → same short code
- Click analytics with referrer tracking
Design decisions:
- Priority queue (CRITICAL delivered before NORMAL)
- User preference/opt-out checking before dispatch
- Exponential backoff retry (2^attempt seconds)
- Multi-channel: send same notification to email + SMS + push simultaneously
- Scheduled notifications (delay delivery until timestamp)