Problem
The subscription endpoints make multiple synchronous Stripe API calls per request with insufficient caching, causing HTTP 408 timeouts in production. Observed repeated 408s on GET /v1/users/me/subscription in prod logs (2026-04-17 21:46–00:20 UTC).
Each request to the subscription endpoint can make up to 7 Stripe API calls:
Stripe.Subscription.retrieve() — get current price ID
- 3 plans × 2 intervals = 6
Stripe.Price.retrieve() calls — build available plans
The available-plans endpoint (GET /v1/payments/available-plans) is worse — zero caching, plus an additional Stripe.Subscription.retrieve() and subscription schedule lookups.
Current Caching State
| Data |
Endpoint |
Cached? |
Details |
| Stripe Price (by price_id) |
/v1/users/me/subscription |
✅ Redis, 24h TTL |
stripe_price:{price_id} key |
| Stripe Price (by price_id) |
/v1/payments/available-plans |
❌ None |
Same data, but no cache |
| Stripe Subscription (by sub_id) |
Both endpoints |
❌ None |
Called every request for current_price_id |
| User subscription (Firestore) |
Both endpoints |
❌ None |
Single doc read, fast |
| App review config |
/v1/users/me/subscription |
✅ Memory, 60s TTL |
should_hide_subscription_ui |
Architecture
Layer 1: Stripe Price Cache (extend existing)
- What:
Stripe.Price.retrieve(price_id) results
- Where: Redis, key
stripe_price:{price_id}
- TTL: 24 hours (prices change rarely — only on plan restructure)
- Scope: Already implemented in
users.py subscription endpoint. Extend to payment.py available-plans endpoint using the same get_generic_cache/set_generic_cache pattern.
- Invalidation: Manual — flush keys when price IDs change (deploy-time). Acceptable since price changes are rare and planned.
- Files:
backend/routers/payment.py lines 256-293
Layer 2: Stripe Subscription Cache (new)
- What:
Stripe.Subscription.retrieve(subscription_id) results (status, current price, schedule)
- Where: Redis, key
stripe_sub:{subscription_id}
- TTL: 5–10 minutes (short — changes on upgrade/cancel/renewal)
- Scope: Both
users.py (line 807) and payment.py (line 194)
- Invalidation:
- TTL-based (5-10 min) covers most cases
- Explicit invalidation on write paths:
upgrade_subscription_endpoint, cancel_subscription, Stripe webhook handler (stripe_webhook)
- On invalidation, delete
stripe_sub:{subscription_id} so next read fetches fresh
- Files:
backend/routers/users.py line 807, backend/routers/payment.py lines 193-228
Layer 3: Available Plans Catalog Cache (new, optional)
- What: The fully-assembled
available_plans list (plan definitions + resolved prices)
- Where: Redis, key
available_plans_catalog:{version_gate} (keyed by new_plans_enabled bool)
- TTL: 1 hour
- Scope: Both endpoints build the same plan catalog. Could be computed once and shared.
- Invalidation: Flush on price ID env var changes (deploy-time) or Stripe price updates.
- Trade-off: Higher complexity. Layer 1+2 may be sufficient — measure before implementing.
Call Flow (before vs after)
BEFORE (up to 7 Stripe calls per request):
Client → /subscription
→ Firestore: get_user_subscription(uid)
→ Stripe: Subscription.retrieve(sub_id) ← SLOW, no cache
→ Stripe: Price.retrieve(neo_monthly) ← cached (24h)
→ Stripe: Price.retrieve(neo_annual) ← cached (24h)
→ Stripe: Price.retrieve(operator_monthly) ← cached (24h)
→ Stripe: Price.retrieve(operator_annual) ← cached (24h)
→ Stripe: Price.retrieve(architect_monthly) ← cached (24h)
→ Stripe: Price.retrieve(architect_annual) ← cached (24h)
AFTER (0-1 Stripe calls when warm):
Client → /subscription
→ Firestore: get_user_subscription(uid)
→ Redis: stripe_sub:{sub_id} ← HIT (5-10min TTL)
→ Redis: stripe_price:{neo_monthly} ← HIT (24h TTL)
→ Redis: stripe_price:{neo_annual} ← HIT
→ Redis: stripe_price:{operator_monthly} ← HIT
→ Redis: stripe_price:{operator_annual} ← HIT
→ Redis: stripe_price:{architect_monthly} ← HIT
→ Redis: stripe_price:{architect_annual} ← HIT
Implementation Notes
- Use existing
get_generic_cache/set_generic_cache from database.redis_db — no new infra needed
- Redis is fail-open in this codebase (errors caught + logged, requests proceed with fresh Stripe call)
- Stripe Subscription cache value should store the full
.to_dict() result (same pattern as price cache)
- Invalidation helpers: add
invalidate_stripe_sub_cache(subscription_id) called from upgrade/cancel/webhook paths
- The
reconcile_basic_plan_with_stripe function (line 796) also calls Stripe.Subscription.retrieve — can use the same cache
Priority
High — this directly causes 408 timeouts on the subscription endpoint in production, which makes the mobile app's subscription management card disappear silently (subscription data fails to load).
Related
Problem
The subscription endpoints make multiple synchronous Stripe API calls per request with insufficient caching, causing HTTP 408 timeouts in production. Observed repeated 408s on
GET /v1/users/me/subscriptionin prod logs (2026-04-17 21:46–00:20 UTC).Each request to the subscription endpoint can make up to 7 Stripe API calls:
Stripe.Subscription.retrieve()— get current price IDStripe.Price.retrieve()calls — build available plansThe available-plans endpoint (
GET /v1/payments/available-plans) is worse — zero caching, plus an additionalStripe.Subscription.retrieve()and subscription schedule lookups.Current Caching State
/v1/users/me/subscriptionstripe_price:{price_id}key/v1/payments/available-planscurrent_price_id/v1/users/me/subscriptionshould_hide_subscription_uiArchitecture
Layer 1: Stripe Price Cache (extend existing)
Stripe.Price.retrieve(price_id)resultsstripe_price:{price_id}users.pysubscription endpoint. Extend topayment.pyavailable-plans endpoint using the sameget_generic_cache/set_generic_cachepattern.backend/routers/payment.pylines 256-293Layer 2: Stripe Subscription Cache (new)
Stripe.Subscription.retrieve(subscription_id)results (status, current price, schedule)stripe_sub:{subscription_id}users.py(line 807) andpayment.py(line 194)upgrade_subscription_endpoint,cancel_subscription, Stripe webhook handler (stripe_webhook)stripe_sub:{subscription_id}so next read fetches freshbackend/routers/users.pyline 807,backend/routers/payment.pylines 193-228Layer 3: Available Plans Catalog Cache (new, optional)
available_planslist (plan definitions + resolved prices)available_plans_catalog:{version_gate}(keyed bynew_plans_enabledbool)Call Flow (before vs after)
Implementation Notes
get_generic_cache/set_generic_cachefromdatabase.redis_db— no new infra needed.to_dict()result (same pattern as price cache)invalidate_stripe_sub_cache(subscription_id)called from upgrade/cancel/webhook pathsreconcile_basic_plan_with_stripefunction (line 796) also callsStripe.Subscription.retrieve— can use the same cachePriority
High — this directly causes 408 timeouts on the subscription endpoint in production, which makes the mobile app's subscription management card disappear silently (subscription data fails to load).
Related