feat(server): CallableSubdomainTenantRouter for DB-backed tenant lookups#544
feat(server): CallableSubdomainTenantRouter for DB-backed tenant lookups#544
Conversation
Adds an adopter-callable :class:`SubdomainTenantRouter` that takes a single sync-or-async callable mapping a normalized host to a Tenant. Framework owns host normalization (lower-case + port-strip) and optionally provides a bounded TTL cache; adopters write ~5 LOC of glue against their tenant table instead of ~25 LOC of hand-rolled routing. Closes salesagent SDK_FEEDBACK round 2 #20. Reference impl in salesagent's core/main.py::_load_tenant_subdomain_map() collapses to a ~5-line CallableSubdomainTenantRouter instantiation. Surface: - CallableSubdomainTenantRouter(resolver, *, cache_size=0, cache_ttl_seconds=0.0) - TenantResolver callable type alias Caching is opt-in (cache_size > 0). Explicit TTL required when caching is enabled — no "cache forever" mode (production safety against stale-tenant footguns). Bounded LRU via OrderedDict (no third-party dependency). Negative results cached too (DOS-style probing can't bypass). invalidate(host=None) for adopter-driven eviction. Memory profile: zero state without caching; with caching, bounded by cache_size entries — typically <1MB for a 1024-entry cache. Designed specifically with salesagent's slow-leak-investigation lens. Tests: 14 new tests covering normalization, sync+async resolvers, cache TTL/bound/invalidation, validation errors, end-to-end through the existing middleware. 30 total pass in test_subdomain_tenant_router.py; 3801 framework tests pass with no regressions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Strong PR — design instincts are right (opt-in caching, explicit TTL required, negative caching, bounded LRU, no deps). A few thoughts: Worth a docstring callout (not a blocker):
Minor — not asking for a change:
Nothing here blocks merge — the design is sound and the test coverage is good. |
…le in CallableSubdomainTenantRouter Per @bokelley review on #544: - Add explicit docstring callout that adopters must call `invalidate(host)` on tenant *creation* too (not only deactivation), because cached None results cause 404s until TTL expires when a new tenant is provisioned mid-cache. - Update `invalidate()` param docs to mention creation alongside deactivation. - Reframe memory-profile claim: sizeof(Tenant) is adopter-controlled (ext field), so "well under 1 MB" is replaced with the honest `cache_size × sizeof(your_Tenant)`. - Add test pinning that case/port variants (Acme.localhost:3001 + acme.localhost) normalize to one cache key and hit the resolver exactly once. https://claude.ai/code/${CLAUDE_CODE_REMOTE_SESSION_ID}
|
Thanks for the thorough review — pushed a follow-up commit (
Items 3 (thundering herd) and 4 (the test suggestion already addressed above) are noted — the thundering-herd callout is a good future issue if DB-pressure spikes show up in salesagent telemetry. Generated by Claude Code |
|
Thanks for the follow-up — the three addressed items (negative-cache onboarding callout, adopter-honest memory framing, and the case/port normalization test) look like solid resolutions. Deferring the thundering-herd callout to a follow-up issue is reasonable given current scope. Triaged by Claude Code. Generated by Claude Code |
…struction (#552) Drop-in alternative to PlatformRouter that defers per-tenant DecisioningPlatform construction to first request, with a bounded LRU + TTL cache. Closes the eager-router pain point for adopters with N tenants × per-tenant SDK auth handshake (Google Ad Manager service-account, Kevel API key) where boot scales O(N). * Async or sync factory; awaited via inspect.isawaitable, matching CallableSubdomainTenantRouter's convention (PR #544). * Bounded cache: cache_size > 0 mandatory (default 256); cache_ttl_seconds >= 0 (default 3600.0; 0 = size-only eviction). Distinct from CallableSubdomainTenantRouter which rejects ttl=0 — there tenants go stale, here platform adapters don't (unless the factory reads mutable config — docstring calls that out). * invalidate(tenant_id=None) for hot-reload. Per-tenant + global generation counter snapshots prevent the invalidate-during-build race from resurrecting an evicted slot. * Drop-in: isinstance(router, DecisioningPlatform) is true, serve() accepts it identically, ACCOUNT_NOT_FOUND / UNSUPPORTED_FEATURE projection matches PlatformRouter. * No singleflight in v1 — concurrent cold requests each build (locked by test_concurrent_cold_requests_each_build_v1_contract). * proposal_managers stays an eager dict (cheap to hold). * platform_for_tenant() async sibling-API parity for admin/health endpoints. * Extracted _select_proposal_method module-level so PlatformRouter and LazyPlatformRouter share the same routing logic without drift. Closes #547. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Adds an adopter-callable
SubdomainTenantRouterthat takes a single sync-or-async callable mapping a normalized host to aTenant. Framework owns host normalization (lower-case + port-strip) and optionally provides a bounded TTL cache; adopters write ~5 LOC of glue against their tenant table instead of ~25 LOC of hand-rolled routing.Closes the salesagent SDK_FEEDBACK round-2 #20. Reference impl in salesagent's
core/main.py::_load_tenant_subdomain_map()is now a ~5-lineCallableSubdomainTenantRouterinstantiation.Surface
Design notes
Caching is opt-in.
cache_size=0(default) skips the cache entirely; the resolver is awaited on every request. Adopters opt in with explicit bounds.Explicit TTL required when caching is enabled. No "cache forever" mode — the constructor raises
ValueErrorwhencache_size > 0andcache_ttl_seconds <= 0. Tenants come and go (suspension, deactivation); long-lived caches without TTL are a stale-tenant footgun.Negative results cached too.
Nonereturns are stored alongside positive hits so DOS-style probing for unknown hosts doesn't bypass the cache.Bounded LRU via
OrderedDict. No third-party dependency. Eviction on overflow ispopitem(last=False)— oldest entry first.Sync or async resolvers. Mirrors
MediaBuyStore's pattern;inspect.isawaitable(result)decides whether toawait.Memory profile
This was designed with the salesagent slow-leak investigation in mind:
resolve()calls the adopter directly.cache_sizeentries × (host string + frozenTenant+ 16-byte expiry float). For a typical 1024-entry cache that's well under 1 MB. No unbounded growth paths.Tests
14 new tests in
tests/test_subdomain_tenant_router.py:Nonecached)time.monotonic)invalidate(host)drops a specific entry;invalidate()clears allinvalidateis a no-op when caching is disabledcache_size > 0withoutcache_ttl_secondscache_sizeSubdomainTenantMiddleware30 total tests pass in
test_subdomain_tenant_router.py; 3801 framework tests pass with no regressions.Test plan
InMemorySubdomainTenantRoutertests still pass# type: ignorein this PR)🤖 Generated with Claude Code