chore(devex): cut heavy SDK imports off Django startup path#60635
Conversation
Django app-load imported a pile of vendor SDKs eagerly, slowing every process that runs django.setup() — manage.py shell, migrate, web workers, and every CI step. Two aggregators did most of the damage: the data-imports source registry (sources/__init__.py eager-loaded ~150 source modules and their SDKs to self-register) and a handful of viewsets/models pulling SDKs at module top. Make them lazy: - SourceRegistry self-populates on first use instead of at package import. Source modules move to a _load_all helper; back-compat preserved via module __getattr__ so `from ...sources import XSource` still resolves. - Defer trafilatura, matplotlib, elevenlabs, databricks/bigquery/snowflake destination tests, and stripe/anthropic into the functions that use them. - Add ruff TID253 banning module-level imports of elevenlabs/matplotlib/ trafilatura (single lazy call site each) to stop regressions. Cuts shell startup ~16s -> ~12s locally. The remaining big lever — dropping the posthog.temporal.ai preload — is blocked on an 18-module import cycle in the ee.hogai agent core (surfaced via grimp) and needs its own PR.
test_integration.py patched StripeClient/Anthropic at posthog.models.integration, but those moved to function-local imports. Patch the source modules (stripe.StripeClient, anthropic.Anthropic) instead, matching where the names are now looked up.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 036c24d86f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
posthog/temporal/data_imports/sources/common/registry.py:15-25
`_loaded` is set to `True` before `load_all_sources()` finishes. If any source module raises an `ImportError` partway through `_load_all.py` (e.g., a broken optional dependency), the exception propagates to the first caller, but `_loaded` is already `True`. Every subsequent call to `_ensure_loaded` returns immediately, leaving the registry permanently incomplete for the lifetime of the process — subsequent lookups either silently return a partial source list or raise a `ValueError` for the missing source, with no indication that loading ever failed.
```suggestion
@classmethod
def _ensure_loaded(cls) -> None:
# Sources self-register via @SourceRegistry.register on import. We import them
# on first registry use rather than at package import, so a process that never
# touches the registry (most of them) doesn't pay for every vendor SDK at startup.
if cls._loaded:
return
cls._loaded = True
try:
from posthog.temporal.data_imports.sources import load_all_sources
load_all_sources()
except Exception:
cls._loaded = False
raise
```
Reviews (1): Last reviewed commit: "fix(devex): patch stripe/anthropic at so..." | Re-trigger Greptile |
PLC0415 (import-outside-top-level) is currently exempted for most dirs, but those exemptions are being removed over time. Pre-empt that: tag the lazy imports introduced for startup-perf with `# noqa: PLC0415` so this cleanup doesn't regress when the rule is turned on. The TID253-banned trio (elevenlabs/matplotlib/trafilatura) is already exempt from PLC0415 via the ban, so it needs no marker.
There was a problem hiding this comment.
Pull request overview
This PR reduces Django startup overhead by moving heavyweight vendor SDK imports off module-load paths and into first-use code paths, especially for data-import source registration and integration-related SDK clients.
Changes:
- Reworks
SourceRegistryto load source modules lazily via_load_all.py. - Moves several SDK imports (
stripe,elevenlabs,trafilatura,matplotlib, destination-test SDKs) into runtime call sites. - Adds Ruff
TID253enforcement for selected heavy module-level imports.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
pyproject.toml |
Enables TID253 and bans selected heavy module-level imports. |
products/user_interviews/backend/presentation/views.py |
Lazily initializes and caches the ElevenLabs client. |
products/business_knowledge/backend/html_parse.py |
Defers trafilatura import until HTML parsing. |
products/batch_exports/backend/api/destination_tests/__init__.py |
Defers destination test imports per destination branch. |
posthog/temporal/data_imports/sources/common/registry.py |
Adds lazy source-registry loading. |
posthog/temporal/data_imports/sources/_load_all.py |
Centralizes imports that trigger source self-registration. |
posthog/temporal/data_imports/sources/__init__.py |
Replaces eager source re-exports with lazy loading helpers and __getattr__. |
posthog/temporal/ai/anomaly_investigation/charts.py |
Defers matplotlib.pyplot import until chart rendering. |
posthog/models/integration.py |
Defers Anthropic and Stripe client imports to usage sites. |
posthog/api/test/test_integration.py |
Updates mocks to patch SDK modules directly after lazy import changes. |
posthog/api/integration.py |
Defers stripe import used for install signature verification. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…gration API Address review feedback on the startup-import cleanup: - SourceRegistry._ensure_loaded set `_loaded` before the import finished, so a failed load (broken optional dep) would cache as "loaded" permanently and a concurrent web-worker request could read a half-populated registry. Use a double-checked lock and flip `_loaded` only after load_all_sources() succeeds. - api/integration.py still imported the Anthropic exception classes at module scope, pulling the SDK onto the startup path. Defer them into the one method that catches them.
|
i'm really curious to see the impact of this, did you run some benchmarks? |
|
It's in the description, 25% reduction. Not sure how much more to benchmark. I could look at reduction in import work in the profiler report I guess.
|
| [tool.ruff.lint.flake8-tidy-imports] | ||
| # Heavy SDKs that noticeably slow startup. They must be imported lazily (inside the | ||
| # function that uses them), not at module top, so they stay off the Django app-load path. | ||
| # Only modules with a single, already-lazy call site are listed — vendor source/destination | ||
| # leaf modules legitimately import their own SDK at module level and are loaded on demand. | ||
| banned-module-level-imports = ["elevenlabs", "matplotlib", "trafilatura"] |
There was a problem hiding this comment.
Any impact to the ruff PLC0415 rule? Wondering if we need to mark explicit exceptions to that once that rule is more enforceable
There was a problem hiding this comment.
Good thinking!
I've marked location that we definitely should not un-inline with a (current superflous) noqa PLC0415 and also adjusted guidance (AGENTS.md) in a separate PR.
|
🎭 Playwright report · View test results →
These issues are not necessarily caused by your changes. |
…p-speed # Conflicts: # posthog/temporal/data_imports/sources/__init__.py
|
No dependency changes detected. Learn more about Socket for GitHub. 👍 No dependency changes detected in pull request |
The second `if cls._loaded` guard inside the lock is genuinely reachable at runtime — another thread can flip the flag while we wait on the lock. mypy narrows the flag to falsy after the first guard and can't model the concurrent mutation, so it marks the inner return unreachable. Suppress narrowly rather than restructure the lock.
Problem
Django app-load eagerly imports a pile of heavy vendor SDKs, so every process that runs
django.setup()pays for them —manage.py shell,migrate, web workers, and every CI step that boots Django. Profilingmanage.py shell -c passshowed most of the weight sits behind two eager aggregators rather than the code that actually uses the SDKs.Changes
Make the heavy imports lazy so they load on first use, not at app-load:
data_importsSourceRegistryself-populates lazily.sources/__init__.pyused to eager-import ~150 source modules so each could self-register via a decorator — dragging bingads/suds, google-ads, and friends into every startup. Sources now live in a_load_allhelper that the registry imports on first use (get_source/get_all_sources/…). Back-compat is preserved via a module-level__getattr__, sofrom ...sources import StripeSourcestill resolves.models/integration.py+api/integration.py).TID253bans module-level imports ofelevenlabs/matplotlib/trafilatura(each has a single, already-lazy call site) so this doesn't regress. Vendor source/destination leaf modules legitimately import their SDK at module level and are loaded on demand, so they're not banned.Local
manage.py shell -c passdrops from ~16s to ~12s (min of 5 runs).Out of scope / follow-up: the other big lever is dropping the
import posthog.temporal.aipreload inposthog/api/__init__.py. That preload masks an 18-module import cycle in theee.hogaiagent core (tools↔chat_agent↔agent_modes.presets↔mcp_tool, surfaced withgrimp) which has no safe import entry point. Untangling it touches the AI agent core and needs the eval suite, so it's left for a dedicated PR.How did you test this code?
Agent-authored (Claude Code). No manual UI testing. Automated tests run locally with
--reuse-db:posthog/temporal/data_imports/sources/bigquery/tests/test_source.py— 5 passed (exercisesSourceRegistry)posthog/models/test/test_integration_model.py— 106 passed (stripe/anthropic lazy paths)products/data_warehouse/backend/api/test/test_external_data_source.py— 191 passed (mainSourceRegistryconsumer)ruff check+ruff formatclean;TID253clean repo-wide;ty checkpassed via lint-stagedAlso verified via
sys.modulesafterdjango.setup()that elevenlabs, trafilatura, matplotlib, bingads, suds, and google-ads are no longer imported at startup.🤖 Agent context
Authored with Claude Code (Opus). Approach and key decisions:
temporal.aipreload — is entangled with a real circular-import knot. Usedgrimpto compute strongly-connected components and confirmed it's an 18-module SCC with no safe entry point (even preloadingAssistantGraphfirst crashes). Deliberately scoped that out rather than untangle the AI core blind; recommend an import-linter contract to drive and lock that follow-up.__getattr__back-compat for the registry over deleting the re-exports, after confirming all consumers use onlySourceRegistry(no source classes imported by name).Agent-assisted, human-reviewed — not self-merged.