Skip to content

chore(devex): cut heavy SDK imports off Django startup path#60635

Merged
webjunkie merged 6 commits into
masterfrom
chore/django-startup-speed
Jun 1, 2026
Merged

chore(devex): cut heavy SDK imports off Django startup path#60635
webjunkie merged 6 commits into
masterfrom
chore/django-startup-speed

Conversation

@webjunkie
Copy link
Copy Markdown
Contributor

Problem

Django app-load eagerly imports a pile of heavy vendor SDKs, so every process that runs django.setup() pays for them — manage.py shell, migrate, web workers, and every CI step that boots Django. Profiling manage.py shell -c pass showed most of the weight sits behind two eager aggregators rather than the code that actually uses the SDKs.

Changes

Make the heavy imports lazy so they load on first use, not at app-load:

  • data_imports SourceRegistry self-populates lazily. sources/__init__.py used to eager-import ~150 source modules so each could self-register via a decorator — dragging bingads/suds, google-ads, and friends into every startup. Sources now live in a _load_all helper that the registry imports on first use (get_source/get_all_sources/…). Back-compat is preserved via a module-level __getattr__, so from ...sources import StripeSource still resolves.
  • Defer leaf SDKs into the functions that use them: trafilatura (business_knowledge), matplotlib (anomaly charts), elevenlabs (user_interviews, via a cached client factory), databricks/bigquery/snowflake (batch-export destination tests — per-branch import in the factory), and stripe/anthropic (models/integration.py + api/integration.py).
  • Ruff TID253 bans module-level imports of elevenlabs/matplotlib/trafilatura (each has a single, already-lazy call site) so this doesn't regress. Vendor source/destination leaf modules legitimately import their SDK at module level and are loaded on demand, so they're not banned.

Local manage.py shell -c pass drops from ~16s to ~12s (min of 5 runs).

Out of scope / follow-up: the other big lever is dropping the import posthog.temporal.ai preload in posthog/api/__init__.py. That preload masks an 18-module import cycle in the ee.hogai agent core (toolschat_agentagent_modes.presetsmcp_tool, surfaced with grimp) which has no safe import entry point. Untangling it touches the AI agent core and needs the eval suite, so it's left for a dedicated PR.

How did you test this code?

Agent-authored (Claude Code). No manual UI testing. Automated tests run locally with --reuse-db:

  • posthog/temporal/data_imports/sources/bigquery/tests/test_source.py5 passed (exercises SourceRegistry)
  • posthog/models/test/test_integration_model.py106 passed (stripe/anthropic lazy paths)
  • products/data_warehouse/backend/api/test/test_external_data_source.py191 passed (main SourceRegistry consumer)
  • batch-export destination tests skip locally (need real cloud credentials)
  • ruff check + ruff format clean; TID253 clean repo-wide; ty check passed via lint-staged

Also verified via sys.modules after django.setup() that elevenlabs, trafilatura, matplotlib, bingads, suds, and google-ads are no longer imported at startup.

🤖 Agent context

Authored with Claude Code (Opus). Approach and key decisions:

  • Started from a pyinstrument profile of shell startup, cross-checked against Django startup best practices (Adam Johnson, the BeyondPricing lazy-imports writeup, ruff TID253).
  • Found the headline win — removing the temporal.ai preload — is entangled with a real circular-import knot. Used grimp to compute strongly-connected components and confirmed it's an 18-module SCC with no safe entry point (even preloading AssistantGraph first crashes). Deliberately scoped that out rather than untangle the AI core blind; recommend an import-linter contract to drive and lock that follow-up.
  • Discovered most SDK weight hides behind two aggregators (the source registry and the AI preload), so per-viewset lazy imports only pay off once the aggregator is also addressed — hence prioritizing the registry rewrite.
  • Chose lazy self-population + __getattr__ back-compat for the registry over deleting the re-exports, after confirming all consumers use only SourceRegistry (no source classes imported by name).

Agent-assisted, human-reviewed — not self-merged.

webjunkie added 2 commits May 29, 2026 11:55
Django app-load imported a pile of vendor SDKs eagerly, slowing every
process that runs django.setup() — manage.py shell, migrate, web workers,
and every CI step. Two aggregators did most of the damage: the data-imports
source registry (sources/__init__.py eager-loaded ~150 source modules and
their SDKs to self-register) and a handful of viewsets/models pulling SDKs
at module top.

Make them lazy:
- SourceRegistry self-populates on first use instead of at package import.
  Source modules move to a _load_all helper; back-compat preserved via
  module __getattr__ so `from ...sources import XSource` still resolves.
- Defer trafilatura, matplotlib, elevenlabs, databricks/bigquery/snowflake
  destination tests, and stripe/anthropic into the functions that use them.
- Add ruff TID253 banning module-level imports of elevenlabs/matplotlib/
  trafilatura (single lazy call site each) to stop regressions.

Cuts shell startup ~16s -> ~12s locally. The remaining big lever — dropping
the posthog.temporal.ai preload — is blocked on an 18-module import cycle in
the ee.hogai agent core (surfaced via grimp) and needs its own PR.
test_integration.py patched StripeClient/Anthropic at
posthog.models.integration, but those moved to function-local imports.
Patch the source modules (stripe.StripeClient, anthropic.Anthropic) instead,
matching where the names are now looked up.
@webjunkie webjunkie marked this pull request as ready for review May 29, 2026 11:14
Copilot AI review requested due to automatic review settings May 29, 2026 11:14
@assign-reviewers-posthog assign-reviewers-posthog Bot requested review from a team, MattBro, fercgomes and rafaeelaudibert and removed request for a team May 29, 2026 11:14
@webjunkie webjunkie removed request for a team May 29, 2026 11:15
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 036c24d86f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread posthog/temporal/data_imports/sources/common/registry.py Outdated
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
posthog/temporal/data_imports/sources/common/registry.py:15-25
`_loaded` is set to `True` before `load_all_sources()` finishes. If any source module raises an `ImportError` partway through `_load_all.py` (e.g., a broken optional dependency), the exception propagates to the first caller, but `_loaded` is already `True`. Every subsequent call to `_ensure_loaded` returns immediately, leaving the registry permanently incomplete for the lifetime of the process — subsequent lookups either silently return a partial source list or raise a `ValueError` for the missing source, with no indication that loading ever failed.

```suggestion
    @classmethod
    def _ensure_loaded(cls) -> None:
        # Sources self-register via @SourceRegistry.register on import. We import them
        # on first registry use rather than at package import, so a process that never
        # touches the registry (most of them) doesn't pay for every vendor SDK at startup.
        if cls._loaded:
            return
        cls._loaded = True
        try:
            from posthog.temporal.data_imports.sources import load_all_sources

            load_all_sources()
        except Exception:
            cls._loaded = False
            raise
```

Reviews (1): Last reviewed commit: "fix(devex): patch stripe/anthropic at so..." | Re-trigger Greptile

Comment thread posthog/temporal/data_imports/sources/common/registry.py Outdated
PLC0415 (import-outside-top-level) is currently exempted for most dirs, but
those exemptions are being removed over time. Pre-empt that: tag the lazy
imports introduced for startup-perf with `# noqa: PLC0415` so this cleanup
doesn't regress when the rule is turned on. The TID253-banned trio
(elevenlabs/matplotlib/trafilatura) is already exempt from PLC0415 via the
ban, so it needs no marker.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces Django startup overhead by moving heavyweight vendor SDK imports off module-load paths and into first-use code paths, especially for data-import source registration and integration-related SDK clients.

Changes:

  • Reworks SourceRegistry to load source modules lazily via _load_all.py.
  • Moves several SDK imports (stripe, elevenlabs, trafilatura, matplotlib, destination-test SDKs) into runtime call sites.
  • Adds Ruff TID253 enforcement for selected heavy module-level imports.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pyproject.toml Enables TID253 and bans selected heavy module-level imports.
products/user_interviews/backend/presentation/views.py Lazily initializes and caches the ElevenLabs client.
products/business_knowledge/backend/html_parse.py Defers trafilatura import until HTML parsing.
products/batch_exports/backend/api/destination_tests/__init__.py Defers destination test imports per destination branch.
posthog/temporal/data_imports/sources/common/registry.py Adds lazy source-registry loading.
posthog/temporal/data_imports/sources/_load_all.py Centralizes imports that trigger source self-registration.
posthog/temporal/data_imports/sources/__init__.py Replaces eager source re-exports with lazy loading helpers and __getattr__.
posthog/temporal/ai/anomaly_investigation/charts.py Defers matplotlib.pyplot import until chart rendering.
posthog/models/integration.py Defers Anthropic and Stripe client imports to usage sites.
posthog/api/test/test_integration.py Updates mocks to patch SDK modules directly after lazy import changes.
posthog/api/integration.py Defers stripe import used for install signature verification.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread posthog/temporal/data_imports/sources/common/registry.py Outdated
Comment thread posthog/api/integration.py Outdated
…gration API

Address review feedback on the startup-import cleanup:

- SourceRegistry._ensure_loaded set `_loaded` before the import finished, so a
  failed load (broken optional dep) would cache as "loaded" permanently and a
  concurrent web-worker request could read a half-populated registry. Use a
  double-checked lock and flip `_loaded` only after load_all_sources() succeeds.
- api/integration.py still imported the Anthropic exception classes at module
  scope, pulling the SDK onto the startup path. Defer them into the one method
  that catches them.
Copy link
Copy Markdown
Contributor

i'm really curious to see the impact of this, did you run some benchmarks?

@webjunkie
Copy link
Copy Markdown
Contributor Author

It's in the description, 25% reduction. Not sure how much more to benchmark. I could look at reduction in import work in the profiler report I guess.

Local manage.py shell -c pass drops from ~16s to ~12s (min of 5 runs).

Comment thread pyproject.toml
Comment on lines +374 to +379
[tool.ruff.lint.flake8-tidy-imports]
# Heavy SDKs that noticeably slow startup. They must be imported lazily (inside the
# function that uses them), not at module top, so they stay off the Django app-load path.
# Only modules with a single, already-lazy call site are listed — vendor source/destination
# leaf modules legitimately import their own SDK at module level and are loaded on demand.
banned-module-level-imports = ["elevenlabs", "matplotlib", "trafilatura"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any impact to the ruff PLC0415 rule? Wondering if we need to mark explicit exceptions to that once that rule is more enforceable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking!
I've marked location that we definitely should not un-inline with a (current superflous) noqa PLC0415 and also adjusted guidance (AGENTS.md) in a separate PR.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

🎭 Playwright report · View test results →

⚠️ 3 flaky tests:

  • Inline editing insight title via compact card popover (chromium)
  • Change date range and toggle comparison (chromium)
  • launch basic survey from multivariant feature flag (chromium)

These issues are not necessarily caused by your changes.
Annoyed by this comment? Help fix flakies and failures and it'll disappear!

…p-speed

# Conflicts:
#	posthog/temporal/data_imports/sources/__init__.py
@socket-security
Copy link
Copy Markdown

socket-security Bot commented Jun 1, 2026

No dependency changes detected. Learn more about Socket for GitHub.

👍 No dependency changes detected in pull request

The second `if cls._loaded` guard inside the lock is genuinely
reachable at runtime — another thread can flip the flag while we wait
on the lock. mypy narrows the flag to falsy after the first guard and
can't model the concurrent mutation, so it marks the inner return
unreachable. Suppress narrowly rather than restructure the lock.
@webjunkie webjunkie enabled auto-merge (squash) June 1, 2026 06:36
@webjunkie webjunkie merged commit c7fcc60 into master Jun 1, 2026
205 of 207 checks passed
@webjunkie webjunkie deleted the chore/django-startup-speed branch June 1, 2026 06:58
@deployment-status-posthog
Copy link
Copy Markdown

deployment-status-posthog Bot commented Jun 1, 2026

Deploy status

Environment Status Deployed At Workflow
dev ✅ Deployed 2026-06-01 07:20 UTC Run
prod-us ✅ Deployed 2026-06-01 07:35 UTC Run
prod-eu ✅ Deployed 2026-06-01 07:37 UTC Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants