perf(api): lazy DRF router, defer heavy imports off django.setup()#60849
perf(api): lazy DRF router, defer heavy imports off django.setup()#60849webjunkie wants to merge 47 commits into
Conversation
Introduce RouterRegistry in posthog/api/routing.py: named, addressable handles onto the shared nested DRF routers (projects/environments/ organizations) that feed the single OpenAPI schema. Products can now register their endpoints from their own folder via a register_routes(routers) function instead of editing the central posthog/api/__init__.py. Migrate the surveys product as the pilot: its route now lives in products/surveys/backend/routes.py. The shared parents are registered into the registry while remaining module globals, so the rest of the file is unchanged. https://claude.ai/code/session_01JtM7FdS7NYzUDqYQ4G8AbN
…ort cycle Make ee/hogai/tools/__init__ and ee/hogai/chat_agent/__init__ resolve their exports lazily (PEP 562 __getattr__) instead of eager re-exports, and import SlashCommandName from its leaf module. The MCP tool registry self-loads via load_all_tools() so registration is preserved without the eager package import. Breaks the tools <-> chat_agent import cycle that was only held together by the temporal.ai startup preload — prerequisite for letting hogai load lazily once the API router stops force-loading it.
Move the route aggregator (160 router.register calls + ~200 product view imports) out of posthog/api/__init__.py into posthog/api/rest_router.py. __init__ is now a thin shim that imports rest_router lazily via module __getattr__, only when a router object (router, projects_router, ...) is first accessed — which happens when the URLconf resolves, i.e. on the first web request. Django already loads the URLconf (and thus the router + every view it imports) lazily; building the router at package-import time defeated that, so every django.setup() (shell, migrate, celery, CI) paid ~6s to build routes and drag every product's views (and the AI core) onto the startup path. Importing a posthog.api submodule no longer triggers the build. Builds on the RouterRegistry seam from the product-local-routes pilot. Drops the temporal.ai preload — the lazy ee.hogai package facades make the import order safe without it. django.setup() wall-clock ~11.8s -> ~9.3s (min of 5).
The slack_app AppConfig.ready() imports signals.py to register Integration signal receivers. signals.py imported _invalidate_repo_list_cache from backend/api.py at module scope, and api.py imports posthog.temporal.ai workflows — which pull the entire ee.hogai chat-agent core. So every django.setup() (shell, migrate, celery, CI) dragged the AI agent in. Defer the import into the receiver body; it only fires when a GitHub integration changes. Drops ee.hogai modules loaded at setup from ~270 to ~30 and posthog.temporal.ai no longer loads at startup.
table.py is a Django models module (loaded at app population), so its module-level 'import chdb' pulled embedded ClickHouse into every django.setup(). chdb is only used in get_columns/get_count — defer it there. Add posthog/test/test_startup_import_budget.py as the regression guard: it boots a bare django.setup() in a clean subprocess and asserts the heavy subsystems deliberately kept off the startup path (the lazy API router, posthog.temporal.ai, the ee.hogai chat-agent core, chdb) are not imported. When it fails, defer the offending import rather than widening the budget.
… path Two 'misplaced shared leaf' relocations that grimp's nominate_cycle_breakers pointed at, both off the django.setup() path: - Move tasks exceptions out of the eager-init temporal package: products/tasks/backend/temporal/exceptions.py -> products/tasks/backend/exceptions.py. The sandbox services import these exceptions; living under temporal/ meant importing them tripped temporal/__init__'s eager workflow load, forming a services<->temporal cycle. - Extract generic temporal metric helpers (get_metric_meter, ExecutionTimeRecorder) to posthog/temporal/common/metrics.py. session-replay delete-recordings imported them from ai_observability.metrics, which dragged the whole ai_observability worker-registration package (clustering -> scipy, run_evaluation -> google-genai) onto setup. ai_observability.metrics now re-exports from common. Together these take ai_observability + scipy off the startup path. django.setup() wall-clock ~8.0s -> ~6.1s (baseline ~11.8s). Extends the startup-import budget guard to cover ai_observability and scipy.
…ations posthog/api/file_system/registrations.py runs from AppConfig.ready() to register core file-system types. Its module-level import of log_playlist_activity pulled session_recording_playlist_api -> session_recording_api -> the session_summary temporal workflow -> google-genai, all onto every process's startup path. log_playlist_activity is only called from the playlist post-delete/restore hooks, so defer it there. Takes the whole session-recording API subtree (and the Gemini SDK) off startup. django.setup() wall-clock ~6.1s -> ~5.1s (baseline ~11.8s). Budget guard extended to session_recording_api + google.genai.
demo_create_data and demo_reset_master_team are eager-imported by posthog/tasks/__init__ (celery autoimport) and imported the demo matrix at module scope, which pulls mimesis (a fake-data generator) onto every process's startup path. The matrix is only needed when the task runs; defer it into the task bodies. Takes mimesis off startup; budget guard extended.
The email task (eager-imported by posthog/tasks/__init__) imported a single constant from posthog.api.two_factor_reset, a DRF viewset module (~120ms of import). Defer it into the one function that uses it.
Move emit_signal from products/signals/backend/api.py to products/signals/backend/facade/api.py so cross-boundary callers (endpoints product, core temporal workers, scout harness) go through a real facade per product-isolation guidelines. The temporal workflow imports emit_signal needs are deferred into the function body. At module scope they formed a latent circular import (api -> temporal package -> reingestion -> api) that only resolved by import-order luck and dragged the temporal workflow stack onto the Django startup path. The facade now imports nothing temporal at module scope, so importing it from the API/URL hot path is cheap and safe. Pure rename + deferred imports; no behavior change.
…oved symbols CI fallout from the startup-import refactor: - posthog/api/__getattr__ now resolves real submodules directly (cheap, keeps setup lazy) and delegates everything else to rest_router, so re-exported names like `posthog.api.feature_flag` (which come from products.feature_flags, not a submodule) work again. The bare-setup budget guard still passes — the aggregator only builds on first access to such a name. - ai_observability ExecutionTimeRecorder tests patch the helper at its new home (posthog.temporal.common.metrics.get_metric_meter). - test_table chdb test patches chdb.query directly (chdb is function-local now). - ee/hogai/eval/data_setup imports HedgeboxMatrix from its real source, not the demo_create_data re-export that's now deferred.
… research cycle More CI fallout after merging master: - Remove services/agent-cli/ — local scratch accidentally committed via a git add -A; it's a new 'service' dir not covered by ci-security.yaml, failing lint-workflows. - notebook.py: re-sort imports (master added a query_validation import next to the relocated tasks.exceptions import; the merge left the block unsorted -> ruff I001). - report_generation/research.py: importing temporal.types ran the signals temporal package __init__ (agentic -> report -> back here), a circular import surfaced by the reorder. SignalData is annotation-only (module uses __future__ annotations) so move it to TYPE_CHECKING; import the one runtime helper locally.
|
⏭️ Skipped snapshot commit because branch advanced to The new commit will trigger its own snapshot update workflow. If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:
|
Query snapshots: Backend query snapshots updatedChanges: 2 snapshots (2 modified, 0 added, 0 deleted) What this means:
Next steps:
|
Module-level import of render_signals_to_text from the temporal types pulled the signals workflow chain at import time, re-introducing the cycle the research facade fix already broke. Move SignalData to TYPE_CHECKING and defer the render import into the call site.
With the API route aggregator moved off the startup path (lazy rest_router), 7 models stopped registering at django.setup(): they were only reachable through a viewset import the aggregator used to pull eagerly. A model registers as a side effect of importing its class, so once nothing imported them at app-population they vanished from apps.get_models() — breaking the django-stubs mypy plugin (36 'has no attribute objects'/field errors) and risking makemigrations/admin. Import the 7 models from their app's models/__init__ (where Django expects model classes to live): PersistedFolder, SessionRecordingExternalReference (posthog), ExplicitTeamMembership, FeatureFlagRoleAccess, LLMTraceSummary (ee), WebExperiment (experiments), HogFlowSchedule (workflows). Add a startup-budget guard asserting importing the router registers zero new models, so any future misplaced model fails CI at the source instead of silently dropping out of the registry. Load time is unchanged (~4.8s): these modules are light and loaded at setup on master too, via the old eager aggregator.
…odule Semantic merge conflict: this branch moved temporal/exceptions.py to backend/exceptions.py (off the eager temporal startup path), while master's sandbox-credential-refresh change added a new importer of the old path. Both sides merged textually clean but left a dangling import. Point it at the new location. The startup-budget model guard caught this before CI.
# Conflicts: # products/slack_app/backend/signals.py
…ready() A master-side change made dashboards.api.dashboard pull the error_tracking query-runner chain (-> experiments bayesian stats -> scipy). early_access's ready() imported the api module just for _set_enrollment_filters, dragging that whole chain — and scipy — onto every process's startup. The helper is only used in the pre-delete hook, so import it there. Caught by the startup-budget guard after merging master.
…tup() The lazy API router connects model-signal receivers only when the router is built, because importing a viewset module runs its @receiver decorators as a side effect. Processes that never build the router (celery, temporal, migrate, shell) silently lose them — feature-flag cache invalidation and alert cleanup among them. Add a guard (counterpart to the model-registration guard) asserting the router connects zero new receivers; everything must wire at setup, canonically from AppConfig.ready(). Marked xfail(strict) on this branch since the lazy-router work regresses it — wiring the receivers into AppConfig.ready() makes it xpass, which strict turns into a failure, forcing the marker's removal.
The lazy API router stopped importing ~200 viewset modules at django.setup(), which also stopped running their @receiver decorators as an import side effect. 29 receivers across 10 modules then connected only when the router was first built — so any process that never builds it (celery, temporal, migrate, shell) silently dropped them: feature-flag cache invalidation and alert cleanup would not fire on background writes, serving stale flags. Wire them from the owning app's AppConfig.ready() so they connect at app-population regardless of the router. batch_imports and alert pulled heavy viewset trees at module scope, so their one heavy import is deferred into the method that uses it — both a correctness fix and a small startup win. Flips the receiver guard from xfail to passing; drop the marker.
# Conflicts: # products/tasks/backend/services/modal_sandbox.py
Resolve conflicts from master's product migrations against the lazy API router:
- annotation, batch_imports, and exports moved from posthog/api into the products
tree (products/{annotations,batch_exports,exports}); deployments product removed.
Ported these import + route-registration deltas into posthog/api/rest_router.py
(the lazy aggregator) and added master's new desktop file-system routes.
- Kept posthog/api/__init__.py as the lazy shim (master kept editing the old eager
aggregator, which now lives in rest_router.py).
- Re-wired the annotation + batch_imports signal receivers: they no longer live under
posthog/api, so connect them from each product's AppConfig.ready() instead of
posthog/apps.py. batch_imports' one heavy import (the batch-export Temporal
framework via resolve_and_validate_url) is re-deferred so wiring it at ready()
stays cheap and off the startup path.
- Regenerated test_process_scheduled_changes snapshots (query sequence renumbered;
content unchanged, experiment columns intact, makemigrations clean).
Wiring the Vercel sync receivers at EnterpriseConfig.ready() imports ee.vercel.integration, which imported BillingManager at module scope -> the agentic-provisioning signature module -> the stripe SDK. So reconnecting that receiver under the lazy router dragged the whole billing stack (stripe, requests, jwt) back onto django.setup() that the lazy router had removed (~1s on every process: celery, temporal, migrate, shell, CI). Defer the BillingManager import into the two call sites that use it, and defer 'import stripe' in signature.py into the verify function (its only user). Relocate the test patch targets from ee.vercel.integration.BillingManager to its definition module ee.billing.billing_manager, since the name is no longer bound at integration import time. Same pattern as the slack_app receiver fix.
👀 Auto-assigned reviewersThese soft owners were skipped because their changes are minor, or the reviewer list was getting long. Nothing blocks merge, so self-assign if you'd like a look:
Soft owners come from |
Deferring the two_factor_reset import out of module scope removed the only import separating the third-party posthoganalytics group from the first-party posthog.* group, so `ruff check .` flagged I001 (un-sorted import block) repo-wide — lint-staged only sorts the files in a given commit, so it slipped through. Re-add the group separator.
There was a problem hiding this comment.
💡 Codex Review
Moving the exception definitions to products.tasks.backend.exceptions leaves at least one remaining import of the old module path (products/tasks/backend/temporal/process_task/activities/tests/test_refresh_sandbox_credentials.py:7), and products/tasks/backend/temporal/exceptions.py no longer exists, so that test module fails at import time with ModuleNotFoundError before it can run. Either update the remaining import or leave a small compatibility shim at the old path so existing imports continue to resolve.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR reduces django.setup() startup cost by moving heavy imports (DRF route aggregation, AI/Temporal paths, vendor SDKs, embedded ClickHouse, etc.) off the app-population path and into lazy/first-use import points, while preserving correctness by explicitly wiring signal receivers in AppConfig.ready() and adding a regression guard test.
Changes:
- Made the DRF API router build lazy by moving the large route aggregator into
posthog/api/rest_router.pyand turningposthog/api/__init__.pyinto a PEP 562__getattr__shim. - Deferred multiple heavy imports to runtime call sites (e.g.
chdb,stripe,dagster,user_agents, revenue-analytics orchestrator), and broke/unmasked cycles by relocating imports. - Added a subprocess-based startup import budget guard plus supporting docs/skill to prevent regressions; moved/wired receivers into
AppConfig.ready()to avoid silent receiver loss with a lazy router.
Reviewed changes
Copilot reviewed 77 out of 78 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| products/workflows/backend/models/init.py | Ensure model registers at app population |
| products/warehouse_sources/backend/models/table.py | Defer chdb import to call sites |
| products/tasks/backend/temporal/process_task/activities/tests/test_get_task_processing_context.py | Update exception import path |
| products/tasks/backend/temporal/process_task/activities/tests/test_get_pr_context.py | Update exception import path |
| products/tasks/backend/temporal/process_task/activities/tests/test_execute_task_in_sandbox.py | Update exception import path |
| products/tasks/backend/temporal/process_task/activities/tests/test_cleanup_sandbox.py | Update exception import path |
| products/tasks/backend/temporal/process_task/activities/start_agent_server.py | Update exception import path |
| products/tasks/backend/temporal/process_task/activities/refresh_sandbox_credentials.py | Update exception import path |
| products/tasks/backend/temporal/process_task/activities/provision_sandbox.py | Update exception import path |
| products/tasks/backend/temporal/process_task/activities/get_task_processing_context.py | Update exception import path |
| products/tasks/backend/temporal/process_task/activities/get_sandbox_for_repository.py | Update exception import path |
| products/tasks/backend/temporal/process_task/activities/get_pr_context.py | Update exception import path |
| products/tasks/backend/temporal/process_task/activities/execute_task_in_sandbox.py | Update exception import path |
| products/tasks/backend/temporal/process_task/activities/create_sandbox_from_snapshot.py | Update exception import path + reorder imports |
| products/tasks/backend/temporal/process_task/activities/cleanup_sandbox.py | Update exception import path |
| products/tasks/backend/temporal/oauth.py | Update exception import path |
| products/tasks/backend/temporal/create_snapshot/activities/tests/test_setup_repository.py | Update exception import path |
| products/tasks/backend/temporal/create_snapshot/activities/tests/test_create_snapshot.py | Update exception import path |
| products/tasks/backend/temporal/create_snapshot/activities/tests/test_clone_repository.py | Update exception import path |
| products/tasks/backend/temporal/create_snapshot/activities/setup_repository.py | Update exception import path |
| products/tasks/backend/temporal/create_snapshot/activities/get_snapshot_context.py | Update exception import path |
| products/tasks/backend/temporal/create_snapshot/activities/clone_repository.py | Update exception import path |
| products/tasks/backend/services/tests/test_modal_sandbox.py | Update exception import path |
| products/tasks/backend/services/test_docker_sandbox.py | Update exception import path |
| products/tasks/backend/services/modal_sandbox.py | Move exceptions to shared module |
| products/tasks/backend/services/docker_sandbox.py | Move exceptions to shared module |
| products/tasks/backend/exceptions.py | New shared task/temporal exception types |
| products/surveys/backend/routes.py | Add product-local API route registration |
| products/slack_app/backend/signals.py | Defer heavy API import inside receiver |
| products/signals/backend/report_generation/select_repo.py | Defer temporal.types import to avoid cycle |
| products/signals/backend/report_generation/research.py | Defer temporal.types import to avoid cycle |
| products/notebooks/backend/api/notebook.py | Update exception import path |
| products/feature_flags/backend/apps.py | Wire receivers in ready() |
| products/feature_flags/backend/api/test/snapshots/test_organization_feature_flag.ambr | Update snapshot output |
| products/experiments/backend/models/init.py | Ensure model registers at app population |
| products/early_access_features/backend/apps.py | Defer heavy import inside hook |
| products/data_warehouse/backend/models/test/test_table.py | Update chdb patch target |
| products/batch_exports/backend/apps.py | Wire receivers in ready() |
| products/batch_exports/backend/api/batch_imports.py | Defer heavy import inside validator |
| products/annotations/backend/apps.py | Wire receivers in ready() |
| products/alerts/backend/apps.py | Wire receivers in ready() |
| products/alerts/backend/api/alert.py | Defer insight serializer import |
| posthog/utils.py | Defer user_agents import |
| posthog/test/test_startup_import_budget.py | New startup import/model/receiver guard tests |
| posthog/temporal/session_replay/delete_recordings/metrics.py | Use common metrics helpers |
| posthog/temporal/data_imports/pipelines/pipeline/typings.py | Make dlt type-only + runtime placeholder |
| posthog/temporal/common/metrics.py | New shared Temporal metrics helpers |
| posthog/temporal/ai/posthog_code_slack_mention.py | Break circular import via lazy import |
| posthog/temporal/ai_observability/test_metrics.py | Patch new metrics import path |
| posthog/temporal/ai_observability/metrics.py | Re-export common metrics helpers |
| posthog/tasks/test/snapshots/test_process_scheduled_changes.ambr | Update snapshot output |
| posthog/tasks/email.py | Defer DRF constant import |
| posthog/tasks/demo_reset_master_team.py | Defer demo imports |
| posthog/tasks/demo_create_data.py | Defer demo imports |
| posthog/models/init.py | Ensure models register at app population |
| posthog/hogql/database/database.py | Defer revenue analytics orchestrator import |
| posthog/clickhouse/cluster.py | Lazy dagster logger to avoid import at setup |
| posthog/apps.py | Wire core receivers in ready() |
| posthog/api/test/test_routing.py | Add RouterRegistry tests |
| posthog/api/routing.py | Introduce RouterRegistry |
| posthog/api/rest_router.py | New heavy router aggregator module |
| posthog/api/file_system/registrations.py | Defer session replay import |
| posthog/api/init.py | Lazy shim via __getattr__ |
| ee/vercel/test/test_integration.py | Patch BillingManager import location |
| ee/vercel/integration.py | Defer BillingManager import |
| ee/models/init.py | Ensure EE models register at app population |
| ee/hogai/tools/init.py | Lazy tool exports + explicit load hook |
| ee/hogai/mcp_tool.py | Ensure tools register via load_all_tools() |
| ee/hogai/eval/data_setup.py | Fix HedgeboxMatrix import |
| ee/hogai/chat_agent/slash_commands/commands/remember/command.py | Import path adjustment for const |
| ee/hogai/chat_agent/slash_commands/commands/feedback/command.py | Import path adjustment for const |
| ee/hogai/chat_agent/memory/nodes.py | Import path adjustment for const |
| ee/hogai/chat_agent/init.py | Lazy AssistantGraph export |
| ee/apps.py | Wire Vercel receivers in ready() |
| ee/api/vercel/test/test_vercel_permission.py | Patch BillingManager import location |
| ee/api/agentic_provisioning/signature.py | Defer stripe import |
| docs/internal/django-startup-time.md | New internal doc for startup performance |
| .agents/skills/django-startup-time/SKILL.md | New agent skill mirroring the doc |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Prompt To Fix All With AIFix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
products/batch_exports/backend/apps.py:7-15
**Missing `batch_export` receiver – audit log regression in temporal workers**
`products/batch_exports/backend/apps.py` imports only `batch_imports` to wire its receivers, but `batch_export.py` also contains `@mutable_receiver(model_activity_signal, sender=BatchExport)` at line 1727 (`handle_batch_export_change`). Before this PR, `posthog/api/__init__.py` eagerly imported that module, so the receiver connected at `django.setup()` for all processes. Now it only connects when the router is first built (first web request). The Temporal batch-export service (`products/batch_exports/backend/service.py`) calls `batch_export.save()` directly when pausing/unpausing exports — those saves fire `model_activity_signal`, but `handle_batch_export_change` is never registered in the Temporal worker process, so those audit-log entries are silently dropped.
The regression guard test (`test_signal_receivers_connect_at_setup_not_via_router`) misses this because it only iterates `django.db.models.signals` (`pre_save`, `post_save`, etc.) and does not inspect `model_activity_signal`, so the gap is invisible to CI. Similarly, several other API-layer `@mutable_receiver(model_activity_signal, ...)` handlers in `feature_flag.py`, `dashboard.py`, `external_data_source.py`, etc. are now router-only — worth auditing whether any of those models are mutated in non-web processes.
### Issue 2 of 3
posthog/test/test_startup_import_budget.py:99-158
**Receiver guard misses `model_activity_signal`**
`_SIGNALS` covers only Django's built-in model signals (`pre_save`, `post_save`, `pre_delete`, `post_delete`, `m2m_changed`, `pre_init`, `post_init`). `model_activity_signal` is a custom `django.dispatch.Signal` — its receivers are not stored on any of those standard signal objects and are therefore invisible to this diff check. Handlers decorated with `@mutable_receiver(model_activity_signal, sender=<Model>)` in `batch_export.py`, `feature_flag.py`, `dashboard.py`, `external_data_source.py`, etc. can silently stop connecting in non-web processes without the test failing. Adding `model_activity_signal` from `posthog.models.signals` to the checked set would close the gap.
### Issue 3 of 3
posthog/temporal/data_imports/pipelines/pipeline/typings.py:12-18
**`TDataType = str` runtime placeholder changes visible type semantics**
At runtime `TDataType` resolves to `str` instead of the actual `dlt.common.data_types.typing.TDataType` (a union of Python native types). With `from __future__ import annotations` in effect, `get_type_hints()` on any dataclass in this file that uses `TDataType` in its field annotations will return `str` — not the intended union. If a Pydantic model, Temporal payload serialiser, or the existing `get_type_hints()` call anywhere in the warehouse import stack adds introspection of these hints, it would silently accept or coerce any string as valid rather than rejecting non-string values. The PR correctly documents this, but a narrow sentinel (e.g., `class TDataType: ...` whose `__repr__` makes it obvious it is a stub) would make accidental use more visible than using `str`.
Reviews (1): Last reviewed commit: "chore(tasks): restore import group spaci..." | Re-trigger Greptile |
# Conflicts: # posthog/api/__init__.py # posthog/api/routing.py # posthog/api/test/test_routing.py
The tasks exceptions module moved from products/tasks/backend/temporal/exceptions to products/tasks/backend/exceptions to keep it off the eager-init Temporal package, but this test still imported the old path — every Temporal test shard failed at collection with ModuleNotFoundError. Update the import.
…a router The lazy API router no longer imports viewset modules at django.setup(), so the `@mutable_receiver(model_activity_signal, ...)` audit-log handlers that lived in those modules stopped connecting in any process that never builds the router — celery, the Temporal workers, migrate, shell. Models mutated outside web requests (BatchExport pause/unpause, FeatureFlag cohort recalculation, ExternalDataSource/ Schema data-import workflows, etc.) silently dropped their activity-log entries. Reconnect all of them at app-population. Light viewset modules are imported directly from their product's AppConfig.ready() (actions, batch_exports, logs, tagged_item via core). For viewsets that pull heavy deps onto the import path — feature_flag and dashboard reach scipy via the error-tracking query runners, external_data_source/schema and evaluations reach dlt / scipy / the genai + ai_observability worker — the receiver is extracted into a light backend/activity_logging.py module so it can wire without dragging those onto startup. Verified at parity with the eager-router receiver set. The regression guard only inspected django.db.models signals, so this whole class was invisible to CI. Extend it to also diff model_activity_signal receivers across the router build, so a dropped audit-log receiver fails the build instead of shipping silently.
`__exit__` checked `if not self._start_counter`, which treats a legitimate 0.0 reading from time.perf_counter() as "never started" and raises spuriously. Use an explicit `is None` check so only the uninitialized case raises.
The runtime placeholder for the type-only TDataType was `str`, which silently reads as a plausible-but-wrong type if anything ever introspects these hints. Swap it for a small named stub whose repr makes accidental runtime use obvious.
Breaking the slack_app.backend.api <-> posthog_code_slack_mention cycle made resolve_slack_user a function-local import in forward_posthog_code_followup_activity, so patching it as a module attribute on posthog_code_slack_mention no longer takes effect — the function reimports the real one at call time. Patch it where it's defined (products.slack_app.backend.api) instead.
Query snapshots: Backend query snapshots updatedChanges: 1 snapshots (1 modified, 0 added, 0 deleted) What this means:
Next steps:
|
|
Docs from this PR will be published at posthog.com
Preview will be ready in ~10 minutes. Click Preview link above to access docs at |
|
🎭 Playwright report · View test results →
These issues are not necessarily caused by your changes. |
haacked
left a comment
There was a problem hiding this comment.
Nice refactoring! A few suggestions and nits, nothing blocking.
| """ | ||
|
|
||
|
|
||
| def test_signal_receivers_connect_at_setup_not_via_router() -> None: |
There was a problem hiding this comment.
suggestion: This guard catches a receiver that connects only when the router builds, but it can't catch the failure mode this refactor is actually about: a receiver whose module gets dropped from its AppConfig.ready() entirely. With no import it never connects in any process, so before and after both lack it, the diff stays empty, and the test stays green while the audit log silently stops (say, on external-data-source writes in a Temporal worker). A future "remove unused import" cleanup that deletes one of the # noqa: F401 side-effect imports in an apps.py slips through the same way.
This test already computes the setup-time receiver snapshot (before, taken before rest_router is imported). Worth a companion that asserts the relocated receivers are present in that snapshot, one expected name per receiver moved into a ready(), so a missing wire fails loudly.
| import ee.hogai.tools # noqa: F401 - ensure tools are registered | ||
| from ee.hogai.tools import load_all_tools # noqa: PLC0415 - ensure tools are registered | ||
|
|
||
| load_all_tools() |
There was a problem hiding this comment.
suggestion: All three public readers of self._tools need the tools imported first, but only get and get_scopes call load_all_tools(). get_names() reads the dict directly, so it returns an empty or partial list depending on call order. Rather than adding the call a third time (and remembering it on the next accessor), consider having all the reads call one helper:
def _ensure_loaded(self) -> dict[str, MCPToolRegistration]:
from ee.hogai.tools import load_all_tools # noqa: PLC0415 - ensure tools are registered
load_all_tools()
return self._toolsand have get/get_scopes/get_names read from self._ensure_loaded(). That keeps the load-then-read invariant in one place. It stays a function-local import to avoid re-introducing the tools ↔ mcp_tool cycle.
| self._start_counter: float | None = None | ||
| self._status_override: str | None = None | ||
|
|
||
| def set_status(self, status: str) -> None: |
There was a problem hiding this comment.
nit: set_status lost its docstring when it moved here from ai_observability/metrics.py. The original explained when to call it ("Override the status that will be recorded. Use for non-exception outcomes like SKIPPED."), which is the only thing distinguishing it from letting __exit__ derive the status. Worth carrying over:
| def set_status(self, status: str) -> None: | |
| def set_status(self, status: str) -> None: | |
| """Override the status that will be recorded. Use for non-exception outcomes like SKIPPED.""" | |
| self._status_override = status |
|
|
||
| # The API route aggregator (the DRF router build + ~200 viewset imports) lives in | ||
| # `posthog.api.rest_router`, imported lazily — only when a name it defines or re-exports | ||
| # (the router objects, view modules like `feature_flag`) is first accessed, which in practice |
There was a problem hiding this comment.
nit: feature_flag was migrated into products/ and the aggregator no longer binds it, so make the example a name it actually re-exports, like the router object. Same at line 20.
Problem
django.setup()now boots in under 4s, down from roughly 12s — cut by about two-thirds. That cost is paid by every process, not just web:manage.py,migrate,celery,temporal, and every CI test-shard boot.The time went to heavy subsystems dragged onto the startup path by module-level imports behind
AppConfig.ready(), signal receivers, the eagerposthog/tasks/__init__, and eager package__init__s — none of which a plainsetup()needs. Profiling (python -X importtime+tuna+ an__import__-hook tracer) traced each trigger.Changes
Defer or relocate those imports so they load on first use, not at app-load. Each was found by tracing the trigger and confirmed with an A/B re-measure.
posthog/api/__init__.pyintoposthog/api/rest_router.py;__init__is now a thin shim that builds it lazily via module__getattr__, only when the URLconf resolves (first web request — Django's intended laziness). For a web server that means the first request pays the build, which we prefer over per-request deferral. Builds on theRouterRegistryseam from feat(api): migrate product API routes to product-local registration #60700.stripe,dlt,dagster(via a lazy logger proxy),user_agents, the revenue-analytics view builders, andchdball moved off the module-import path to their call sites. Each is pinned in the guard'sFORBIDDEN_AT_SETUPlist so it can't silently creep back.tools,chat_agent) — break the tools ↔ chat_agent cycle the oldtemporal.aipreload masked; prerequisite for the AI core loading lazily.AppConfig.ready()— annotations, batch_exports, feature_flags, and early_access_features used to wire their signal receivers as a side effect of a viewset import. The lazy router no longer pulls that, so a process that never builds the router (celery, temporal, migrate, shell) would silently stop running them — e.g. the flags cache would stop invalidating on writes. Wiring moved into each product'sready(), with the heavy bits still deferred inside the handlers.posthog/urls.pyused to pull in first, which had been masking a real cycle:slack_app.backend.apiimports the workflow classes fromposthog_code_slack_mention, which imported a helper straight back at module scope. With the pre-import gone, Django's system checks (check_custom_error_handlers, duringensure_migration_defaults) hit the cycle head-on. Broke it by deferringresolve_slack_userto its two call sites.dltleftTDataTypetype-only, soget_type_hints()onSourceResponsewould raiseNameError. Nothing introspects it today, but it's a landmine for future pydantic/Temporal-payload use; added a runtime placeholder so hints resolve without importingdlt.posthog/test/test_startup_import_budget.pyboots a cleandjango.setup()in a subprocess and fails if any evicted subsystem reappears. Plus an internal doc + shallow skill capturing the traps (the silent-receiver-loss and circular-import-unmasking ones have each caused follow-up fixes).What's left is the genuine floor:
posthog.schema(generated, imported everywhere) andclickhouse.client(core query infra), both reached from the model layer at app population — confirmed with grimp, and a separate project.How did you test this code?
I'm an agent (Claude Code). Automated only, no manual testing:
ready()).django.setup()+import_module("posthog.urls")+manage.py checkclean — this is exactly the path the Dagster CI job'sensure_migration_defaultsfailed on before the circular-import fix.origin/masterworktree to confirm it was ours (the lazy router unmasked it), not master's; boot-imported the full temporaldata_imports+aiworkflow/activity module set; resolved every deferred symbol live post-setup(); checkedget_type_hints()on the affected dataclasses; and verified via_live_receiversthat every relocated receiver actually connects in a process that never builds the router.surveysroute registers its complete API surface).ruff/tyclean via lint-staged.CI still needs to cover what I couldn't run locally: the AI eval suite (the hogai facades and slack defer touch the agent core), drf-spectacular OpenAPI schema-gen (the lazy router), and import-linter/tach.
Docs update
No user-facing change. Internal-only doc + skill added under
docs/internal/and.agents/skills/.🤖 Agent context
Stacked on #60700 — its
RouterRegistrycommit appears in this diff and will dedupe when #60700 merges. Tooling and method: profiled with-X importtime+tuna+ an__import__-hook tracer, and used grimp's static graph (nominate_cycle_breakers) to find cuttable edges — repeatedly a "misplaced shared leaf" (exceptions/metrics living under an eager-init package). A self-assessing triage tool was built alongside to rank deferral candidates by removable cost and check deferability; it reproduced the manual findings.The instructive part for reviewers: removing an eager import chain is not free — it can expose latent circular imports that only ever worked by import order, and it can silently drop signal receivers that rode in on a viewset import. Both happened here and both are now covered (the cycle broken at its call sites; receivers re-wired in
ready()and verified live). Wins landed where the root was a single incidental importer; the remaining floor (posthog.schema,clickhouse.client) has many legitimate roots through the model layer. Agent-assisted, human-reviewed — not self-merged.