Skip to content

perf(api): lazy DRF router, defer heavy imports off django.setup()#60849

Open
webjunkie wants to merge 47 commits into
masterfrom
pr-60700-base
Open

perf(api): lazy DRF router, defer heavy imports off django.setup()#60849
webjunkie wants to merge 47 commits into
masterfrom
pr-60700-base

Conversation

@webjunkie
Copy link
Copy Markdown
Contributor

@webjunkie webjunkie commented Jun 1, 2026

Problem

django.setup() now boots in under 4s, down from roughly 12s — cut by about two-thirds. That cost is paid by every process, not just web: manage.py, migrate, celery, temporal, and every CI test-shard boot.

The time went to heavy subsystems dragged onto the startup path by module-level imports behind AppConfig.ready(), signal receivers, the eager posthog/tasks/__init__, and eager package __init__s — none of which a plain setup() needs. Profiling (python -X importtime + tuna + an __import__-hook tracer) traced each trigger.

Changes

Defer or relocate those imports so they load on first use, not at app-load. Each was found by tracing the trigger and confirmed with an A/B re-measure.

  • Lazy DRF router — the large route aggregator and the product-view imports moved out of posthog/api/__init__.py into posthog/api/rest_router.py; __init__ is now a thin shim that builds it lazily via module __getattr__, only when the URLconf resolves (first web request — Django's intended laziness). For a web server that means the first request pays the build, which we prefer over per-request deferral. Builds on the RouterRegistry seam from feat(api): migrate product API routes to product-local registration #60700.
  • Deferred heavy librariesstripe, dlt, dagster (via a lazy logger proxy), user_agents, the revenue-analytics view builders, and chdb all moved off the module-import path to their call sites. Each is pinned in the guard's FORBIDDEN_AT_SETUP list so it can't silently creep back.
  • hogai lazy package facades (tools, chat_agent) — break the tools ↔ chat_agent cycle the old temporal.ai preload masked; prerequisite for the AI core loading lazily.
  • Receivers relocated to AppConfig.ready() — annotations, batch_exports, feature_flags, and early_access_features used to wire their signal receivers as a side effect of a viewset import. The lazy router no longer pulls that, so a process that never builds the router (celery, temporal, migrate, shell) would silently stop running them — e.g. the flags cache would stop invalidating on writes. Wiring moved into each product's ready(), with the heavy bits still deferred inside the handlers.
  • Circular-import fix — making the router lazy removed an eager import chain that posthog/urls.py used to pull in first, which had been masking a real cycle: slack_app.backend.api imports the workflow classes from posthog_code_slack_mention, which imported a helper straight back at module scope. With the pre-import gone, Django's system checks (check_custom_error_handlers, during ensure_migration_defaults) hit the cycle head-on. Broke it by deferring resolve_slack_user to its two call sites.
  • dlt-typed annotations stay runtime-resolvable — deferring dlt left TDataType type-only, so get_type_hints() on SourceResponse would raise NameError. Nothing introspects it today, but it's a landmine for future pydantic/Temporal-payload use; added a runtime placeholder so hints resolve without importing dlt.
  • Regression guard + docsposthog/test/test_startup_import_budget.py boots a clean django.setup() in a subprocess and fails if any evicted subsystem reappears. Plus an internal doc + shallow skill capturing the traps (the silent-receiver-loss and circular-import-unmasking ones have each caused follow-up fixes).

What's left is the genuine floor: posthog.schema (generated, imported everywhere) and clickhouse.client (core query infra), both reached from the model layer at app population — confirmed with grimp, and a separate project.

How did you test this code?

I'm an agent (Claude Code). Automated only, no manual testing:

  • Import-budget guard — 3 passed (heavy subsystems off the path, all models register at app-population, receivers wire at ready()).
  • django.setup() + import_module("posthog.urls") + manage.py check clean — this is exactly the path the Dagster CI job's ensure_migration_defaults failed on before the circular-import fix.
  • Footgun audit for the "boots fine, breaks at runtime later" class: bisected the circular import against a detached origin/master worktree to confirm it was ours (the lazy router unmasked it), not master's; boot-imported the full temporal data_imports + ai workflow/activity module set; resolved every deferred symbol live post-setup(); checked get_type_hints() on the affected dataclasses; and verified via _live_receivers that every relocated receiver actually connects in a process that never builds the router.
  • Lazy router materializes fully (5386 resolved routes; the relocated surveys route registers its complete API surface).
  • ruff / ty clean via lint-staged.

CI still needs to cover what I couldn't run locally: the AI eval suite (the hogai facades and slack defer touch the agent core), drf-spectacular OpenAPI schema-gen (the lazy router), and import-linter/tach.

Docs update

No user-facing change. Internal-only doc + skill added under docs/internal/ and .agents/skills/.

🤖 Agent context

Stacked on #60700 — its RouterRegistry commit appears in this diff and will dedupe when #60700 merges. Tooling and method: profiled with -X importtime + tuna + an __import__-hook tracer, and used grimp's static graph (nominate_cycle_breakers) to find cuttable edges — repeatedly a "misplaced shared leaf" (exceptions/metrics living under an eager-init package). A self-assessing triage tool was built alongside to rank deferral candidates by removable cost and check deferability; it reproduced the manual findings.

The instructive part for reviewers: removing an eager import chain is not free — it can expose latent circular imports that only ever worked by import order, and it can silently drop signal receivers that rode in on a viewset import. Both happened here and both are now covered (the cycle broken at its call sites; receivers re-wired in ready() and verified live). Wins landed where the root was a single incidental importer; the remaining floor (posthog.schema, clickhouse.client) has many legitimate roots through the model layer. Agent-assisted, human-reviewed — not self-merged.

claude and others added 14 commits May 29, 2026 15:57
Introduce RouterRegistry in posthog/api/routing.py: named, addressable
handles onto the shared nested DRF routers (projects/environments/
organizations) that feed the single OpenAPI schema. Products can now
register their endpoints from their own folder via a register_routes(routers)
function instead of editing the central posthog/api/__init__.py.

Migrate the surveys product as the pilot: its route now lives in
products/surveys/backend/routes.py. The shared parents are registered into
the registry while remaining module globals, so the rest of the file is
unchanged.

https://claude.ai/code/session_01JtM7FdS7NYzUDqYQ4G8AbN
…ort cycle

Make ee/hogai/tools/__init__ and ee/hogai/chat_agent/__init__ resolve their
exports lazily (PEP 562 __getattr__) instead of eager re-exports, and import
SlashCommandName from its leaf module. The MCP tool registry self-loads via
load_all_tools() so registration is preserved without the eager package import.

Breaks the tools <-> chat_agent import cycle that was only held together by the
temporal.ai startup preload — prerequisite for letting hogai load lazily once
the API router stops force-loading it.
Move the route aggregator (160 router.register calls + ~200 product view
imports) out of posthog/api/__init__.py into posthog/api/rest_router.py.
__init__ is now a thin shim that imports rest_router lazily via module
__getattr__, only when a router object (router, projects_router, ...) is first
accessed — which happens when the URLconf resolves, i.e. on the first web
request.

Django already loads the URLconf (and thus the router + every view it imports)
lazily; building the router at package-import time defeated that, so every
django.setup() (shell, migrate, celery, CI) paid ~6s to build routes and drag
every product's views (and the AI core) onto the startup path. Importing a
posthog.api submodule no longer triggers the build.

Builds on the RouterRegistry seam from the product-local-routes pilot. Drops
the temporal.ai preload — the lazy ee.hogai package facades make the import
order safe without it.

django.setup() wall-clock ~11.8s -> ~9.3s (min of 5).
The slack_app AppConfig.ready() imports signals.py to register Integration
signal receivers. signals.py imported _invalidate_repo_list_cache from
backend/api.py at module scope, and api.py imports posthog.temporal.ai
workflows — which pull the entire ee.hogai chat-agent core. So every
django.setup() (shell, migrate, celery, CI) dragged the AI agent in.

Defer the import into the receiver body; it only fires when a GitHub
integration changes. Drops ee.hogai modules loaded at setup from ~270 to ~30
and posthog.temporal.ai no longer loads at startup.
table.py is a Django models module (loaded at app population), so its
module-level 'import chdb' pulled embedded ClickHouse into every
django.setup(). chdb is only used in get_columns/get_count — defer it there.

Add posthog/test/test_startup_import_budget.py as the regression guard: it
boots a bare django.setup() in a clean subprocess and asserts the heavy
subsystems deliberately kept off the startup path (the lazy API router,
posthog.temporal.ai, the ee.hogai chat-agent core, chdb) are not imported.
When it fails, defer the offending import rather than widening the budget.
… path

Two 'misplaced shared leaf' relocations that grimp's nominate_cycle_breakers
pointed at, both off the django.setup() path:

- Move tasks exceptions out of the eager-init temporal package:
  products/tasks/backend/temporal/exceptions.py -> products/tasks/backend/exceptions.py.
  The sandbox services import these exceptions; living under temporal/ meant
  importing them tripped temporal/__init__'s eager workflow load, forming a
  services<->temporal cycle.

- Extract generic temporal metric helpers (get_metric_meter,
  ExecutionTimeRecorder) to posthog/temporal/common/metrics.py. session-replay
  delete-recordings imported them from ai_observability.metrics, which dragged
  the whole ai_observability worker-registration package (clustering -> scipy,
  run_evaluation -> google-genai) onto setup. ai_observability.metrics now
  re-exports from common.

Together these take ai_observability + scipy off the startup path.
django.setup() wall-clock ~8.0s -> ~6.1s (baseline ~11.8s). Extends the
startup-import budget guard to cover ai_observability and scipy.
…ations

posthog/api/file_system/registrations.py runs from AppConfig.ready() to register
core file-system types. Its module-level import of log_playlist_activity pulled
session_recording_playlist_api -> session_recording_api -> the session_summary
temporal workflow -> google-genai, all onto every process's startup path.

log_playlist_activity is only called from the playlist post-delete/restore hooks,
so defer it there. Takes the whole session-recording API subtree (and the Gemini
SDK) off startup. django.setup() wall-clock ~6.1s -> ~5.1s (baseline ~11.8s).
Budget guard extended to session_recording_api + google.genai.
demo_create_data and demo_reset_master_team are eager-imported by
posthog/tasks/__init__ (celery autoimport) and imported the demo matrix at
module scope, which pulls mimesis (a fake-data generator) onto every process's
startup path. The matrix is only needed when the task runs; defer it into the
task bodies. Takes mimesis off startup; budget guard extended.
The email task (eager-imported by posthog/tasks/__init__) imported a single
constant from posthog.api.two_factor_reset, a DRF viewset module (~120ms of
import). Defer it into the one function that uses it.
Move emit_signal from products/signals/backend/api.py to
products/signals/backend/facade/api.py so cross-boundary callers
(endpoints product, core temporal workers, scout harness) go through a
real facade per product-isolation guidelines.

The temporal workflow imports emit_signal needs are deferred into the
function body. At module scope they formed a latent circular import
(api -> temporal package -> reingestion -> api) that only resolved by
import-order luck and dragged the temporal workflow stack onto the
Django startup path. The facade now imports nothing temporal at module
scope, so importing it from the API/URL hot path is cheap and safe.

Pure rename + deferred imports; no behavior change.
…oved symbols

CI fallout from the startup-import refactor:

- posthog/api/__getattr__ now resolves real submodules directly (cheap, keeps
  setup lazy) and delegates everything else to rest_router, so re-exported names
  like `posthog.api.feature_flag` (which come from products.feature_flags, not a
  submodule) work again. The bare-setup budget guard still passes — the aggregator
  only builds on first access to such a name.
- ai_observability ExecutionTimeRecorder tests patch the helper at its new home
  (posthog.temporal.common.metrics.get_metric_meter).
- test_table chdb test patches chdb.query directly (chdb is function-local now).
- ee/hogai/eval/data_setup imports HedgeboxMatrix from its real source, not the
  demo_create_data re-export that's now deferred.
… research cycle

More CI fallout after merging master:
- Remove services/agent-cli/ — local scratch accidentally committed via a git add -A;
  it's a new 'service' dir not covered by ci-security.yaml, failing lint-workflows.
- notebook.py: re-sort imports (master added a query_validation import next to the
  relocated tasks.exceptions import; the merge left the block unsorted -> ruff I001).
- report_generation/research.py: importing temporal.types ran the signals temporal
  package __init__ (agentic -> report -> back here), a circular import surfaced by the
  reorder. SignalData is annotation-only (module uses __future__ annotations) so move it
  to TYPE_CHECKING; import the one runtime helper locally.
@tests-posthog
Copy link
Copy Markdown
Contributor

tests-posthog Bot commented Jun 1, 2026

⏭️ Skipped snapshot commit because branch advanced to d1be910 while workflow was testing 942ac5f.

The new commit will trigger its own snapshot update workflow.

If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:

  • Merge master into your branch, or
  • Push an empty commit: git commit --allow-empty -m 'trigger CI' && git push

@tests-posthog
Copy link
Copy Markdown
Contributor

tests-posthog Bot commented Jun 1, 2026

Query snapshots: Backend query snapshots updated

Changes: 2 snapshots (2 modified, 0 added, 0 deleted)

What this means:

  • Query snapshots have been automatically updated to match current output
  • These changes reflect modifications to database queries or schema

Next steps:

  • Review the query changes to ensure they're intentional
  • If unexpected, investigate what caused the query to change

Review snapshot changes →

webjunkie added 7 commits June 1, 2026 14:45
Module-level import of render_signals_to_text from the temporal types pulled
the signals workflow chain at import time, re-introducing the cycle the research
facade fix already broke. Move SignalData to TYPE_CHECKING and defer the render
import into the call site.
With the API route aggregator moved off the startup path (lazy rest_router), 7
models stopped registering at django.setup(): they were only reachable through a
viewset import the aggregator used to pull eagerly. A model registers as a side
effect of importing its class, so once nothing imported them at app-population
they vanished from apps.get_models() — breaking the django-stubs mypy plugin
(36 'has no attribute objects'/field errors) and risking makemigrations/admin.

Import the 7 models from their app's models/__init__ (where Django expects model
classes to live): PersistedFolder, SessionRecordingExternalReference (posthog),
ExplicitTeamMembership, FeatureFlagRoleAccess, LLMTraceSummary (ee), WebExperiment
(experiments), HogFlowSchedule (workflows).

Add a startup-budget guard asserting importing the router registers zero new
models, so any future misplaced model fails CI at the source instead of silently
dropping out of the registry. Load time is unchanged (~4.8s): these modules are
light and loaded at setup on master too, via the old eager aggregator.
…odule

Semantic merge conflict: this branch moved temporal/exceptions.py to
backend/exceptions.py (off the eager temporal startup path), while master's
sandbox-credential-refresh change added a new importer of the old path. Both
sides merged textually clean but left a dangling import. Point it at the new
location. The startup-budget model guard caught this before CI.
# Conflicts:
#	products/slack_app/backend/signals.py
…ready()

A master-side change made dashboards.api.dashboard pull the error_tracking
query-runner chain (-> experiments bayesian stats -> scipy). early_access's
ready() imported the api module just for _set_enrollment_filters, dragging that
whole chain — and scipy — onto every process's startup. The helper is only used
in the pre-delete hook, so import it there. Caught by the startup-budget guard
after merging master.
@webjunkie webjunkie self-assigned this Jun 2, 2026
webjunkie added 5 commits June 2, 2026 14:59
…tup()

The lazy API router connects model-signal receivers only when the router is built,
because importing a viewset module runs its @receiver decorators as a side effect.
Processes that never build the router (celery, temporal, migrate, shell) silently
lose them — feature-flag cache invalidation and alert cleanup among them.

Add a guard (counterpart to the model-registration guard) asserting the router
connects zero new receivers; everything must wire at setup, canonically from
AppConfig.ready(). Marked xfail(strict) on this branch since the lazy-router work
regresses it — wiring the receivers into AppConfig.ready() makes it xpass, which
strict turns into a failure, forcing the marker's removal.
The lazy API router stopped importing ~200 viewset modules at django.setup(),
which also stopped running their @receiver decorators as an import side effect.
29 receivers across 10 modules then connected only when the router was first
built — so any process that never builds it (celery, temporal, migrate, shell)
silently dropped them: feature-flag cache invalidation and alert cleanup would
not fire on background writes, serving stale flags.

Wire them from the owning app's AppConfig.ready() so they connect at
app-population regardless of the router. batch_imports and alert pulled heavy
viewset trees at module scope, so their one heavy import is deferred into the
method that uses it — both a correctness fix and a small startup win.

Flips the receiver guard from xfail to passing; drop the marker.
# Conflicts:
#	products/tasks/backend/services/modal_sandbox.py
Resolve conflicts from master's product migrations against the lazy API router:

- annotation, batch_imports, and exports moved from posthog/api into the products
  tree (products/{annotations,batch_exports,exports}); deployments product removed.
  Ported these import + route-registration deltas into posthog/api/rest_router.py
  (the lazy aggregator) and added master's new desktop file-system routes.
- Kept posthog/api/__init__.py as the lazy shim (master kept editing the old eager
  aggregator, which now lives in rest_router.py).
- Re-wired the annotation + batch_imports signal receivers: they no longer live under
  posthog/api, so connect them from each product's AppConfig.ready() instead of
  posthog/apps.py. batch_imports' one heavy import (the batch-export Temporal
  framework via resolve_and_validate_url) is re-deferred so wiring it at ready()
  stays cheap and off the startup path.
- Regenerated test_process_scheduled_changes snapshots (query sequence renumbered;
  content unchanged, experiment columns intact, makemigrations clean).
Wiring the Vercel sync receivers at EnterpriseConfig.ready() imports ee.vercel.integration,
which imported BillingManager at module scope -> the agentic-provisioning signature module ->
the stripe SDK. So reconnecting that receiver under the lazy router dragged the whole billing
stack (stripe, requests, jwt) back onto django.setup() that the lazy router had removed (~1s on
every process: celery, temporal, migrate, shell, CI).

Defer the BillingManager import into the two call sites that use it, and defer 'import stripe' in
signature.py into the verify function (its only user). Relocate the test patch targets from
ee.vercel.integration.BillingManager to its definition module ee.billing.billing_manager, since
the name is no longer bound at integration import time.

Same pattern as the slack_app receiver fix.
@assign-reviewers-posthog assign-reviewers-posthog Bot requested review from a team June 3, 2026 14:06
@assign-reviewers-posthog
Copy link
Copy Markdown

👀 Auto-assigned reviewers

These soft owners were skipped because their changes are minor, or the reviewer list was getting long. Nothing blocks merge, so self-assign if you'd like a look:

  • @PostHog/team-analytics-platform (products/alerts/**)
  • @PostHog/batch-exports (products/batch_exports/**)
  • @PostHog/team-managed-warehouse (products/data_warehouse/**, products/warehouse_sources/**)
  • @PostHog/team-surveys (products/surveys/**)
  • @PostHog/team-product-analytics (products/annotations/**)
  • @PostHog/team-experiments (products/experiments/**)
  • @PostHog/team-workflows (products/workflows/**)
  • @PostHog/team-data-stack (products/data_warehouse/**)
  • @PostHog/team-platform-features (products/notebooks/**)

Soft owners come from CODEOWNERS-soft and each product's product.yaml. Generated files and lockfiles are ignored when deciding ownership.

@webjunkie webjunkie removed request for a team June 3, 2026 14:07
@posthog-project-board-bot posthog-project-board-bot Bot moved this to In Review in Feature Flags Jun 3, 2026
@webjunkie webjunkie requested review from a team and pauldambra June 3, 2026 14:07
Deferring the two_factor_reset import out of module scope removed the only
import separating the third-party posthoganalytics group from the first-party
posthog.* group, so `ruff check .` flagged I001 (un-sorted import block)
repo-wide — lint-staged only sorts the files in a given commit, so it slipped
through. Re-add the group separator.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

from typing import Any, Optional

P2 Badge Keep the old task exception import path working

Moving the exception definitions to products.tasks.backend.exceptions leaves at least one remaining import of the old module path (products/tasks/backend/temporal/process_task/activities/tests/test_refresh_sandbox_credentials.py:7), and products/tasks/backend/temporal/exceptions.py no longer exists, so that test module fails at import time with ModuleNotFoundError before it can run. Either update the remaining import or leave a small compatibility shim at the old path so existing imports continue to resolve.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread posthog/api/__init__.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces django.setup() startup cost by moving heavy imports (DRF route aggregation, AI/Temporal paths, vendor SDKs, embedded ClickHouse, etc.) off the app-population path and into lazy/first-use import points, while preserving correctness by explicitly wiring signal receivers in AppConfig.ready() and adding a regression guard test.

Changes:

  • Made the DRF API router build lazy by moving the large route aggregator into posthog/api/rest_router.py and turning posthog/api/__init__.py into a PEP 562 __getattr__ shim.
  • Deferred multiple heavy imports to runtime call sites (e.g. chdb, stripe, dagster, user_agents, revenue-analytics orchestrator), and broke/unmasked cycles by relocating imports.
  • Added a subprocess-based startup import budget guard plus supporting docs/skill to prevent regressions; moved/wired receivers into AppConfig.ready() to avoid silent receiver loss with a lazy router.

Reviewed changes

Copilot reviewed 77 out of 78 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
products/workflows/backend/models/init.py Ensure model registers at app population
products/warehouse_sources/backend/models/table.py Defer chdb import to call sites
products/tasks/backend/temporal/process_task/activities/tests/test_get_task_processing_context.py Update exception import path
products/tasks/backend/temporal/process_task/activities/tests/test_get_pr_context.py Update exception import path
products/tasks/backend/temporal/process_task/activities/tests/test_execute_task_in_sandbox.py Update exception import path
products/tasks/backend/temporal/process_task/activities/tests/test_cleanup_sandbox.py Update exception import path
products/tasks/backend/temporal/process_task/activities/start_agent_server.py Update exception import path
products/tasks/backend/temporal/process_task/activities/refresh_sandbox_credentials.py Update exception import path
products/tasks/backend/temporal/process_task/activities/provision_sandbox.py Update exception import path
products/tasks/backend/temporal/process_task/activities/get_task_processing_context.py Update exception import path
products/tasks/backend/temporal/process_task/activities/get_sandbox_for_repository.py Update exception import path
products/tasks/backend/temporal/process_task/activities/get_pr_context.py Update exception import path
products/tasks/backend/temporal/process_task/activities/execute_task_in_sandbox.py Update exception import path
products/tasks/backend/temporal/process_task/activities/create_sandbox_from_snapshot.py Update exception import path + reorder imports
products/tasks/backend/temporal/process_task/activities/cleanup_sandbox.py Update exception import path
products/tasks/backend/temporal/oauth.py Update exception import path
products/tasks/backend/temporal/create_snapshot/activities/tests/test_setup_repository.py Update exception import path
products/tasks/backend/temporal/create_snapshot/activities/tests/test_create_snapshot.py Update exception import path
products/tasks/backend/temporal/create_snapshot/activities/tests/test_clone_repository.py Update exception import path
products/tasks/backend/temporal/create_snapshot/activities/setup_repository.py Update exception import path
products/tasks/backend/temporal/create_snapshot/activities/get_snapshot_context.py Update exception import path
products/tasks/backend/temporal/create_snapshot/activities/clone_repository.py Update exception import path
products/tasks/backend/services/tests/test_modal_sandbox.py Update exception import path
products/tasks/backend/services/test_docker_sandbox.py Update exception import path
products/tasks/backend/services/modal_sandbox.py Move exceptions to shared module
products/tasks/backend/services/docker_sandbox.py Move exceptions to shared module
products/tasks/backend/exceptions.py New shared task/temporal exception types
products/surveys/backend/routes.py Add product-local API route registration
products/slack_app/backend/signals.py Defer heavy API import inside receiver
products/signals/backend/report_generation/select_repo.py Defer temporal.types import to avoid cycle
products/signals/backend/report_generation/research.py Defer temporal.types import to avoid cycle
products/notebooks/backend/api/notebook.py Update exception import path
products/feature_flags/backend/apps.py Wire receivers in ready()
products/feature_flags/backend/api/test/snapshots/test_organization_feature_flag.ambr Update snapshot output
products/experiments/backend/models/init.py Ensure model registers at app population
products/early_access_features/backend/apps.py Defer heavy import inside hook
products/data_warehouse/backend/models/test/test_table.py Update chdb patch target
products/batch_exports/backend/apps.py Wire receivers in ready()
products/batch_exports/backend/api/batch_imports.py Defer heavy import inside validator
products/annotations/backend/apps.py Wire receivers in ready()
products/alerts/backend/apps.py Wire receivers in ready()
products/alerts/backend/api/alert.py Defer insight serializer import
posthog/utils.py Defer user_agents import
posthog/test/test_startup_import_budget.py New startup import/model/receiver guard tests
posthog/temporal/session_replay/delete_recordings/metrics.py Use common metrics helpers
posthog/temporal/data_imports/pipelines/pipeline/typings.py Make dlt type-only + runtime placeholder
posthog/temporal/common/metrics.py New shared Temporal metrics helpers
posthog/temporal/ai/posthog_code_slack_mention.py Break circular import via lazy import
posthog/temporal/ai_observability/test_metrics.py Patch new metrics import path
posthog/temporal/ai_observability/metrics.py Re-export common metrics helpers
posthog/tasks/test/snapshots/test_process_scheduled_changes.ambr Update snapshot output
posthog/tasks/email.py Defer DRF constant import
posthog/tasks/demo_reset_master_team.py Defer demo imports
posthog/tasks/demo_create_data.py Defer demo imports
posthog/models/init.py Ensure models register at app population
posthog/hogql/database/database.py Defer revenue analytics orchestrator import
posthog/clickhouse/cluster.py Lazy dagster logger to avoid import at setup
posthog/apps.py Wire core receivers in ready()
posthog/api/test/test_routing.py Add RouterRegistry tests
posthog/api/routing.py Introduce RouterRegistry
posthog/api/rest_router.py New heavy router aggregator module
posthog/api/file_system/registrations.py Defer session replay import
posthog/api/init.py Lazy shim via __getattr__
ee/vercel/test/test_integration.py Patch BillingManager import location
ee/vercel/integration.py Defer BillingManager import
ee/models/init.py Ensure EE models register at app population
ee/hogai/tools/init.py Lazy tool exports + explicit load hook
ee/hogai/mcp_tool.py Ensure tools register via load_all_tools()
ee/hogai/eval/data_setup.py Fix HedgeboxMatrix import
ee/hogai/chat_agent/slash_commands/commands/remember/command.py Import path adjustment for const
ee/hogai/chat_agent/slash_commands/commands/feedback/command.py Import path adjustment for const
ee/hogai/chat_agent/memory/nodes.py Import path adjustment for const
ee/hogai/chat_agent/init.py Lazy AssistantGraph export
ee/apps.py Wire Vercel receivers in ready()
ee/api/vercel/test/test_vercel_permission.py Patch BillingManager import location
ee/api/agentic_provisioning/signature.py Defer stripe import
docs/internal/django-startup-time.md New internal doc for startup performance
.agents/skills/django-startup-time/SKILL.md New agent skill mirroring the doc

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread posthog/temporal/common/metrics.py
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 3, 2026

Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
products/batch_exports/backend/apps.py:7-15
**Missing `batch_export` receiver – audit log regression in temporal workers**

`products/batch_exports/backend/apps.py` imports only `batch_imports` to wire its receivers, but `batch_export.py` also contains `@mutable_receiver(model_activity_signal, sender=BatchExport)` at line 1727 (`handle_batch_export_change`). Before this PR, `posthog/api/__init__.py` eagerly imported that module, so the receiver connected at `django.setup()` for all processes. Now it only connects when the router is first built (first web request). The Temporal batch-export service (`products/batch_exports/backend/service.py`) calls `batch_export.save()` directly when pausing/unpausing exports — those saves fire `model_activity_signal`, but `handle_batch_export_change` is never registered in the Temporal worker process, so those audit-log entries are silently dropped.

The regression guard test (`test_signal_receivers_connect_at_setup_not_via_router`) misses this because it only iterates `django.db.models.signals` (`pre_save`, `post_save`, etc.) and does not inspect `model_activity_signal`, so the gap is invisible to CI. Similarly, several other API-layer `@mutable_receiver(model_activity_signal, ...)` handlers in `feature_flag.py`, `dashboard.py`, `external_data_source.py`, etc. are now router-only — worth auditing whether any of those models are mutated in non-web processes.

### Issue 2 of 3
posthog/test/test_startup_import_budget.py:99-158
**Receiver guard misses `model_activity_signal`**

`_SIGNALS` covers only Django's built-in model signals (`pre_save`, `post_save`, `pre_delete`, `post_delete`, `m2m_changed`, `pre_init`, `post_init`). `model_activity_signal` is a custom `django.dispatch.Signal` — its receivers are not stored on any of those standard signal objects and are therefore invisible to this diff check. Handlers decorated with `@mutable_receiver(model_activity_signal, sender=<Model>)` in `batch_export.py`, `feature_flag.py`, `dashboard.py`, `external_data_source.py`, etc. can silently stop connecting in non-web processes without the test failing. Adding `model_activity_signal` from `posthog.models.signals` to the checked set would close the gap.

### Issue 3 of 3
posthog/temporal/data_imports/pipelines/pipeline/typings.py:12-18
**`TDataType = str` runtime placeholder changes visible type semantics**

At runtime `TDataType` resolves to `str` instead of the actual `dlt.common.data_types.typing.TDataType` (a union of Python native types). With `from __future__ import annotations` in effect, `get_type_hints()` on any dataclass in this file that uses `TDataType` in its field annotations will return `str` — not the intended union. If a Pydantic model, Temporal payload serialiser, or the existing `get_type_hints()` call anywhere in the warehouse import stack adds introspection of these hints, it would silently accept or coerce any string as valid rather than rejecting non-string values. The PR correctly documents this, but a narrow sentinel (e.g., `class TDataType: ...` whose `__repr__` makes it obvious it is a stub) would make accidental use more visible than using `str`.

Reviews (1): Last reviewed commit: "chore(tasks): restore import group spaci..." | Re-trigger Greptile

Comment thread products/batch_exports/backend/apps.py Outdated
Comment thread posthog/test/test_startup_import_budget.py
Comment thread posthog/temporal/data_imports/pipelines/pipeline/typings.py Outdated
webjunkie and others added 7 commits June 3, 2026 17:29
# Conflicts:
#	posthog/api/__init__.py
#	posthog/api/routing.py
#	posthog/api/test/test_routing.py
The tasks exceptions module moved from products/tasks/backend/temporal/exceptions
to products/tasks/backend/exceptions to keep it off the eager-init Temporal
package, but this test still imported the old path — every Temporal test shard
failed at collection with ModuleNotFoundError. Update the import.
…a router

The lazy API router no longer imports viewset modules at django.setup(), so the
`@mutable_receiver(model_activity_signal, ...)` audit-log handlers that lived in
those modules stopped connecting in any process that never builds the router —
celery, the Temporal workers, migrate, shell. Models mutated outside web requests
(BatchExport pause/unpause, FeatureFlag cohort recalculation, ExternalDataSource/
Schema data-import workflows, etc.) silently dropped their activity-log entries.

Reconnect all of them at app-population. Light viewset modules are imported
directly from their product's AppConfig.ready() (actions, batch_exports, logs,
tagged_item via core). For viewsets that pull heavy deps onto the import path —
feature_flag and dashboard reach scipy via the error-tracking query runners,
external_data_source/schema and evaluations reach dlt / scipy / the genai +
ai_observability worker — the receiver is extracted into a light
backend/activity_logging.py module so it can wire without dragging those onto
startup. Verified at parity with the eager-router receiver set.

The regression guard only inspected django.db.models signals, so this whole class
was invisible to CI. Extend it to also diff model_activity_signal receivers across
the router build, so a dropped audit-log receiver fails the build instead of
shipping silently.
`__exit__` checked `if not self._start_counter`, which treats a legitimate 0.0
reading from time.perf_counter() as "never started" and raises spuriously. Use an
explicit `is None` check so only the uninitialized case raises.
The runtime placeholder for the type-only TDataType was `str`, which silently
reads as a plausible-but-wrong type if anything ever introspects these hints.
Swap it for a small named stub whose repr makes accidental runtime use obvious.
Breaking the slack_app.backend.api <-> posthog_code_slack_mention cycle made
resolve_slack_user a function-local import in forward_posthog_code_followup_activity,
so patching it as a module attribute on posthog_code_slack_mention no longer takes
effect — the function reimports the real one at call time. Patch it where it's
defined (products.slack_app.backend.api) instead.
@tests-posthog
Copy link
Copy Markdown
Contributor

tests-posthog Bot commented Jun 3, 2026

Query snapshots: Backend query snapshots updated

Changes: 1 snapshots (1 modified, 0 added, 0 deleted)

What this means:

  • Query snapshots have been automatically updated to match current output
  • These changes reflect modifications to database queries or schema

Next steps:

  • Review the query changes to ensure they're intentional
  • If unexpected, investigate what caused the query to change

Review snapshot changes →

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

Docs from this PR will be published at posthog.com

Project Deployment Preview Updated (UTC)
posthog.com 🤷 Unknown Preview Jun 3, 2026, 4:24 PM

Preview will be ready in ~10 minutes. Click Preview link above to access docs at /handbook/engineering/

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

🎭 Playwright report · View test results →

⚠️ 1 flaky test:

  • Save an insight, make changes, discard them, and save a copy (chromium)

These issues are not necessarily caused by your changes.
Annoyed by this comment? Help fix flakies and failures and it'll disappear!

Copy link
Copy Markdown
Contributor

@haacked haacked left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring! A few suggestions and nits, nothing blocking.

"""


def test_signal_receivers_connect_at_setup_not_via_router() -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: This guard catches a receiver that connects only when the router builds, but it can't catch the failure mode this refactor is actually about: a receiver whose module gets dropped from its AppConfig.ready() entirely. With no import it never connects in any process, so before and after both lack it, the diff stays empty, and the test stays green while the audit log silently stops (say, on external-data-source writes in a Temporal worker). A future "remove unused import" cleanup that deletes one of the # noqa: F401 side-effect imports in an apps.py slips through the same way.

This test already computes the setup-time receiver snapshot (before, taken before rest_router is imported). Worth a companion that asserts the relocated receivers are present in that snapshot, one expected name per receiver moved into a ready(), so a missing wire fails loudly.

Comment thread ee/hogai/mcp_tool.py
import ee.hogai.tools # noqa: F401 - ensure tools are registered
from ee.hogai.tools import load_all_tools # noqa: PLC0415 - ensure tools are registered

load_all_tools()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: All three public readers of self._tools need the tools imported first, but only get and get_scopes call load_all_tools(). get_names() reads the dict directly, so it returns an empty or partial list depending on call order. Rather than adding the call a third time (and remembering it on the next accessor), consider having all the reads call one helper:

def _ensure_loaded(self) -> dict[str, MCPToolRegistration]:
    from ee.hogai.tools import load_all_tools  # noqa: PLC0415 - ensure tools are registered

    load_all_tools()
    return self._tools

and have get/get_scopes/get_names read from self._ensure_loaded(). That keeps the load-then-read invariant in one place. It stays a function-local import to avoid re-introducing the tools ↔ mcp_tool cycle.

self._start_counter: float | None = None
self._status_override: str | None = None

def set_status(self, status: str) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: set_status lost its docstring when it moved here from ai_observability/metrics.py. The original explained when to call it ("Override the status that will be recorded. Use for non-exception outcomes like SKIPPED."), which is the only thing distinguishing it from letting __exit__ derive the status. Worth carrying over:

Suggested change
def set_status(self, status: str) -> None:
def set_status(self, status: str) -> None:
"""Override the status that will be recorded. Use for non-exception outcomes like SKIPPED."""
self._status_override = status

Comment thread posthog/api/__init__.py

# The API route aggregator (the DRF router build + ~200 viewset imports) lives in
# `posthog.api.rest_router`, imported lazily — only when a name it defines or re-exports
# (the router objects, view modules like `feature_flag`) is first accessed, which in practice
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: feature_flag was migrated into products/ and the aggregator no longer binds it, so make the example a name it actually re-exports, like the router object. Same at line 20.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

4 participants