feat(data-warehouse): add ClickHouse source by danielcarletti · Pull Request #53601 · PostHog/posthog

danielcarletti · 2026-04-07T18:10:20Z

Problem

PostHog's data warehouse supports a long list of sources but not ClickHouse itself. Users running their own ClickHouse deployments want to pull that data into PostHog without building a custom pipeline, and the target databases are often multi-terabyte and multi-billion row — the import path has to stream, not buffer.

Changes

Adds a new warehouse source for ClickHouse under posthog/temporal/data_imports/sources/clickhouse/, following the same split as the Postgres source (source.py for registration/form fields/validation, clickhouse.py for transport). Designed up front to scale to very large databases.

Scalability:

Uses clickhouse-connect with query_arrow_stream, so data flows as a sequence of pa.RecordBatch chunks sized by ClickHouse's max_block_size. Batches are accumulated into ~100k-row / 200 MiB pa.Tables via pa.Table.from_batches before yielding, so Delta sees fewer, larger commits. Memory per worker is bounded regardless of table size.
Row counts come for free from system.tables.total_rows for MergeTree tables — no SELECT COUNT(*) on multi-billion row tables. Distributed tables fall back to SELECT count() (cheap, distributed). MaterializedViews resolve to their TO target's total_rows or their .inner_id.<uuid> inner table. Plain views and no-counter engines (Memory/Buffer/Log/Kafka/URL) are reported as "Skipped" with an explanatory tooltip in the UI.
Partition sizing is computed from system.tables.total_bytes / total_rows rather than sampling the table, targeting DEFAULT_PARTITION_TARGET_SIZE_IN_BYTES (200 MiB) per partition.
Server-side settings: output_format_arrow_string_as_string=1, output_format_arrow_low_cardinality_as_dictionary=0, optimize_read_in_order=1, max_bytes_before_external_sort=500 MiB, max_execution_time, tunable max_block_size.

Arrow compatibility:

ClickHouse refuses to emit several types as Arrow (error 50 on UUID, IPv4/6, wide ints, Enum*, FixedString, Array, Map, Tuple, Nested, Variant, Dynamic, JSON, Object). We now build an explicit SELECT list and wrap those columns in toString(col) AS col so the stream never crashes. Type mapping handles Nullable/LowCardinality wrappers, DateTime/DateTime64 with precision + timezone, Decimal[32-256], Date/Date32, signed/unsigned ints up to 64-bit (wide Int128/256 fall back to string), Enum8/16, and composites serialized to string.

Schema discovery:

Single round-trip to system.columns for the whole database.
Primary key comes from is_in_sorting_key on system.columns. Because ClickHouse's sorting key is not necessarily unique, every incremental sync runs _has_duplicate_primary_keys first (bounded-prefix probe with max_rows_to_read + read_overflow_mode='break').
View vs. materialized-view detection via system.tables.engine.
Discovery/query log lines run at info level so they surface on the syncs tab.

Incremental sync:

Supports integer (Int8-Int256, UInt8-UInt256) and temporal (Date, Date32, DateTime, DateTime64) cursor fields.
Query builder uses parameterized queries (%(last_value)s) — only validated, backtick-quoted identifiers land in the SQL string. Identifier quoting escapes embedded backticks and rejects null bytes.

Connection options: host, port, database, user, password (optional), HTTPS toggle, SSL-verify toggle, optional SSH tunnel. SSH tunnel works transparently because we use HTTP(S).

Registration / plumbing:

Added CLICKHOUSE to ExternalDataSourceType in products/data_warehouse/backend/types.py, posthog/schema.py (via schema:build), and frontend/src/queries/schema/schema-general.ts.
New Django migration 0042_alter_externaldatasource_source_type adds the choice to the model.
Registered ClickHouseSource in posthog/temporal/data_imports/sources/__init__.py.
ClickHouseSourceConfig regenerated via generate:source-configs.
Frontend: SchemaForm renders "Skipped" with a tooltip instead of "Unknown" when row count is unavailable, explaining that counting would require a full scan.

How did you test this code?

This PR was authored by an agent (Claude Code). Verification so far is code-level plus one round of manual smoke-testing against a local ClickHouse:

141 unit tests in test_clickhouse.py covering identifier quoting, type modifier stripping (Nullable/LowCardinality), incremental-field filtering across every supported CH type, query-builder output (including toString wrapping of Arrow-incompatible types), ClickHouseColumn → pa.Field mapping for all supported types (including DateTime64 precision/timezone, Decimal[32-256], wide ints, enums, composites), non-retryable error pattern matching, error translation, schema grouping (mocked client), validate_credentials error paths, batch-accumulation boundaries in get_rows, MV target parsing (qualified/backticked/unqualified/none), and get_clickhouse_row_count across MergeTree/Distributed/MV-with-TO/MV-inner/View paths. All 141 passing.
The existing Postgres test suite still passes, confirming no regression in shared types/configs.
Smoke-tested source loading under Django: class is registered, config class generates correctly, form fields render.
Manual testing against a local ClickHouse — table list renders, Distributed row counts populate via count() fallback, views render as "Skipped":

Follow-ups for merging:

Drop a clickhouse.png icon into frontend/public/services/ (currently references a placeholder path).
Write a docs page at posthog.com/docs/cdp/sources/clickhouse (URL referenced in docsUrl).

Publish to changelog?

Yes — new warehouse source.

🤖 LLM context

Authored by Claude Code (Opus 4.6, 1M context) across multiple sessions. The agent read the existing Postgres, Snowflake, and MySQL sources as reference, followed the implementing-warehouse-sources skill, and chose clickhouse-connect's query_arrow_stream over clickhouse-driver specifically for the Arrow streaming path (which bounds memory on huge tables). Later sessions hardened the source after discovering real-world failures: ClickHouse's Arrow output refuses several common types (error 50), query_arrow_stream yields RecordBatch not Table, and system.tables.total_rows is NULL for Distributed tables and MaterializedViews — all now handled.

Adds a data warehouse source for ClickHouse, built for scalability with very large databases via clickhouse-connect's streaming Arrow reader, free row/byte counts from system.tables, and sorting-key-based primary key discovery. Supports HTTPS, SSL verification toggle, and SSH tunnel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-07T18:13:14Z

Size Change: +91 B (0%)

Total Size: 129 MB

ℹ️ View Unchanged

Filename	Size	Change
`frontend/dist/368Hedgehogs`	5.26 kB	0 B
`frontend/dist/abap`	14.2 kB	0 B
`frontend/dist/AccountSocialConnected`	2.17 kB	0 B
`frontend/dist/Action`	23.9 kB	0 B
`frontend/dist/Actions`	1.02 kB	0 B
`frontend/dist/AdvancedActivityLogsScene`	35.6 kB	0 B
`frontend/dist/AgenticAuthorize`	5.25 kB	0 B
`frontend/dist/apex`	3.95 kB	0 B
`frontend/dist/ApprovalDetail`	16.2 kB	0 B
`frontend/dist/array.full.es5.js`	333 kB	0 B
`frontend/dist/array.full.js`	427 kB	0 B
`frontend/dist/array.js`	183 kB	0 B
`frontend/dist/AsyncMigrations`	13.1 kB	0 B
`frontend/dist/AuthorizationStatus`	716 B	0 B
`frontend/dist/azcli`	846 B	0 B
`frontend/dist/bat`	1.84 kB	0 B
`frontend/dist/BatchExportScene`	60.5 kB	0 B
`frontend/dist/bicep`	2.55 kB	0 B
`frontend/dist/Billing`	493 B	0 B
`frontend/dist/BillingSection`	20.8 kB	0 B
`frontend/dist/BoxPlot`	5.04 kB	0 B
`frontend/dist/browserAll-0QZMN1W2`	37.4 kB	0 B
`frontend/dist/ButtonPrimitives`	562 B	0 B
`frontend/dist/CalendarHeatMap`	4.79 kB	0 B
`frontend/dist/cameligo`	2.18 kB	0 B
`frontend/dist/changeRequestsLogic`	544 B	0 B
`frontend/dist/CLIAuthorize`	11.3 kB	0 B
`frontend/dist/CLILive`	3.97 kB	0 B
`frontend/dist/clojure`	9.64 kB	0 B
`frontend/dist/coffee`	3.59 kB	0 B
`frontend/dist/Cohort`	24.8 kB	0 B
`frontend/dist/CohortCalculationHistory`	6.22 kB	0 B
`frontend/dist/Cohorts`	9.39 kB	0 B
`frontend/dist/ConfirmOrganization`	4.48 kB	0 B
`frontend/dist/conversations.js`	65.8 kB	0 B
`frontend/dist/Coupons`	720 B	0 B
`frontend/dist/cpp`	5.3 kB	0 B
`frontend/dist/Create`	655 B	0 B
`frontend/dist/crisp-chat-integration.js`	1.88 kB	0 B
`frontend/dist/csharp`	4.52 kB	0 B
`frontend/dist/csp`	1.42 kB	0 B
`frontend/dist/css`	4.51 kB	0 B
`frontend/dist/cssMode`	4.15 kB	0 B
`frontend/dist/CustomCssScene`	3.55 kB	0 B
`frontend/dist/CustomerAnalyticsConfigurationScene`	1.99 kB	0 B
`frontend/dist/CustomerAnalyticsScene`	26.4 kB	0 B
`frontend/dist/CustomerJourneyBuilderScene`	1.69 kB	0 B
`frontend/dist/CustomerJourneyTemplatesScene`	7.39 kB	0 B
`frontend/dist/customizations.full.js`	17.9 kB	0 B
`frontend/dist/CyclotronJobInputAssignee`	1.32 kB	0 B
`frontend/dist/CyclotronJobInputBusinessHours`	2.71 kB	0 B
`frontend/dist/CyclotronJobInputTicketTags`	711 B	0 B
`frontend/dist/cypher`	3.38 kB	0 B
`frontend/dist/dart`	4.25 kB	0 B
`frontend/dist/Dashboard`	1.11 kB	0 B
`frontend/dist/Dashboards`	24.1 kB	0 B
`frontend/dist/DataManagementScene`	646 B	0 B
`frontend/dist/DataPipelinesNewScene`	2.32 kB	0 B
`frontend/dist/DataWarehouseScene`	1.26 kB	0 B
`frontend/dist/Deactivated`	1.13 kB	0 B
`frontend/dist/dead-clicks-autocapture.js`	13.1 kB	0 B
`frontend/dist/DeadLetterQueue`	5.38 kB	0 B
`frontend/dist/DebugScene`	20 kB	0 B
`frontend/dist/decompressionWorker`	2.85 kB	0 B
`frontend/dist/decompressionWorker.js`	2.85 kB	0 B
`frontend/dist/DefinitionEdit`	7.11 kB	0 B
`frontend/dist/DefinitionView`	22.7 kB	0 B
`frontend/dist/DestinationsScene`	2.71 kB	0 B
`frontend/dist/dist`	575 B	0 B
`frontend/dist/dockerfile`	1.87 kB	0 B
`frontend/dist/EarlyAccessFeature`	753 B	0 B
`frontend/dist/EarlyAccessFeatures`	2.84 kB	0 B
`frontend/dist/ecl`	5.33 kB	0 B
`frontend/dist/EditorScene`	891 B	0 B
`frontend/dist/elixir`	10.3 kB	0 B
`frontend/dist/elk.bundled`	1.44 MB	0 B
`frontend/dist/EmailMFAVerify`	2.98 kB	0 B
`frontend/dist/EndpointScene`	37.5 kB	0 B
`frontend/dist/EndpointsScene`	22.1 kB	0 B
`frontend/dist/ErrorTrackingIssueFingerprintsScene`	6.98 kB	0 B
`frontend/dist/ErrorTrackingIssueScene`	95.6 kB	0 B
`frontend/dist/ErrorTrackingScene`	22.6 kB	0 B
`frontend/dist/EvaluationTemplates`	575 B	0 B
`frontend/dist/EventsScene`	2.57 kB	0 B
`frontend/dist/exception-autocapture.js`	11.8 kB	0 B
`frontend/dist/Experiment`	217 kB	0 B
`frontend/dist/Experiments`	17.7 kB	0 B
`frontend/dist/exporter`	20.9 MB	+26 B (0%)
`frontend/dist/exporter.js`	20.9 MB	+26 B (0%)
`frontend/dist/ExportsScene`	3.98 kB	0 B
`frontend/dist/FeatureFlag`	128 kB	0 B
`frontend/dist/FeatureFlags`	606 B	0 B
`frontend/dist/FeatureFlagTemplatesScene`	7.03 kB	0 B
`frontend/dist/FlappyHog`	5.78 kB	0 B
`frontend/dist/flow9`	1.8 kB	0 B
`frontend/dist/freemarker2`	16.7 kB	0 B
`frontend/dist/fsharp`	2.98 kB	0 B
`frontend/dist/go`	2.65 kB	0 B
`frontend/dist/graphql`	2.26 kB	0 B
`frontend/dist/Group`	14.4 kB	0 B
`frontend/dist/Groups`	3.91 kB	0 B
`frontend/dist/GroupsNew`	7.34 kB	0 B
`frontend/dist/handlebars`	7.34 kB	0 B
`frontend/dist/hcl`	3.59 kB	0 B
`frontend/dist/HealthCategoryDetailScene`	7.23 kB	0 B
`frontend/dist/HealthScene`	10.6 kB	0 B
`frontend/dist/HeatmapNewScene`	4.16 kB	0 B
`frontend/dist/HeatmapRecordingScene`	3.92 kB	0 B
`frontend/dist/HeatmapScene`	5.88 kB	0 B
`frontend/dist/HeatmapsScene`	3.88 kB	0 B
`frontend/dist/hls`	394 kB	0 B
`frontend/dist/HogFunctionScene`	58.8 kB	0 B
`frontend/dist/HogRepl`	7.37 kB	0 B
`frontend/dist/html`	5.58 kB	0 B
`frontend/dist/htmlMode`	4.62 kB	0 B
`frontend/dist/image-blob-reduce.esm`	49.4 kB	0 B
`frontend/dist/InboxScene`	59.8 kB	0 B
`frontend/dist/index`	302 kB	0 B
`frontend/dist/index.js`	302 kB	0 B
`frontend/dist/ini`	1.1 kB	0 B
`frontend/dist/InsightQuickStart`	5.42 kB	0 B
`frontend/dist/InsightScene`	28.9 kB	0 B
`frontend/dist/IntegrationsRedirect`	733 B	0 B
`frontend/dist/intercom-integration.js`	1.93 kB	0 B
`frontend/dist/InviteSignup`	14.4 kB	0 B
`frontend/dist/java`	3.22 kB	0 B
`frontend/dist/javascript`	985 B	0 B
`frontend/dist/jsonMode`	13.9 kB	0 B
`frontend/dist/julia`	7.22 kB	0 B
`frontend/dist/kotlin`	3.4 kB	0 B
`frontend/dist/lazy`	158 kB	0 B
`frontend/dist/LegacyPluginScene`	26.6 kB	0 B
`frontend/dist/LemonTextAreaMarkdown`	502 B	0 B
`frontend/dist/less`	3.9 kB	0 B
`frontend/dist/lexon`	2.44 kB	0 B
`frontend/dist/lib`	2.22 kB	0 B
`frontend/dist/Link`	468 B	0 B
`frontend/dist/LinkScene`	24.8 kB	0 B
`frontend/dist/LinksScene`	4.19 kB	0 B
`frontend/dist/liquid`	4.53 kB	0 B
`frontend/dist/LiveDebugger`	19.1 kB	0 B
`frontend/dist/LiveEventsTable`	3.22 kB	0 B
`frontend/dist/LLMAnalyticsClusterScene`	15.7 kB	0 B
`frontend/dist/LLMAnalyticsClustersScene`	43.1 kB	0 B
`frontend/dist/LLMAnalyticsDatasetScene`	19.7 kB	0 B
`frontend/dist/LLMAnalyticsDatasetsScene`	3.28 kB	0 B
`frontend/dist/LLMAnalyticsEvaluation`	58.7 kB	0 B
`frontend/dist/LLMAnalyticsEvaluationsScene`	29.8 kB	0 B
`frontend/dist/LLMAnalyticsPlaygroundScene`	36.3 kB	0 B
`frontend/dist/LLMAnalyticsScene`	118 kB	0 B
`frontend/dist/LLMAnalyticsSessionScene`	13.4 kB	0 B
`frontend/dist/LLMAnalyticsTraceScene`	129 kB	0 B
`frontend/dist/LLMAnalyticsUsers`	526 B	0 B
`frontend/dist/LLMASessionFeedbackDisplay`	4.83 kB	0 B
`frontend/dist/LLMPromptScene`	17.5 kB	0 B
`frontend/dist/LLMPromptsScene`	4.47 kB	0 B
`frontend/dist/LLMSkillScene`	589 B	0 B
`frontend/dist/LLMSkillsScene`	606 B	0 B
`frontend/dist/Login`	8.57 kB	0 B
`frontend/dist/Login2FA`	4.2 kB	0 B
`frontend/dist/logs.js`	38.5 kB	0 B
`frontend/dist/LogsScene`	11.4 kB	0 B
`frontend/dist/lua`	2.11 kB	0 B
`frontend/dist/m3`	2.81 kB	0 B
`frontend/dist/main`	819 kB	0 B
`frontend/dist/ManagedMigration`	14.1 kB	0 B
`frontend/dist/markdown`	3.79 kB	0 B
`frontend/dist/MarketingAnalyticsScene`	39.7 kB	0 B
`frontend/dist/MaterializedColumns`	10.2 kB	0 B
`frontend/dist/Max`	801 B	0 B
`frontend/dist/mdx`	5.39 kB	0 B
`frontend/dist/memlens.lib.bundle`	27.8 kB	0 B
`frontend/dist/MessageTemplate`	16.3 kB	0 B
`frontend/dist/MetricsScene`	828 B	0 B
`frontend/dist/mips`	2.58 kB	0 B
`frontend/dist/ModelsScene`	13.6 kB	0 B
`frontend/dist/MonacoDiffEditor`	403 B	0 B
`frontend/dist/monacoEditorWorker`	288 kB	0 B
`frontend/dist/monacoEditorWorker.js`	288 kB	0 B
`frontend/dist/monacoJsonWorker`	419 kB	0 B
`frontend/dist/monacoJsonWorker.js`	419 kB	0 B
`frontend/dist/monacoTsWorker`	7.02 MB	0 B
`frontend/dist/monacoTsWorker.js`	7.02 MB	0 B
`frontend/dist/MoveToPostHogCloud`	4.46 kB	0 B
`frontend/dist/msdax`	4.91 kB	0 B
`frontend/dist/mysql`	11.3 kB	0 B
`frontend/dist/NavTabChat`	4.68 kB	0 B
`frontend/dist/NewSourceScene`	783 B	0 B
`frontend/dist/NewTabScene`	647 B	0 B
`frontend/dist/NodeDetailScene`	16.3 kB	0 B
`frontend/dist/NotebookCanvasScene`	3.16 kB	0 B
`frontend/dist/NotebookPanel`	5.14 kB	0 B
`frontend/dist/NotebookScene`	8.17 kB	0 B
`frontend/dist/NotebooksScene`	7.58 kB	0 B
`frontend/dist/OAuthAuthorize`	573 B	0 B
`frontend/dist/objective-c`	2.41 kB	0 B
`frontend/dist/Onboarding`	734 kB	0 B
`frontend/dist/OnboardingCouponRedemption`	1.2 kB	0 B
`frontend/dist/pascal`	2.99 kB	0 B
`frontend/dist/pascaligo`	2 kB	0 B
`frontend/dist/passkeyLogic`	484 B	0 B
`frontend/dist/PasswordReset`	4.32 kB	0 B
`frontend/dist/PasswordResetComplete`	2.94 kB	0 B
`frontend/dist/PendingDeletion`	2.21 kB	0 B
`frontend/dist/perl`	8.25 kB	0 B
`frontend/dist/PersonScene`	16 kB	0 B
`frontend/dist/PersonsScene`	4.68 kB	0 B
`frontend/dist/pgsql`	13.5 kB	0 B
`frontend/dist/php`	8.02 kB	0 B
`frontend/dist/PipelineStatusScene`	9.1 kB	0 B
`frontend/dist/pla`	1.67 kB	0 B
`frontend/dist/posthog`	144 kB	0 B
`frontend/dist/postiats`	7.86 kB	0 B
`frontend/dist/powerquery`	16.9 kB	0 B
`frontend/dist/powershell`	3.27 kB	0 B
`frontend/dist/PreflightCheck`	5.53 kB	0 B
`frontend/dist/product-tours.js`	115 kB	0 B
`frontend/dist/ProductTour`	273 kB	0 B
`frontend/dist/ProductTours`	4.68 kB	0 B
`frontend/dist/ProjectHomepage`	24.7 kB	0 B
`frontend/dist/protobuf`	9.05 kB	0 B
`frontend/dist/pug`	4.82 kB	0 B
`frontend/dist/python`	4.76 kB	0 B
`frontend/dist/qsharp`	3.19 kB	0 B
`frontend/dist/QueryPerformance`	6.46 kB	0 B
`frontend/dist/r`	3.12 kB	0 B
`frontend/dist/razor`	9.35 kB	0 B
`frontend/dist/recorder-v2.js`	111 kB	0 B
`frontend/dist/recorder.js`	111 kB	0 B
`frontend/dist/redis`	3.55 kB	0 B
`frontend/dist/redshift`	11.8 kB	0 B
`frontend/dist/RegionMap`	29.4 kB	0 B
`frontend/dist/render-query`	20.6 MB	0 B
`frontend/dist/render-query.js`	20.6 MB	+26 B (0%)
`frontend/dist/ResourceTransfer`	9.17 kB	0 B
`frontend/dist/restructuredtext`	3.9 kB	0 B
`frontend/dist/RevenueAnalyticsScene`	25.6 kB	0 B
`frontend/dist/ruby`	8.5 kB	0 B
`frontend/dist/rust`	4.16 kB	0 B
`frontend/dist/SavedInsights`	664 B	0 B
`frontend/dist/sb`	1.82 kB	0 B
`frontend/dist/scala`	7.32 kB	0 B
`frontend/dist/scheme`	1.76 kB	0 B
`frontend/dist/scss`	6.41 kB	0 B
`frontend/dist/SdkDoctorScene`	9.4 kB	0 B
`frontend/dist/SessionAttributionExplorerScene`	6.62 kB	0 B
`frontend/dist/SessionGroupSummariesTable`	4.62 kB	0 B
`frontend/dist/SessionGroupSummaryScene`	17 kB	0 B
`frontend/dist/SessionProfileScene`	15 kB	0 B
`frontend/dist/SessionRecordingDetail`	1.75 kB	0 B
`frontend/dist/SessionRecordingFilePlaybackScene`	4.46 kB	0 B
`frontend/dist/SessionRecordings`	742 B	0 B
`frontend/dist/SessionRecordingsKiosk`	8.84 kB	0 B
`frontend/dist/SessionRecordingsPlaylistScene`	4.14 kB	0 B
`frontend/dist/SessionRecordingsSettingsScene`	1.9 kB	0 B
`frontend/dist/SessionsScene`	3.98 kB	0 B
`frontend/dist/SettingsScene`	2.98 kB	0 B
`frontend/dist/SharedMetric`	4.83 kB	0 B
`frontend/dist/SharedMetrics`	549 B	0 B
`frontend/dist/shell`	3.07 kB	0 B
`frontend/dist/SignupContainer`	25.7 kB	0 B
`frontend/dist/Site`	1.18 kB	0 B
`frontend/dist/solidity`	18.6 kB	0 B
`frontend/dist/sophia`	2.76 kB	0 B
`frontend/dist/SourceScene`	758 B	0 B
`frontend/dist/SourcesScene`	6.08 kB	0 B
`frontend/dist/sparql`	2.55 kB	0 B
`frontend/dist/sql`	10.3 kB	0 B
`frontend/dist/SqlVariableEditScene`	7.24 kB	0 B
`frontend/dist/st`	7.4 kB	0 B
`frontend/dist/StartupProgram`	21.2 kB	0 B
`frontend/dist/SubscriptionScene`	12.8 kB	0 B
`frontend/dist/SubscriptionsScene`	4.89 kB	0 B
`frontend/dist/SupportSettingsScene`	1.16 kB	0 B
`frontend/dist/SupportTicketScene`	24.6 kB	0 B
`frontend/dist/SupportTicketsScene`	733 B	0 B
`frontend/dist/Survey`	848 B	0 B
`frontend/dist/SurveyFormBuilder`	1.54 kB	0 B
`frontend/dist/Surveys`	18.2 kB	0 B
`frontend/dist/surveys.js`	90 kB	0 B
`frontend/dist/SurveyWizard`	64.3 kB	0 B
`frontend/dist/swift`	5.26 kB	0 B
`frontend/dist/SystemStatus`	16.8 kB	0 B
`frontend/dist/systemverilog`	7.61 kB	0 B
`frontend/dist/TaskDetailScene`	21.5 kB	0 B
`frontend/dist/TaskTracker`	13.2 kB	0 B
`frontend/dist/tcl`	3.57 kB	0 B
`frontend/dist/TextCardMarkdownEditor`	11 kB	0 B
`frontend/dist/toolbar`	10.6 MB	0 B
`frontend/dist/toolbar.js`	10.6 MB	+13 B (0%)
`frontend/dist/ToolbarLaunch`	2.52 kB	0 B
`frontend/dist/tracing-headers.js`	1.74 kB	0 B
`frontend/dist/TracingScene`	29.8 kB	0 B
`frontend/dist/TransformationsScene`	1.95 kB	0 B
`frontend/dist/tsMode`	24 kB	0 B
`frontend/dist/twig`	5.97 kB	0 B
`frontend/dist/TwoFactorReset`	3.98 kB	0 B
`frontend/dist/typescript`	240 B	0 B
`frontend/dist/typespec`	2.82 kB	0 B
`frontend/dist/Unsubscribe`	1.62 kB	0 B
`frontend/dist/UserInterview`	4.53 kB	0 B
`frontend/dist/UserInterviews`	2.01 kB	0 B
`frontend/dist/vb`	5.79 kB	0 B
`frontend/dist/VercelConnect`	4.95 kB	0 B
`frontend/dist/VercelLinkError`	1.91 kB	0 B
`frontend/dist/VerifyEmail`	4.48 kB	0 B
`frontend/dist/vimMode`	211 kB	0 B
`frontend/dist/VisualReviewRunScene`	26.7 kB	0 B
`frontend/dist/VisualReviewRunsScene`	6.12 kB	0 B
`frontend/dist/VisualReviewSettingsScene`	10.8 kB	0 B
`frontend/dist/web-vitals.js`	6.39 kB	0 B
`frontend/dist/WebAnalyticsScene`	5.77 kB	0 B
`frontend/dist/WebGLRenderer-DYjOwNoG`	60.3 kB	0 B
`frontend/dist/WebGPURenderer-B_wkl_Ja`	36.3 kB	0 B
`frontend/dist/WebScriptsScene`	2.57 kB	0 B
`frontend/dist/webworkerAll-puPV1rBA`	324 B	0 B
`frontend/dist/wgsl`	7.34 kB	0 B
`frontend/dist/Wizard`	4.45 kB	0 B
`frontend/dist/WorkflowScene`	101 kB	0 B
`frontend/dist/WorkflowsScene`	58.3 kB	0 B
`frontend/dist/WorldMap`	4.73 kB	0 B
`frontend/dist/xml`	2.98 kB	0 B
`frontend/dist/yaml`	4.6 kB	0 B

_{compressed-size-action}

github-actions · 2026-04-07T18:17:42Z

Migration SQL Changes

Hey 👋, we've detected some migrations on this PR. Here's the SQL output for each migration, make sure they make sense:

`products/data_warehouse/backend/migrations/0045_alter_externaldatasource_source_type.py`

BEGIN;
--
-- Alter field source_type on externaldatasource
--
-- (no-op)
COMMIT;

Last updated: 2026-04-20 17:59 UTC (6105f2a)

github-actions · 2026-04-07T18:18:03Z

🔍 Migration Risk Analysis

We've analyzed your migrations for potential risks.

Summary: 0 Safe | 1 Needs Review | 0 Blocked

⚠️ Needs Review

May have performance impact

data_warehouse.0045_alter_externaldatasource_source_type
  └─ #1 ⚠️ AlterField
     Field alteration may cause table locks or data loss (check if changing type or constraints)
     model: externaldatasource, field: source_type, field_type: CharField

Last updated: 2026-04-20 17:59 UTC (6105f2a)

tests-posthog · 2026-04-07T18:32:32Z

⏭️ Skipped snapshot commit because branch advanced to f3c73f7 while workflow was testing a4905b8.

The new commit will trigger its own snapshot update workflow.

If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:

Merge master into your branch, or
Push an empty commit: git commit --allow-empty -m 'trigger CI' && git push

github-actions · 2026-04-07T18:39:59Z

🎭 Playwright report · View test results →

⚠️ 3 flaky tests:

creates a Postgres direct source and queries it successfully (chromium)
Save view (chromium)
Materialize view pane (chromium)

These issues are not necessarily caused by your changes.
Annoyed by this comment? Help fix flakies and failures and it'll disappear!

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tests-posthog · 2026-04-13T19:04:00Z

Query snapshots: Backend query snapshots updated

Changes: 2 snapshots (2 modified, 0 added, 0 deleted)

What this means:

Query snapshots have been automatically updated to match current output
These changes reflect modifications to database queries or schema

Next steps:

Review the query changes to ensure they're intentional
If unexpected, investigate what caused the query to change

Review snapshot changes →

…eature-clickhouse-source

- Bound the duplicate-primary-key probe to a 10M-row prefix with read_overflow_mode='break' so misconfiguration detection is O(budget) instead of a full-table GROUP BY on every incremental sync. Fail-safe flips to True on unexpected errors to block merges against unverifiable keys. - Add optimize_read_in_order and max_bytes_before_external_sort to the data query. When the cursor leads the sorting key the top-level sort is skipped; otherwise we spill to disk instead of OOMing. Warn when the cursor isn't a sort-key prefix. - Accumulate streamed Arrow blocks into ~200 MiB / ~100k-row pa.Tables before yielding, collapsing the Delta commit count by ~5x on large tables without raising peak memory meaningfully. - Replace the full-table row count on incremental resumes with a bounded WHERE cursor > last_value count so progress reporting tracks actual work. Default rows_to_sync to None instead of 0 when unknown. - Widen _get_client exception wrapping to cover OSError and ssl.SSLError alongside ClickHouseError. Made-with: Cursor

…to dc-feature-clickhouse-source Made-with: Cursor # Conflicts: # posthog/api/test/__snapshots__/test_api_docs.ambr

…eature-clickhouse-source

tests-posthog · 2026-04-16T17:18:55Z

⏭️ Skipped snapshot commit because branch advanced to a3c9787 while workflow was testing bc089a1.

The new commit will trigger its own snapshot update workflow.

If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:

Merge master into your branch, or
Push an empty commit: git commit --allow-empty -m 'trigger CI' && git push

- password: str -> str | None across clickhouse.py signatures (matches ClickHouseSourceConfig), coerce to "" at the clickhouse-connect boundary - pa.timestamp: branch on optional tz and tighten _datetime_unit_for_precision return to Literal so the overload resolves - test: narrow response.items() away from AsyncIterable before list() Made-with: Cursor

tests-posthog · 2026-04-16T17:38:51Z

⏭️ Skipped snapshot commit because branch advanced to e645817 while workflow was testing a3c9787.

The new commit will trigger its own snapshot update workflow.

If you expected this workflow to succeed: This can happen due to concurrent commits. To get a fresh workflow run, either:

Merge master into your branch, or
Push an empty commit: git commit --allow-empty -m 'trigger CI' && git push

greptile-apps · 2026-04-16T17:44:52Z

Prompt To Fix All With AI

This is a comment left during a code review.
Path: posthog/temporal/data_imports/sources/clickhouse/clickhouse.py
Line: 423-428

Comment:
**Wrong precision/scale for Decimal shorthand types**

For `Decimal32(S)`, `Decimal64(S)`, `Decimal128(S)`, and `Decimal256(S)` the single argument is the **scale**, not the precision — the precision is fixed by the variant (9 / 18 / 38 / 76). The current regex puts `S` into group 1 and interprets it as precision with an implied scale of 0, so `Decimal32(4)` produces `pa.decimal128(4, 0)` instead of the correct `pa.decimal128(9, 4)`. ClickHouse sends Arrow data with the real precision/scale, so the registered Delta schema and the actual wire schema disagree — downstream writes can fail or silently corrupt values.

The test `test_decimal_types` only asserts `isinstance(…, Decimal128Type)` so it doesn't catch the wrong precision/scale values.

Suggested fix — split the two forms:

```python
_DECIMAL_FIXED_WIDTHS: dict[str, int] = {"32": 9, "64": 18, "128": 38, "256": 76}
_DECIMAL_FIXED_RE = re.compile(r"^Decimal(32|64|128|256)\(\s*(\d+)\s*\)$")
_DECIMAL_VAR_RE = re.compile(r"^Decimal\(\s*(\d+)\s*(?:,\s*(\d+)\s*)?\)$")
```

Then in `_inner_to_arrow_type`:
```python
match_fixed = _DECIMAL_FIXED_RE.match(inner)
if match_fixed is not None:
    precision = _DECIMAL_FIXED_WIDTHS[match_fixed.group(1)]
    scale = int(match_fixed.group(2))
    return build_pyarrow_decimal_type(precision, scale)

match_dec = _DECIMAL_VAR_RE.match(inner)
if match_dec is not None:
    precision = int(match_dec.group(1))
    scale = int(match_dec.group(2)) if match_dec.group(2) is not None else 0
    return build_pyarrow_decimal_type(precision, scale)
```

And the test should be extended to assert both `field.type.precision` and `field.type.scale`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: posthog/temporal/data_imports/sources/clickhouse/clickhouse.py
Line: 672-699

Comment:
**`incremental_field_type` parameter accepted but never used**

`_build_query` accepts `incremental_field_type` solely to validate it is not `None`, but the value is never used in the query string or the returned parameter dict (the returned `{}` is always discarded with `_` at the call site in `get_rows`). The parameter is superfluous — the guard on `incremental_field` alone is sufficient, and the type-specific logic (`incremental_type_to_initial_value`) already lives in `get_rows`.

```suggestion
def _build_query(
    *,
    database: str,
    table_name: str,
    should_use_incremental_field: bool,
    incremental_field: Optional[str],
) -> str:
    """Build the data extraction query.

    Returns the SQL string. We never interpolate the incremental cursor
    value directly — only identifiers (which are validated) end up in the
    SQL string.
    """
    qualified = _qualified_table(database, table_name)

    if not should_use_incremental_field:
        return f"SELECT * FROM {qualified}"

    if incremental_field is None:
        raise ValueError("incremental_field can't be None when should_use_incremental_field is True")

    quoted_field = _quote_identifier(incremental_field)
    return f"SELECT * FROM {qualified} WHERE {quoted_field} > %(last_value)s ORDER BY {quoted_field} ASC"
```

The call site in `get_rows` would change to `query = _build_query(...)` and you can drop `incremental_field_type=incremental_field_type` from that call.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(data-warehouse): satisfy mypy for Cl..." | Re-trigger Greptile}

tests-posthog · 2026-04-16T17:58:37Z

Query snapshots: Backend query snapshots updated

Changes: 1 snapshots (1 modified, 0 added, 0 deleted)

What this means:

Query snapshots have been automatically updated to match current output
These changes reflect modifications to database queries or schema

Next steps:

Review the query changes to ensure they're intentional
If unexpected, investigate what caused the query to change

Review snapshot changes →

chatgpt-codex-connector · 2026-04-16T20:04:15Z

To use Codex here, create a Codex account and connect to github.

- query_arrow_stream yields RecordBatches; switch accumulator to pa.Table.from_batches to avoid the pa.concat_tables type mismatch. - Build an explicit SELECT list and wrap Arrow-incompatible column types (UUID, IPv4/6, wide ints, Enum*, FixedString, Array, Map, Tuple, Nested, Variant, Dynamic, JSON, Object) in toString() to avoid ClickHouse error 50 on SELECT *. - Extend row-count discovery to Distributed tables (SELECT count() fallback) and MaterializedViews (resolve TO target, else .inner_id inner table). Plain views and no-counter engines stay skipped. - Upgrade discovery/query log lines to info so users see them on the syncs tab; add an entry log for get_rows(). - Frontend: show "Skipped" with an explanatory tooltip instead of "Unknown" when row count is unavailable.

…himedes

- Add get_primary_keys_for_schemas that reuses _get_primary_keys per table and wire detected_primary_keys into SourceSchema so the frontend can suggest sorting-key columns during setup. - Split DecimalN(S) from Decimal(P[, S]) — the former has fixed precision (9/18/38/76) and the lone arg is scale. Previous regex mis-mapped Decimal32(4) to Decimal(4, 0). Tests now assert exact precision and scale. - Drop the unused incremental_field_type parameter from _build_query and return a plain SQL string instead of (str, dict). The type-aware cursor seeding already lives in get_rows.

greptile-apps · 2026-04-16T20:37:51Z

Prompt To Fix All With AI

This is a comment left during a code review.
Path: products/data_warehouse/frontend/shared/components/forms/SchemaForm.tsx
Line: 109-113

Comment:
**ClickHouse-specific tooltip in a shared component**

The tooltip text references "Memory/Buffer/Log-engine tables, or Kafka/URL table functions" — these are ClickHouse engine names that make no sense to a Postgres, MySQL, or Snowflake user seeing a null row count. The `SchemaForm` component is shared across every source; any source that fails to return a row count (e.g. due to a permissions error, or simply because a given source never populates that field) will now surface ClickHouse-specific jargon to unrelated users.

```suggestion
                                    return (
                                        <Tooltip title="Row count is unavailable for this table. The table can still be synced — we just don't know its size up front.">
                                            <span className="text-muted-alt cursor-help">Skipped</span>
                                        </Tooltip>
                                    )
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (2): Last reviewed commit: "fix(data-warehouse): detect primary keys..." | Re-trigger Greptile}

Cast the table name parameter to str before dict lookup to match the typed dict's key.

Drop tests that only assert trivial string formatting or exact dict values — keep the ones that exercise real logic (regex parsing, error translation substring match, engine-specific row-count branches, etc.).

tests-posthog · 2026-04-20T13:57:33Z

Query snapshots: Backend query snapshots updated

Changes: 1 snapshots (1 modified, 0 added, 0 deleted)

What this means:

Query snapshots have been automatically updated to match current output
These changes reflect modifications to database queries or schema

Next steps:

Review the query changes to ensure they're intentional
If unexpected, investigate what caused the query to change

Review snapshot changes →

Check for an empty or missing table name before indexing so callers get the "Table name is missing" ValueError instead of an IndexError.

Resolve SchemaForm.tsx conflict — keep master's new primary-key column and nested LemonCollapse structure while preserving the "Skipped" tooltip that replaces the "Unknown" row-count label.

deployment-status-posthog · 2026-04-20T19:25:00Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-04-20 19:24 UTC	Run
prod-us	✅ Deployed	2026-04-21 04:41 UTC	Run
prod-eu	✅ Deployed	2026-04-20 20:04 UTC	Run

chore: update OpenAPI generated types

f3c73f7

danielcarletti and others added 3 commits April 13, 2026 15:37

Fix migrations

e22cb25

chore: merge master into dc-feature-clickhouse-source

d1961b7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test(backend): update query snapshots

41115ec

danielcarletti added 4 commits April 16, 2026 13:21

Merge branch 'master' of https://github.com/PostHog/posthog into dc-f…

132ed1f

…eature-clickhouse-source

Merge remote-tracking branch 'origin/dc-feature-clickhouse-source' in…

bc089a1

…to dc-feature-clickhouse-source Made-with: Cursor # Conflicts: # posthog/api/test/__snapshots__/test_api_docs.ambr

Merge branch 'master' of https://github.com/PostHog/posthog into dc-f…

a3c9787

…eature-clickhouse-source

danielcarletti marked this pull request as ready for review April 16, 2026 17:38

danielcarletti marked this pull request as draft April 16, 2026 17:38

assign-reviewers-posthog Bot requested review from a team April 16, 2026 17:38

greptile-apps Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread posthog/temporal/data_imports/sources/clickhouse/clickhouse.py Outdated

Comment thread posthog/temporal/data_imports/sources/clickhouse/clickhouse.py Outdated

test(backend): update query snapshots

b099c6f

danielcarletti added 3 commits April 16, 2026 17:06

Merge remote-tracking branch 'origin/master' into claude/dazzling-arc…

ee87d20

…himedes

danielcarletti marked this pull request as ready for review April 16, 2026 20:32

danielcarletti added 4 commits April 16, 2026 17:39

fix(data-warehouse): satisfy mypy in ClickHouse primary-key test

c8d68f3

Cast the table name parameter to str before dict lookup to match the typed dict's key.

feat(data-warehouse): mark ClickHouse source as beta

381d3e7

chore(data-warehouse): trim low-value ClickHouse tests

4d6bb4d

Drop tests that only assert trivial string formatting or exact dict values — keep the ones that exercise real logic (regex parsing, error translation substring match, engine-specific row-count branches, etc.).

Merge branch 'master' into dc-feature-clickhouse-source

1c82cb6

graphite-app Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread posthog/temporal/data_imports/sources/clickhouse/clickhouse.py Outdated

test(backend): update query snapshots

5763abf

danielcarletti added 2 commits April 20, 2026 11:02

fix(data-warehouse): guard empty table_names in clickhouse_source

a5a99d5

Check for an empty or missing table name before indexing so callers get the "Table name is missing" ValueError instead of an IndexError.

chore: merge master into dc-feature-clickhouse-source

6105f2a

Resolve SchemaForm.tsx conflict — keep master's new primary-key column and nested LemonCollapse structure while preserving the "Skipped" tooltip that replaces the "Unknown" row-count label.

MarconLP approved these changes Apr 20, 2026

View reviewed changes

danielcarletti mentioned this pull request Apr 20, 2026

docs(dwh): add ClickHouse source PostHog/posthog.com#16469

Merged

4 tasks

danielcarletti merged commit 9edf2f9 into master Apr 20, 2026
235 checks passed

danielcarletti deleted the dc-feature-clickhouse-source branch April 20, 2026 18:48

inkeep Bot mentioned this pull request Apr 20, 2026

docs: add ClickHouse data warehouse source documentation PostHog/posthog.com#16470

Open

Conversation

danielcarletti commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Publish to changelog?

🤖 LLM context

Uh oh!

github-actions Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Migration SQL Changes

products/data_warehouse/backend/migrations/0045_alter_externaldatasource_source_type.py

Uh oh!

github-actions Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Migration Risk Analysis

⚠️ Needs Review

Uh oh!

tests-posthog Bot commented Apr 7, 2026

Uh oh!

github-actions Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tests-posthog Bot commented Apr 13, 2026

Query snapshots: Backend query snapshots updated

Uh oh!

tests-posthog Bot commented Apr 16, 2026

Uh oh!

tests-posthog Bot commented Apr 16, 2026

Uh oh!

greptile-apps Bot commented Apr 16, 2026

Uh oh!

Uh oh!

Uh oh!

tests-posthog Bot commented Apr 16, 2026

Query snapshots: Backend query snapshots updated

Uh oh!

chatgpt-codex-connector Bot commented Apr 16, 2026

Uh oh!

greptile-apps Bot commented Apr 16, 2026

Uh oh!

Uh oh!

tests-posthog Bot commented Apr 20, 2026

Query snapshots: Backend query snapshots updated

Uh oh!

Uh oh!

deployment-status-posthog Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielcarletti commented Apr 7, 2026 •

edited

Loading

github-actions Bot commented Apr 7, 2026 •

edited

Loading

github-actions Bot commented Apr 7, 2026 •

edited

Loading

`products/data_warehouse/backend/migrations/0045_alter_externaldatasource_source_type.py`

github-actions Bot commented Apr 7, 2026 •

edited

Loading

github-actions Bot commented Apr 7, 2026 •

edited

Loading

deployment-status-posthog Bot commented Apr 20, 2026 •

edited

Loading