docs: document Kafka streams mode for ksqlDB integration by paveltiunov · Pull Request #10797 · cube-js/cube

paveltiunov · 2026-05-01T03:00:12Z

Check List

Tests have been run in packages where changes have been made if available
Linter has been run for changed code
Tests for the changes have been added if not covered yet
Docs have been added / updated if required

Description of Changes Made

Documents the Kafka streams mode for the ksqlDB integration. In this mode, Cube reads data directly from the underlying Kafka topics instead of going through the ksqlDB REST API for data streaming. Cube does not create any tables or streams in ksqlDB when this mode is enabled.

Changes

Mintlify docs (docs-mintlify/admin/connect-to-data/data-sources/ksqldb.mdx):

Added CUBEJS_DB_KAFKA_HOST, CUBEJS_DB_KAFKA_USER, CUBEJS_DB_KAFKA_PASS, and CUBEJS_DB_KAFKA_USE_SSL to the environment variables table
Added new "Kafka streams mode" section with:
- Overview of what the mode does and how it differs from the default
- "When to use" guidance (read-only ksqlDB, higher throughput, restricted permissions)
- "Enabling Kafka streams mode" with .env examples including multi-broker setup
- Confluent Cloud note about separate Kafka vs ksqlDB credentials
- "How it works" numbered walkthrough
- "Data modeling" subsection with a full lambda pre-aggregation example (batch + streaming cubes with rollup_lambda)
- "Primary key and ungrouped queries" — explains that at least one dimension must have primary_key: true for the streaming pre-aggregation to work, because Cube Store's stream engine doesn't support GROUP BY; Cube auto-omits grouping when a primary key is present
- "Unique key columns and deduplication" — __seq column, last-row-per-key semantics, Kafka message key fallback
- "Stream format" — expected JSON format, case-sensitive field names, with Timestamp handling subsection covering string (ISO 8601/RFC 3339) and numeric (epoch ms) formats, PARSE_TIMESTAMP for non-standard formats, time dimension truncation via granularity, and date_trunc availability
- "Filtering on the stream" — Cube Store applies SQL projections/filters on micro-batches
- "Supported SQL syntax" — strict plan shape, supported/unsupported clauses, alias requirements
Fixed incorrect "Possible Values" for CUBEJS_DB_USER and CUBEJS_DB_PASS

Legacy Nextra docs (docs/content/product/configuration/data-sources/ksqldb.mdx):

Same content additions and fixes, adapted to Nextra component conventions

Add documentation for the Kafka streams mode, where Cube reads data directly from Kafka topics instead of going through the ksqlDB REST API for data streaming. In this mode, Cube does not create any tables or streams in ksqlDB. The documentation covers: - What Kafka streams mode is and how it differs from the default mode - When to use it (read-only ksqlDB, higher throughput, restricted perms) - How to enable it via CUBEJS_DB_KAFKA_* environment variables - How it works under the hood (metadata from ksqlDB, data from Kafka) - Configuration via driverFactory for programmatic setup Also fixes incorrect 'Possible Values' descriptions for CUBEJS_DB_USER and CUBEJS_DB_PASS in the env vars table. Updated both the Mintlify docs (docs-mintlify/) and the legacy Nextra docs (docs/) for consistency. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

mintlify · 2026-05-01T03:00:15Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
cubed3	🟢 Ready	View Preview	May 1, 2026, 3:00 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

claude · 2026-05-01T03:00:27Z

Claude finished @paveltiunov's task in 2m 27s —— View job

PR Review: docs: document Kafka streams mode for ksqlDB integration

Gather context and read changed files
Review mintlify docs changes
Review legacy Nextra docs changes
Check consistency between both doc versions
Verify documentation against driver implementation
Post review feedback

Overall

Good documentation addition. The content is well-structured, clearly explains the feature, and the driverFactory example matches the actual driver implementation. The bug fix for CUBEJS_DB_USER/CUBEJS_DB_PASS "Possible Values" (were incorrectly showing "A valid port number" / "A valid database name") is a nice catch.

I verified all constructor options (url, username, password, kafkaHost, kafkaUser, kafkaPassword, kafkaUseSsl) and environment variables against packages/cubejs-ksql-driver/src/KsqlDriver.ts — they all match.

Issues

1. Broken link anchors for new Kafka env vars (mintlify)
docs-mintlify/admin/connect-to-data/data-sources/ksqldb.mdx lines 77–80

The new Kafka env vars link to anchors like /reference/configuration/environment-variables#cubejs_db_kafka_host, but these anchors do not exist in the environment variables reference page — there are no CUBEJS_DB_KAFKA_* entries there. These links will 404 or land at the top of the page. Either:

Add the Kafka env vars to the environment variables reference page, or
Remove the links (use plain backtick formatting like the Nextra version does)

Fix this →

2. Inconsistent <EnvVar> component usage (Nextra)
docs/content/product/configuration/data-sources/ksqldb.mdx lines 69–72

The existing env vars use the <EnvVar> component (which auto-links to the reference page), but the new Kafka env vars use plain backtick formatting. This is visually inconsistent within the same table. If the Kafka env vars aren't in the reference (and shouldn't be linked), this might be intentional — but it would be good to add a brief note or make the formatting consistent.

3. Unrelated ref link changes (mintlify)
docs-mintlify/admin/connect-to-data/data-sources/ksqldb.mdx lines 190–191

The PR changed two reference links that are unrelated to the Kafka streams mode feature:

ref-powerbi: /admin/connect-to-data/visualization-tools/powerbi → /docs/integrations/power-bi
ref-dax-api: /reference/dax-api → /reference/core-data-apis/dax-api

The original ref-powerbi path (/admin/connect-to-data/visualization-tools/powerbi) has a valid file on disk, while the new path (/docs/integrations/power-bi) points to a directory. For ref-dax-api, the original path (/reference/dax-api) has no file while the new one (/reference/core-data-apis/dax-api) does exist. The powerbi change may be introducing a regression — worth verifying that Mintlify resolves directory paths to index pages correctly.

4. Missing trailing newline
Both files end without a trailing newline. Minor, but POSIX convention.

Nit

The driverFactory example hardcodes kafkaUseSsl: true instead of reading from process.env.CUBEJS_DB_KAFKA_USE_SSL. For consistency with the other properties (which all read from env vars), consider:

kafkaUseSsl: process.env.CUBEJS_DB_KAFKA_USE_SSL === "true",

claude · 2026-05-01T03:04:10Z

Claude finished @paveltiunov's task in 1m 27s —— View job

PR Review: docs: document Kafka streams mode for ksqlDB integration

Gather context and read changed files
Review mintlify docs changes
Review legacy Nextra docs changes
Verify documentation against driver implementation
Check consistency between both doc versions
Post review feedback

Overall

Well-structured documentation addition. The content clearly explains the Kafka streams mode feature, when to use it, and how to configure it. The bug fix for CUBEJS_DB_USER/CUBEJS_DB_PASS "Possible Values" (which incorrectly showed "A valid port number" / "A valid database name") is a good catch.

I verified all constructor options (url, username, password, kafkaHost, kafkaUser, kafkaPassword, kafkaUseSsl) and environment variables against packages/cubejs-ksql-driver/src/KsqlDriver.ts and packages/cubejs-backend-shared/src/env.ts — they all match. The "How it works" section accurately reflects the getStreamingTableData method in the driver.

Both doc versions (mintlify and Nextra) are consistent in content, with appropriate component differences (<Info> vs <InfoBox>, <EnvVar> vs links, etc.).

Issues

1. Broken link anchors for new Kafka env vars (mintlify) — docs-mintlify/.../ksqldb.mdx lines 77–80

The new Kafka env vars link to anchors like /reference/configuration/environment-variables#cubejs_db_kafka_host, but these anchors do not exist in the environment variables reference page — I confirmed via grep that there are no CUBEJS_DB_KAFKA_* entries there. These links will 404 or land at the top of the page. Either:

Add the Kafka env vars to the environment variables reference page, or
Remove the links and use plain backtick formatting (like the Nextra version already does)

Fix this →

2. driverFactory example hardcodes kafkaUseSsl: true — both files

The driverFactory code example hardcodes kafkaUseSsl: true while every other property reads from process.env. For consistency and to match how users would actually configure this:

kafkaUseSsl: process.env.CUBEJS_DB_KAFKA_USE_SSL === "true",

This also matches the env var behavior defined in packages/cubejs-backend-shared/src/env.ts:489-493 where it defaults to false and parses as a boolean.

Fix this →

3. Missing trailing newline — both files

Both files end without a trailing newline (POSIX convention). Minor.

Nits

Inconsistent <EnvVar> component usage in Nextra — docs/.../ksqldb.mdx lines 69–72

The existing env vars use the <EnvVar> component (which auto-links to the reference), but the new Kafka env vars use plain backtick formatting. This is likely intentional since the Kafka vars aren't in the reference page, but it's visually inconsistent within the same table. If these vars are added to the reference page (per issue #1), they could also use <EnvVar>.

ref-powerbi path change in mintlify — line 190

Changed from /admin/connect-to-data/visualization-tools/powerbi to /docs/integrations/power-bi. The new path is a directory (with index.mdx inside). This should resolve correctly in Mintlify, but worth verifying the preview deployment to confirm.

ref-dax-api path change in mintlify — line 191

Changed from /reference/dax-api (which has no file on disk) to /reference/core-data-apis/dax-api (which exists) — this is a good fix.

…ams mode Remove the driverFactory configuration section and replace it with a Data modeling section that shows: - How to configure ksqlDB as a named data source using decorated environment variables (CUBEJS_DS_KSQL_DB_*) - How to create a cube with data_source: ksql that references an existing ksqlDB stream or table - A complete cube definition with measures, dimensions, and a streaming pre-aggregation in both YAML and JavaScript Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

Replace the simple single-cube example with a full lambda pre-aggregation pattern showing: - A batch cube (order_events) querying a warehouse with FILTER_PARAMS, incremental daily partitions, and a rollup_lambda that merges batch and streaming rollups - A streaming cube (order_events_stream) with data_source: ksql pointing at an existing ksqlDB stream, using read_only: true, stream_offset, unique_key_columns, and incremental refresh - Documentation of key streaming pre-aggregation properties (read_only, stream_offset, unique_key_columns) Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

It is true by default, so specifying it is redundant. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

Add three new subsections to the Kafka streams mode documentation: - Unique key columns and deduplication: explains how __seq column is appended from Kafka offset, deduplication happens at read/compaction time (last row per key wins), and key values can fall back to Kafka message key when missing from payload - Stream format: documents the expected JSON object format for Kafka message values, case-sensitivity of field names, and optional message key parsing - Filtering on the stream: explains that Cube Store applies SELECT projections and WHERE filters from the cube's sql property directly on each micro-batch of Kafka messages, without creating any objects in ksqlDB Also expands stream_offset documentation to explain defaults and automatic resume behavior on subsequent refreshes. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

Add a 'Supported SQL syntax' subsection under 'Filtering on the stream' that documents: - The strict plan shape requirement (Projection > Filter > TableScan) - Supported clauses: SELECT, WHERE with comparisons/boolean logic, IS NULL, IN, BETWEEN, CASE, CAST, EXTRACT, SUBSTRING, scalar functions, CONVERT_TZ, nested expressions - Unsupported clauses: JOIN, subqueries, GROUP BY, HAVING, aggregates, ORDER BY, LIMIT/OFFSET, UNION/INTERSECT/EXCEPT, window functions, multiple FROM/WHERE, CTEs - Alias requirement for non-column expressions - Unique key column expression constraints Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

Add PARSE_TIMESTAMP and FORMAT_TIMESTAMP to the supported functions list, and add a paragraph explaining that time dimension truncation (granularity) is fully supported via the PARSE_TIMESTAMP(FORMAT_TIMESTAMP(CONVERT_TZ(...))) expression chain that Cube generates automatically and Cube Store evaluates natively as custom UDFs in its post-processing engine. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

Add a 'Timestamp handling' subsection under 'Stream format' that covers: - String timestamps: ISO 8601 / RFC 3339 formats with examples - Numeric timestamps: epoch milliseconds (not seconds/microseconds) - PARSE_TIMESTAMP for converting non-standard timestamp formats - Time dimension truncation via granularity (auto-generated PARSE_TIMESTAMP/FORMAT_TIMESTAMP/CONVERT_TZ chain) - date_trunc availability as a standard SQL function Move time dimension truncation docs from the Supported SQL syntax section into the Timestamp handling section where it fits better. Add date_trunc to the supported functions list. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

Add 'Primary key and ungrouped queries' subsection explaining: - Cube Store's stream post-processing does not support GROUP BY - Cube omits GROUP BY when at least one dimension has primary_key: true - This makes the query a simple SELECT ... FROM ... eligible for read-only streaming - Without a primary key dimension, GROUP BY is generated and the pre-aggregation cannot use the streaming path Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

Update the ungrouped query requirement to state that all primary key columns must be present in the streaming pre-aggregation's dimensions list, not just one. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

Wrap the data modeling example in CodeGroup/CodeTabs with both YAML and JavaScript versions of the batch cube (order_events) and streaming cube (order_events_stream) including all pre-aggregation configuration. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

mintlify Bot deployed to staging - docs-mintlify May 1, 2026 03:00 View deployment

vercel Bot deployed to Preview May 1, 2026 03:02 View deployment

paveltiunov marked this pull request as ready for review May 1, 2026 03:03

mintlify Bot deployed to staging - docs-mintlify May 1, 2026 03:10 View deployment

vercel Bot deployed to Preview May 1, 2026 03:13 View deployment

mintlify Bot deployed to staging - docs-mintlify May 1, 2026 03:17 View deployment

docs: remove external: true from pre-aggregation examples

6c1ca59

It is true by default, so specifying it is redundant. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

mintlify Bot deployed to staging - docs-mintlify May 1, 2026 03:20 View deployment

vercel Bot deployed to Preview May 1, 2026 03:22 View deployment

mintlify Bot deployed to staging - docs-mintlify May 1, 2026 03:25 View deployment

vercel Bot deployed to Preview May 1, 2026 03:26 View deployment

mintlify Bot deployed to staging - docs-mintlify May 1, 2026 03:33 View deployment

vercel Bot deployed to Preview May 1, 2026 03:34 View deployment

mintlify Bot deployed to staging - docs-mintlify May 1, 2026 03:49 View deployment

vercel Bot deployed to Preview May 1, 2026 03:51 View deployment

mintlify Bot deployed to staging - docs-mintlify May 1, 2026 03:52 View deployment

vercel Bot deployed to Preview May 1, 2026 03:54 View deployment

mintlify Bot deployed to staging - docs-mintlify May 1, 2026 05:09 View deployment

docs: clarify that all primary key columns must be included

cbcd6af

Update the ungrouped query requirement to state that all primary key columns must be present in the streaming pre-aggregation's dimensions list, not just one. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

mintlify Bot deployed to staging - docs-mintlify May 1, 2026 05:12 View deployment

vercel Bot deployed to Preview May 1, 2026 05:13 View deployment

mintlify Bot deployed to staging - docs-mintlify May 1, 2026 15:49 View deployment

vercel Bot deployed to Preview May 1, 2026 15:51 View deployment

paveltiunov merged commit 98128af into master May 1, 2026
7 checks passed

paveltiunov deleted the cursor/docs-ksqldb-kafka-streams-mode-623b branch May 1, 2026 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: document Kafka streams mode for ksqlDB integration#10797

docs: document Kafka streams mode for ksqlDB integration#10797
paveltiunov merged 11 commits intomasterfrom
cursor/docs-ksqldb-kafka-streams-mode-623b

paveltiunov commented May 1, 2026 •

edited by cursor Bot

Loading

Uh oh!

mintlify Bot commented May 1, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 1, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

paveltiunov commented May 1, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

mintlify Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: docs: document Kafka streams mode for ksqlDB integration

Overall

Issues

Nit

Uh oh!

claude Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: docs: document Kafka streams mode for ksqlDB integration

Overall

Issues

Nits

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

paveltiunov commented May 1, 2026 •

edited by cursor Bot

Loading

mintlify Bot commented May 1, 2026 •

edited

Loading

claude Bot commented May 1, 2026 •

edited

Loading

claude Bot commented May 1, 2026 •

edited

Loading