Skip to content

[Feature] SSE Materialized View Creation and Ingestion#18528

Merged
xiangfu0 merged 1 commit into
apache:masterfrom
hongkunxu:feat/sse_mv_view_creation
May 20, 2026
Merged

[Feature] SSE Materialized View Creation and Ingestion#18528
xiangfu0 merged 1 commit into
apache:masterfrom
hongkunxu:feat/sse_mv_view_creation

Conversation

@hongkunxu
Copy link
Copy Markdown
Contributor

@hongkunxu hongkunxu commented May 19, 2026

Related Issue

This PR is part 1/2 of the solution to PEP-Request #17298 — Support for Materialised Query Rewrite in Apache Pinot using MV Tables and Calcite.

Part Delivers PR
1/2 MV creation, metadata mapping, ingestion / incremental refresh #18528
2/2 Calcite-based broker query rewrite (auto-route base-table queries to the most cost-effective MV) #18529

Part 1 — Materialized View: Create, Metadata & Ingestion

Summary

This PR introduces the storage / ingestion half of Apache Pinot's Materialized View (MV) feature: an MV is modeled as a derived Pinot table whose contents are built and incrementally refreshed by a Minion task framework that reacts to base-table segment changes. It lands the ZK-persisted MV definition + runtime model, the controller-side consistency manager that marks partitions STALE on base-table changes, partition-level cryptographic fingerprinting for collision-resistant change detection, and the APPEND / OVERWRITE / DELETE task lifecycle.

This PR is consumer-agnostic: once built, the MV table is queryable by name like any other Pinot OFFLINE table. The Companion PR (linked at the end) ships the broker-side rewrite engine that automatically routes user queries against the base table onto the MV.

Key capabilities:

  • ZK-persisted MV definition + runtime model, validated at controller table-create time via TaskConfigUtils.validateTaskConfigs
  • Event-driven freshness tracking: every controller-side ZK segment-metadata write notifies the consistency manager, which marks affected MV partitions STALE (debounce-buffered, CAS-protected)
  • Partition-level cryptographic fingerprinting (farmHashFingerprint64 over sorted (segmentName, crc)) for collision-resistant change detection
  • Minion APPEND / OVERWRITE / DELETE lifecycle with batched APPEND scheduling for fast historical back-fill (maxTasksPerBatch)
  • CAS-protected runtime updates under batched-completion contention; typed CasConflictException separates retry-worthy conflicts from real validation errors
  • Streaming gRPC executor — query results are pulled frame-by-frame and chunked directly into segment builds, so heap residency is bounded by maxNumRecordsPerSegment, not by total window row count
  • Defense-in-depth against silent result truncation (analyzer LIMIT probe + generator verify-re-parse + executor end-of-stream saturation gate)
  • Lineage rollback on task failure
  • Base-table delete is blocked when dependent MVs exist (rejects with the dependent MV list)
  • Configurable broker-gRPC client (TLS, max-inbound-message-size, keepalive) via pinot.minion.materializedview.broker.grpc.*
  • All hard caps are cluster-config-overridable at runtime (no restart needed)

Scope

In this PR Deferred to Companion PR
pinot-materialized-view module: analysis/, consistency/, metadata/, scheduler/, executor/, context/ rewrite/ package (AggregationEquivalenceRegistry + Passthrough/SketchMerge rules) — lands with the engine that applies them
Controller-side notifyMaterializedView* hooks on PinotHelixResourceManager (every segment-metadata write) and SegmentDeletionManager Broker MaterializedViewQueryRewriteEngine + subsumption strategies
Wiring in BaseControllerStarter: MaterializedViewConsistencyManager registered before segment lifecycle listeners MaterializedViewMetadataCache on the broker
MaterializedViewTask minion plugin (generator + executor + factory + observer) BrokerResponse.materializedViewQueried annotation
MaterializedViewTask constants moved from MinionConstants to CommonConstants (pinot-spi) BaseSingleStageBrokerRequestHandler MV integration
airlineStatsMv quickstart with MV-vs-base result comparison for every demonstrated aggregation Per-query rewriteEnabled enforcement + SLO-based eligibility gate
pinot-materialized-view/DESIGN.md covering today's time-windowed model and the planned fixed-partition extension

Design Doc

Long-form design doc: https://docs.google.com/document/d/1ToJfN42IMNySEY8YODb99Beis9YpLa8A8OWLcPcvG0M/edit?usp=sharing

In-repo design notes for the next contributor: pinot-materialized-view/DESIGN.md — covers today's time-windowed partition model, what's already partition-shape-neutral, and the migration plan for fixed-partition (CATEGORICAL / SINGLETON) MVs.

Architecture (this PR)

graph TB
    subgraph User
        Q["User Query<br/>(queries airlineStatsMv directly by name<br/>until Companion PR lands)"]
    end

    subgraph Controller
        PHRM["PinotHelixResourceManager<br/>createSegmentZkMetadata / updateZkMetadata<br/>notifyMaterializedView*"]
        SDM["SegmentDeletionManager<br/>notifyMaterializedView*"]
        MVCM["MaterializedViewConsistencyManager<br/>debounce + CAS-mark STALE<br/>onBaseTableDataChange(range)<br/>onBaseTableFullInvalidation()"]
        TASKVAL["TaskConfigUtils.validateTaskConfigs<br/>→ MaterializedViewAnalyzer.analyze<br/>(REST table-create / update)"]
    end

    subgraph ZK["ZooKeeper"]
        DEF["/CONFIGS/MATERIALIZED_VIEW/DEFINITION<br/>definedSql, baseTables, splitSpec,<br/>partitionExprMaps,<br/>rewriteEnabled, stalenessThresholdMs"]
        RUN["/CONFIGS/MATERIALIZED_VIEW/RUNTIME<br/>watermarkMs +<br/>partitions { bucketStartMs →<br/>(state, fingerprint, lastRefreshTime) }<br/>typed ZNRecord MapField per partition"]
    end

    subgraph Minion
        TG["MaterializedViewTaskGenerator<br/>APPEND / OVERWRITE / DELETE<br/>maxTasksPerBatch"]
        TE["MaterializedViewTaskExecutor<br/>1. stream gRPC rows from broker<br/>2. chunk into segments (bounded heap)<br/>3. upload + segment-replace<br/>4. CAS-update runtime metadata"]
        GRPC_FACTORY["GrpcMaterializedViewQueryExecutor<br/>broker discovery + round-robin<br/>BrokerGrpcQueryClient cache<br/>QueryHandle exposes Iterator<Object[]>"]
    end

    subgraph Broker
        BG["gRPC broker query endpoint<br/>existing, unchanged"]
    end

    BT["Base Table"] -->|segment add/replace/delete| PHRM
    PHRM --> MVCM
    SDM --> MVCM
    MVCM -->|CAS write STALE via persist| RUN
    TASKVAL -.->|reads| DEF

    TG -->|read| DEF
    TG -->|read| RUN
    TG -->|emit task| TE
    TE --> GRPC_FACTORY
    GRPC_FACTORY -->|SQL| BG
    BG -->|streamed frames| TE
    TE -->|segment upload| Server[Server segments]
    TE -->|CAS update runtime via persist| RUN
Loading

The Companion PR adds a broker-side rewrite engine on top of RUN that lets a query against the base table get answered from the MV transparently. This PR stops at "MV table is queryable by name."

Coverage Model

The MV runtime ZNode tracks coverage as a per-partition map keyed by bucketStartMs:

  • partitions[bucketStart] = { state: VALID | STALE, fingerprint, lastRefreshTime } — partition present
  • Bucket absent from the map = MV does not cover that time range. The broker's rewrite engine (Companion PR) routes queries for those ranges to the base table. The consistency manager never synthesizes STALE entries for absent buckets.
  • watermarkMs — highest contiguous VALID prefix from epoch. Advances monotonically as APPEND tasks succeed; resets on OVERWRITE that punches a hole below the current watermark.
  • DELETE task removes a partition entry — there is no separate EXPIRED state.
  • PartitionInfo is serialized as a typed Map<String,String> ZNRecord MapField per bucket (keys: state, segmentCount, crc, lastRefreshTime), forward-compatible to new keys via "unknown keys ignored on read".

Key Components

Component Module Description
MaterializedViewConsistencyManager pinot-materialized-view Event-driven STALE marking on base-table changes; debounce-buffered with CAS retry; registered on the controller before lifecycle listener init. Lock-free fast-path skips work on clusters with no dependent MVs.
MaterializedViewDefinitionMetadata pinot-materialized-view ZK-persisted MV config (SQL, base tables, split spec, rewriteEnabled, stalenessThresholdMs)
MaterializedViewRuntimeMetadata pinot-materialized-view ZK-persisted mutable MV state: watermarkMs + per-bucket partitions map (typed ZNRecord MapField per partition); defensive-copy on construction
MaterializedViewRuntimeMetadataUtils pinot-materialized-view Central persist(...) helper. Throws typed CasConflictException on CAS failure so callers distinguish "retry me" from validation / transport errors.
PartitionInfo / PartitionFingerprint / PartitionState pinot-materialized-view Per-bucket state (VALID/STALE) + farmHashFingerprint64 over sorted (segmentName, crc). PartitionFingerprint.encodeMap sorts by key for deterministic output.
MaterializedViewAnalyzer pinot-materialized-view SQL validation invoked at controller table-config validation time: required bucketTimePeriod, LIMIT/OFFSET/nested-SELECT rules, time-column format inference, source-table eligibility (rejects UPSERT/DEDUP/DIMENSION/REFRESH/REALTIME sources).
MaterializedViewTaskScheduler pinot-materialized-view APPEND/OVERWRITE/DELETE task selection; calls MaterializedViewAnalyzer.analyze from validateTaskConfigs (wired through TaskConfigUtils at REST table-create time).
MaterializedViewTaskGenerator pinot-plugins/pinot-minion-tasks Minion plugin: registers MaterializedViewTask and delegates to the scheduler.
MaterializedViewTaskExecutor pinot-plugins/pinot-minion-tasks Executes MV build tasks: streams broker rows, chunks into segments, uploads, CAS-updates runtime. End-of-stream saturation gate (totalRows >= effectiveLimit) fails BEFORE segment commit; narrow ZkException-only retry boundary; lineage rollback on failure.
GrpcMaterializedViewQueryExecutor pinot-materialized-view Streaming gRPC client to the broker query endpoint; returns a QueryHandle exposing schema + Iterator<Object[]> so the executor pulls rows frame-by-frame. Discovers brokers via Helix, round-robins, caches one BrokerGrpcQueryClient per endpoint, evicts stale clients. Configured via pinot.minion.materializedview.broker.grpc.* (TLS, max inbound size, keepalive).
MaterializedViewQuickStart pinot-tools End-to-end quickstart that loads the airlineStats fixture, materializes airlineStatsMv, then runs each aggregation against both tables and asserts results match (numeric-tolerant comparator).

Quick Start

bin/pinot-admin.sh QuickStart -type MATERIALIZED_VIEW

Launches a local cluster, loads airlineStats (base) and airlineStatsMv (the MV), back-fills all 31 days of MV segments via batched APPEND tasks (maxTasksPerBatch=31), then runs each demo aggregation twice:

  1. Against the base table directly
  2. Against the MV table using the re-aggregation form (SUM over sum_ArrDelay, SUM over flight_count, DISTINCTCOUNTHLL over the raw-sketch column, etc.)

The quickstart prints both result sets side by side and reports MATCH / DIFFER so you can see the correctness contract that the Companion PR's rewriter will rely on. End-to-end smoke test results: MV table created, 432 pre-aggregated rows materialized in ~40s, all 5 base-vs-MV comparisons return MATCH.

SUM and COUNT: top 10 carriers by total arrival delay — base table
+---------+-------------+---------+
| Carrier | total_delay | flights |
+---------+-------------+---------+
| AA      | 12345       | 1023    |
| DL      | 11890       |  998    |
...
SUM and COUNT: top 10 carriers by total arrival delay — MV table
+---------+-------------+---------+
| Carrier | total_delay | flights |
+---------+-------------+---------+
| AA      | 12345       | 1023    |
| DL      | 11890       |  998    |
...
*** Base and MV results MATCH ***

Configuration Reference

MV task config — task.taskTypeConfigsMap.MaterializedViewTask

Key Default Required Description
definedSQL yes The SQL that defines the MV (validated at table-create time). LIMIT optional; OFFSET rejected; nested SELECTs rejected.
bucketTimePeriod yes Per-partition window size (Joda period: 1d, 1h, 30m). Required: no implicit default to avoid silent drift.
bufferTimePeriod 0d no Lag behind wall-clock before a window is eligible for materialization. Must be >= 0.
maxTasksPerBatch 4 no Max APPEND tasks scheduled per generator cycle. Range [1, 1000].
maxNumRecordsPerSegment 5_000_000 no Output segment size cap. The streaming executor uses this as the chunk size — heap residency is bounded by this value, not by total window row count.
stalenessThresholdMs 0 (no SLO) no Per-MV SLO consumed by the Companion PR's broker isEligible check. Persisted today on the definition znode.

Minion gRPC client config — pinot.minion.materializedview.broker.grpc.*

The minion ships with plaintext defaults — fine for local quickstarts, must be set on TLS-enabled clusters. Subset prefix used by the gRPC client factory in MaterializedViewTaskExecutorFactory.

# Disable plaintext on TLS clusters
pinot.minion.materializedview.broker.grpc.usePlainText=false

# TLS material (any key supported by TlsUtils.extractTlsConfig under prefix 'tls.')
pinot.minion.materializedview.broker.grpc.tls.keystore.path=/etc/pinot/tls/keystore.jks
pinot.minion.materializedview.broker.grpc.tls.keystore.password=...
pinot.minion.materializedview.broker.grpc.tls.truststore.path=/etc/pinot/tls/truststore.jks
pinot.minion.materializedview.broker.grpc.tls.truststore.password=...

# Optional: raise inbound message size for large MV result frames (default 128 MiB)
pinot.minion.materializedview.broker.grpc.maxInboundMessageSizeBytes=268435456

# Optional: keepalive tuning
pinot.minion.materializedview.broker.grpc.channelKeepAliveTimeSeconds=30

Per-task auth credentials (Bearer tokens, etc.) are unaffected by this prefix — they flow through the task's AuthProvider as gRPC metadata.

Caps and cluster-config overrides

Each cap has a compile-time default and a Helix CLUSTER-scope cluster-config key. The consumer site reads the override live on every call (no controller / minion restart needed); an unset / malformed / non-positive value falls back to the compile-time default.

Compile-time default Value Cluster-config key Purpose
DEFAULT_MATERIALIZED_VIEW_QUERY_LIMIT 1_000_000 pinot.materialized.view.query.default.limit Auto-injected LIMIT when definedSQL omits one. Per-MV alternative: declare an explicit LIMIT N in definedSQL.
MAX_MATERIALIZED_VIEW_QUERY_LIMIT 100_000_000 pinot.materialized.view.query.max.limit Hard ceiling on any user-declared LIMIT. Sized so a single window cannot OOM the executor's segment-build chunk.
MAX_TASKS_PER_BATCH_USER_CAP 1_000 pinot.materialized.view.scheduler.max.tasks.per.batch.cap User-facing upper bound on per-MV maxTasksPerBatch.
DEFAULT_MAX_BATCH_LOOP_ITERATIONS 100_000 pinot.materialized.view.scheduler.max.batch.loop.iterations Cap on the scheduler's APPEND-window enumeration loop.
DEFAULT_MAX_RUNTIME_UPDATE_ATTEMPTS 128 pinot.materialized.view.executor.runtime.update.max.attempts Executor's CAS retry budget on the runtime znode under batched-completion contention.
DEFAULT_DEBOUNCE_DELAY_MS 5_000 ms pinot.materialized.view.consistency.debounce.ms Consistency manager debounce window. Lower = faster STALE-marking visibility (more ZK writes); higher = better batching (slower visibility).

Setting an override (via the controller REST /cluster/configs):

curl -X POST 'http://<controller>:9000/cluster/configs' \
  -H 'Content-Type: application/json' \
  -d '{
    "pinot.materialized.view.consistency.debounce.ms": "10000",
    "pinot.materialized.view.scheduler.max.tasks.per.batch.cap": "2000"
  }'

Reads are live — operators do not need to restart any role for the new value to take effect. Existing in-flight tasks complete with whatever value they captured at the start of their cycle.

Defense-in-depth against silent truncation

definedSQL may omit LIMIT. The system protects against silent result truncation in the materialized segments at three layers:

1. Analyzer (table-create time, REST API):

  • Required bucketTimePeriod — no implicit default.
  • LIMIT present → validates positive bound; rejects values > MAX_MATERIALIZED_VIEW_QUERY_LIMIT.
  • LIMIT absent → simulates the generator's auto-injection probe and re-parses to verify the LIMIT survives. Catches trailing line/block comments.
  • OFFSET rejected. Nested SELECTs rejected.
  • Time-column format/granularity inferred from the SELECT expression and matched against the MV schema's DateTimeFieldSpec.
  • Source table cannot be UPSERT / DEDUP / DIMENSION / REFRESH / REALTIME.

2. Generator (per scheduling cycle):

  • Pre-resolves effectiveLimit via AST.
  • Appends LIMIT effectiveLimit to the broker SQL when the user did not declare one.
  • Runs an unconditional verify-re-parse on the broker-bound SQL.
  • Stores the value in EFFECTIVE_LIMIT_KEY for the executor.

3. Executor (per task):

  • parseEffectiveLimit fails loud if EFFECTIVE_LIMIT_KEY is missing or non-positive.
  • End-of-stream saturation gate: after consuming all rows from the streaming gRPC iterator, if totalRows >= effectiveLimit the task fails BEFORE any segment is committed via lineage. The broker enforces LIMIT by truncating at exactly N rows, so receiving exactly N is treated as "possibly truncated" — conservative by design.
  • Fingerprint validation is hoisted outside the CAS retry loop — a real source drift fails fast with an actionable message instead of being retried 128× and surfacing as "Failed after N attempts".

Task Lifecycle

Mode When Effect
APPEND Watermark can advance into a new bucket window past bufferTimePeriod Materializes [watermarkMs, watermarkMs + bucketMs), advances watermark on success. With maxTasksPerBatch > 1, schedules up to N consecutive windows per cycle.
OVERWRITE A partition was marked STALE by the consistency manager and a fresh fingerprint computation confirms a real change Re-materializes the affected window; replaces existing MV segments via segment-replace lineage.
DELETE A STALE partition's fresh fingerprint shows zero source segments (retention deleted the base data) Removes MV segments and the partition entry from the runtime map; no broker query. Triggered internally by the executor, no separate operator-facing state.

DELETE / OVERWRITE are mutually exclusive (both touch existing MV segments via segment-replace) and gated at one in-flight per MV table. APPEND is gated by maxTasksPerBatch.

End-to-end Example

1. Base table (airlineStats) — already part of the standard quickstart

2. MV table config (airlineStatsMv_offline_table_config.json)

{
  "tableName": "airlineStatsMv",
  "tableType": "OFFLINE",
  "segmentsConfig": {
    "timeColumnName": "tsMs",
    "timeType": "MILLISECONDS",
    "segmentPushType": "APPEND",
    "replication": "1"
  },
  "tableIndexConfig": { "loadMode": "MMAP" },
  "task": {
    "taskTypeConfigsMap": {
      "MaterializedViewTask": {
        "definedSQL": "SELECT DaysSinceEpoch * 86400000 AS tsMs, Carrier, SUM(ArrDelay) AS sum_ArrDelay, COUNT(*) AS flight_count, MIN(ArrDelay) AS min_ArrDelay, MAX(ArrDelay) AS max_ArrDelay, DISTINCTCOUNTRAWHLL(FlightNum) AS raw_hll_FlightNum, DISTINCTCOUNTRAWHLLPLUS(FlightNum) AS raw_hllplus_FlightNum FROM airlineStats GROUP BY DaysSinceEpoch * 86400000, Carrier",
        "bucketTimePeriod": "1d",
        "maxTasksPerBatch": "31"
      }
    }
  }
}

3. MV schema (airlineStatsMv_schema.json)

{
  "schemaName": "airlineStatsMv",
  "dimensionFieldSpecs": [
    { "name": "Carrier", "dataType": "STRING" }
  ],
  "metricFieldSpecs": [
    { "name": "sum_ArrDelay", "dataType": "LONG" },
    { "name": "flight_count", "dataType": "LONG" },
    { "name": "min_ArrDelay", "dataType": "INT" },
    { "name": "max_ArrDelay", "dataType": "INT" },
    { "name": "raw_hll_FlightNum", "dataType": "BYTES" },
    { "name": "raw_hllplus_FlightNum", "dataType": "BYTES" }
  ],
  "dateTimeFieldSpecs": [
    { "name": "tsMs", "dataType": "TIMESTAMP", "format": "1:MILLISECONDS:TIMESTAMP", "granularity": "1:MILLISECONDS" }
  ]
}

4. Querying the MV directly (until Companion PR lands)

Re-aggregation rules used by both the quickstart's verification step and the Companion PR's broker rewriter:

User query (against base) Equivalent MV query
SUM(ArrDelay) SUM(sum_ArrDelay)
COUNT(*) SUM(flight_count) — COUNT is a SUM-with-trivial-input under re-aggregation
MIN(ArrDelay) MIN(min_ArrDelay)
MAX(ArrDelay) MAX(max_ArrDelay)
DISTINCTCOUNTHLL(FlightNum) DISTINCTCOUNTHLL(raw_hll_FlightNum) — same function applied to the raw-sketch column merges sketches
DISTINCTCOUNTHLLPLUS(FlightNum) DISTINCTCOUNTHLLPLUS(raw_hllplus_FlightNum)

Concrete example:

-- Base query (the quickstart runs this against airlineStats)
SELECT Carrier,
       SUM(ArrDelay) AS total_delay,
       COUNT(*) AS flights
FROM airlineStats
WHERE DaysSinceEpoch < 16102
GROUP BY Carrier
ORDER BY Carrier;

-- Re-aggregated MV query (the quickstart runs this against airlineStatsMv,
-- expects identical rows once back-fill completes)
SELECT Carrier,
       SUM(sum_ArrDelay) AS total_delay,
       SUM(flight_count) AS flights
FROM airlineStatsMv
GROUP BY Carrier
ORDER BY Carrier;

MV Definition Examples

Scan MV (no aggregation):

SELECT DaysSinceEpoch, Carrier, Origin, Dest, DestCityName
FROM airlineStats

Aggregation MV:

SELECT DaysSinceEpoch, Carrier, Origin, Dest,
       SUM(ArrDelayMinutes) AS ArrDelayMinutes_sum,
       SUM(Cancelled)       AS Cancelled_sum
FROM airlineStats
GROUP BY DaysSinceEpoch, Carrier, Origin, Dest

HLL sketch MV (raw sketch storage, mergeable at query time):

SELECT DaysSinceEpoch, Origin, Dest,
       DISTINCTCOUNTRAWHLL(TailNum) AS hll_tailnum
FROM airlineStats
GROUP BY DaysSinceEpoch, Origin, Dest

DATETRUNC bucket MV (coarser MV time column than the base):

SELECT DATETRUNC('DAY', baseTimeColumn) AS dayBucket,
       Carrier,
       SUM(ArrDelay) AS sum_ArrDelay,
       COUNT(*) AS flight_count
FROM airlineStats
GROUP BY DATETRUNC('DAY', baseTimeColumn), Carrier

The MV's designated segmentsConfig.timeColumnName must be dayBucket, its unit must match bucketTimePeriod=1d, and the analyzer enforces both at create time.

Common Errors

Error message (excerpt) When How to fix
MaterializedViewTask requires 'bucketTimePeriod' bucketTimePeriod missing on task config Add "bucketTimePeriod": "1d" (or other Joda period).
LIMIT must be strictly positive LIMIT 0 or negative Remove the LIMIT or set a positive value.
LIMIT … exceeds maximum allowed … LIMIT > 100M Reduce LIMIT; if the MV genuinely produces that many rows, the bucket is mis-sized.
definedSQL must not declare OFFSET OFFSET present Remove OFFSET — MV windows are independent.
nested SELECT / subquery Nested SELECT in definedSQL Flatten — MV definitions are restricted to single-level queries.
Source table … is mutable (UPSERT/DEDUP/DIMENSION/REFRESH) Source uses upsert/dedup/dimension/REFRESH MV requires an immutable-coverage source. Use APPEND-style OFFLINE tables.
Source table … is a REALTIME table … not yet supported Source resolves to _REALTIME only Materialize from an OFFLINE base table; realtime support arrives in a follow-up PR alongside LLC-commit notify wiring.
MV definedSQL uses aggregation 'X' for which no MV-side re-aggregation is supported definedSQL uses an aggregation the rewriter (in the Companion PR) cannot re-aggregate Replace with one of SUM / MIN / MAX / COUNT / DISTINCTCOUNTRAWHLL / DISTINCTCOUNTRAWHLLPLUS / DISTINCTCOUNTRAWTHETASKETCH.
MV result saturated LIMIT A window's actual row count >= effectiveLimit Narrow bucketTimePeriod or add WHERE filters in definedSQL.
Missing effectiveLimit in task config Pre-upgrade task config picked up by upgraded executor Drop and recreate the MV table.
Cannot delete table 'X': N materialized view(s) depend on it: […]. Drop the dependent materialized views first. Operator tried to drop a base table while an MV depends on it Drop the dependent MVs first.

Operational Notes

  • Upgrade order: controller → minion. The MV consistency manager is constructed and registered before segment lifecycle listeners initialize (BaseControllerStarter), so segment events arriving as Helix comes online don't slip past the notify path.
  • Wire-format additions (backward-compatible — old readers ignore unknown fields):
    • Two new ZK paths under /CONFIGS/MATERIALIZED_VIEW/{DEFINITION,RUNTIME}.
    • PartitionInfo is a typed ZNRecord.MapField per partition; new fields can be added without breaking older readers.
  • No new overhead on clusters without MVs: PinotHelixResourceManager.notifyMaterializedViewConsistencyManager exits early via getDependentMaterializedViews(rawTableName).isEmpty() before any log emission or downstream dispatch, so every controller-side ZK segment-metadata write — including the high-volume realtime LLC-commit path — does a single ConcurrentHashMap lookup and returns.
  • Worst-case CAS sleep: under maximum contention (maxTasksPerBatch=1000), one task can spend up to ~25 seconds in jittered backoff before exhausting MAX_RUNTIME_UPDATE_ATTEMPTS=128. Size the minion thread pool to absorb this. Non-CAS exceptions (e.g. IllegalStateException from invariant checks, validation errors) propagate immediately without retry.

Testing

  • Unit tests (133 in pinot-materialized-view, 25 in pinot-minion-builtin-tasks):
    • Partition metadata serialization (PartitionFingerprintTest, PartitionInfoTest, PartitionStateTest, MaterializedViewMetadataTest)
    • MV analyzer (MaterializedViewAnalyzerTest): LIMIT / OFFSET / nested-SELECT, trailing-comment probe, time-column format inference, mutable-source rejection, realtime-source rejection
    • gRPC executor (GrpcMaterializedViewQueryExecutorTest): client cache, broker discovery, stale-client eviction
    • Task executor (MaterializedViewTaskExecutorTest): saturation gate, lineage rollback, narrow-retry boundary
    • Task scheduler (MaterializedViewTaskSchedulerTest): end-to-end LIMIT injection, cold-start metadata, batched APPEND
    • Consistency manager (MaterializedViewConsistencyManagerTest): debounce + CAS retry, no synthetic-STALE on full invalidation, no-op when no dependent MV
    • Time-expression validator (TimeExprValidatorTest): identity passthrough vs DATETRUNC rules
  • Quickstart smoke test: MaterializedViewQuickStart exercises the full ingestion pipeline end-to-end against the airlineStats fixture and asserts per-aggregation equality between base and MV queries using a numeric-tolerant comparator. Verified locally end-to-end: cluster boot 60s, 432-row materialization in ~40s, all 5 comparisons MATCH.

Modules Affected

Module Change kind
pinot-spi CommonConstants.MaterializedViewTask (new), MINION_BROKER_GRPC_CONFIG_PREFIX, 6 cluster-config-override keys
pinot-common ZKMetadataProvider helpers for MV definition / runtime znode paths
pinot-controller MaterializedViewConsistencyManager wiring in BaseControllerStarter; notifyMaterializedView* hooks in createSegmentZkMetadata / updateZkMetadata / addNewSegment / endReplaceSegments (PHRM) and SegmentDeletionManager; base-table delete blocked when MVs depend
pinot-materialized-view (NEW MODULE) Metadata model, consistency manager, analyzer, scheduler, gRPC streaming query executor, DESIGN.md
pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks MaterializedViewTaskGenerator, MaterializedViewTaskExecutor, factory, observer factory
pinot-tools MaterializedViewQuickStart + airlineStatsMv example resources

Limitations / Deferred Work

  • REALTIME source tables: rejected at create time. LLC-commit notify wiring lives in a follow-up PR.
  • Fixed-partition MVs (categorical / singleton): not implemented. The current code is time-windowed only; the interfaces have been arranged so the extension is additive — see pinot-materialized-view/DESIGN.md.
  • DDL surface: no CREATE MATERIALIZED VIEW syntax — table-config-driven only. DDL parser hooks land alongside the broker rewrite.
  • Direct queries before Companion PR: until the rewrite engine lands, the MV table is queried by name (airlineStatsMv_OFFLINE) — same as any other Pinot OFFLINE table.

Companion PR

Part 2 of this work — the broker-side query rewrite engine that transparently routes user queries against the base table onto the MV table — ships as a separate PR:

That PR adds the subsumption strategies (Exact / Scan / Aggregation), the MaterializedViewQueryRewriteEngine, the broker-side metadata cache, hybrid (MV + base) execution, the per-MV rewriteEnabled / stalenessThresholdMs gates on the MV definition, and the materializedViewQueried field on the broker response. It depends on this PR for the metadata model and the populated runtime watermarks.

Without the Companion PR, the artifacts produced by this PR are still fully functional: MVs are built and incrementally refreshed, and operators can query the MV table directly by name like any other Pinot OFFLINE table — exactly the round-trip the quickstart's comparison step verifies.

@hongkunxu hongkunxu marked this pull request as ready for review May 19, 2026 02:02
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 19, 2026

Codecov Report

❌ Patch coverage is 46.47202% with 1100 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.23%. Comparing base (5c20e71) to head (ca74920).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
...dview/scheduler/MaterializedViewTaskScheduler.java 20.93% 325 Missing and 15 partials ⚠️
...materializedview/MaterializedViewTaskExecutor.java 6.85% 323 Missing and 3 partials ⚠️
...onsistency/MaterializedViewConsistencyManager.java 65.32% 65 Missing and 21 partials ⚠️
...ew/executor/GrpcMaterializedViewQueryExecutor.java 40.51% 67 Missing and 2 partials ⚠️
...ializedview/analysis/MaterializedViewAnalyzer.java 78.13% 29 Missing and 32 partials ⚠️
...ntroller/helix/core/PinotHelixResourceManager.java 54.26% 42 Missing and 17 partials ⚠️
...lizedview/scheduler/MaterializedViewTaskUtils.java 50.79% 26 Missing and 5 partials ⚠️
...aterializedview/MaterializedViewTaskGenerator.java 0.00% 24 Missing ⚠️
...lizedview/analysis/timeexpr/TimeExprValidator.java 81.90% 6 Missing and 13 partials ⚠️
...lizedview/MaterializedViewTaskExecutorFactory.java 0.00% 17 Missing ⚠️
... and 11 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18528      +/-   ##
============================================
+ Coverage     63.72%   64.23%   +0.50%     
+ Complexity     1932     1126     -806     
============================================
  Files          3292     3309      +17     
  Lines        201470   203555    +2085     
  Branches      31316    31684     +368     
============================================
+ Hits         128396   130759    +2363     
+ Misses        62789    62301     -488     
- Partials      10285    10495     +210     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (?)
java-21 64.23% <46.47%> (+0.50%) ⬆️
temurin 64.23% <46.47%> (+0.50%) ⬆️
unittests 64.23% <46.47%> (+0.50%) ⬆️
unittests1 56.69% <0.00%> (+0.90%) ⬆️
unittests2 35.44% <46.47%> (+0.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 force-pushed the feat/sse_mv_view_creation branch 6 times, most recently from 7ab3df3 to c712f9b Compare May 19, 2026 05:37
@xiangfu0 xiangfu0 force-pushed the feat/sse_mv_view_creation branch from c712f9b to 4a1d1ae Compare May 19, 2026 05:57
@xiangfu0 xiangfu0 force-pushed the feat/sse_mv_view_creation branch 5 times, most recently from ba83f2e to 59e38bc Compare May 19, 2026 09:13
@xiangfu0 xiangfu0 requested review from Copilot and xiangfu0 May 19, 2026 09:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@xiangfu0 xiangfu0 force-pushed the feat/sse_mv_view_creation branch 3 times, most recently from 1520791 to 2a016e0 Compare May 19, 2026 10:37
@xiangfu0 xiangfu0 requested a review from Copilot May 19, 2026 12:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

PR 1 of a 2-PR split.  This change introduces the view-creation and
materialization pipeline; broker-side query rewrite ships in PR 2.

**Scope of PR 1:**
- MV table definition (analyzer, time-expression validator, ZK metadata)
- Per-partition runtime metadata (PartitionInfo, PartitionState,
  PartitionFingerprint) and storage utilities
- Controller-side consistency manager (subscribes to base-table segment
  events; debounces and CAS-marks affected MV partitions STALE)
- Controller integration (PinotHelixResourceManager / SegmentDeletionManager
  notify methods; BaseControllerStarter wires the manager before listeners
  fire)
- Minion task pipeline (generator + executor + segment lineage replace,
  saturation-LIMIT gate, partition-fingerprint CAS write to runtime znode)
- pinot-materialized-view module: analysis/, consistency/, metadata/,
  scheduler/, executor/, context/
- Constants moved from MinionConstants.MaterializedViewTask to
  CommonConstants.MaterializedViewTask (pinot-spi)
- Quickstart: airlineStatsMv example (TIMESTAMP MV column derived from
  base via DaysSinceEpoch * 86400000), with per-aggregation MV-vs-base
  result comparison
- DESIGN.md documenting the time-windowed model and the planned
  fixed-partition extension

**Out of scope (deferred to PR 2):**
- Broker query rewrite engine + subsumption strategies
  (AggregationEquivalenceRegistry and the equivalence rule classes
  land with the engine that actually applies them)
- Broker metadata cache + handler + split dispatcher
- BrokerResponse.materializedViewQueried response annotation
- BaseSingleStageBrokerRequestHandler MV integration

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Hongkun Xu <xuhongkun666@163.com>
@xiangfu0 xiangfu0 force-pushed the feat/sse_mv_view_creation branch from 2a016e0 to ca74920 Compare May 19, 2026 12:56
@xiangfu0 xiangfu0 requested a review from Copilot May 19, 2026 13:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@xiangfu0 xiangfu0 merged commit 42491d0 into apache:master May 20, 2026
20 of 23 checks passed
@xiangfu0 xiangfu0 deleted the feat/sse_mv_view_creation branch May 20, 2026 02:40
@xiangfu0
Copy link
Copy Markdown
Contributor

Docs PR opened: pinot-contrib/pinot-docs#811

pinot-contrib/pinot-docs#811

xiangfu0 added a commit to pinot-contrib/pinot-docs that referenced this pull request May 20, 2026
## What changed
- add a new Querying & SQL page for the current materialized-view
feature surface
- wire the page into the Querying & SQL landing page and SUMMARY
navigation
- document the Data Explorer and controller REST discovery surface that
landed with the MV UI work

## Source cross-checks
- validated the docs against `pinot-materialized-view`,
`MaterializedViewQuickStart`, `PinotMaterializedViewRestletResource`,
and the bundled `airlineStatsMv` example config/schema in the local
`apache/pinot` checkout
- kept the docs scoped to the shipped surface: offline MV tables,
append-only source tables, explicit `MaterializedViewTask` config,
supported aggregations, and direct MV-table querying

## Validation
- `git diff --check`
- verified the edited internal links and content refs point at existing
docs files
xiangfu0 added a commit to hongkunxu/pinot that referenced this pull request May 20, 2026
PR 2 of 2 in the SSE Materialized View series. PR 1 (apache#18528) landed the MV
view-definition, controller routing, and minion-task plumbing; this change
turns on the broker-side query rewrite that actually serves user queries from
the materialized view.

Highlights
----------
* Broker query rewrite engine with three subsumption strategies — Exact,
  Aggregation (re-aggregation via the AggregationEquivalence registry), and
  Scan — chosen at compile time from the rewrite plan.
* Two execution modes: FULL_REWRITE swaps the server query, schema, and
  table name at compile time; SPLIT_REWRITE issues dual scatter-gather (base
  side `ts >= watermarkMs`, MV side `ts < watermarkMs`) and merges DataTables
  via IdentityHashMap so the same physical server can carry both sides.
* Defense-in-depth fallbacks at both compile and execute time bump a new
  `QUERY_REWRITE_EXCEPTIONS` meter and fall back to the base-table query
  whenever the MV path raises, so an MV failure never surfaces as a 500.
* MV cache lifecycle wired through the broker's resource state model:
  invalidate on OFFLINE/DROPPED, refresh on OFFLINE→ONLINE so a cycled
  broker resource cannot leave an MV silently un-queryable.
* SPLIT_REWRITE detects the "all MV servers failed" case and refuses to
  return partial results — falls back to the base-table path instead.
* Result-column names are preserved through rewrite via implicit aliases on
  the rewritten select list, so clients reading the result by column name
  are not silently broken when the MV path kicks in.
* TIMESTAMP-only contract on the MV time column, validated at view creation,
  so the broker never has to guess time-format alignment between base and MV.
* Per-MV `rewriteEnabled` and `stalenessThresholdMs` knobs for SLO control.
* Quickstart example wires a `TIMESTAMP` time column on the airlineStatsView
  table via `definedSQL` (`daysSinceEpoch * 24 * 60 * 60 * 1000`).

Co-Authored-By: Hongkun Xu <xuhongkun666@163.com>
Co-Authored-By: Xiang Fu <xiangfu@startree.ai>
xiangfu0 added a commit to hongkunxu/pinot that referenced this pull request May 20, 2026
PR 2 of 2 in the SSE Materialized View series. PR 1 (apache#18528) landed the MV
view-definition, controller routing, and minion-task plumbing; this change
turns on the broker-side query rewrite that actually serves user queries from
the materialized view.

Highlights
----------
* Broker query rewrite engine with three subsumption strategies — Exact,
  Aggregation (re-aggregation via the AggregationEquivalence registry), and
  Scan — chosen at compile time from the rewrite plan.
* Two execution modes: FULL_REWRITE swaps the server query, schema, and
  table name at compile time; SPLIT_REWRITE issues dual scatter-gather (base
  side `ts >= watermarkMs`, MV side `ts < watermarkMs`) and merges DataTables
  via IdentityHashMap so the same physical server can carry both sides.
* Defense-in-depth fallbacks at both compile and execute time bump a new
  `QUERY_REWRITE_EXCEPTIONS` meter and fall back to the base-table query
  whenever the MV path raises, so an MV failure never surfaces as a 500.
* MV cache lifecycle wired through the broker's resource state model:
  invalidate on OFFLINE/DROPPED, refresh on OFFLINE→ONLINE so a cycled
  broker resource cannot leave an MV silently un-queryable.
* SPLIT_REWRITE detects "all MV servers failed" AND "all base servers failed"
  and refuses to return partial results — falls back to the base-table path.
* FULL_REWRITE is rejected on hybrid base tables (where a batch MV would
  otherwise silently drop realtime data) and the broker-internal
  `materializedViewRewrite` query-option marker is stripped from user input
  so it can never bypass `BrokerReduceService`'s nested-query safety net.
* Result-column names are preserved through rewrite via implicit aliases on
  the rewritten select list, so clients reading the result by column name
  are not silently broken when the MV path kicks in.
* TIMESTAMP-only contract on the MV time column, validated at view creation,
  so the broker never has to guess time-format alignment between base and MV.
* Per-MV `rewriteEnabled` and `stalenessThresholdMs` knobs for SLO control.
* `BrokerGauge.MATERIALIZED_VIEW_CACHE_ENTRY_COUNT` exposes the MV metadata
  cache size so operators can detect unbounded growth.
* Quickstart example wires a `TIMESTAMP` time column on the airlineStatsView
  table via `definedSQL` (`daysSinceEpoch * 24 * 60 * 60 * 1000`).

Co-Authored-By: Hongkun Xu <xuhongkun666@163.com>
Co-Authored-By: Xiang Fu <xiangfu@startree.ai>
xiangfu0 added a commit to hongkunxu/pinot that referenced this pull request May 20, 2026
PR 2 of 2 in the SSE Materialized View series. PR 1 (apache#18528) landed the MV
view-definition, controller routing, and minion-task plumbing; this change
turns on the broker-side query rewrite that actually serves user queries from
the materialized view.

Highlights
----------
* Broker query rewrite engine with three subsumption strategies — Exact,
  Aggregation (re-aggregation via the AggregationEquivalence registry), and
  Scan — chosen at compile time from the rewrite plan.
* Two execution modes: FULL_REWRITE swaps the server query, schema, and
  table name at compile time; SPLIT_REWRITE issues dual scatter-gather (base
  side `ts >= watermarkMs`, MV side `ts < watermarkMs`) and merges DataTables
  via IdentityHashMap so the same physical server can carry both sides.
* Defense-in-depth fallbacks at both compile and execute time bump a new
  `QUERY_REWRITE_EXCEPTIONS` meter and fall back to the base-table query
  whenever the MV path raises, so an MV failure never surfaces as a 500.
  The fallback path refreshes routing locals from the recomputed route so
  cancel + metrics + error reporting reflect the live state.
* MV cache lifecycle wired through the broker's resource state model:
  invalidate on OFFLINE/DROPPED, refresh on OFFLINE→ONLINE so a cycled
  broker resource cannot leave an MV silently un-queryable.
* SPLIT_REWRITE refuses to return partial results when *all* MV-side or
  *all* base-side servers failed — falls back to the base-table path.
* FULL_REWRITE is rejected on hybrid base tables (where a batch MV would
  otherwise silently drop realtime data) and the broker-internal
  `materializedViewRewrite` query-option marker is stripped from user input
  so it can never bypass `BrokerReduceService`'s nested-query safety net.
* Result-column names are preserved through rewrite via implicit aliases on
  the rewritten select list, so clients reading the result by column name
  are not silently broken when the MV path kicks in.
* TIMESTAMP-only contract on the MV time column, validated at view creation,
  so the broker never has to guess time-format alignment between base and MV.
* Per-MV `rewriteEnabled` and `stalenessThresholdMs` knobs for SLO control.
* `BrokerGauge.MATERIALIZED_VIEW_CACHE_ENTRY_COUNT` exposes the MV metadata
  cache size so operators can detect unbounded growth.
* Quickstart example wires a `TIMESTAMP` time column on the airlineStatsView
  table via `definedSQL` (`daysSinceEpoch * 24 * 60 * 60 * 1000`).

Co-Authored-By: Hongkun Xu <xuhongkun666@163.com>
Co-Authored-By: Xiang Fu <xiangfu@startree.ai>
xiangfu0 added a commit to hongkunxu/pinot that referenced this pull request May 20, 2026
PR 2 of 2 in the SSE Materialized View series. PR 1 (apache#18528) landed the MV
view-definition, controller routing, and minion-task plumbing; this change
turns on the broker-side query rewrite that actually serves user queries from
the materialized view.

Highlights
----------
* Broker query rewrite engine with three subsumption strategies — Exact,
  Aggregation (re-aggregation via the AggregationEquivalence registry), and
  Scan — chosen at compile time from the rewrite plan.
* Two execution modes: FULL_REWRITE swaps the server query, schema, and
  table name at compile time; SPLIT_REWRITE issues dual scatter-gather (base
  side `ts >= watermarkMs`, MV side `ts < watermarkMs`) and merges DataTables
  via IdentityHashMap so the same physical server can carry both sides.
* Defense-in-depth fallbacks at both compile and execute time bump a new
  `QUERY_REWRITE_EXCEPTIONS` meter and fall back to the base-table query
  whenever the MV path raises, so an MV failure never surfaces as a 500.
  The fallback path refreshes routing locals from the recomputed route so
  cancel + metrics + error reporting reflect the live state.
* MV cache lifecycle wired through the broker's resource state model:
  invalidate on OFFLINE/DROPPED, refresh on OFFLINE→ONLINE so a cycled
  broker resource cannot leave an MV silently un-queryable.
* SPLIT_REWRITE refuses to return partial results when *all* MV-side or
  *all* base-side servers failed — falls back to the base-table path.
* FULL_REWRITE is rejected on hybrid base tables (where a batch MV would
  otherwise silently drop realtime data) and the broker-internal
  `materializedViewRewrite` query-option marker is stripped from user input
  so it can never bypass `BrokerReduceService`'s nested-query safety net.
  Both guards have regression tests pinning the contract.
* Result-column names are preserved through rewrite via implicit aliases on
  the rewritten select list, so clients reading the result by column name
  are not silently broken when the MV path kicks in.
* TIMESTAMP-only contract on the MV time column, validated at view creation,
  so the broker never has to guess time-format alignment between base and MV.
* Per-MV `rewriteEnabled` and `stalenessThresholdMs` knobs for SLO control.
* `BrokerGauge.MATERIALIZED_VIEW_CACHE_ENTRY_COUNT` exposes the MV metadata
  cache size so operators can detect unbounded growth.
* Integration test polls real cache observables (not fixed sleeps) so the
  test runs as fast as the cluster's actual ZK propagation allows.
* Quickstart example wires a `TIMESTAMP` time column on the airlineStatsView
  table via `definedSQL` (`daysSinceEpoch * 24 * 60 * 60 * 1000`).

Co-Authored-By: Hongkun Xu <xuhongkun666@163.com>
Co-Authored-By: Xiang Fu <xiangfu@startree.ai>
xiangfu0 added a commit to hongkunxu/pinot that referenced this pull request May 20, 2026
PR 2 of 2 in the SSE Materialized View series. PR 1 (apache#18528) landed the MV
view-definition, controller routing, and minion-task plumbing; this change
turns on the broker-side query rewrite that actually serves user queries from
the materialized view.

Highlights
----------
* Broker query rewrite engine with three subsumption strategies — Exact,
  Aggregation (re-aggregation via the AggregationEquivalence registry), and
  Scan — chosen at compile time from the rewrite plan.
* Two execution modes: FULL_REWRITE swaps the server query, schema, and
  table name at compile time; SPLIT_REWRITE issues dual scatter-gather (base
  side `ts >= watermarkMs`, MV side `ts < watermarkMs`) and merges DataTables
  via IdentityHashMap so the same physical server can carry both sides.
* Defense-in-depth fallbacks at both compile and execute time bump a new
  `QUERY_REWRITE_EXCEPTIONS` meter and fall back to the base-table query
  whenever the MV path raises, so an MV failure never surfaces as a 500.
  The fallback path refreshes routing locals from the recomputed route so
  cancel + metrics + error reporting reflect the live state.
* MV cache lifecycle wired through the broker's resource state model:
  invalidate on OFFLINE/DROPPED, refresh on OFFLINE→ONLINE so a cycled
  broker resource cannot leave an MV silently un-queryable.
* SPLIT_REWRITE refuses to return partial results when *all* MV-side or
  *all* base-side servers failed — falls back to the base-table path.
* FULL_REWRITE is rejected on hybrid base tables (where a batch MV would
  otherwise silently drop realtime data) and the broker-internal
  `materializedViewRewrite` query-option marker is stripped from user input
  inside `doHandleRequest` so it can never bypass `BrokerReduceService`'s
  nested-query safety net — even on the IN_SUBQUERY recursion path.
* Result-column names are preserved through rewrite via implicit aliases on
  the rewritten select list, so clients reading the result by column name
  are not silently broken when the MV path kicks in.
* TIMESTAMP-only contract on the MV time column, validated at view creation,
  so the broker never has to guess time-format alignment between base and MV.
* Per-MV `rewriteEnabled` and `stalenessThresholdMs` knobs for SLO control.
* `BrokerGauge.MATERIALIZED_VIEW_CACHE_ENTRY_COUNT` exposes the MV metadata
  cache size so operators can detect unbounded growth.
* Integration test polls real cache observables (not fixed sleeps) so the
  test runs as fast as the cluster's actual ZK propagation allows.
* Quickstart example wires a `TIMESTAMP` time column on the airlineStatsView
  table via `definedSQL` (`daysSinceEpoch * 24 * 60 * 60 * 1000`).

Co-Authored-By: Hongkun Xu <xuhongkun666@163.com>
Co-Authored-By: Xiang Fu <xiangfu@startree.ai>
xiangfu0 added a commit to hongkunxu/pinot that referenced this pull request May 21, 2026
PR 2 of 2 in the SSE Materialized View series. PR 1 (apache#18528) landed the MV
view-definition, controller routing, and minion-task plumbing; this change
turns on the broker-side query rewrite that actually serves user queries from
the materialized view.

Highlights
----------
* Broker query rewrite engine with three subsumption strategies — Exact,
  Aggregation (re-aggregation via the AggregationEquivalence registry), and
  Scan — chosen at compile time from the rewrite plan.
* Two execution modes: FULL_REWRITE swaps the server query, schema, and
  table name at compile time; SPLIT_REWRITE issues dual scatter-gather (base
  side `ts >= watermarkMs`, MV side `ts < watermarkMs`) and merges DataTables
  via IdentityHashMap so the same physical server can carry both sides.
* Defense-in-depth fallbacks at both compile and execute time bump a new
  `QUERY_REWRITE_EXCEPTIONS` meter and fall back to the base-table query
  whenever the MV path raises, so an MV failure never surfaces as a 500.
  The fallback path refreshes routing locals from the recomputed route so
  cancel + metrics + error reporting reflect the live state.
* MV cache lifecycle wired through the broker's resource state model:
  invalidate on OFFLINE/DROPPED, refresh on OFFLINE→ONLINE for BOTH MV-table
  cycles (direct rehydrate) and base-table cycles (walk MV defs and reload
  any whose baseTables reference the transitioning table) so a transient
  broker-resource bounce never silently disables MV rewrite.
* `annotateResponse` stamps `materializedViewQueried` only when a swap was
  actually committed (`isFullRewrite() || isSplitRewrite()`) — no
  false-positives on `fromRewriteResult` (skip) paths.
* SPLIT_REWRITE refuses to return partial results when *all* MV-side or
  *all* base-side servers failed — falls back to the base-table path.
* FULL_REWRITE is rejected on hybrid base tables (where a batch MV would
  otherwise silently drop realtime data) and the broker-internal
  `materializedViewRewrite` query-option marker is stripped from user input
  inside `doHandleRequest` so it can never bypass `BrokerReduceService`'s
  nested-query safety net — even on the IN_SUBQUERY recursion path.
* Result-column names are preserved through rewrite via implicit aliases on
  the rewritten select list, so clients reading the result by column name
  are not silently broken when the MV path kicks in.
* TIMESTAMP-only contract on the MV time column, validated at view creation,
  so the broker never has to guess time-format alignment between base and MV.
* Per-MV `rewriteEnabled` and `stalenessThresholdMs` knobs for SLO control.
* `BrokerGauge.MATERIALIZED_VIEW_CACHE_ENTRY_COUNT` exposes the MV metadata
  cache size so operators can detect unbounded growth.
* Integration test polls real cache observables (not fixed sleeps) so the
  test runs as fast as the cluster's actual ZK propagation allows.
* Quickstart example wires a `TIMESTAMP` time column on the airlineStatsView
  table via `definedSQL` (`daysSinceEpoch * 24 * 60 * 60 * 1000`).

Co-Authored-By: Hongkun Xu <xuhongkun666@163.com>
Co-Authored-By: Xiang Fu <xiangfu@startree.ai>
xiangfu0 added a commit to hongkunxu/pinot that referenced this pull request May 21, 2026
PR 2 of 2 in the SSE Materialized View series. PR 1 (apache#18528) landed the MV
view-definition, controller routing, and minion-task plumbing; this change
turns on the broker-side query rewrite that actually serves user queries from
the materialized view.

Highlights
----------
* Broker query rewrite engine with three subsumption strategies — Exact,
  Aggregation (re-aggregation via the AggregationEquivalence registry), and
  Scan — chosen at compile time from the rewrite plan.
* Two execution modes: FULL_REWRITE swaps the server query, schema, and
  table name at compile time; SPLIT_REWRITE issues dual scatter-gather (base
  side `ts >= watermarkMs`, MV side `ts < watermarkMs`) and merges DataTables
  via IdentityHashMap so the same physical server can carry both sides.
* Defense-in-depth fallbacks at both compile and execute time bump a new
  `QUERY_REWRITE_EXCEPTIONS` meter and fall back to the base-table query
  whenever the MV path raises, so an MV failure never surfaces as a 500.
  The fallback path refreshes routing locals from the recomputed route so
  cancel + metrics + error reporting reflect the live state.
* MV cache lifecycle wired through the broker's resource state model:
  invalidate on OFFLINE/DROPPED, refresh on OFFLINE→ONLINE for BOTH MV-table
  cycles (direct rehydrate) and base-table cycles (walk MV defs and reload
  any whose baseTables reference the transitioning table) so a transient
  broker-resource bounce never silently disables MV rewrite.
* `annotateResponse` stamps `materializedViewQueried` only when a swap was
  actually committed (`isFullRewrite() || isSplitRewrite()`) — no
  false-positives on `fromRewriteResult` (skip) paths.
* SPLIT_REWRITE refuses to return partial results when *all* MV-side or
  *all* base-side servers failed — falls back to the base-table path.
* FULL_REWRITE is rejected on hybrid base tables (where a batch MV would
  otherwise silently drop realtime data) and the broker-internal
  `materializedViewRewrite` query-option marker is stripped from user input
  inside `doHandleRequest` so it can never bypass `BrokerReduceService`'s
  nested-query safety net — even on the IN_SUBQUERY recursion path.
* Result-column names are preserved through rewrite via implicit aliases on
  the rewritten select list, so clients reading the result by column name
  are not silently broken when the MV path kicks in.
* TIMESTAMP-only contract on the MV time column, validated at view creation,
  so the broker never has to guess time-format alignment between base and MV.
* Per-MV `rewriteEnabled` and `stalenessThresholdMs` knobs for SLO control.
* `BrokerGauge.MATERIALIZED_VIEW_CACHE_ENTRY_COUNT` exposes the MV metadata
  cache size so operators can detect unbounded growth.
* Integration test polls real cache observables (not fixed sleeps) so the
  test runs as fast as the cluster's actual ZK propagation allows.
* Quickstart example wires a `TIMESTAMP` time column on the airlineStatsView
  table via `definedSQL` (`daysSinceEpoch * 24 * 60 * 60 * 1000`).

Co-Authored-By: Hongkun Xu <xuhongkun666@163.com>
Co-Authored-By: Xiang Fu <xiangfu@startree.ai>
xiangfu0 added a commit to hongkunxu/pinot that referenced this pull request May 21, 2026
PR 2 of 2 in the SSE Materialized View series. PR 1 (apache#18528) landed the MV
view-definition, controller routing, and minion-task plumbing; this change
turns on the broker-side query rewrite that actually serves user queries from
the materialized view.

Highlights
----------
* Broker query rewrite engine with three subsumption strategies — Exact,
  Aggregation (re-aggregation via the AggregationEquivalence registry), and
  Scan — chosen at compile time from the rewrite plan.
* Two execution modes: FULL_REWRITE swaps the server query, schema, and
  table name at compile time; SPLIT_REWRITE issues dual scatter-gather (base
  side `ts >= watermarkMs`, MV side `ts < watermarkMs`) and merges DataTables
  via IdentityHashMap so the same physical server can carry both sides.
* Defense-in-depth fallbacks at both compile and execute time bump a new
  `QUERY_REWRITE_EXCEPTIONS` meter and fall back to the base-table query
  whenever the MV path raises, so an MV failure never surfaces as a 500.
  The fallback path refreshes routing locals from the recomputed route so
  cancel + metrics + error reporting reflect the live state.
* MV cache lifecycle wired through the broker's resource state model:
  invalidate on OFFLINE/DROPPED, refresh on OFFLINE→ONLINE for BOTH MV-table
  cycles (direct rehydrate) and base-table cycles (walk MV defs and reload
  any whose baseTables reference the transitioning table) so a transient
  broker-resource bounce never silently disables MV rewrite.
* `annotateResponse` stamps `materializedViewQueried` only when a swap was
  actually committed (`isFullRewrite() || isSplitRewrite()`) — no
  false-positives on `fromRewriteResult` (skip) paths.
* SPLIT_REWRITE refuses to return partial results when *all* MV-side or
  *all* base-side servers failed — falls back to the base-table path.
* FULL_REWRITE is rejected on hybrid base tables (where a batch MV would
  otherwise silently drop realtime data) and the broker-internal
  `materializedViewRewrite` query-option marker is stripped from user input
  inside `doHandleRequest` so it can never bypass `BrokerReduceService`'s
  nested-query safety net — even on the IN_SUBQUERY recursion path.
* Result-column names are preserved through rewrite via implicit aliases on
  the rewritten select list, so clients reading the result by column name
  are not silently broken when the MV path kicks in.
* TIMESTAMP-only contract on the MV time column, validated at view creation,
  so the broker never has to guess time-format alignment between base and MV.
* Per-MV `rewriteEnabled` and `stalenessThresholdMs` knobs for SLO control.
* `BrokerGauge.MATERIALIZED_VIEW_CACHE_ENTRY_COUNT` exposes the MV metadata
  cache size so operators can detect unbounded growth.
* Integration test polls real cache observables (not fixed sleeps) so the
  test runs as fast as the cluster's actual ZK propagation allows.
* Quickstart example wires a `TIMESTAMP` time column on the airlineStatsView
  table via `definedSQL` (`daysSinceEpoch * 24 * 60 * 60 * 1000`).

Co-Authored-By: Hongkun Xu <xuhongkun666@163.com>
Co-Authored-By: Xiang Fu <xiangfu@startree.ai>
xiangfu0 added a commit to hongkunxu/pinot that referenced this pull request May 21, 2026
PR 2 of 2 in the SSE Materialized View series. PR 1 (apache#18528) landed the MV
view-definition, controller routing, and minion-task plumbing; this change
turns on the broker-side query rewrite that actually serves user queries from
the materialized view.

Highlights
----------
* Broker query rewrite engine with three subsumption strategies — Exact,
  Aggregation (re-aggregation via the AggregationEquivalence registry), and
  Scan — chosen at compile time from the rewrite plan.
* Two execution modes: FULL_REWRITE swaps the server query, schema, and
  table name at compile time; SPLIT_REWRITE issues dual scatter-gather (base
  side `ts >= watermarkMs`, MV side `ts < watermarkMs`) and merges DataTables
  via IdentityHashMap so the same physical server can carry both sides.
* Defense-in-depth fallbacks at both compile and execute time bump a new
  `QUERY_REWRITE_EXCEPTIONS` meter and fall back to the base-table query
  whenever the MV path raises, so an MV failure never surfaces as a 500.
  The fallback path refreshes routing locals from the recomputed route so
  cancel + metrics + error reporting reflect the live state.
* MV cache lifecycle wired through the broker's resource state model:
  invalidate on OFFLINE/DROPPED, refresh on OFFLINE→ONLINE for BOTH MV-table
  cycles (direct rehydrate) and base-table cycles (walk MV defs and reload
  any whose baseTables reference the transitioning table) so a transient
  broker-resource bounce never silently disables MV rewrite.
* `annotateResponse` stamps `materializedViewQueried` only when a swap was
  actually committed (`isFullRewrite() || isSplitRewrite()`) — no
  false-positives on `fromRewriteResult` (skip) paths.
* SPLIT_REWRITE refuses to return partial results when *all* MV-side or
  *all* base-side servers failed — falls back to the base-table path.
* FULL_REWRITE is rejected on hybrid base tables (where a batch MV would
  otherwise silently drop realtime data) and the broker-internal
  `materializedViewRewrite` query-option marker is stripped from user input
  inside `doHandleRequest` so it can never bypass `BrokerReduceService`'s
  nested-query safety net — even on the IN_SUBQUERY recursion path.
* Result-column names are preserved through rewrite via implicit aliases on
  the rewritten select list, so clients reading the result by column name
  are not silently broken when the MV path kicks in.
* TIMESTAMP-only contract on the MV time column, validated at view creation,
  so the broker never has to guess time-format alignment between base and MV.
* Per-MV `rewriteEnabled` and `stalenessThresholdMs` knobs for SLO control.
* `BrokerGauge.MATERIALIZED_VIEW_CACHE_ENTRY_COUNT` exposes the MV metadata
  cache size so operators can detect unbounded growth.
* Integration test polls real cache observables (not fixed sleeps) so the
  test runs as fast as the cluster's actual ZK propagation allows.
* Quickstart example wires a `TIMESTAMP` time column on the airlineStatsView
  table via `definedSQL` (`daysSinceEpoch * 24 * 60 * 60 * 1000`).

Co-Authored-By: Hongkun Xu <xuhongkun666@163.com>
Co-Authored-By: Xiang Fu <xiangfu@startree.ai>
xiangfu0 added a commit to hongkunxu/pinot that referenced this pull request May 21, 2026
PR 2 of 2 in the SSE Materialized View series. PR 1 (apache#18528) landed the MV
view-definition, controller routing, and minion-task plumbing; this change
turns on the broker-side query rewrite that actually serves user queries from
the materialized view.

Highlights
----------
* Broker query rewrite engine with three subsumption strategies — Exact,
  Aggregation (re-aggregation via the AggregationEquivalence registry), and
  Scan — chosen at compile time from the rewrite plan.
* Two execution modes: FULL_REWRITE swaps the server query, schema, and
  table name at compile time; SPLIT_REWRITE issues dual scatter-gather (base
  side `ts >= watermarkMs`, MV side `ts < watermarkMs`) and merges DataTables
  via IdentityHashMap so the same physical server can carry both sides.
* Defense-in-depth fallbacks at both compile and execute time bump a new
  `QUERY_REWRITE_EXCEPTIONS` meter and fall back to the base-table query
  whenever the MV path raises, so an MV failure never surfaces as a 500.
  The fallback path refreshes routing locals from the recomputed route so
  cancel + metrics + error reporting reflect the live state.
* MV cache lifecycle wired through the broker's resource state model:
  invalidate on OFFLINE/DROPPED, refresh on OFFLINE→ONLINE for BOTH MV-table
  cycles (direct rehydrate) and base-table cycles (walk MV defs and reload
  any whose baseTables reference the transitioning table) so a transient
  broker-resource bounce never silently disables MV rewrite.
* `annotateResponse` stamps `materializedViewQueried` only when a swap was
  actually committed (`isFullRewrite() || isSplitRewrite()`) — no
  false-positives on `fromRewriteResult` (skip) paths.
* SPLIT_REWRITE refuses to return partial results when *all* MV-side or
  *all* base-side servers failed — falls back to the base-table path.
* FULL_REWRITE is rejected on hybrid base tables (where a batch MV would
  otherwise silently drop realtime data) and the broker-internal
  `materializedViewRewrite` query-option marker is stripped from user input
  inside `doHandleRequest` so it can never bypass `BrokerReduceService`'s
  nested-query safety net — even on the IN_SUBQUERY recursion path.
* Result-column names are preserved through rewrite via implicit aliases on
  the rewritten select list, so clients reading the result by column name
  are not silently broken when the MV path kicks in.
* TIMESTAMP-only contract on the MV time column, validated at view creation,
  so the broker never has to guess time-format alignment between base and MV.
* Per-MV `rewriteEnabled` and `stalenessThresholdMs` knobs for SLO control.
* `BrokerGauge.MATERIALIZED_VIEW_CACHE_ENTRY_COUNT` exposes the MV metadata
  cache size so operators can detect unbounded growth.
* Integration test polls real cache observables (not fixed sleeps) so the
  test runs as fast as the cluster's actual ZK propagation allows.
* Quickstart example wires a `TIMESTAMP` time column on the airlineStatsView
  table via `definedSQL` (`daysSinceEpoch * 24 * 60 * 60 * 1000`).

Co-Authored-By: Hongkun Xu <xuhongkun666@163.com>
Co-Authored-By: Xiang Fu <xiangfu@startree.ai>
xiangfu0 added a commit to hongkunxu/pinot that referenced this pull request May 21, 2026
PR 2 of 2 in the SSE Materialized View series. PR 1 (apache#18528) landed the MV
view-definition, controller routing, and minion-task plumbing; this change
turns on the broker-side query rewrite that actually serves user queries from
the materialized view.

Highlights
----------
* Broker query rewrite engine with three subsumption strategies — Exact,
  Aggregation (re-aggregation via the AggregationEquivalence registry), and
  Scan — chosen at compile time from the rewrite plan.
* Two execution modes: FULL_REWRITE swaps the server query, schema, and
  table name at compile time; SPLIT_REWRITE issues dual scatter-gather (base
  side `ts >= watermarkMs`, MV side `ts < watermarkMs`) and merges DataTables
  via IdentityHashMap so the same physical server can carry both sides.
* Defense-in-depth fallbacks at both compile and execute time bump a new
  `QUERY_REWRITE_EXCEPTIONS` meter and fall back to the base-table query
  whenever the MV path raises, so an MV failure never surfaces as a 500.
  The fallback path refreshes routing locals from the recomputed route so
  cancel + metrics + error reporting reflect the live state.
* MV cache lifecycle wired through the broker's resource state model:
  invalidate on OFFLINE/DROPPED, refresh on OFFLINE→ONLINE for BOTH MV-table
  cycles (direct rehydrate) and base-table cycles (walk MV defs and reload
  any whose baseTables reference the transitioning table) so a transient
  broker-resource bounce never silently disables MV rewrite.
* `annotateResponse` stamps `materializedViewQueried` only when a swap was
  actually committed (`isFullRewrite() || isSplitRewrite()`) — no
  false-positives on `fromRewriteResult` (skip) paths.
* SPLIT_REWRITE refuses to return partial results when *all* MV-side or
  *all* base-side servers failed — falls back to the base-table path.
* FULL_REWRITE is rejected on hybrid base tables (where a batch MV would
  otherwise silently drop realtime data) and the broker-internal
  `materializedViewRewrite` query-option marker is stripped from user input
  inside `doHandleRequest` so it can never bypass `BrokerReduceService`'s
  nested-query safety net — even on the IN_SUBQUERY recursion path.
* `BaseSingleStageBrokerRequestHandler` keeps MV-specific logic in dedicated
  helper methods (`applyMaterializedViewRewriteAtCompile`,
  `tryExecuteMaterializedViewSplit`) so the non-MV call sites in
  `compileRequest` / `doHandleRequest` stay short — a non-MV deployment
  reads the existing flow with only a small guarded MV branch added.
* Result-column names are preserved through rewrite via implicit aliases on
  the rewritten select list, so clients reading the result by column name
  are not silently broken when the MV path kicks in.
* TIMESTAMP-only contract on the MV time column, validated at view creation,
  so the broker never has to guess time-format alignment between base and MV.
* Per-MV `rewriteEnabled` and `stalenessThresholdMs` knobs for SLO control.
* `BrokerGauge.MATERIALIZED_VIEW_CACHE_ENTRY_COUNT` exposes the MV metadata
  cache size so operators can detect unbounded growth.
* Integration test polls real cache observables (not fixed sleeps) so the
  test runs as fast as the cluster's actual ZK propagation allows.
* Quickstart example wires a `TIMESTAMP` time column on the airlineStatsView
  table via `definedSQL` (`daysSinceEpoch * 24 * 60 * 60 * 1000`).

Co-Authored-By: Hongkun Xu <xuhongkun666@163.com>
Co-Authored-By: Xiang Fu <xiangfu@startree.ai>
xiangfu0 added a commit to hongkunxu/pinot that referenced this pull request May 21, 2026
PR 2 of 2 in the SSE Materialized View series. PR 1 (apache#18528) landed the MV
view-definition, controller routing, and minion-task plumbing; this change
turns on the broker-side query rewrite that actually serves user queries from
the materialized view.

Highlights
----------
* Broker query rewrite engine with three subsumption strategies — Exact,
  Aggregation (re-aggregation via the AggregationEquivalence registry), and
  Scan — chosen at compile time from the rewrite plan.
* Two execution modes: FULL_REWRITE swaps the server query, schema, and
  table name at compile time; SPLIT_REWRITE issues dual scatter-gather (base
  side `ts >= watermarkMs`, MV side `ts < watermarkMs`) and merges DataTables
  via IdentityHashMap so the same physical server can carry both sides.
* Defense-in-depth fallbacks at both compile and execute time bump a new
  `QUERY_REWRITE_EXCEPTIONS` meter and fall back to the base-table query
  whenever the MV path raises, so an MV failure never surfaces as a 500.
  The fallback path refreshes routing locals from the recomputed route so
  cancel + metrics + error reporting reflect the live state.
* MV cache lifecycle wired through the broker's resource state model:
  invalidate on OFFLINE/DROPPED, refresh on OFFLINE→ONLINE for BOTH MV-table
  cycles (direct rehydrate) and base-table cycles (walk MV defs and reload
  any whose baseTables reference the transitioning table) so a transient
  broker-resource bounce never silently disables MV rewrite.
* `annotateResponse` stamps `materializedViewQueried` only when a swap was
  actually committed (`isFullRewrite() || isSplitRewrite()`) — no
  false-positives on `fromRewriteResult` (skip) paths.
* SPLIT_REWRITE refuses to return partial results when *all* MV-side or
  *all* base-side servers failed — falls back to the base-table path.
* FULL_REWRITE is rejected on hybrid base tables (where a batch MV would
  otherwise silently drop realtime data) and the broker-internal
  `materializedViewRewrite` query-option marker is stripped from user input
  inside `doHandleRequest` so it can never bypass `BrokerReduceService`'s
  nested-query safety net — even on the IN_SUBQUERY recursion path.
* `BaseSingleStageBrokerRequestHandler` keeps MV-specific logic in dedicated
  helper methods (`applyMaterializedViewRewriteAtCompile`,
  `tryExecuteMaterializedViewSplit`) so the non-MV call sites in
  `compileRequest` / `doHandleRequest` stay short — a non-MV deployment
  reads the existing flow with only a small guarded MV branch added.
* Result-column names are preserved through rewrite via implicit aliases on
  the rewritten select list, so clients reading the result by column name
  are not silently broken when the MV path kicks in.
* TIMESTAMP-only contract on the MV time column, validated at view creation,
  so the broker never has to guess time-format alignment between base and MV.
* Per-MV `rewriteEnabled` and `stalenessThresholdMs` knobs for SLO control.
* `BrokerGauge.MATERIALIZED_VIEW_CACHE_ENTRY_COUNT` exposes the MV metadata
  cache size so operators can detect unbounded growth.
* Integration test polls real cache observables (not fixed sleeps) so the
  test runs as fast as the cluster's actual ZK propagation allows.
* Quickstart example wires a `TIMESTAMP` time column on the airlineStatsView
  table via `definedSQL` (`daysSinceEpoch * 24 * 60 * 60 * 1000`).

Co-Authored-By: Hongkun Xu <xuhongkun666@163.com>
Co-Authored-By: Xiang Fu <xiangfu@startree.ai>
xiangfu0 added a commit to hongkunxu/pinot that referenced this pull request May 21, 2026
PR 2 of 2 in the SSE Materialized View series. PR 1 (apache#18528) landed the MV
view-definition, controller routing, and minion-task plumbing; this change
turns on the broker-side query rewrite that actually serves user queries from
the materialized view.

Highlights
----------
* Broker query rewrite engine with three subsumption strategies — Exact,
  Aggregation (re-aggregation via the AggregationEquivalence registry), and
  Scan — chosen at compile time from the rewrite plan.
* Two execution modes: FULL_REWRITE swaps the server query, schema, and
  table name at compile time; SPLIT_REWRITE issues dual scatter-gather (base
  side `ts >= watermarkMs`, MV side `ts < watermarkMs`) and merges DataTables
  via IdentityHashMap so the same physical server can carry both sides.
* Defense-in-depth fallbacks at both compile and execute time bump a new
  `QUERY_REWRITE_EXCEPTIONS` meter and fall back to the base-table query
  whenever the MV path raises, so an MV failure never surfaces as a 500.
* MV cache lifecycle wired through the broker's resource state model:
  invalidate on OFFLINE/DROPPED, refresh on OFFLINE→ONLINE for both
  MV-table and base-table cycles so a transient broker-resource bounce
  never silently disables MV rewrite.  `MaterializedViewHandler.close()`
  is invoked from broker shutdown so a hot-reload or test teardown does
  not leak Helix watcher slots.
* `annotateResponse` stamps `materializedViewQueried` only when a swap was
  actually committed — no false-positives on `fromRewriteResult` paths.
* SPLIT_REWRITE refuses to return partial results when *all* MV-side or
  *all* base-side servers failed.
* FULL_REWRITE is rejected on hybrid base tables (where a batch MV would
  otherwise silently drop realtime data) and the broker-internal
  `materializedViewRewrite` query-option marker is stripped from user input
  inside `doHandleRequest` so it can never bypass `BrokerReduceService`'s
  nested-query safety net.
* `BaseSingleStageBrokerRequestHandler` keeps MV-specific logic in dedicated
  helpers (`applyMaterializedViewRewriteAtCompile`,
  `tryExecuteMaterializedViewSplit`) so the non-MV call sites stay short.
* Result-column names are preserved through rewrite via implicit aliases on
  the rewritten select list.
* TIMESTAMP-only contract on the MV time column, validated at view creation.
* Per-MV `rewriteEnabled` and `stalenessThresholdMs` knobs for SLO control.
* `BrokerGauge.MATERIALIZED_VIEW_CACHE_ENTRY_COUNT` exposes the MV metadata
  cache size so operators can detect unbounded growth.
* Integration test polls real cache observables (not fixed sleeps).
* Quickstart example wires a `TIMESTAMP` time column on the airlineStatsView
  table via `definedSQL` (`daysSinceEpoch * 24 * 60 * 60 * 1000`).

Co-Authored-By: Hongkun Xu <xuhongkun666@163.com>
Co-Authored-By: Xiang Fu <xiangfu@startree.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants