Skip to content

Support consuming tier overrides for realtime consuming segments#18479

Closed
xiangfu0 wants to merge 1 commit into
apache:masterfrom
xiangfu0:codex/consuming-segment-index-profile
Closed

Support consuming tier overrides for realtime consuming segments#18479
xiangfu0 wants to merge 1 commit into
apache:masterfrom
xiangfu0:codex/consuming-segment-index-profile

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

Summary

  • Reuse existing tierOverwrites support for mutable realtime consuming segments by applying tierOverwrites.consuming when building RealtimeSegmentConfig.
  • Keep committed/immutable segment generation and real storage-tier loading on the persisted table config or the actual segment tier.
  • Treat consuming as a synthetic mutable-consuming tier only when the table does not already define a real storage tier with that name, preserving existing storage-tier behavior.
  • Validate the effective consuming view and reject unsupported tableIndexConfig.tierOverwrites.consuming keys that do not flow through mutable index loading.

User Manual

Configure the committed segment shape as usual, then add tierOverwrites.consuming where the mutable consuming segment should differ.

Example: keep userId RAW and without an inverted index after commit, but use dictionary encoding plus an inverted index while the segment is consuming:

{
  "tableIndexConfig": {
    "noDictionaryColumns": ["userId"],
    "tierOverwrites": {
      "consuming": {
        "noDictionaryColumns": []
      }
    }
  },
  "fieldConfigList": [
    {
      "name": "userId",
      "encodingType": "RAW",
      "tierOverwrites": {
        "consuming": {
          "encodingType": "DICTIONARY",
          "indexes": {
            "inverted": {
              "enabled": true
            }
          }
        }
      }
    }
  ]
}

Query example that can benefit while rows are still in consuming segments:

SELECT COUNT(*)
FROM userEvents
WHERE userId = 'u123';

Notes:

  • You normally do not need a tierConfigs entry named consuming for this feature.
  • If a table already uses consuming as a real storage tier name, Pinot keeps existing storage-tier semantics and does not treat tierOverwrites.consuming as the synthetic mutable-consuming override for that table.
  • If the persisted column is listed in tableIndexConfig.noDictionaryColumns or noDictionaryConfig, clear that setting under tableIndexConfig.tierOverwrites.consuming so the consuming view can enable dictionary.
  • tableIndexConfig.tierOverwrites.consuming is limited to index-loading settings such as dictionary, inverted, range, JSON, Bloom filter, and dictionary optimization options. Row-shape or ingestion settings such as aggregateMetrics and segmentPartitionConfig are rejected for the synthetic consuming tier.
  • Real storage-tier overrides are unchanged and still apply through actual immutable segment tiers.

A complete sample config and walkthrough are included under pinot-tools/src/main/resources/examples/stream/consumingSegmentTierOverride/.

Validation

  • ./mvnw -pl pinot-segment-local -am -Dtest=TableConfigConsumingSegmentTierOverrideTest,TableConfigUtilsTest,IndexLoadingConfigTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -pl pinot-common -am -Dtest=TableConfigSerDeUtilsTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -pl pinot-integration-tests -am -Dskip.npm=true -Dtest=ConsumingSegmentTierOverrideRealtimeTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw spotless:apply -pl pinot-common,pinot-core,pinot-segment-local,pinot-spi,pinot-integration-tests,pinot-tools
  • ./mvnw license:format -pl pinot-common,pinot-core,pinot-segment-local,pinot-spi,pinot-integration-tests,pinot-tools
  • ./mvnw checkstyle:check -pl pinot-common,pinot-core,pinot-segment-local,pinot-spi,pinot-integration-tests,pinot-tools
  • ./mvnw license:check -pl pinot-common,pinot-core,pinot-segment-local,pinot-spi,pinot-integration-tests,pinot-tools
  • git diff --cached --check

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 12, 2026

Codecov Report

❌ Patch coverage is 65.33333% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.69%. Comparing base (7fe517a) to head (ede179e).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
...he/pinot/segment/local/utils/TableConfigUtils.java 66.66% 11 Missing and 11 partials ⚠️
...ealtime/writer/StatelessRealtimeSegmentWriter.java 0.00% 3 Missing ⚠️
...local/segment/index/loader/IndexLoadingConfig.java 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18479      +/-   ##
============================================
+ Coverage     63.68%   63.69%   +0.01%     
  Complexity     1684     1684              
============================================
  Files          3262     3262              
  Lines        199826   199907      +81     
  Branches      31031    31050      +19     
============================================
+ Hits         127264   127340      +76     
+ Misses        62414    62404      -10     
- Partials      10148    10163      +15     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 63.69% <65.33%> (+0.01%) ⬆️
temurin 63.69% <65.33%> (+0.01%) ⬆️
unittests 63.69% <65.33%> (+0.01%) ⬆️
unittests1 55.76% <18.66%> (+<0.01%) ⬆️
unittests2 34.98% <65.33%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants