Skip to content

Support consuming tier overrides for realtime consuming segments#18474

Closed
xiangfu0 wants to merge 1 commit into
apache:masterfrom
xiangfu0:codex/consuming-segment-index-profile
Closed

Support consuming tier overrides for realtime consuming segments#18474
xiangfu0 wants to merge 1 commit into
apache:masterfrom
xiangfu0:codex/consuming-segment-index-profile

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented May 12, 2026

Summary

  • Reuse existing tierOverwrites support for mutable realtime consuming segments by applying tierOverwrites.consuming when building RealtimeSegmentConfig.
  • Keep committed/immutable segment generation and real storage-tier loading on the persisted table config or the actual segment tier.
  • Treat consuming as a synthetic mutable-consuming tier only when the table does not already define a real storage tier with that name, preserving existing storage-tier behavior.
  • Validate the effective consuming view and reject unsupported tableIndexConfig.tierOverwrites.consuming keys that do not flow through mutable index loading.

User Manual

Configure the committed segment shape as usual, then add tierOverwrites.consuming where the mutable consuming segment should differ.

Example: keep userId RAW and without an inverted index after commit, but use dictionary encoding plus an inverted index while the segment is consuming:

{
  "tableIndexConfig": {
    "noDictionaryColumns": ["userId"],
    "tierOverwrites": {
      "consuming": {
        "noDictionaryColumns": []
      }
    }
  },
  "fieldConfigList": [
    {
      "name": "userId",
      "encodingType": "RAW",
      "tierOverwrites": {
        "consuming": {
          "encodingType": "DICTIONARY",
          "indexes": {
            "inverted": {
              "enabled": true
            }
          }
        }
      }
    }
  ]
}

Query example that can benefit while rows are still in consuming segments:

SELECT COUNT(*)
FROM userEvents
WHERE userId = 'u123';

Notes:

  • You normally do not need a tierConfigs entry named consuming for this feature.
  • If a table already uses consuming as a real storage tier name, Pinot keeps existing storage-tier semantics and does not treat tierOverwrites.consuming as the synthetic mutable-consuming override for that table.
  • If the persisted column is listed in tableIndexConfig.noDictionaryColumns or noDictionaryConfig, clear that setting under tableIndexConfig.tierOverwrites.consuming so the consuming view can enable dictionary.
  • tableIndexConfig.tierOverwrites.consuming is limited to index-loading settings such as dictionary, inverted, range, JSON, Bloom filter, and dictionary optimization options. Row-shape or ingestion settings such as aggregateMetrics and segmentPartitionConfig are rejected for the synthetic consuming tier.
  • Real storage-tier overrides are unchanged and still apply through actual immutable segment tiers.

A complete sample config and walkthrough are included under pinot-tools/src/main/resources/examples/stream/consumingSegmentTierOverride/.

Validation

  • ./mvnw -pl pinot-segment-local -am -Dtest=TableConfigConsumingSegmentTierOverrideTest,TableConfigUtilsTest,IndexLoadingConfigTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -pl pinot-common -am -Dtest=TableConfigSerDeUtilsTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -pl pinot-integration-tests -am -Dskip.npm=true -Dtest=ConsumingSegmentTierOverrideRealtimeTest -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw spotless:apply -pl pinot-common,pinot-core,pinot-segment-local,pinot-spi,pinot-integration-tests,pinot-tools
  • ./mvnw license:format -pl pinot-common,pinot-core,pinot-segment-local,pinot-spi,pinot-integration-tests,pinot-tools
  • ./mvnw checkstyle:check -pl pinot-common,pinot-core,pinot-segment-local,pinot-spi,pinot-integration-tests,pinot-tools
  • ./mvnw license:check -pl pinot-common,pinot-core,pinot-segment-local,pinot-spi,pinot-integration-tests,pinot-tools
  • git diff --cached --check

@xiangfu0 xiangfu0 force-pushed the codex/consuming-segment-index-profile branch from f379368 to 8f5b053 Compare May 12, 2026 10:24
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 12, 2026

Codecov Report

❌ Patch coverage is 71.77419% with 70 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.71%. Comparing base (7fe517a) to head (10ee317).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
...he/pinot/segment/local/utils/TableConfigUtils.java 68.04% 37 Missing and 25 partials ⚠️
...ealtime/writer/StatelessRealtimeSegmentWriter.java 0.00% 3 Missing ⚠️
.../spi/config/table/ConsumingSegmentIndexConfig.java 75.00% 2 Missing ⚠️
...org/apache/pinot/spi/config/table/TableConfig.java 75.00% 2 Missing ⚠️
...a/manager/realtime/RealtimeSegmentDataManager.java 75.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18474      +/-   ##
============================================
+ Coverage     63.68%   63.71%   +0.02%     
  Complexity     1684     1684              
============================================
  Files          3262     3265       +3     
  Lines        199826   200080     +254     
  Branches      31031    31087      +56     
============================================
+ Hits         127264   127481     +217     
- Misses        62414    62430      +16     
- Partials      10148    10169      +21     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 63.71% <71.77%> (+0.02%) ⬆️
temurin 63.71% <71.77%> (+0.02%) ⬆️
unittests 63.71% <71.77%> (+0.02%) ⬆️
unittests1 55.72% <16.93%> (-0.04%) ⬇️
unittests2 35.01% <70.16%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 marked this pull request as ready for review May 12, 2026 11:40
@xiangfu0 xiangfu0 force-pushed the codex/consuming-segment-index-profile branch from 8f5b053 to 10ee317 Compare May 12, 2026 19:31
@xiangfu0 xiangfu0 closed this May 12, 2026
@xiangfu0 xiangfu0 changed the title Add realtime consuming segment index profile Support consuming tier overrides for realtime consuming segments May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants