Remove SegmentIndexCreationInfo and ColumnIndexCreationInfo wrapper classes#18116
Merged
xiangfu0 merged 1 commit intoapache:masterfrom Apr 8, 2026
Merged
Conversation
557efcb to
8238301
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18116 +/- ##
============================================
+ Coverage 63.97% 64.03% +0.06%
- Complexity 1611 1617 +6
============================================
Files 3181 3179 -2
Lines 193767 193697 -70
Branches 29918 29914 -4
============================================
+ Hits 123953 124028 +75
+ Misses 60034 59865 -169
- Partials 9780 9804 +24
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…lasses These two classes add unnecessary indirection: - SegmentIndexCreationInfo is a trivial wrapper holding only `int totalDocs` - ColumnIndexCreationInfo wraps ColumnStatistics with 3 extra fields that can be derived at usage sites Replace SegmentIndexCreationInfo with plain `int totalDocs` parameter. Replace ColumnIndexCreationInfo with ColumnStatistics directly. The extra fields are handled as: - useVarLengthDictionary: re-derived from DictionaryIndexConfig + stats - isAutoGenerated: detected via `instanceof DefaultColumnStatistics` - defaultNullValue: computed from FieldSpec.getDefaultNullValueString() Also simplifies SegmentCreator.init() and SegmentGeneratorConfig by consolidating parameters and moving common initialization into BaseSegmentCreator.
8238301 to
b2dc5a5
Compare
xiangfu0
approved these changes
Apr 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Removes unnecessary wrapper classes and consolidates segment creation parameters.
Delete SegmentIndexCreationInfo and ColumnIndexCreationInfo
SegmentIndexCreationInfowas a trivial wrapper holding onlyint totalDocs— replaced with a plainintparameterColumnIndexCreationInfowrappedColumnStatisticswith 3 extra fields that are now derived at usage sites:useVarLengthDictionary: re-derived fromDictionaryIndexConfig+ column statsisAutoGenerated: detected viainstanceof DefaultColumnStatisticsdefaultNullValue: computed fromFieldSpec.getDefaultNullValueString()Simplify
SegmentCreatorinterfaceinit()reduced from 7 parameters to 4:(SegmentGeneratorConfig, int totalDocs, TreeMap<String, ColumnStatistics>, File outDir)InstanceType,immutableToMutableIdMap,Schemamoved intoSegmentGeneratorConfig_colIndexespopulation) consolidated intoBaseSegmentCreator.init()SegmentColumnarIndexCreatorno longer overridesinit()— inherits fromBaseSegmentCreatorConsolidate parameters into
SegmentGeneratorConfiginstanceType,mutableSegmentCompacted,mutableToImmutableDocIdMapfieldsSegmentPurger,SegmentProcessorFramework,RealtimeSegmentConverter, minion tasks, etc.) now set these on the config instead of passing them as separateinit()parametersSegmentIndexCreationDriver.init(config, instanceType)overload —instanceTypeis now on the configSimplify
SegmentIndexCreationDriverImpl_segmentIndexCreationInfofield — uses_totalDocsdirectly_indexCreationInfoMapreplaced with_columnStatisticsMap(TreeMap<String, ColumnStatistics>)ColumnStatisticsdirectly instead of wrapping inColumnIndexCreationInfoinit()overloads simplified:InstanceTypeparameter removed (read from config)Fix
NoDictColumnStatisticsCollector.getUniqueValuesSet()NotImplementedExceptionto returningnull@Nullableannotation toColumnStatistics.getUniqueValuesSet()MutableNoDictColumnStatisticsImprove Lucene text index reuse during realtime conversion
LuceneTextIndexCreator: renamedimmutableToMutableIdMaptomutableToImmutableDocIdMapto match actual semanticsmutableSegmentCompactedparameter — disables mutable index reuse when segment is compacted (compaction changes doc IDs, making the mutable text index invalid)RealtimeSegmentConverter: computesmutableToImmutableDocIdMapfromsortedDocIdswhen both sorting and mutable text index reuse are enabledMutableSegmentImpl: addedhasColumnWithReuseMutableTextIndex()to check if any column uses mutable text index reuseCleanup
IndexCreationContext.Builder: replacedwithAllColumnStatistics()+withColumnIndexCreationInfo()with singlewithColumnStatistics()methodCompactedPinotSegmentRecordReader: removed unused constructor overloads takingThreadSafeMutableRoaringBitmappinot-core,pinot-plugins,pinot-segment-local,pinot-segment-spiTest plan
ColumnMetadataTest,SegmentPreProcessorTest,MutableSegmentImplTest,LuceneTextIndexBufferIntegrationTest)MergeRollupTaskExecutorTest,PurgeTaskExecutorTest,RealtimeToOfflineSegmentsTaskExecutorTest)NoDictColumnStatisticsCollector.getUniqueValuesSet()returnsnullinstead of throwing, and mutable text index reuse is disabled when segment is compacted