Update Segment builder to use column major tables #11776

ege-st · 2023-10-10T14:54:16Z

This PR updates the Realtime Segment Builder to remove an unnecessary and costly transformation: Pinot would transpose the column major table to a row major table and then transform it back to a column oriented table. This PR updates the segment builder to use column oriented entirely for the construction. This change improves the performance of segment building, especially for very wide tables.

In order to minimize disruption and risk, this change is kept behind a table level configuration flag (columnMajorSegmentBuilderEnabled ), which is false by default. If this flag is true then the table's segments will be built with the column oriented process.

PR description will be updated with performance test results in the next day or two.

…g a segment

…o sorted column

…some todo notes

codecov-commenter · 2023-10-10T15:38:48Z

Codecov Report

Merging #11776 (e566f43) into master (24af80d) will increase coverage by 48.49%.
Report is 48 commits behind head on master.
The diff coverage is 77.01%.

@@              Coverage Diff              @@
##             master   #11776       +/-   ##
=============================================
+ Coverage     14.45%   62.95%   +48.49%     
- Complexity      201     1141      +940     
=============================================
  Files          2342     2367       +25     
  Lines        125917   127951     +2034     
  Branches      19370    19743      +373     
=============================================
+ Hits          18205    80553    +62348     
+ Misses       106170    41677    -64493     
- Partials       1542     5721     +4179

Flag	Coverage Δ
custom-integration1	`<0.01% <0.00%> (?)`
integration	`<0.01% <0.00%> (-0.01%)`	⬇️
integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration2	`0.00% <0.00%> (ø)`
java-11	`62.90% <77.01%> (+48.48%)`	⬆️
java-17	`?`
java-20	`?`
java-21	`62.81% <77.01%> (?)`
skip-bytebuffers-false	`62.94% <77.01%> (?)`
skip-bytebuffers-true	`62.78% <77.01%> (?)`
temurin	`62.95% <77.01%> (+48.49%)`	⬆️
unittests	`62.95% <77.01%> (+48.49%)`	⬆️
unittests1	`66.99% <77.01%> (?)`
unittests2	`14.42% <0.00%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
...ot/segment/spi/creator/SegmentGeneratorConfig.java	`80.59% <ø> (+80.59%)`	⬆️
.../apache/pinot/spi/config/table/IndexingConfig.java	`97.93% <100.00%> (+97.93%)`	⬆️
.../config/table/ingestion/StreamIngestionConfig.java	`100.00% <100.00%> (+100.00%)`	⬆️
...he/pinot/spi/utils/builder/TableConfigBuilder.java	`85.88% <100.00%> (+85.88%)`	⬆️
...a/manager/realtime/RealtimeSegmentDataManager.java	`50.66% <0.00%> (+50.66%)`	⬆️
...t/creator/impl/SegmentIndexCreationDriverImpl.java	`73.66% <88.88%> (+73.66%)`	⬆️
...l/realtime/converter/RealtimeSegmentConverter.java	`70.00% <50.00%> (+70.00%)`	⬆️
...ment/creator/impl/SegmentColumnarIndexCreator.java	`84.95% <70.37%> (+84.95%)`	⬆️

... and 1571 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Jackie-Jiang

The overall logic looks good. Let's clean up the TODOs and debug code

To test it completely, let's enable it by default and see if the existing tests can pass

Jackie-Jiang · 2023-10-11T21:44:46Z

...l/src/main/java/org/apache/pinot/segment/local/segment/readers/PinotSegmentRecordReader.java

@@ -220,15 +222,20 @@ public String getSegmentName() {
  }

  public void getRecord(int docId, GenericRow buffer) {
+    // TODO: start duration


Is the change in this class temporary debugging code?

Jackie-Jiang · 2023-10-11T21:56:15Z

...rc/main/java/org/apache/pinot/segment/local/realtime/converter/RealtimeSegmentConverter.java

+
+    // Check if column major mode should be enabled
+    try {
+      // TODO(Erich): move this so that the code does not directly reference the flag name


Let's move this into TableConfig -> IngestionConfig -> StreamIngestionConfig, and add a field _enableColumnMajorSegmentCreation

Fixed this. is using the deprecated config structure for their tables (from the sample I got) and this is for the new stream config. To make migration simple I added a field for both the old and new configuration methods.

Jackie-Jiang · 2023-10-11T21:58:08Z

...rc/main/java/org/apache/pinot/segment/local/realtime/converter/RealtimeSegmentConverter.java

+  public boolean isColumnMajorEnabled() {
+    return _enableColumnMajor;
+  }
+
+  public int getTotalDocCount() {
+    return _totalDocs;
+  }


Seems not used. Suggest removing them and change these 2 variables to local variable

This one is used in a log statement to record what was done while building the segment.

I see them used in RealtimeSegmentDataManager, but these info are already logged in SegmentIndexCreationDriverImpl who has the best knowledge on how the segment is built. No need to log detailed info at top level

Jackie-Jiang · 2023-10-11T21:59:17Z

...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java

@@ -104,6 +106,7 @@ public class SegmentColumnarIndexCreator implements SegmentCreator {
  private int _totalDocs;
  private int _docIdCounter;
  private boolean _nullHandlingEnabled;
+  private long _durationNS = 0;


Is this used?

Jackie-Jiang · 2023-10-11T21:59:48Z

...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java

+  }
+
+  private void indexColumnValue(PinotSegmentColumnReader colReader,
+                                Map<IndexType<?, ?, ?>, IndexCreator> creatorsByIndex,


(code format) Please follow the Pinot Style

Jackie-Jiang · 2023-10-11T22:06:46Z

...java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java

@@ -102,7 +104,7 @@ public class SegmentIndexCreationDriverImpl implements SegmentIndexCreationDrive
  private int _totalDocs = 0;
  private File _tempIndexDir;
  private String _segmentName;
-  private long _totalRecordReadTime = 0;
+  private long _totalRecordReadTimeNS = 0;


Let's keep the naming consistent

Suggested change

private long _totalRecordReadTimeNS = 0;

private long _totalRecordReadTimeNs = 0;

Jackie-Jiang · 2023-10-11T22:39:38Z

...java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java

@@ -344,7 +390,7 @@ private void handlePostCreation()
    // Persist creation metadata to disk
    persistCreationMeta(segmentOutputDir, crc, creationTime);

-    LOGGER.info("Driver, record read time : {}", _totalRecordReadTime);
+    LOGGER.info("Driver, record read time (NS) : {}", _totalRecordReadTimeNS);


Let's keep the log unchanged, but convert the time to ms

Jackie-Jiang · 2023-10-11T22:40:21Z

pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentCreator.java

+   * @param sortedDocIds - If not null, then this provides the sorted order of documents.
+   * @param colReader - Used to get the values of the column.
+   */
+  void indexColumn(String columnName, @Nullable int[] sortedDocIds, IndexSegment colReader)


The third argument is not really a column reader

Jackie-Jiang · 2023-10-11T22:41:15Z

...java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java

+        _indexCreator.indexColumn(col, sortedDocIds, indexSegment);
+      }
+    } catch (Exception e) {
+      _indexCreator.close(); // TODO: Why is this only closed on an exception?


In regular case, it will be closed after handlePostCreation()

...java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java

…null document _not_ the docId

…dByColumn method

…lders

…he new config structure

Jackie-Jiang · 2023-10-19T21:27:53Z

pinot-spi/src/main/java/org/apache/pinot/spi/config/table/ingestion/StreamIngestionConfig.java

@@ -34,6 +34,9 @@ public class StreamIngestionConfig extends BaseJsonConfig {
  @JsonPropertyDescription("All configs for the streams from which to ingest")
  private final List<Map<String, String>> _streamConfigMaps;

+  @JsonPropertyDescription("Whether to use column major mode when creating the segment.")
+  private boolean _columnMajorSegmentBuilderEnabled;


Seems this PR always uses the config from IndexingConfig instead of this. Even though ideally it should be configured here, since it is a short-lived config (we want to always enable it in the future), let's remove it from here to avoid confusion

Both are checked in the RealTimeSegmentDataManager constructor: if there's a Stream Ingestion Config section then it uses that to check for the Segment Builder mode and if there isn't then it checks the the old configuration section for the flag.

Since this is a newly added temporary flag, don't see much value supporting it in 2 different places. Let's just remove it from here and only keep the one in IndexingConfig for simplicity

Jackie-Jiang · 2023-10-19T21:35:28Z

...t-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java

@@ -97,6 +93,7 @@ public enum TimeColumnType {
  private String _segmentNamePrefix = null;
  private String _segmentNamePostfix = null;
  private String _segmentTimeColumnName = null;
+  private boolean _segmentEnableColumnMajor = false;


I don't think the changes in this config is required. Both the fields are not used

Jackie-Jiang · 2023-10-19T21:45:13Z

...re/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java

-          _segmentLogger
-              .info("Stopping consumption due to time limit start={} now={} numRowsConsumed={} numRowsIndexed={}",
-                  _startTimeMs, now, _numRowsConsumed, _numRowsIndexed);
+          _segmentLogger.info(


Can you revert the format change for the unrelated code? Currently it is very hard to find the relevant changes. Alternatively, you may also file a PR just for the reformat for the files changed in this PR, and we can merge that first then rebase this on top of that

Jackie-Jiang · 2023-10-19T21:47:30Z

...re/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java

@@ -297,6 +297,7 @@ public void deleteSegmentFile() {
  private final Semaphore _segBuildSemaphore;
  private final boolean _isOffHeap;
  private final boolean _nullHandlingEnabled;
+  private final boolean _enableColumnMajorSegmentBuilder;


We don't need to change this class. RealtimeSegmentConverter has access to TableConfig and we can extract this within RealtimeSegmentConverter

Jackie-Jiang · 2023-10-19T21:52:21Z

...rc/main/java/org/apache/pinot/segment/local/realtime/converter/RealtimeSegmentConverter.java

+  public boolean isColumnMajorEnabled() {
+    return _enableColumnMajor;
+  }
+
+  public int getTotalDocCount() {
+    return _totalDocs;
+  }


I see them used in RealtimeSegmentDataManager, but these info are already logged in SegmentIndexCreationDriverImpl who has the best knowledge on how the segment is built. No need to log detailed info at top level

Jackie-Jiang · 2023-10-19T22:00:25Z

...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java

@@ -303,6 +305,8 @@ public static ChunkCompressionType getDefaultCompressionType(FieldType fieldType
  @Override
  public void indexRow(GenericRow row)
      throws IOException {
+    long startNS = System.nanoTime();


(minor) Not used

Jackie-Jiang · 2023-10-19T22:02:15Z

...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java

@@ -102,14 +104,14 @@ public class SegmentColumnarIndexCreator implements SegmentCreator {
  private Schema _schema;
  private File _indexDir;
  private int _totalDocs;
-  private int _docIdCounter;
+  private int _docPosOnDisk;


Don't change this. We should not update this when building the segment in column major fashion

Jackie-Jiang · 2023-10-19T22:06:43Z

...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java

+      if (sortedDocIds != null) {
+        int onDiskDocId = 0;
+        for (int docId : sortedDocIds) {
+          indexColumnValue(colReader, creatorsByIndex, columnName, fieldSpec, dictionaryCreator, docId, onDiskDocId,


For better performance, we want to change the order of the loop:

Loop over the columns

Loop over the index creator

Loop over the docs

Ideally we can directly seal the index creator when a column is indexed. Can be addressed separately

Good catch. Testing this right now.

Made the change.

Jackie-Jiang · 2023-10-19T22:09:00Z

...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java

+//            In row oriented:
+//              - this.indexRow iterates over each column and checks if it isNullValue.  If it is then it sets the null
+//              value vector for that doc id
+//              - This null value comes from the GenericRow that is created by PinotSegmentRecordReader
+//              - PinotSegmentRecordReader:L224 is where we figure out the null value stuff
+//              - PSegRecReader calls PinotSegmentColumnReader.isNull on the doc id to determine if the value for that
+//              column of that docId is null
+//              - if it returns true and we are NOT skipping null values we put the default null value into that field
+//              of the GenericRow


Is this relevant? I don't follow this comment

I put it in because I wanted to see if it would help people better understand the different steps that are involved in the null value logic. But since it didn't help, I'll remove it.

Jackie-Jiang · 2023-10-19T22:11:25Z

pinot-core/src/test/java/org/apache/pinot/realtime/converter/RealtimeSegmentConverterTest.java

+  }
+
+  @Test
+  public void testNoRecordsIndexedColumnMajorSegmentBuilder()


Can you also add a test when there are some records indexed?

…e segment

…ocuments

… helper method rather than a member variable

Jackie-Jiang

LGTM otherwise

Jackie-Jiang · 2023-10-21T00:30:12Z

pinot-spi/src/main/java/org/apache/pinot/spi/config/table/ingestion/StreamIngestionConfig.java

@@ -34,6 +34,9 @@ public class StreamIngestionConfig extends BaseJsonConfig {
  @JsonPropertyDescription("All configs for the streams from which to ingest")
  private final List<Map<String, String>> _streamConfigMaps;

+  @JsonPropertyDescription("Whether to use column major mode when creating the segment.")
+  private boolean _columnMajorSegmentBuilderEnabled;


Since this is a newly added temporary flag, don't see much value supporting it in 2 different places. Let's just remove it from here and only keep the one in IndexingConfig for simplicity

Jackie-Jiang · 2023-10-21T00:31:17Z

...t-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java

@@ -117,6 +113,7 @@ public enum TimeColumnType {
  // Use on-heap or off-heap memory to generate index (currently only affect inverted index and star-tree v2)
  private boolean _onHeap = false;
  private boolean _nullHandlingEnabled = false;
+  private boolean _columnMajorSegmentBuilderEnabled = false;


This is not used and not needed.

Jackie-Jiang · 2023-10-21T00:32:51Z

...l/src/main/java/org/apache/pinot/segment/local/segment/readers/PinotSegmentRecordReader.java

@@ -194,6 +194,10 @@ public int[] getSortedDocIds() {
    return _sortedDocIds;
  }

+  public boolean getSkipDefaultNullValues() {


Hmm, seems it is not changed. Was there commit not pushed?

Jackie-Jiang · 2023-10-21T00:34:33Z

...java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java

@@ -344,12 +394,13 @@ private void handlePostCreation()
    // Persist creation metadata to disk
    persistCreationMeta(segmentOutputDir, crc, creationTime);

-    LOGGER.info("Driver, record read time : {}", _totalRecordReadTime);
+    LOGGER.info("Driver, record read time : {}", ((float) _totalRecordReadTimeNs) / 1000000.0);


We don't want to log float time

Suggested change

LOGGER.info("Driver, record read time : {}", ((float) _totalRecordReadTimeNs) / 1000000.0);

LOGGER.info("Driver, record read time : {}", TimeUnit.NANOSECONDS.toMillis(_totalRecordReadTimeNs));

Jackie-Jiang · 2023-10-21T00:37:46Z

...java/org/apache/pinot/segment/local/segment/creator/impl/SegmentIndexCreationDriverImpl.java

@@ -229,16 +232,21 @@ public void build()
      GenericRow reuse = new GenericRow();
      TransformPipeline.Result reusedResult = new TransformPipeline.Result();
      while (_recordReader.hasNext()) {
-        long recordReadStartTime = System.currentTimeMillis();
-        long recordReadStopTime = System.currentTimeMillis();
+        long recordReadStopTime = System.nanoTime();


My IDE will show the redundant statement, not sure if you need to enable it explicitly

Suggested change

long recordReadStopTime = System.nanoTime();

long recordReadStopTime;

I'd assumed that was a deliberate coding style choice for Pinot: went ahead and got rid of it.

Jackie-Jiang · 2023-10-21T00:38:57Z

...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java

+        for (int docId : sortedDocIds) {
+          indexColumnValue(colReader, creatorsByIndex, columnName, fieldSpec, dictionaryCreator, docId, onDiskDocId,
+              nullVec, skipDefaultNullValues);
+          onDiskDocId += 1;


(nit)

Suggested change

onDiskDocId += 1;

onDiskDocId++;

Jackie-Jiang · 2023-10-21T00:40:27Z

...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java

+      indexMultiValueRow(dictionaryCreator, (Object[]) columnValueToIndex, creatorsByIndex);
+    }
+
+    if (_nullHandlingEnabled && !skipDefaultNullValues) {


Remove the second check

Suggested change

if (_nullHandlingEnabled && !skipDefaultNullValues) {

if (_nullHandlingEnabled) {

No idea how this change got lost, but re-removed it.

ege-st added 21 commits September 25, 2023 09:47

measuring more info about the internals of closing a segment

6fafdf1

hacky column oriented segment builder

e940920

Use nanosec resolution for timing calls to the transformation pipeline

7fdda15

adding some explanatory notes

4ed233f

uncomment the record rewind function

f0e0620

uncomment the record rewind function

031b7b0

add a flag to set whether to use column or row major mode for buildin…

c8ec70d

…g a segment

remove test log statement

db7b23c

Add additional information to log statement for segment build

a082b3e

Fixed an issue where column major segment build failed if there was n…

359d5a4

…o sorted column

Added note about perf refactoring

04b9e6d

Enable null value support for column major segment building

467270c

Enable null value support for column major segment building

76d054c

Check if docid is null before setting in the null vector. Also added …

df8cba4

…some todo notes

wip

01831ac

Fixing linting errors

d8ae90b

Fixing linting errors and cleaning up code

ca53b59

Fixing linting errors and cleaning up code

60ce5bd

Fixing linting errors and cleaning up code

dd512f1

Merge branch 'master' into segment_builder_use_column_orientation

0cbeeb8

Remove unused functions

22413e0

Note about refactoring once testing is done

ac26674

Jackie-Jiang added enhancement release-notes Referenced by PRs that need attention when compiling the next release notes labels Oct 11, 2023

Jackie-Jiang reviewed Oct 11, 2023

View reviewed changes

ege-st added 4 commits October 15, 2023 13:07

add in skip default flag

6d251eb

Fixed bug in code: the null vector needs the on-disk position of the …

dd00c2f

…null document _not_ the docId

Log what Segment Building mode was used when logging the builder stats

5f9fb78

Code clean up

afa8338

ege-st added 4 commits October 19, 2023 10:00

Adding documentation to explain why the RecordReader is closed in bul…

f2ca956

…dByColumn method

Added unit tests to check both row major and column major segment bui…

54821dc

…lders

Fix style issue

983777c

Fixed an issue to have compatibility with the deprecated config and t…

87e90f3

…he new config structure

ege-st marked this pull request as ready for review October 19, 2023 19:27

ege-st added 4 commits October 19, 2023 15:29

Removed some unneeded code

5ca93ff

Fix check style

151d7f4

Reformatting code

2a405e0

Reformatting code

93941e5

Jackie-Jiang reviewed Oct 19, 2023

View reviewed changes

Jackie-Jiang mentioned this pull request Oct 19, 2023

[DO NOT MERGE] Test column major segment build #11837

Closed

ege-st added 16 commits October 20, 2023 09:35

removed unused variable

424dceb

removed unused variable

22c6c56

Removing a comment that did not contribute much information

0febf5b

Addressing comments

b18e1f9

Fix a log statement

6531a74

Beefing up unit test

22ac18b

Unit test correctly checks for the expected start and end times of th…

80fd807

…e segment

Better test name

616f578

Add tests with 10 records. Fix a bug when closing a segment with no d…

47ed053

…ocuments

Style fixes

4897d67

Remove unrelated comment from unit test

07999f6

Move the check to see if column major mode should be used or not to a…

a9eb8a7

… helper method rather than a member variable

Adding comment

1519cca

Refactor code to reduce footprint of the enable column mode flags

f35a398

Fewer reformatting to make CR easier.

473f799

adding multivalue column to test

3e6c982

Jackie-Jiang approved these changes Oct 21, 2023

View reviewed changes

Code clean up

e566f43

xiangfu0 merged commit 88edfd4 into apache:master Oct 22, 2023
19 checks passed

	private long _totalRecordReadTimeNS = 0;
	private long _totalRecordReadTimeNs = 0;

	LOGGER.info("Driver, record read time : {}", ((float) _totalRecordReadTimeNs) / 1000000.0);
	LOGGER.info("Driver, record read time : {}", TimeUnit.NANOSECONDS.toMillis(_totalRecordReadTimeNs));

	long recordReadStopTime = System.nanoTime();
	long recordReadStopTime;

	if (_nullHandlingEnabled && !skipDefaultNullValues) {
	if (_nullHandlingEnabled) {

Update Segment builder to use column major tables #11776

Update Segment builder to use column major tables #11776

Conversation

ege-st commented Oct 10, 2023 • edited by snleee

codecov-commenter commented Oct 10, 2023 • edited

Codecov Report

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ege-st commented Oct 10, 2023 •

edited by snleee

codecov-commenter commented Oct 10, 2023 •

edited