Support STRING and BYTES for no dictionary columns in realtime consuming segments #4791

siddharthteotia · 2019-11-06T01:20:03Z

Added support for creation of raw index for var length columns in realtime consuming segments.

This is also needed for text search feature.

siddharthteotia · 2019-11-06T23:16:18Z

pinot-core/src/test/java/org/apache/pinot/core/common/RealtimeNoDictionaryTest.java

@@ -38,11 +41,13 @@
  private static final String LONG_COL_NAME = "longcol";
  private static final String FLOAT_COL_NAME = "floatcol";
  private static final String DOUBLE_COL_NAME = "doublecol";
-  private static final int NUM_ROWS = 1000;
+  private static final String STRING_COL_NAME = "stringcol";
+  private static final int NUM_ROWS = 10;


Sorry this was a bad change... while I was debugging the test I added. Need to undo this to set NUM_ROWS back to 1000

mcvsubbu · 2019-11-07T17:41:38Z

pinot-common/src/main/java/org/apache/pinot/common/data/FieldSpec.java

@@ -420,6 +420,7 @@ public int size() {
        case DOUBLE:
          return Double.BYTES;
        case BYTES:
+        case STRING:


The comment below is not valid anymore I suppose. Actually, I am not sure what that comment means. So, can you please clarify the comment ? thanks

mcvsubbu · 2019-11-07T17:46:38Z

pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java

-        indexReaderWriter = new FixedByteSingleColumnSingleValueReaderWriter(_capacity, indexColumnSize, _memoryManager,
-            allocationContext);
+
+      if (forwardIndexColumnSize > 0) {


I am a little uncomfortable assuming the size of the column to be a certain value when FieldSpec does not export that value. How about we add a method to FieldSpec like isFixedWidthColumn(). We can set fwdIndexColumnSize to be -1 in this method, and modify it only if it is fixed width no dictionary column, or a column for which we create dictionary

Done. Added isFixedWidthColumn() method to FieldSpec.DataType

mcvsubbu · 2019-11-07T17:49:04Z

pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java

-        // No dictionary
-        indexColumnSize = dataType.size();
+          && !invertedIndexColumns.contains(column)) {
+        // No dictionary -- size will be equal to size of data


This does mean that we support no dictionary for all types of columns. Seems ok, but just worried that if another column type is added for which we may need some speical logic, it may be hard to locate this place to change it. Short of introducing a method isNoDictionarySupportedForColumnType() I am not sure what else can be done. We can add the isSingleValueField() check inside the new method, though

Good suggestion. Done

pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java

mcvsubbu · 2019-11-07T17:55:47Z

pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java

+        // for STRING/BYTES SV column, we support raw index in consuming segments
+        // RealtimeSegmentStatsHistory does not have the stats for no-dictionary columns
+        // from previous consuming segments
+        // TODO: come up with better estimated values


Cardinality should not be a factor here, since it is a raw index, and the actual values are stored. You only need some estimate for the average string length. We can get that from StatsHistory (as long as we update it correctly, of course). The call to construct VarByteSiunceColumnSVRW should take _capacity as the number of strings to add, and the averageLen that we can get from stats history.

Cardinality was a wrong word I used for variable names. I meant number of values (rows).

I have added a TODO to capture the estimated average column length in realtime segment stats history for no dictionary columns as well. Currently we only do this for dictionary encoded columns. For now, a constant value of 100 is being used. I will follow-up with a PR to add support for this.

Secondly, using _capacity (which is essentially the maxSegmentRows as indicated RealtimeSegmentConfig) directly for allocating the memory for VarByteSVRW might not be good. Note that VarByteSVRW internally uses MutableOffHeapByteArrayStore which stores data in a list of buffers. Passing _capacity means the byte store will try to allocate a single giant buffer to store the strings/bytes for all rows. This memory allocation might fail if we are using DIRECT off-heap memory mode and _capacity is very high and memory is fragmented. So I do a min(_capacity, 100_000) as the initial capacity for VarByteSVRW to begin with smaller capacity as opposed to _capacity

Looking at the code, it seems like we already have a TODO to start with smaller capacity for MV columns.

mcvsubbu · 2019-11-07T17:57:38Z

...a/org/apache/pinot/core/io/readerwriter/impl/VarByteSingleColumnSingleValueReaderWriter.java

+  public VarByteSingleColumnSingleValueReaderWriter(
+      PinotDataBufferMemoryManager memoryManager,
+      String allocationContext,
+      int estimatedCardinality,


rename to maxNumberOfValues?

mcvsubbu · 2019-11-07T18:09:55Z

...n/java/org/apache/pinot/core/realtime/converter/stats/RealtimeNoDictionaryColStatistics.java

@@ -63,12 +68,52 @@ public int getCardinality() {

  @Override
  public int getLengthOfShortestElement() {
-    return lengthOfDataType(); // Only fixed length data types supported.
+    FieldSpec.DataType dataType = _blockValSet.getValueType();
+    if (dataType == FieldSpec.DataType.STRING || dataType == FieldSpec.DataType.BYTES) {


Let us introduce a method in FieldType isFixedWidth()? Will be easier when /if we add new data types.

mcvsubbu · 2019-11-07T18:11:24Z

...n/java/org/apache/pinot/core/realtime/converter/stats/RealtimeNoDictionaryColStatistics.java

+    if (dataType == FieldSpec.DataType.STRING || dataType == FieldSpec.DataType.BYTES) {
+      // variable width no dictionary columns
+      int minLength = Integer.MAX_VALUE;
+      BaseSingleColumnSingleValueReaderWriter readerWriter = (BaseSingleColumnSingleValueReaderWriter)_forwardIndex;


Can we keep track of shortest and longest element in the fwd index and just read it here? Will save time as well as garbage generation during segment build.

We can definitely compute min and max in one go, and not need to walk over the fwd index for each of them separately.

siddharthteotia · 2020-01-10T14:55:32Z

@mcvsubbu , I have addressed the review comments. Also rebased on latest master

siddharthteotia · 2020-01-10T18:53:49Z

Fixed test failures. Build is passing with latest changes.

mcvsubbu · 2020-01-10T22:59:46Z

pinot-core/src/main/java/org/apache/pinot/core/common/DataSource.java

+  /**
+   * Returns the forward index for the data source
+   */
+  public abstract DataFileReader getForwardIndex();


Instead of modifying the DataSource interface and returning null in many places, let us modify the ColumnDataSource class to get the forwrardIndex. We can then use and type-cast it to our hearts content in the RealtimeNoDictionaryColStatistics class. Since this class is very specific, the type-casting should work fine without danger of exception

mcvsubbu · 2020-01-10T23:02:15Z

pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java

+      int forwardIndexColumnSize = -1;
+      if (isNoDictionarySupportedForColumn(noDictionaryColumns, invertedIndexColumns, fieldSpec, column)) {
+        // no dictionary
+        // each forward index entry will be equal to size of data


Suggested change

// each forward index entry will be equal to size of data

// each forward index entry will be equal to size of data for that row

mcvsubbu · 2020-01-10T23:02:57Z

pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java

+      if (isNoDictionarySupportedForColumn(noDictionaryColumns, invertedIndexColumns, fieldSpec, column)) {
+        // no dictionary
+        // each forward index entry will be equal to size of data
+        // for INT, LONG, FLOAT, DOUBLE it is equal to the number of fixed bytes used to store the value,


Suggested change

// for INT, LONG, FLOAT, DOUBLE it is equal to the number of fixed bytes used to store the value,

// For INT, LONG, FLOAT, DOUBLE it is equal to the size of the (serialized) raw value,

mcvsubbu · 2020-01-10T23:05:06Z

pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java

+   * @param column column name
+   * @return true if column is no-dictionary, false if dictionary encoded
+   */
+  private boolean isNoDictionarySupportedForColumn(Set<String> noDictionaryColumns, Set<String> invertedIndexColumns,


Suggested change

private boolean isNoDictionarySupportedForColumn(Set<String> noDictionaryColumns, Set<String> invertedIndexColumns,

private boolean isNoDictionaryColumn(Set<String> noDictionaryColumns, Set<String> invertedIndexColumns,

mcvsubbu · 2020-01-10T23:15:02Z

pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java

-            allocationContext);
+
+      // create forward index reader/writer
+      if (forwardIndexColumnSize != -1) {


It may be easier to read if we invert the if condition and exchange the if and else body.

mcvsubbu · 2020-01-10T23:16:32Z

pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java

@@ -76,6 +77,9 @@
  private static final int MIN_ROWS_TO_INDEX = 1000_000; // Min size of recordIdMap for updatable metrics.
  private static final int MIN_RECORD_ID_MAP_CACHE_SIZE = 10000; // Min overflow map size for updatable metrics.

+  private static final int NODICT_VARIABLE_WIDTH_ESTIMATED_AVERAGE_VALUE_LENGTH = 100;


Maybe post-fix these definitions with DEFAULT at the end (since we will eventually get these from the stats) -- unless you are going to file the stats PR real soon. In that case, just let it be. this definition will probably move into the stats object

Done. I will put a PR soon though

mcvsubbu · 2020-01-11T00:09:04Z

pinot-core/src/test/java/org/apache/pinot/core/common/RealtimeNoDictionaryTest.java

+    }
+    try {
+      int[] intValues = new int[NUM_ROWS];
+      dataFetcher.fetchIntValues(INT_COL_NAME, docIds, numDocIds, intValues);


why are we testing this again? This functionality is already tested in testIntValues() right? You only need to add testStringValue() and testBytesValue(). Since these data types cannot be fetched as any other,we only need to iterate the fetcher through the corresponding value types

Yes those were redundant. I included tests for STRING and BYTES in the same unit test.

pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java

...a/org/apache/pinot/core/io/readerwriter/impl/VarByteSingleColumnSingleValueReaderWriter.java

siddharthteotia · 2020-01-11T01:33:53Z

@mcvsubbu , I have addressed the review comments. Please take another look

codecov-io · 2020-01-11T02:07:30Z

Codecov Report

Merging #4791 into master will increase coverage by 21.27%.
The diff coverage is 56.66%.

@@              Coverage Diff              @@
##             master    #4791       +/-   ##
=============================================
+ Coverage     36.06%   57.34%   +21.27%     
- Complexity        0       12       +12     
=============================================
  Files          1174     1174               
  Lines         62349    62244      -105     
  Branches       9175     9148       -27     
=============================================
+ Hits          22487    35694    +13207     
+ Misses        37851    23879    -13972     
- Partials       2011     2671      +660

Impacted Files	Coverage Δ	Complexity Δ
...main/java/org/apache/pinot/spi/data/FieldSpec.java	`85.93% <0%> (+85.93%)`	`0 <0> (ø)`	⬇️
...re/segment/index/data/source/ColumnDataSource.java	`95.74% <100%> (+4.44%)`	`0 <0> (ø)`	⬇️
...riter/BaseSingleColumnSingleValueReaderWriter.java	`29.62% <33.33%> (+9.62%)`	`0 <0> (ø)`	⬇️
.../FixedByteSingleColumnSingleValueReaderWriter.java	`98.94% <50%> (+5.39%)`	`0 <0> (ø)`	⬇️
.../core/indexsegment/mutable/MutableSegmentImpl.java	`81.1% <54.16%> (+16.45%)`	`0 <0> (ø)`	⬇️
...pl/VarByteSingleColumnSingleValueReaderWriter.java	`61.53% <61.53%> (ø)`	`0 <0> (?)`
...erter/stats/RealtimeNoDictionaryColStatistics.java	`70% <66.66%> (+6%)`	`0 <0> (ø)`	⬇️
...a/manager/realtime/RealtimeSegmentDataManager.java	`50% <0%> (-25%)`	`0% <0%> (ø)`
.../impl/dictionary/FloatOnHeapMutableDictionary.java	`48.78% <0%> (-19.52%)`	`0% <0%> (ø)`
...pinot/broker/api/resources/PinotClientRequest.java	`31.91% <0%> (-10.95%)`	`0% <0%> (ø)`
... and 639 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 97c7118...33f780f. Read the comment docs.

mcvsubbu

lgtm other than a minor comment, thanks

mcvsubbu · 2020-01-12T00:20:50Z

...a/org/apache/pinot/core/io/readerwriter/impl/VarByteSingleColumnSingleValueReaderWriter.java

+  @Override
+  public void setString(int row, String val) {
+    byte[] serializedValue = StringUtil.encodeUtf8(val);
+    _byteArrayStore.add(serializedValue);


can we just invoke setBytes() here?

Done. Thanks

…ing segments

siddharthteotia · 2020-01-12T19:29:31Z

Addressed the latest comment. Thanks for the review @mcvsubbu

(1) PR apache#5256 added support for deriving num docs per chunk for var byte raw index create from column length. This was specifically done as part of supporting text blobs. For use cases that don't want this feature and are high QPS, see a negative impact since size of chunk increases (earlier value of numDocsPerChunk was hardcoded to 1000) and based on the access pattern we might end up uncompressing a bigger chunk to get values for a set of docIds. We have made this change configurable. So the default behaviour is same as old (1000 docs per chunk) (2) PR apache#4791 added support for noDict for STRING/BYTES in consuming segments. There is a particular impact of this change on the use cases that have set noDict on their STRING dimension columns for other performance reasons and also want metricsAggregation. These use cases don't get to aggregateMetrics because the new implementation was able to honor their table config setting of noDict on STRING/BYTES. Without metrics aggregation, memory pressure increases. So to continue aggregating metrics for such cases, we will create dictionary even if the column is part of noDictionary set from table config.

(1) PR #5256 added support for deriving num docs per chunk for var byte raw index create from column length. This was specifically done as part of supporting text blobs. For use cases that don't want this feature and are high QPS, see a negative impact since size of chunk increases (earlier value of numDocsPerChunk was hardcoded to 1000) and based on the access pattern we might end up uncompressing a bigger chunk to get values for a set of docIds. We have made this change configurable. So the default behaviour is same as old (1000 docs per chunk) (2) PR #4791 added support for noDict for STRING/BYTES in consuming segments. There is a particular impact of this change on the use cases that have set noDict on their STRING dimension columns for other performance reasons and also want metricsAggregation. These use cases don't get to aggregateMetrics because the new implementation was able to honor their table config setting of noDict on STRING/BYTES. Without metrics aggregation, memory pressure increases. So to continue aggregating metrics for such cases, we will create dictionary even if the column is part of noDictionary set from table config. Co-authored-by: Siddharth Teotia <steotia@steotia-mn1.linkedin.biz>

siddharthteotia commented Nov 6, 2019

View reviewed changes

mcvsubbu reviewed Nov 7, 2019

View reviewed changes

siddharthteotia force-pushed the realtime_nodict branch from e0275d8 to e64b46f Compare January 10, 2020 14:52

mcvsubbu reviewed Jan 11, 2020

View reviewed changes

...a/org/apache/pinot/core/io/readerwriter/impl/VarByteSingleColumnSingleValueReaderWriter.java Show resolved Hide resolved

siddharthteotia force-pushed the realtime_nodict branch from e626b64 to 981e94a Compare January 11, 2020 01:33

mcvsubbu approved these changes Jan 12, 2020

View reviewed changes

Support STRING and BYTES for no dictionary columns in realtime consum…

33f780f

…ing segments

siddharthteotia force-pushed the realtime_nodict branch from 981e94a to 33f780f Compare January 12, 2020 19:28

siddharthteotia merged commit e81e530 into apache:master Jan 12, 2020

siddharthteotia mentioned this pull request Jan 12, 2020

Support variable byte no dictionary columns in Consuming segments #4034

Closed

siddharthteotia mentioned this pull request Feb 20, 2020

Add test for supporting no dictionary on variable width columns in realtime segments #5083

Open

siddharthteotia mentioned this pull request May 30, 2020

Derive numDocsPerChunk for var byte raw index from metadata only if config is enabled. #5470

Merged

3 tasks

	// each forward index entry will be equal to size of data
	// each forward index entry will be equal to size of data for that row

	// for INT, LONG, FLOAT, DOUBLE it is equal to the number of fixed bytes used to store the value,
	// For INT, LONG, FLOAT, DOUBLE it is equal to the size of the (serialized) raw value,

	private boolean isNoDictionarySupportedForColumn(Set<String> noDictionaryColumns, Set<String> invertedIndexColumns,
	private boolean isNoDictionaryColumn(Set<String> noDictionaryColumns, Set<String> invertedIndexColumns,

Support STRING and BYTES for no dictionary columns in realtime consuming segments #4791

Support STRING and BYTES for no dictionary columns in realtime consuming segments #4791

Conversation

siddharthteotia commented Nov 6, 2019 • edited

siddharthteotia Nov 6, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddharthteotia Jan 10, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddharthteotia commented Jan 10, 2020 • edited

siddharthteotia commented Jan 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddharthteotia commented Jan 11, 2020

codecov-io commented Jan 11, 2020 • edited

Codecov Report

mcvsubbu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddharthteotia commented Jan 12, 2020

siddharthteotia commented Nov 6, 2019 •

edited

siddharthteotia Nov 6, 2019 •

edited

siddharthteotia Jan 10, 2020 •

edited

siddharthteotia commented Jan 10, 2020 •

edited

codecov-io commented Jan 11, 2020 •

edited