Fix double memory allocation during operator setup #5619

siddharthteotia · 2020-06-25T07:13:30Z

This is a fix for issue being seen in #5610

The call stack for the OOM is

`Caused by: java.lang.OutOfMemoryError: Direct buffer memory
	at java.nio.Bits.reserveMemory(Bits.java:694) ~[?:1.8.0_252]
	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[?:1.8.0_252]
	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[?:1.8.0_252]
	at org.apache.pinot.core.io.reader.impl.ChunkReaderContext.<init>(ChunkReaderContext.java:38)org.apache.pinot.core.io.reader.impl.v1.VarByteChunkSingleValueReader.createContext(VarByteChunkSingleValueReader.java:93) 
org.apache.pinot.core.io.reader.impl.v1.VarByteChunkSingleValueReader.createContext(VarByteChunkSingleValueReader.java:
	at org.apache.pinot.core.operator.docvalsets.SingleValueSet.<init>(SingleValueSet.java:35) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.operator.blocks.SingleValueBlock.<init>(SingleValueBlock.java:41) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.segment.index.datasource.BaseDataSource.getNextBlock(BaseDataSource.java:105) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.common.DataFetcher.<init>(DataFetcher.java:65) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.operator.ProjectionOperator.<init>(ProjectionOperator.java:46) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.plan.ProjectionPlanNode.run(ProjectionPlanNode.java:51) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.plan.TransformPlanNode.run(TransformPlanNode.java:103) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.plan.SelectionPlanNode.run(SelectionPlanNode.java:55) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.plan.CombinePlanNode$1.callJob(CombinePlanNode.java:122) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]`

This implies that error happens essentially during execution setup phase when the operators are being setup and haven't begun the execution by passing data between them.

As per @fx19880617 , the call stack seems to be consistently reproducible as soon as the client upgrades to 0.4.0 in production and runs a SELECT * query on segments created in 0.3.0. At the top of the stack, there is ChunkReaderContext which allocates a chunk and fails with direct memory OOM. @fx19880617 verified that chunk size and numDocsPerChunk etc are same so it is the not case that in 0.4.0, we are suddenly allocating more memory in the ChunkReaderContext and thus failing.

Either there is a leak or something else.

As part of PR #5510, one change made to SingleValueSet (also part of the call stack) was that reader context is now being created in the call to constructor as opposed to during every read call. There are two implications of this:

Earlier since chunk reader context objects were created on a per call basis, they were essentially short lived and probably never made their way to perm gen. Thus were garbage collected thereby also leading to garbage collecting the direct byte buffer reference inside them and freeing up direct memory -- this is essentially how direct memory is freed up in JVM unless cleaner is called. Now since they are being allocated in the constructor, they have essentially become long-lived objects and are unlikely to be GCed as quickly as they were in the previous code. Thus, there will be memory pressure.
The second implication as explained below is worse and probably the actual root cause of the OOM reported by the customer.

See this code for creation of ProjectionOperator (note that it is part of call stack)

`public ProjectionOperator(Map<String, DataSource> dataSourceMap, BaseOperator<DocIdSetBlock> docIdSetOperator) {
    _dataSourceMap = dataSourceMap;
    _dataBlockMap = new HashMap<>(dataSourceMap.size());
    for (Map.Entry<String, DataSource> entry : dataSourceMap.entrySet()) {
      _dataBlockMap.put(entry.getKey(), entry.getValue().nextBlock());
    }
    _docIdSetOperator = docIdSetOperator;
    _dataBlockCache = new DataBlockCache(new DataFetcher(dataSourceMap));
  }`

_dataBlockMap.put(entry.getKey(), entry.getValue().nextBlock()); creates a block by going down the path of nextBlock() -> SingleValueBlock -> SingleValueSet -> constructor -> create reader context (direct memory allocated with the new PR)

We then create DataFetcher with dataSourceMap and this code again creates a block

dataSourceMap.get(column).nextBlock().getBlockValueSet() thus essentially going down the same path again eventually creating SingleValueBlock -> SingleValueSet -> reader context. -> allocating direct memory

So the memory is being allocated twice and somewhere in the middle of doing this for the DataFetcher for a given column, we fail with OOM

kishoreg · 2020-06-25T07:18:07Z

Amazing work!

We definitely need a backward compatibility test for segments.

kishoreg · 2020-06-25T07:18:53Z

can you check in a small segment file from 0.3.0 and add a test case on top of that?

siddharthteotia · 2020-06-25T07:23:09Z

Amazing work!

We definitely need a backward compatibility test for segments.

@kishoreg , there is no backward incompatibility here. The PR #5510 actually results in double memory allocation (as explained above in the description)

W.r.t tests for backward compatibility of raw index reader writer, we already have them. There are files checked in from prior versions to ensure current readers can read old versions. We have them for both compressed and uncompressed. I recently added more tests too. See VarByteChunkSingleValueWriterTest.java

xiangfu0

lgtm

kishoreg · 2020-06-25T17:59:21Z

Amazing work!
We definitely need a backward compatibility test for segments.

@kishoreg , there is no backward incompatibility here. The PR #5510 actually results in double memory allocation (as explained above in the description)

W.r.t tests for backward compatibility of raw index reader writer, we already have them. There are files checked in from prior versions to ensure current readers can read old versions. We have them for both compressed and uncompressed. I recently added more tests too. See VarByteChunkSingleValueWriterTest.java

Got it. I thought there was something wrong with the metadata in the segment header.

Jackie-Jiang · 2020-06-25T18:06:57Z

Good finding!
In order to fix this, I would recommend replacing _dataBlockMap with Map<String, DataSourceMetadata> or Map<String, FieldSpec>`. Setting blocks in operator constructor is not correct.

Besides, the real root cause of this issue is allocating direct memory within reader context without freeing it explicitly. There is no guarantee when the direct memory can be released. We need to figure out a way to free the memory explicitly.

Jackie-Jiang

LGTM

Jackie-Jiang · 2020-06-25T20:10:24Z

pinot-core/src/main/java/org/apache/pinot/core/common/DataFetcher.java

@@ -62,7 +62,7 @@ public DataFetcher(Map<String, DataSource> dataSourceMap) {
      DataSource dataSource = entry.getValue();
      _dictionaryMap.put(column, dataSource.getDictionary());
      DataSourceMetadata dataSourceMetadata = dataSource.getDataSourceMetadata();
-      BlockValSet blockValueSet = dataSource.nextBlock().getBlockValueSet();
+      BlockValSet blockValueSet = dataSourceMap.get(column).nextBlock().getBlockValueSet();


Revert this line

Jackie-Jiang · 2020-06-25T20:12:01Z

pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/MultiValueBlock.java

-  private final BlockValSet _blockValSet;
+  private final SingleColumnMultiValueReader _reader;
+  private final FieldSpec.DataType _dataType;
+  private BlockValSet _blockValSet;


(nit) put this line after line 37

Jackie-Jiang · 2020-06-25T20:12:36Z

pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/SingleValueBlock.java

@@ -32,13 +32,14 @@
 @SuppressWarnings("rawtypes")
 public final class SingleValueBlock implements Block {
  private final SingleColumnSingleValueReader _reader;
-  private final BlockValSet _blockValSet;
+  private final FieldSpec.DataType _dataType;
+  private BlockValSet _blockValSet;


(nit) Put this line after line 37

siddharthteotia · 2020-06-25T20:19:13Z

pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/MultiValueBlock.java

@@ -47,6 +50,9 @@ public BlockDocIdSet getBlockDocIdSet() {

  @Override
  public BlockValSet getBlockValueSet() {


@Jackie-Jiang , I am concerned about the invocation of this in multi-threaded scenario. From the code, it doesn't look like it will happen since each query will create it's own. Thoughts?

@siddharthteotia There is no multi-threading for the blocks. All the blocks are processed by one thread

Co-authored-by: Siddharth Teotia <steotia@steotia-mn1.linkedin.biz>

Motivation: Currently DataSource is modeled as an Operator, where values are returned as a block complying with the Operator interface. This is confusing because of the following reasons: - The block contains all the documents in the segment, instead of a block of at most 10000 documents as in the Projection layer. - The values are always fetched with their document ids, instead of fetched as a block. Currently BlockValSet interface has 2 set of APIs because of this, which is confusing. - The BlockValSet returned by the DataSource is not really a value set, but a value reader on top of the forward index. - Extra BlockMetadata has to be maintained which can cause unexpected problems (e.g. the issue fixed in apache#5619) Changes: - Make DataSource standalong without implementing Operator - Add interfaces for forward index (ForwardIndexReader, ForwardIndexWriter, ForwardIndexReaderWriter) - Add ColumnValueReader class to help read forward index from DataSource - Remove the docId based APIs from BlockValSet

…5625) Changes: - Make `DataSource` independent of the query execution (not extend `Operator`). - Add interfaces for forward index. - Make `ForwardIndexReaderContext` close-able so that resources (e.g. off-heap buffer) can be released. - In `ForwardIndexReader', always read with `ForwardIndexReaderContext` and close it when the reading is done to prevent resource leak. - Remove `SingleValueSet` and `MultiValueSet` and change caller logic to directly use forward index and prevent resource leak. - Make `DataFetcher` directly use forward index and handle the type conversion within the class. Similar to Dictionary convention, type conversion among INT, LONG, FLOAT, DOUBLE, STRING, and between STRING and BYTES are supported. Motivation: Currently `DataSource` is modeled as an `Operator`, where values are returned as a block complying with the Operator interface. This is confusing because of the following reasons: - The block contains all the documents in the segment, instead of a block of at most 10000 documents as in the `Projection` layer. - The values are always fetched with their document ids, instead of fetched as a block. Currently `BlockValSet` interface has 2 sets of APIs because of this, which is confusing. - The `BlockValSet` returned by the `DataSource` is not really a value set, but a value reader on top of the forward index. - Extra `BlockMetadata` has to be maintained which can cause unexpected problems (e.g. the issue fixed in #5619) TODO: There are 2 remaining places where `ForwardIndexReaderContext` needs to be closed: - `ScanBasedDocIdIterator` - `DataFetcher` New inheritance hierarchy for the forward index classes: - `ForwardIndexReader` - Dictionary-encoded unsorted SV: `FixedBitSVForwardIndexReader` - Dictionary-encoded sorted SV: `SortedIndexReaderImpl` - Dictionary-encoded MV: `FixedBitMVForwardIndexReader` - Raw fixed-byte SV: `FixedByteChunkSVForwardIndexReader` - Raw var-byte SV: `VarByteChunkSVForwardIndexReader` - `MutableForwardIndex` extends `ForwardIndexReader` - Fixed-byte SV (for both dictionary-encoded and raw): `FixedByteSVMutableForwardIndex` - Var-byte SV: `VarByteSVMutableForwardIndex` - Fixed-byte MV (for both dictionary-encoded and raw): `FixedByteMVMutableForwardIndex` - `ForwardIndexCreator` - Dictionary-encoded unsorted SV: `SingleValueUnsortedForwardIndexCreator` - Dictionary-encoded sorted SV: `SingleValueSortedForwardIndexCreator` - Dictionary-encoded MV: `MultiValueUnsortedForwardIndexCreator` - Raw fixed-byte SV: `SingleValueFixedByteRawIndexCreator` - Raw var-byte SV: `SingleValueVarByteRawIndexCreator`

xiangfu0 linked an issue Jun 25, 2020 that may be closed by this pull request

Direct Memory OOM issue during upgrade to 0.4.0 #5610

Closed

siddharthteotia requested review from xiangfu0 and Jackie-Jiang June 25, 2020 07:30

xiangfu0 approved these changes Jun 25, 2020

View reviewed changes

Jackie-Jiang approved these changes Jun 25, 2020

View reviewed changes

siddharthteotia commented Jun 25, 2020

View reviewed changes

siddharthteotia force-pushed the block branch from f5c5660 to af53800 Compare June 25, 2020 20:47

Fix double memory allocation during operator setup

cc2fa00

siddharthteotia force-pushed the block branch from af53800 to cc2fa00 Compare June 25, 2020 21:19

siddharthteotia merged commit 32feeb3 into apache:master Jun 25, 2020

Jackie-Jiang mentioned this pull request Jun 26, 2020

Make DataSource independent of query execution (not extend Operator) #5625

Merged

siddharthteotia added a commit that referenced this pull request Jun 30, 2020

Fix double memory allocation during operator setup (#5619)

a4683b5

Co-authored-by: Siddharth Teotia <steotia@steotia-mn1.linkedin.biz>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix double memory allocation during operator setup #5619

Fix double memory allocation during operator setup #5619

siddharthteotia commented Jun 25, 2020 •

edited by xiangfu0

Loading

kishoreg commented Jun 25, 2020

kishoreg commented Jun 25, 2020

siddharthteotia commented Jun 25, 2020

xiangfu0 left a comment

kishoreg commented Jun 25, 2020

Jackie-Jiang commented Jun 25, 2020

Jackie-Jiang left a comment

Jackie-Jiang Jun 25, 2020

siddharthteotia Jun 25, 2020

Jackie-Jiang Jun 25, 2020

siddharthteotia Jun 25, 2020

Jackie-Jiang Jun 25, 2020

siddharthteotia Jun 25, 2020

siddharthteotia Jun 25, 2020

Jackie-Jiang Jun 25, 2020

		@@ -47,6 +50,9 @@ public BlockDocIdSet getBlockDocIdSet() {

		@Override
		public BlockValSet getBlockValueSet() {

Fix double memory allocation during operator setup #5619

Fix double memory allocation during operator setup #5619

Conversation

siddharthteotia commented Jun 25, 2020 • edited by xiangfu0 Loading

kishoreg commented Jun 25, 2020

kishoreg commented Jun 25, 2020

siddharthteotia commented Jun 25, 2020

xiangfu0 left a comment

Choose a reason for hiding this comment

kishoreg commented Jun 25, 2020

Jackie-Jiang commented Jun 25, 2020

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddharthteotia commented Jun 25, 2020 •

edited by xiangfu0

Loading