Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix double memory allocation during operator setup #5619

Merged
merged 1 commit into from
Jun 25, 2020

Conversation

siddharthteotia
Copy link
Contributor

@siddharthteotia siddharthteotia commented Jun 25, 2020

This is a fix for issue being seen in #5610

The call stack for the OOM is

`Caused by: java.lang.OutOfMemoryError: Direct buffer memory
	at java.nio.Bits.reserveMemory(Bits.java:694) ~[?:1.8.0_252]
	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[?:1.8.0_252]
	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[?:1.8.0_252]
	at org.apache.pinot.core.io.reader.impl.ChunkReaderContext.<init>(ChunkReaderContext.java:38)org.apache.pinot.core.io.reader.impl.v1.VarByteChunkSingleValueReader.createContext(VarByteChunkSingleValueReader.java:93) 
org.apache.pinot.core.io.reader.impl.v1.VarByteChunkSingleValueReader.createContext(VarByteChunkSingleValueReader.java:
	at org.apache.pinot.core.operator.docvalsets.SingleValueSet.<init>(SingleValueSet.java:35) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.operator.blocks.SingleValueBlock.<init>(SingleValueBlock.java:41) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.segment.index.datasource.BaseDataSource.getNextBlock(BaseDataSource.java:105) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.common.DataFetcher.<init>(DataFetcher.java:65) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.operator.ProjectionOperator.<init>(ProjectionOperator.java:46) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.plan.ProjectionPlanNode.run(ProjectionPlanNode.java:51) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.plan.TransformPlanNode.run(TransformPlanNode.java:103) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.plan.SelectionPlanNode.run(SelectionPlanNode.java:55) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]
	at org.apache.pinot.core.plan.CombinePlanNode$1.callJob(CombinePlanNode.java:122) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-01cdc55f514b10bee2d8108d9736d4b57c48b517]`

This implies that error happens essentially during execution setup phase when the operators are being setup and haven't begun the execution by passing data between them.

As per @fx19880617 , the call stack seems to be consistently reproducible as soon as the client upgrades to 0.4.0 in production and runs a SELECT * query on segments created in 0.3.0. At the top of the stack, there is ChunkReaderContext which allocates a chunk and fails with direct memory OOM. @fx19880617 verified that chunk size and numDocsPerChunk etc are same so it is the not case that in 0.4.0, we are suddenly allocating more memory in the ChunkReaderContext and thus failing.

Either there is a leak or something else.

As part of PR #5510, one change made to SingleValueSet (also part of the call stack) was that reader context is now being created in the call to constructor as opposed to during every read call. There are two implications of this:

  • Earlier since chunk reader context objects were created on a per call basis, they were essentially short lived and probably never made their way to perm gen. Thus were garbage collected thereby also leading to garbage collecting the direct byte buffer reference inside them and freeing up direct memory -- this is essentially how direct memory is freed up in JVM unless cleaner is called. Now since they are being allocated in the constructor, they have essentially become long-lived objects and are unlikely to be GCed as quickly as they were in the previous code. Thus, there will be memory pressure.

  • The second implication as explained below is worse and probably the actual root cause of the OOM reported by the customer.

See this code for creation of ProjectionOperator (note that it is part of call stack)

`public ProjectionOperator(Map<String, DataSource> dataSourceMap, BaseOperator<DocIdSetBlock> docIdSetOperator) {
    _dataSourceMap = dataSourceMap;
    _dataBlockMap = new HashMap<>(dataSourceMap.size());
    for (Map.Entry<String, DataSource> entry : dataSourceMap.entrySet()) {
      _dataBlockMap.put(entry.getKey(), entry.getValue().nextBlock());
    }
    _docIdSetOperator = docIdSetOperator;
    _dataBlockCache = new DataBlockCache(new DataFetcher(dataSourceMap));
  }`

_dataBlockMap.put(entry.getKey(), entry.getValue().nextBlock()); creates a block by going down the path of nextBlock() -> SingleValueBlock -> SingleValueSet -> constructor -> create reader context (direct memory allocated with the new PR)

We then create DataFetcher with dataSourceMap and this code again creates a block

dataSourceMap.get(column).nextBlock().getBlockValueSet() thus essentially going down the same path again eventually creating SingleValueBlock -> SingleValueSet -> reader context. -> allocating direct memory

So the memory is being allocated twice and somewhere in the middle of doing this for the DataFetcher for a given column, we fail with OOM

@kishoreg
Copy link
Member

Amazing work!

We definitely need a backward compatibility test for segments.

@kishoreg
Copy link
Member

can you check in a small segment file from 0.3.0 and add a test case on top of that?

@xiangfu0 xiangfu0 linked an issue Jun 25, 2020 that may be closed by this pull request
@siddharthteotia
Copy link
Contributor Author

Amazing work!

We definitely need a backward compatibility test for segments.

@kishoreg , there is no backward incompatibility here. The PR #5510 actually results in double memory allocation (as explained above in the description)

W.r.t tests for backward compatibility of raw index reader writer, we already have them. There are files checked in from prior versions to ensure current readers can read old versions. We have them for both compressed and uncompressed. I recently added more tests too. See VarByteChunkSingleValueWriterTest.java

Copy link
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kishoreg
Copy link
Member

Amazing work!
We definitely need a backward compatibility test for segments.

@kishoreg , there is no backward incompatibility here. The PR #5510 actually results in double memory allocation (as explained above in the description)

W.r.t tests for backward compatibility of raw index reader writer, we already have them. There are files checked in from prior versions to ensure current readers can read old versions. We have them for both compressed and uncompressed. I recently added more tests too. See VarByteChunkSingleValueWriterTest.java

Got it. I thought there was something wrong with the metadata in the segment header.

@Jackie-Jiang
Copy link
Contributor

Good finding!
In order to fix this, I would recommend replacing _dataBlockMap with Map<String, DataSourceMetadata> or Map<String, FieldSpec>`. Setting blocks in operator constructor is not correct.

Besides, the real root cause of this issue is allocating direct memory within reader context without freeing it explicitly. There is no guarantee when the direct memory can be released. We need to figure out a way to free the memory explicitly.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -62,7 +62,7 @@ public DataFetcher(Map<String, DataSource> dataSourceMap) {
DataSource dataSource = entry.getValue();
_dictionaryMap.put(column, dataSource.getDictionary());
DataSourceMetadata dataSourceMetadata = dataSource.getDataSourceMetadata();
BlockValSet blockValueSet = dataSource.nextBlock().getBlockValueSet();
BlockValSet blockValueSet = dataSourceMap.get(column).nextBlock().getBlockValueSet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

private final BlockValSet _blockValSet;
private final SingleColumnMultiValueReader _reader;
private final FieldSpec.DataType _dataType;
private BlockValSet _blockValSet;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) put this line after line 37

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -32,13 +32,14 @@
@SuppressWarnings("rawtypes")
public final class SingleValueBlock implements Block {
private final SingleColumnSingleValueReader _reader;
private final BlockValSet _blockValSet;
private final FieldSpec.DataType _dataType;
private BlockValSet _blockValSet;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Put this line after line 37

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -47,6 +50,9 @@ public BlockDocIdSet getBlockDocIdSet() {

@Override
public BlockValSet getBlockValueSet() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jackie-Jiang , I am concerned about the invocation of this in multi-threaded scenario. From the code, it doesn't look like it will happen since each query will create it's own. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddharthteotia There is no multi-threading for the blocks. All the blocks are processed by one thread

@siddharthteotia siddharthteotia merged commit 32feeb3 into apache:master Jun 25, 2020
siddharthteotia added a commit that referenced this pull request Jun 30, 2020
Co-authored-by: Siddharth Teotia <steotia@steotia-mn1.linkedin.biz>
Jackie-Jiang added a commit to Jackie-Jiang/pinot that referenced this pull request Jul 6, 2020
Motivation:
Currently DataSource is modeled as an Operator, where values are returned
as a block complying with the Operator interface. This is confusing because
of the following reasons:
- The block contains all the documents in the segment, instead of a block
  of at most 10000 documents as in the Projection layer.
- The values are always fetched with their document ids, instead of fetched
  as a block. Currently BlockValSet interface has 2 set of APIs because of
  this, which is confusing.
- The BlockValSet returned by the DataSource is not really a value set, but
  a value reader on top of the forward index.
- Extra BlockMetadata has to be maintained which can cause unexpected
  problems (e.g. the issue fixed in apache#5619)

Changes:
- Make DataSource standalong without implementing Operator
- Add interfaces for forward index (ForwardIndexReader, ForwardIndexWriter,
  ForwardIndexReaderWriter)
- Add ColumnValueReader class to help read forward index from DataSource
- Remove the docId based APIs from BlockValSet
Jackie-Jiang added a commit that referenced this pull request Jul 6, 2020
…5625)

Changes:
- Make `DataSource` independent of the query execution (not extend `Operator`).
- Add interfaces for forward index.
- Make `ForwardIndexReaderContext` close-able so that resources (e.g. off-heap buffer) can be released.
- In `ForwardIndexReader', always read with `ForwardIndexReaderContext` and close it when the reading is done to prevent resource leak.
- Remove `SingleValueSet` and `MultiValueSet` and change caller logic to directly use forward index and prevent resource leak.
- Make `DataFetcher` directly use forward index and handle the type conversion within the class. Similar to Dictionary convention, type conversion among INT, LONG, FLOAT, DOUBLE, STRING, and between STRING and BYTES are supported.

Motivation:
Currently `DataSource` is modeled as an `Operator`, where values are returned as a block complying with the Operator interface.
This is confusing because of the following reasons:
- The block contains all the documents in the segment, instead of a block of at most 10000 documents as in the `Projection` layer.
- The values are always fetched with their document ids, instead of fetched as a block. Currently `BlockValSet` interface has 2 sets of APIs because of this, which is confusing.
- The `BlockValSet` returned by the `DataSource` is not really a value set, but a value reader on top of the forward index.
- Extra `BlockMetadata` has to be maintained which can cause unexpected problems (e.g. the issue fixed in #5619)

TODO:
There are 2 remaining places where `ForwardIndexReaderContext` needs to be closed:
- `ScanBasedDocIdIterator`
- `DataFetcher`

New inheritance hierarchy for the forward index classes:
- `ForwardIndexReader`
  - Dictionary-encoded unsorted SV: `FixedBitSVForwardIndexReader`
  - Dictionary-encoded sorted SV: `SortedIndexReaderImpl`
  - Dictionary-encoded MV: `FixedBitMVForwardIndexReader`
  - Raw fixed-byte SV: `FixedByteChunkSVForwardIndexReader`
  - Raw var-byte SV: `VarByteChunkSVForwardIndexReader`
- `MutableForwardIndex` extends `ForwardIndexReader`
  - Fixed-byte SV (for both dictionary-encoded and raw): `FixedByteSVMutableForwardIndex`
  - Var-byte SV: `VarByteSVMutableForwardIndex`
  - Fixed-byte MV (for both dictionary-encoded and raw): `FixedByteMVMutableForwardIndex`
- `ForwardIndexCreator`
  - Dictionary-encoded unsorted SV: `SingleValueUnsortedForwardIndexCreator`
  - Dictionary-encoded sorted SV: `SingleValueSortedForwardIndexCreator`
  - Dictionary-encoded MV: `MultiValueUnsortedForwardIndexCreator`
  - Raw fixed-byte SV: `SingleValueFixedByteRawIndexCreator`
  - Raw var-byte SV: `SingleValueVarByteRawIndexCreator`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Direct Memory OOM issue during upgrade to 0.4.0
4 participants