Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-6644: Don't reserve space for incoming probe batches unnecessarily during the build phase. #1409

Closed
wants to merge 1 commit into from

Conversation

Projects
None yet
4 participants
@ilooner
Copy link
Contributor

commented Jul 30, 2018

Problem

Previously the memory calculator for HashJoin was reserving space for the worst case sized incoming probe batch. This is actually completely unnecessary since we won't read probe side data before the build phase anymore after DRILL-6453, so this effectively causes memory to be underutilized.

Solution

There are two cases we have to handle:

  • We received probe side data when fetching the schema. In this case we only have to account for the memory consumed by the probe data we received in the build phase. Not the worst case incoming probe batch size.
  • We received NO probe side data when fetching the schema for the probe side. In this case we don't have to reserve any memory for probe data during the build phase.

Prerequisite

The PR for DRILL-6453 must be merged first before this can go in. In this PR only look at the changes in the last commit.

#1408

@ilooner ilooner requested a review from Ben-Zvi Jul 30, 2018

@@ -391,7 +391,7 @@ private void calculateMemoryUsage()
// probe batch we sniffed.
// TODO when batch sizing project is complete we won't have to sniff probe batches since
// they will have a well defined size.
reservedMemory = incompletePartitionsBatchSizes + maxBuildBatchSize + maxProbeBatchSize;
reservedMemory = incompletePartitionsBatchSizes + maxBuildBatchSize + probeSizePredictor.getBatchSize();

This comment has been minimized.

Copy link
@Ben-Zvi

Ben-Zvi Jul 31, 2018

Contributor

While experimenting, I noticed that the size of the incoming probe-side batch is not charged to the hash-Join allocator's available memory (basically I set the available memory to zero before the next() call, and never got an OOM). So is this addition needed ?

This comment has been minimized.

Copy link
@ilooner

ilooner Aug 3, 2018

Author Contributor

I'm not sure. I think we need to get some clarification for the desired behavior of operators. If the Drill engine is operating under the assumption that ownership of VectorContainers passes to a downstream operator after a call to next, then this is a bug that should be fixed.

I will ask a question on the dev list about this.

This comment has been minimized.

Copy link
@ilooner

ilooner Aug 3, 2018

Author Contributor

@sohami explained that it is up to the operator whether or not to transfer ownership of the incoming vector batch and described the two primary use cases for transferring or not transferring ownership. Looking at the HashJoin code we never transfer ownership of the incoming probe batch and copy the probe data directly to the outgoing batch. This explains why you never got an OOM.

So I think you are right, we should not be reserving space for the incoming probe batch, since it is never owned by HashJoin's allocator. Thanks for catching this @Ben-Zvi I will update the PR.

This comment has been minimized.

Copy link
@ilooner

ilooner Aug 3, 2018

Author Contributor

@Ben-Zvi thought about this a bit more and there is a complication with spilling. If we are doing the join on a spilled partition the HashJoin operator will actually own the probe batch. So during the first cycle we don't want to account for the probe batch because the upstream operator will own it, but during the other spill cycles we do want to account for them because we will own them.

@ilooner ilooner force-pushed the ilooner:DRILL-6644 branch from b34db79 to 5de6ff9 Aug 6, 2018


if (!firstCycle) {
// If this is NOT the first cycle the HashJoin operator owns the probe batch and we need to reserve space for it.
reservedMemory += probeSizePredictor.getBatchSize();

This comment has been minimized.

Copy link
@Ben-Zvi

Ben-Zvi Aug 8, 2018

Contributor

Sounds correct, but note that this method calculateMemoryUsage() is invoked from the innerNext() which is called (recursively) after both the first Build and first Probe side spilled batches are already read.

  • In line 363 maxProbeBatchSize is calculated; how different is it from getBatchSize() here ?
  • Should the build side spilled "incoming" batch size also be taken into account ?

This comment has been minimized.

Copy link
@ilooner

ilooner Aug 8, 2018

Author Contributor
  • This should work correctly with spilling. innerNext calls executeBuild which creates a new calculator each time. The calculator is initialized with the current build and probe batches, so the calculator will always be initialized with the correct state.
  • You are right, there should be no difference between maxProbeBatchSize and getBatchSize() now, since this code will only be activated when reading spilled batches. This is leftover logic from when the calculator would compute the worst case incoming probe batch size during the first cycle. I will consolidate these two sizes to clean things up.
  • The build side incoming batch is taken into account on line 390. Note: this code still reserves memory for the incoming build side batch during the first cycle, which is not correct. I want to fix that issue in a separate PR though, since these changes get quite large and require extensive changes to the unit tests.
@ilooner

This comment has been minimized.

Copy link
Contributor Author

commented Aug 8, 2018

@Ben-Zvi I have addressed review comments. See the latest commit for consolidating maxProbeBatchSize. The code is much simpler now.

@ilooner ilooner force-pushed the ilooner:DRILL-6644 branch from f803797 to 514e7b2 Aug 13, 2018

@Ben-Zvi
Copy link
Contributor

left a comment

+1 Good work ....

@@ -379,7 +379,7 @@ public void testLeftHashJoinLeftEmpty() throws Exception {
.build();

RecordBatch joinBatch = new PopBuilder()
.physicalOperator(new HashJoinPOP(null, null, Lists.newArrayList(joinCond("a", "EQUALS", "a2")), JoinRelType.LEFT))
.physicalOperator(new HashJoinPOP(null, null, Lists.newArrayList(joinCond("a2", "EQUALS", "a")), JoinRelType.LEFT))

This comment has been minimized.

Copy link
@Ben-Zvi

Ben-Zvi Aug 15, 2018

Contributor

Good catch .... so it took the calculator to fail to note that the columns were mismatched.

@ilooner ilooner force-pushed the ilooner:DRILL-6644 branch from 514e7b2 to 795d8f8 Aug 15, 2018

@ilooner

This comment has been minimized.

Copy link
Contributor Author

commented Aug 15, 2018

@Ben-Zvi thanks for the review.

vvysotskyi added a commit to vvysotskyi/drill that referenced this pull request Aug 18, 2018

@vvysotskyi

This comment has been minimized.

Copy link
Member

commented Aug 18, 2018

@ilooner, this PR causes Advanced tests failures:

tpcds/tpcds_sf1/original/parquet/query11.sql
tpcds/tpcds_sf1/original/parquet/query4.sql

java.sql.SQLException: SYSTEM ERROR: IllegalStateException

Fragment 35:1

[Error Id: 81d1f358-6871-4ab4-a79c-d632b6e5c1ae on cv1:31010]

  (java.lang.IllegalStateException) null
    com.google.common.base.Preconditions.checkState():158
    org.apache.drill.exec.physical.impl.join.BatchSizePredictorImpl.predictBatchSize():76
    org.apache.drill.exec.physical.impl.join.HashJoinMemoryCalculatorImpl$PostBuildCalculationsImpl.initialize():635
    org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():844
    org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():414
    org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():544
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():142
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():142
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.test.generated.HashAggregatorGen75169.doWork():649
    org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():273
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():142
    org.apache.drill.exec.record.AbstractRecordBatch.next():172
    org.apache.drill.exec.physical.impl.BaseRootExec.next():103
    org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
    org.apache.drill.exec.physical.impl.BaseRootExec.next():93
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():294
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():281
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1595
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():281
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1149
    java.util.concurrent.ThreadPoolExecutor$Worker.run():624
    java.lang.Thread.run():748

Could you please take a look?

@arina-ielchiieva

This comment has been minimized.

Copy link
Member

commented Aug 20, 2018

@ilooner I'll remove ready-to-commit label, please put it back when test failures are resolved.

@ilooner ilooner force-pushed the ilooner:DRILL-6644 branch 5 times, most recently from b798990 to 3746eb9 Aug 20, 2018

@ilooner ilooner force-pushed the ilooner:DRILL-6644 branch from 3746eb9 to 4919de4 Aug 23, 2018

@ilooner

This comment has been minimized.

Copy link
Contributor Author

commented Aug 23, 2018

@vvysotskyi Thanks for catching the failures. @arina-ielchiieva I have fixed the failures adding back the ready-to-commit label.

@asfgit asfgit closed this in e9ffb5b Aug 27, 2018

mattpollack added a commit to mattpollack/drill that referenced this pull request Feb 25, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.