Skip to content

Fix selection early termination test#5167

Closed
siddharthteotia wants to merge 1 commit intoapache:masterfrom
siddharthteotia:sel_early
Closed

Fix selection early termination test#5167
siddharthteotia wants to merge 1 commit intoapache:masterfrom
siddharthteotia:sel_early

Conversation

@siddharthteotia
Copy link
Copy Markdown
Contributor

We are seeing some failures in our automation jobs where Selection Early termination test is failing spuriously when verifying the numDocsScanned.

The test assumes that each single thread would have done it's part of the work inside SelectionOnlyOperator.nextBlock() and returned from there.

Take an example:

LIMIT: 20480
threads per server: 5
num servers: 2
num segments (operators) per thread: 2
docs per segment: 30000

So as per the changes made in early termination PR, each thread will break out immediately after finishing the nextBlock() call on the first operator/segment.

We check for numDocsScanned as limit * num servers * num threads per server => 20480 * 2 * 5 = 204800

Now consider the fact that 5th thread is a bit slow -- it is still executing the SelectionOnlyOperator.nextBlock() code and continuously updating it's local numDocsScanned

The root thread in CombineOperator has already broken out because it has found a mergedBlock across all worker threads that has records >= LIMIT

Next we compute total numDocsScanned across all operators (10) for statistics so it adds up something like this:

20480 + 20480 + 20480 + 20480 + X (X is slightly less than 20480) + 0 + 0 + 0 + 0 + 0 => something just under 204800

In the test we should assert for expected range of numDocsScanned considering the max would be when each of the 5 threads finished the first segment thus leading to a perfect total of 204800

Also added a TODO for a small optimization in CombineOperator

Copy link
Copy Markdown
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

}
});
}
// TODO: we can track a volatile variable "desiredLimitReached"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will cancel the future in line 187. Here we should catch the InterruptedException and silently return, or we might end up logging lots of invalid ERROR messages.

@@ -62,7 +63,8 @@ public void testSelectOnlyQuery() {
assertNull(brokerResponse.getResultTable());
assertEquals(brokerResponse.getNumSegmentsProcessed(), numSegmentsPerServer * NUM_SERVERS);
assertEquals(brokerResponse.getNumSegmentsMatched(), numThreadsPerServer * NUM_SERVERS);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numSegmentsMatched should be between NUM_SERVERS to numThreadsPerServer * NUM_SERVERS

assertEquals(brokerResponse.getNumSegmentsMatched(), numThreadsPerServer * NUM_SERVERS);
assertEquals(brokerResponse.getNumDocsScanned(), numThreadsPerServer * NUM_SERVERS * limit);
long numDocsScanned = brokerResponse.getNumDocsScanned();
assertTrue(limit <= numDocsScanned && numDocsScanned <= numThreadsPerServer * NUM_SERVERS * limit);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertTrue(limit <= numDocsScanned && numDocsScanned <= numThreadsPerServer * NUM_SERVERS * limit);
assertTrue(NUM_SERVERS * limit <= numDocsScanned && numDocsScanned <= numThreadsPerServer * NUM_SERVERS * limit);

assertEquals(brokerResponse.getNumSegmentsMatched(), numThreadsPerServer * NUM_SERVERS);
assertEquals(brokerResponse.getNumDocsScanned(), numThreadsPerServer * NUM_SERVERS * limit);
numDocsScanned = brokerResponse.getNumDocsScanned();
assertTrue(limit <= numDocsScanned && numDocsScanned <= numThreadsPerServer * NUM_SERVERS * limit);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertTrue(limit <= numDocsScanned && numDocsScanned <= numThreadsPerServer * NUM_SERVERS * limit);
assertTrue(NUM_SERVERS * limit <= numDocsScanned && numDocsScanned <= numThreadsPerServer * NUM_SERVERS * limit);

Jackie-Jiang added a commit to Jackie-Jiang/pinot that referenced this pull request Mar 23, 2020
Credit to: siddharthteotia
Replace: apache#5167
Jackie-Jiang added a commit that referenced this pull request Mar 23, 2020
Credit to: siddharthteotia
Replace: #5167
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants