Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3548]Polygon expression processing using unknown expression and filtering performance improvement #3616

Closed

Conversation

VenuReddy2103
Copy link
Contributor

@VenuReddy2103 VenuReddy2103 commented Feb 12, 2020

Why is this PR needed?

This PR improves the query processing performance of in_polygon UDF.

What changes were proposed in this PR?

At present, PolygonExpression processing leverages the existing InExpression. PolygonExpression internally creates a InExpression as a child to it. InExpression is constructed/build from the result of Quad tree algorithm. Algorithm returns the list of ranges(with each range having min and max Id for that range). And this list is a sorted one.
InExpression constitute of 2 childs. One child is a columnExpression(for geohash column) and the other is a ListExpression( with List of LiternalExpressions. One LiteralExpression for each Id returned from algo).
Problems associated with this approach:

  • We expand the list of ranges(with each range having minand max) to all individual Ids. And create LiteralExpression for each Id. Since we can have large ranges(and the numerous ranges), it consumes huge amount of memory in processing.
  • Due to same reason, it slows does the filter execution.

Modifications with this PR:
Instead we can use UnknownExpression with RowLevelFilterResolverImpl and RowLevelFilterExecuterImpl processing. And override evaluate() method to do the binary searchon the list of ranges directly. This will significanly inprove the polygon filter query performance. And Polygon filter expression type is not required anymore at Carbon-Core module.

Does this PR introduce any user interface change?

  • No

Is any new testcase added?

  • Yes. Added an end to end test case

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/263/

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1966/

@VenuReddy2103 VenuReddy2103 changed the title Polygon expression processing using unknown expression and filtering performance improvement [CARBONDATA-3548]Polygon expression processing using unknown expression and filtering performance improvement Feb 12, 2020
buildExpression(ranges);
}

private boolean rangeBinarySearch(List<Long[]> ranges, long searchForNumber) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use collections.binarysearch()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the binary seach on ranges directly instead of expanding ranges to values in another list and doing collections.binarysearch() on the resultant one. Tried to avoid the unnecessary expansion of ranges. It can reduce the search complexity to order of ln(number of list of ranges).

}

@Override
public ExpressionResult evaluate(RowIntf value) {
throw new UnsupportedOperationException("Operation not supported for Polygon expression");
if (rangeBinarySearch(ranges, (Long) value.getVal(0))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data is already sorted. So may be in future can take advantage of it , instead of checking each row exist in a range.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah Agreed. Will think of it in future.

@ajantha-bhat
Copy link
Member

LGTM

@asfgit asfgit closed this in 8ff487f Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants