[CARBONDATA-3548]Polygon expression processing using unknown expression and filtering performance improvement #3616

VenuReddy2103 · 2020-02-12T11:55:56Z

Why is this PR needed?

This PR improves the query processing performance of in_polygon UDF.

What changes were proposed in this PR?

At present, PolygonExpression processing leverages the existing InExpression. PolygonExpression internally creates a InExpression as a child to it. InExpression is constructed/build from the result of Quad tree algorithm. Algorithm returns the list of ranges(with each range having min and max Id for that range). And this list is a sorted one.
InExpression constitute of 2 childs. One child is a columnExpression(for geohash column) and the other is a ListExpression( with List of LiternalExpressions. One LiteralExpression for each Id returned from algo).
Problems associated with this approach:

We expand the list of ranges(with each range having minand max) to all individual Ids. And create LiteralExpression for each Id. Since we can have large ranges(and the numerous ranges), it consumes huge amount of memory in processing.
Due to same reason, it slows does the filter execution.

Modifications with this PR:
Instead we can use UnknownExpression with RowLevelFilterResolverImpl and RowLevelFilterExecuterImpl processing. And override evaluate() method to do the binary searchon the list of ranges directly. This will significanly inprove the polygon filter query performance. And Polygon filter expression type is not required anymore at Carbon-Core module.

Does this PR introduce any user interface change?

No

Is any new testcase added?

Yes. Added an end to end test case

…imization

CarbonDataQA1 · 2020-02-12T12:15:40Z

Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/263/

CarbonDataQA1 · 2020-02-12T13:13:47Z

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1966/

ajantha-bhat · 2020-02-12T14:03:01Z

geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonExpression.java

-    buildExpression(ranges);
+  }
+
+  private boolean rangeBinarySearch(List<Long[]> ranges, long searchForNumber) {


Please use collections.binarysearch()

Did the binary seach on ranges directly instead of expanding ranges to values in another list and doing collections.binarysearch() on the resultant one. Tried to avoid the unnecessary expansion of ranges. It can reduce the search complexity to order of ln(number of list of ranges).

ajantha-bhat · 2020-02-12T14:14:06Z

geo/src/main/java/org/apache/carbondata/geo/scan/expression/PolygonExpression.java

  }

  @Override
  public ExpressionResult evaluate(RowIntf value) {
-    throw new UnsupportedOperationException("Operation not supported for Polygon expression");
+    if (rangeBinarySearch(ranges, (Long) value.getVal(0))) {


Data is already sorted. So may be in future can take advantage of it , instead of checking each row exist in a range.

Yeah Agreed. Will think of it in future.

ajantha-bhat · 2020-02-12T14:48:34Z

LGTM

Polygon expression processing using unknown expression and filter opt…

d504ad3

…imization

VenuReddy2103 changed the title ~~Polygon expression processing using unknown expression and filtering performance improvement~~ [CARBONDATA-3548]Polygon expression processing using unknown expression and filtering performance improvement Feb 12, 2020

ajantha-bhat reviewed Feb 12, 2020

View reviewed changes

asfgit closed this in 8ff487f Feb 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CARBONDATA-3548]Polygon expression processing using unknown expression and filtering performance improvement #3616

[CARBONDATA-3548]Polygon expression processing using unknown expression and filtering performance improvement #3616

VenuReddy2103 commented Feb 12, 2020 •

edited

CarbonDataQA1 commented Feb 12, 2020

CarbonDataQA1 commented Feb 12, 2020

ajantha-bhat Feb 12, 2020

VenuReddy2103 Feb 12, 2020

ajantha-bhat Feb 12, 2020

VenuReddy2103 Feb 12, 2020

ajantha-bhat commented Feb 12, 2020

[CARBONDATA-3548]Polygon expression processing using unknown expression and filtering performance improvement #3616

[CARBONDATA-3548]Polygon expression processing using unknown expression and filtering performance improvement #3616

Conversation

VenuReddy2103 commented Feb 12, 2020 • edited

Why is this PR needed?

What changes were proposed in this PR?

Does this PR introduce any user interface change?

Is any new testcase added?

CarbonDataQA1 commented Feb 12, 2020

CarbonDataQA1 commented Feb 12, 2020

ajantha-bhat Feb 12, 2020

Choose a reason for hiding this comment

VenuReddy2103 Feb 12, 2020

Choose a reason for hiding this comment

ajantha-bhat Feb 12, 2020

Choose a reason for hiding this comment

VenuReddy2103 Feb 12, 2020

Choose a reason for hiding this comment

ajantha-bhat commented Feb 12, 2020

VenuReddy2103 commented Feb 12, 2020 •

edited