[CARBONDATA-3922] Support order by limit push down for secondary index queries #3861

ajantha-bhat · 2020-07-23T12:57:53Z

Why is this PR needed?

a) Limit pushdown for SI is already supported. But when order by column is not SI column, Still we were pushing down limit. Need to fix it.
b) when Limit is present and order by column and all the filter column is SI column. we can pushdown order by + limit.
This can reduce SI output results and reduce the scan time in main table.
c) SI transformation rule is applied even though any relation don't contain SI

What changes were proposed in this PR?

a) Block limit push down if order by column is not an SI column
b) when Limit is present and order by column and all the filter column is SI column, pushdown order by + limit
c) SI transformation rule need to apply only when any relation contains SI

Does this PR introduce any user interface change?

No

Is any new testcase added?

Yes

CarbonDataQA1 · 2020-07-23T15:16:34Z

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3481/

CarbonDataQA1 · 2020-07-23T15:19:21Z

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1739/

vikramahuja1001 · 2020-08-18T07:44:02Z

@ajantha-bhat please rebase

vikramahuja1001 · 2020-08-18T07:44:11Z

retest this please

CarbonDataQA1 · 2020-08-18T09:44:13Z

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2017/

CarbonDataQA1 · 2020-08-18T10:02:47Z

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3758/

ajantha-bhat · 2020-08-25T07:22:52Z

retest this please

CarbonDataQA1 · 2020-08-25T09:09:50Z

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3855/

CarbonDataQA1 · 2020-08-25T09:13:49Z

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2115/

CarbonDataQA1 · 2020-08-29T06:05:33Z

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3911/

CarbonDataQA1 · 2020-08-29T06:12:56Z

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2170/

CarbonDataQA1 · 2020-08-29T12:16:32Z

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3912/

CarbonDataQA1 · 2020-08-29T12:18:45Z

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2171/

ajantha-bhat · 2020-08-29T13:42:17Z

@QiangCai , @akashrn5 @kunal642 : please check and merge

CarbonDataQA1 · 2020-08-29T15:37:35Z

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2172/

CarbonDataQA1 · 2020-08-29T15:40:40Z

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3913/

vikramahuja1001 · 2020-08-31T16:00:52Z

@ajantha-bhat , can test cases be added to check no filter pushdown in not equal to case?

ajantha-bhat · 2020-08-31T16:57:25Z

@ajantha-bhat , can test cases be added to check no filter pushdown in not equal to case?

@vikramahuja1001 : Done added.

kunal642 · 2020-08-31T17:20:21Z

...main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala

+      case attr: AttributeReference =>
+        attr.name.toLowerCase
+    }
+    val filterAttributes = filter.condition collect {


is filterAttributes same as originalFilterAttributes?? code looks to be same

Agree. it was overlooked I guess. we cannot compare here. I moved this comparison in createIndexFilterDataFrame where I decide needPushDown

kunal642 · 2020-08-31T17:26:11Z

...main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala

+      filterAttributes.toSet.asJava,
+      CarbonIndexUtil.getSecondaryIndexes(indexTableRelation).mapValues(_.toList.asJava).asJava)
+      .asScala
+    val databaseName = filter.child.asInstanceOf[LogicalRelation].relation


why not use indexTableRelation.carbonRelation.databaseName

kunal642 · 2020-08-31T17:42:41Z

...main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala

+        .map(_.child.asInstanceOf[AttributeReference].name.toLowerCase())
+        .toSet
+      val indexCarbonTable = CarbonEnv
+        .getCarbonTable(Some(databaseName), enabledMatchingIndexTables.head)(sparkSession)


use indexTableRelation.carbonTable to get indexCarbonTable

indexTableRelation.carbonTable will have main table. Hence this code.

This part of code is referred from the existing code. Let me rename it here

kunal642 · 2020-08-31T17:44:47Z

...main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala

+        .toSet
+      val indexCarbonTable = CarbonEnv
+        .getCarbonTable(Some(databaseName), enabledMatchingIndexTables.head)(sparkSession)
+      var allColumnsFound = true


use forall to check whether all columns exists or not

kunal642 · 2020-08-31T17:49:59Z

...main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala

+
+  // by default do not push down notNull filter,
+  // but for orderby limit push down, push down notNull filter also. Else we get wrong results.
+  var pushDownNotNullFilter : Boolean = _


Why not keep these as local variables in transformFilterToJoin and pass to rewritePlanForSecondaryIndex()?

because too many functions need to change to pass arguments.

I used default arguments and changed required places now. so it is local variable now

CarbonDataQA1 · 2020-08-31T18:52:35Z

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3934/

CarbonDataQA1 · 2020-08-31T18:53:02Z

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2192/

kunal642 · 2020-09-01T14:13:20Z

...main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala

+        .toSet
+      val indexCarbonTable = CarbonEnv
+        .getCarbonTable(Some(databaseName), enabledMatchingIndexTables.head)(sparkSession)
+      if (sortColumns.forall { x => indexCarbonTable.getColumnByName(x) != null }) {


directly return sortColumns.forall { x => indexCarbonTable.getColumnByName(x) != null }

kunal642 · 2020-09-01T14:15:49Z

...main/scala/org/apache/spark/sql/secondaryindex/optimizer/CarbonSecondaryIndexOptimizer.scala

+          case _ =>
+        }
+        (limit, transformChild)
+      case limit@Limit(literal: Literal, _@Project(_, child)) if child.isInstanceOf[Sort] =>


if you use the following then you will not have to check for isInstanceOf of cast the child to Sort.

case limit@Limit(literal: Literal, _@Project(_, Sort(_, _)))

CarbonDataQA1 · 2020-09-01T16:48:43Z

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2213/

CarbonDataQA1 · 2020-09-01T16:49:45Z

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3953/

ajantha-bhat · 2020-09-02T05:20:18Z

@kunal642 : PR is ready. please check and merge

CarbonDataQA1 · 2020-09-04T16:49:50Z

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3976/

CarbonDataQA1 · 2020-09-04T16:51:06Z

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2236/

ajantha-bhat · 2020-09-04T16:58:53Z

retest this please

CarbonDataQA1 · 2020-09-04T18:57:06Z

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3978/

CarbonDataQA1 · 2020-09-04T18:58:50Z

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2238/

ajantha-bhat · 2020-09-07T04:07:10Z

Build passed after rebase. @kunal642 / @QiangCai : please check and merge.

kunal642 · 2020-09-10T15:32:08Z

LGTM

ajantha-bhat · 2020-09-10T15:32:59Z

retest this please

CarbonDataQA1 · 2020-09-10T17:28:51Z

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4041/

CarbonDataQA1 · 2020-09-10T17:38:12Z

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2303/

ajantha-bhat changed the title ~~[CARBONDATA-3922] Support order limit by push down for secondary index queries~~ [CARBONDATA-3922] Support order by limit push down for secondary index queries Jul 23, 2020

ajantha-bhat force-pushed the doc branch from 093d4fe to b63f635 Compare August 29, 2020 04:02

ajantha-bhat mentioned this pull request Aug 29, 2020

[WIP]SI fix for not equal to filter #3895

Closed

ajantha-bhat force-pushed the doc branch from b63f635 to fe234b9 Compare August 29, 2020 10:26

ajantha-bhat force-pushed the doc branch from fe234b9 to 7a23d48 Compare August 29, 2020 13:39

ajantha-bhat force-pushed the doc branch from 7a23d48 to aabb5e1 Compare August 31, 2020 16:56

kunal642 reviewed Aug 31, 2020

View reviewed changes

ajantha-bhat force-pushed the doc branch from aabb5e1 to 0672ca9 Compare September 1, 2020 13:39

kunal642 reviewed Sep 1, 2020

View reviewed changes

ajantha-bhat force-pushed the doc branch from 0672ca9 to acd3330 Compare September 1, 2020 14:49

SI order by push down

514f2d5

ajantha-bhat force-pushed the doc branch from acd3330 to 514f2d5 Compare September 4, 2020 14:50

asfgit closed this in a59aec7 Sep 11, 2020

[CARBONDATA-3922] Support order by limit push down for secondary index queries #3861

[CARBONDATA-3922] Support order by limit push down for secondary index queries #3861

Conversation

ajantha-bhat commented Jul 23, 2020 • edited

Why is this PR needed?

What changes were proposed in this PR?

Does this PR introduce any user interface change?

Is any new testcase added?

CarbonDataQA1 commented Jul 23, 2020

CarbonDataQA1 commented Jul 23, 2020

vikramahuja1001 commented Aug 18, 2020

vikramahuja1001 commented Aug 18, 2020

CarbonDataQA1 commented Aug 18, 2020

CarbonDataQA1 commented Aug 18, 2020

ajantha-bhat commented Aug 25, 2020

CarbonDataQA1 commented Aug 25, 2020

CarbonDataQA1 commented Aug 25, 2020

CarbonDataQA1 commented Aug 29, 2020

CarbonDataQA1 commented Aug 29, 2020

CarbonDataQA1 commented Aug 29, 2020

CarbonDataQA1 commented Aug 29, 2020

ajantha-bhat commented Aug 29, 2020

CarbonDataQA1 commented Aug 29, 2020

CarbonDataQA1 commented Aug 29, 2020

vikramahuja1001 commented Aug 31, 2020

ajantha-bhat commented Aug 31, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarbonDataQA1 commented Aug 31, 2020

CarbonDataQA1 commented Aug 31, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarbonDataQA1 commented Sep 1, 2020

CarbonDataQA1 commented Sep 1, 2020

ajantha-bhat commented Sep 2, 2020

CarbonDataQA1 commented Sep 4, 2020

CarbonDataQA1 commented Sep 4, 2020

ajantha-bhat commented Sep 4, 2020

CarbonDataQA1 commented Sep 4, 2020

CarbonDataQA1 commented Sep 4, 2020

ajantha-bhat commented Sep 7, 2020

kunal642 commented Sep 10, 2020

ajantha-bhat commented Sep 10, 2020

CarbonDataQA1 commented Sep 10, 2020

CarbonDataQA1 commented Sep 10, 2020

ajantha-bhat commented Jul 23, 2020 •

edited