Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3922] Support order by limit push down for secondary index queries #3861

Closed
wants to merge 1 commit into from

Conversation

ajantha-bhat
Copy link
Member

@ajantha-bhat ajantha-bhat commented Jul 23, 2020

Why is this PR needed?

a) Limit pushdown for SI is already supported. But when order by column is not SI column, Still we were pushing down limit. Need to fix it.
b) when Limit is present and order by column and all the filter column is SI column. we can pushdown order by + limit.
This can reduce SI output results and reduce the scan time in main table.
c) SI transformation rule is applied even though any relation don't contain SI

What changes were proposed in this PR?

a) Block limit push down if order by column is not an SI column
b) when Limit is present and order by column and all the filter column is SI column, pushdown order by + limit
c) SI transformation rule need to apply only when any relation contains SI

Does this PR introduce any user interface change?

  • No

Is any new testcase added?

  • Yes

@ajantha-bhat ajantha-bhat changed the title [CARBONDATA-3922] Support order limit by push down for secondary index queries [CARBONDATA-3922] Support order by limit push down for secondary index queries Jul 23, 2020
@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3481/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1739/

@vikramahuja1001
Copy link
Contributor

@ajantha-bhat please rebase

@vikramahuja1001
Copy link
Contributor

retest this please

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2017/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3758/

@ajantha-bhat
Copy link
Member Author

retest this please

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3855/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2115/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3911/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2170/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3912/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2171/

@ajantha-bhat
Copy link
Member Author

@QiangCai , @akashrn5 @kunal642 : please check and merge

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2172/

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3913/

@vikramahuja1001
Copy link
Contributor

@ajantha-bhat , can test cases be added to check no filter pushdown in not equal to case?

@ajantha-bhat
Copy link
Member Author

@ajantha-bhat , can test cases be added to check no filter pushdown in not equal to case?

@vikramahuja1001 : Done added.

case attr: AttributeReference =>
attr.name.toLowerCase
}
val filterAttributes = filter.condition collect {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is filterAttributes same as originalFilterAttributes?? code looks to be same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. it was overlooked I guess. we cannot compare here. I moved this comparison in createIndexFilterDataFrame where I decide needPushDown

filterAttributes.toSet.asJava,
CarbonIndexUtil.getSecondaryIndexes(indexTableRelation).mapValues(_.toList.asJava).asJava)
.asScala
val databaseName = filter.child.asInstanceOf[LogicalRelation].relation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use indexTableRelation.carbonRelation.databaseName

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.map(_.child.asInstanceOf[AttributeReference].name.toLowerCase())
.toSet
val indexCarbonTable = CarbonEnv
.getCarbonTable(Some(databaseName), enabledMatchingIndexTables.head)(sparkSession)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use indexTableRelation.carbonTable to get indexCarbonTable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexTableRelation.carbonTable will have main table. Hence this code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of code is referred from the existing code. Let me rename it here

.toSet
val indexCarbonTable = CarbonEnv
.getCarbonTable(Some(databaseName), enabledMatchingIndexTables.head)(sparkSession)
var allColumnsFound = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use forall to check whether all columns exists or not

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// by default do not push down notNull filter,
// but for orderby limit push down, push down notNull filter also. Else we get wrong results.
var pushDownNotNullFilter : Boolean = _
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep these as local variables in transformFilterToJoin and pass to rewritePlanForSecondaryIndex()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because too many functions need to change to pass arguments.

I used default arguments and changed required places now. so it is local variable now

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3934/

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2192/

.toSet
val indexCarbonTable = CarbonEnv
.getCarbonTable(Some(databaseName), enabledMatchingIndexTables.head)(sparkSession)
if (sortColumns.forall { x => indexCarbonTable.getColumnByName(x) != null }) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directly return sortColumns.forall { x => indexCarbonTable.getColumnByName(x) != null }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. done

case _ =>
}
(limit, transformChild)
case limit@Limit(literal: Literal, _@Project(_, child)) if child.isInstanceOf[Sort] =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you use the following then you will not have to check for isInstanceOf of cast the child to Sort.

case limit@Limit(literal: Literal, _@Project(_, Sort(_, _)))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2213/

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3953/

@ajantha-bhat
Copy link
Member Author

@kunal642 : PR is ready. please check and merge

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3976/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2236/

@ajantha-bhat
Copy link
Member Author

retest this please

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3978/

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2238/

@ajantha-bhat
Copy link
Member Author

Build passed after rebase. @kunal642 / @QiangCai : please check and merge.

@kunal642
Copy link
Contributor

LGTM

@ajantha-bhat
Copy link
Member Author

retest this please

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4041/

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2303/

@asfgit asfgit closed this in a59aec7 Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants