New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-3922] Support order by limit push down for secondary index queries #3861
Conversation
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3481/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1739/ |
@ajantha-bhat please rebase |
retest this please |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2017/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3758/ |
retest this please |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3855/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2115/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3911/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2170/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3912/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2171/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2172/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3913/ |
@ajantha-bhat , can test cases be added to check no filter pushdown in not equal to case? |
@vikramahuja1001 : Done added. |
case attr: AttributeReference => | ||
attr.name.toLowerCase | ||
} | ||
val filterAttributes = filter.condition collect { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is filterAttributes same as originalFilterAttributes?? code looks to be same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. it was overlooked I guess. we cannot compare here. I moved this comparison in createIndexFilterDataFrame
where I decide needPushDown
filterAttributes.toSet.asJava, | ||
CarbonIndexUtil.getSecondaryIndexes(indexTableRelation).mapValues(_.toList.asJava).asJava) | ||
.asScala | ||
val databaseName = filter.child.asInstanceOf[LogicalRelation].relation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use indexTableRelation.carbonRelation.databaseName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
.map(_.child.asInstanceOf[AttributeReference].name.toLowerCase()) | ||
.toSet | ||
val indexCarbonTable = CarbonEnv | ||
.getCarbonTable(Some(databaseName), enabledMatchingIndexTables.head)(sparkSession) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use indexTableRelation.carbonTable to get indexCarbonTable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indexTableRelation.carbonTable will have main table. Hence this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of code is referred from the existing code. Let me rename it here
.toSet | ||
val indexCarbonTable = CarbonEnv | ||
.getCarbonTable(Some(databaseName), enabledMatchingIndexTables.head)(sparkSession) | ||
var allColumnsFound = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use forall to check whether all columns exists or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
// by default do not push down notNull filter, | ||
// but for orderby limit push down, push down notNull filter also. Else we get wrong results. | ||
var pushDownNotNullFilter : Boolean = _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not keep these as local variables in transformFilterToJoin and pass to rewritePlanForSecondaryIndex()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because too many functions need to change to pass arguments.
I used default arguments and changed required places now. so it is local variable now
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3934/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2192/ |
.toSet | ||
val indexCarbonTable = CarbonEnv | ||
.getCarbonTable(Some(databaseName), enabledMatchingIndexTables.head)(sparkSession) | ||
if (sortColumns.forall { x => indexCarbonTable.getColumnByName(x) != null }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
directly return sortColumns.forall { x => indexCarbonTable.getColumnByName(x) != null }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. done
case _ => | ||
} | ||
(limit, transformChild) | ||
case limit@Limit(literal: Literal, _@Project(_, child)) if child.isInstanceOf[Sort] => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use the following then you will not have to check for isInstanceOf of cast the child to Sort.
case limit@Limit(literal: Literal, _@Project(_, Sort(_, _)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2213/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3953/ |
@kunal642 : PR is ready. please check and merge |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3976/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2236/ |
retest this please |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3978/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2238/ |
LGTM |
retest this please |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4041/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2303/ |
Why is this PR needed?
a) Limit pushdown for SI is already supported. But when order by column is not SI column, Still we were pushing down limit. Need to fix it.
b) when Limit is present and order by column and all the filter column is SI column. we can pushdown order by + limit.
This can reduce SI output results and reduce the scan time in main table.
c) SI transformation rule is applied even though any relation don't contain SI
What changes were proposed in this PR?
a) Block limit push down if order by column is not an SI column
b) when Limit is present and order by column and all the filter column is SI column, pushdown order by + limit
c) SI transformation rule need to apply only when any relation contains SI
Does this PR introduce any user interface change?
Is any new testcase added?