-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10371] [SQL] Implement subexpr elimination for UnsafeProjections #9480
Closed
Closed
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
f3ab2bc
[SPARK-10371] [SQL] Implement subexpr elimination for UnsafeProjections
nongli 05b59e6
Code review comments from davies.
nongli fa7aa9d
Added a few more test cases.
nongli a9bcdb0
Clean up from CR.
nongli 6cf0186
Add flag to disable.
nongli File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
106 changes: 106 additions & 0 deletions
106
...lyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.sql.catalyst.expressions | ||
|
||
import scala.collection.mutable | ||
|
||
/** | ||
* This class is used to compute equality of (sub)expression trees. Expressions can be added | ||
* to this class and they subsequently query for expression equality. Expression trees are | ||
* considered equal if for the same input(s), the same result is produced. | ||
*/ | ||
class EquivalentExpressions { | ||
/** | ||
* Wrapper around an Expression that provides semantic equality. | ||
*/ | ||
case class Expr(e: Expression) { | ||
val hash = e.semanticHash() | ||
override def equals(o: Any): Boolean = o match { | ||
case other: Expr => e.semanticEquals(other.e) | ||
case _ => false | ||
} | ||
override def hashCode: Int = hash | ||
} | ||
|
||
// For each expression, the set of equivalent expressions. | ||
private val equivalenceMap: mutable.HashMap[Expr, mutable.MutableList[Expression]] = | ||
new mutable.HashMap[Expr, mutable.MutableList[Expression]] | ||
|
||
/** | ||
* Adds each expression to this data structure, grouping them with existing equivalent | ||
* expressions. Non-recursive. | ||
* Returns if there was already a matching expression. | ||
*/ | ||
def addExpr(expr: Expression): Boolean = { | ||
if (expr.deterministic) { | ||
val e: Expr = Expr(expr) | ||
val f = equivalenceMap.get(e) | ||
if (f.isDefined) { | ||
f.get.+= (expr) | ||
true | ||
} else { | ||
equivalenceMap.put(e, mutable.MutableList(expr)) | ||
false | ||
} | ||
} else { | ||
false | ||
} | ||
} | ||
|
||
/** | ||
* Adds the expression to this datastructure recursively. Stops if a matching expression | ||
* is found. That is, if `expr` has already been added, its children are not added. | ||
* If ignoreLeaf is true, leaf nodes are ignored. | ||
*/ | ||
def addExprTree(root: Expression, ignoreLeaf: Boolean = true): Unit = { | ||
val skip = root.isInstanceOf[LeafExpression] && ignoreLeaf | ||
if (!skip && root.deterministic && !addExpr(root)) { | ||
root.children.foreach(addExprTree(_, ignoreLeaf)) | ||
} | ||
} | ||
|
||
/** | ||
* Returns all fo the expression trees that are equivalent to `e`. Returns | ||
* an empty collection if there are none. | ||
*/ | ||
def getEquivalentExprs(e: Expression): Seq[Expression] = { | ||
equivalenceMap.get(Expr(e)).getOrElse(mutable.MutableList()) | ||
} | ||
|
||
/** | ||
* Returns all the equivalent sets of expressions. | ||
*/ | ||
def getAllEquivalentExprs: Seq[Seq[Expression]] = { | ||
equivalenceMap.values.map(_.toSeq).toSeq | ||
} | ||
|
||
/** | ||
* Returns the state of the datastructure as a string. If all is false, skips sets of equivalent | ||
* expressions with cardinality 1. | ||
*/ | ||
def debugString(all: Boolean = false): String = { | ||
val sb: mutable.StringBuilder = new StringBuilder() | ||
sb.append("Equivalent expressions:\n") | ||
equivalenceMap.foreach { case (k, v) => { | ||
if (all || v.length > 1) { | ||
sb.append(" " + v.mkString(", ")).append("\n") | ||
} | ||
}} | ||
sb.toString() | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all of the expressions are the case class, probably we don't need to our own way to
computeHash
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Looking at thecomments on semanticEquals, we want to ignore cosmetic differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identical hash values doesn't mean the identical values, I am suggesting to use the
hashCode
plussemanticEquals
to identity the common expression, that's also the motivation ofsemanticEquals
. SeeAttributeReference
for details.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's what I do. I put the exprs in a hash set using semantic hash/semantic equals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I am still confusing, why we can't use the
hashCode
instead.The motivation of
semanticEquals
is to ignore theAttributeReference.name
in comparison forAttributeReference
, and theAttributeReference.hashCode
also does ignore theAttributeReference.name
, I don't think thecosmetic differences
really exists forhashCode
. Isn't it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And there is more discussion on the
semanticEquals
can be found at #6587, even, I don't think we need thesemanticEquals
if we changed the implementation ofAttributeReference.equals
, as it does make lots of code complicated like this one, by using thesemanticEquals
, other thanequals
or==
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick look at 6587 and I agree with michael about semantic equals. I think when equivalence classes are added, semantic equals is even more different than equals.
That being said, this patch is not the motivation to do this. If we decide to remove semanticEquals, this patch can be updated trivially to use equals.
Regarding hashCode vs semanticHash code, I think it does no? It looks to me like the hash everything, including the cosmetic stuff but please correct me if I'm wrong. In general, i think it makes sense to implement hash if you implement equals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you look at the methods
equals
,semanticEquals
andhashCode
of theAttributeReference
, you will see that they are not matched, asequals
will take consideration of thename
, but the other 2 are not, that's why I am thinking we can also use thehashCode
, instead of adding the new methodsemanticHash
.Anyway, it's not an external API, we can change it back anytime, as 1.6 is almost code freeze, and this is critical for people now.