New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-15825][SQL] Fix SMJ invalid results #13589
Conversation
Test build #60268 has finished for PR 13589 at commit
|
cc @davies mystery of INPUT_ROW... |
@@ -490,6 +490,7 @@ class CodegenContext { | |||
addNewFunction(compareFunc, funcCode) | |||
s"this.$compareFunc($c1, $c2)" | |||
case schema: StructType => | |||
INPUT_ROW = "i" | |||
val comparisons = GenerateOrdering.genComparisons(this, schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to use InternalRow $INPUT_ROW = null;
in an assignment for funcCode
, to clearly show intention of this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kiszk i
is set to null
here: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L498
I'd rather not change this to INPUT_ROW
since this could potentially make things even more confusing.
Still getting seg faults on Power and with Intel on OpenJDK with this change when we use spark-submit with two worker instances. Git diff --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
segv occurs four times cat work/app-20160610093117-0000/*/stderr | grep -C 2 "Segmentation error" spark-env.sh has |
As Adam says I still get the segv with OpenJDK on linux amd64 running our app. This fix does appear to fix the issue reported in https://issues.apache.org/jira/browse/SPARK-15825 |
Test build #60302 has finished for PR 13589 at commit
|
This line
requires that INPUT_ROW should be |
Merging this into master and 2.0, thanks! |
## What changes were proposed in this pull request? Code generated `SortMergeJoin` failed with wrong results when using structs as keys. This could (eventually) be traced back to the use of a wrong row reference when comparing structs. ## How was this patch tested? TBD Author: Herman van Hovell <hvanhovell@databricks.com> Closes #13589 from hvanhovell/SPARK-15822. (cherry picked from commit e05a2fe) Signed-off-by: Davies Liu <davies.liu@gmail.com>
What changes were proposed in this pull request?
Code generated
SortMergeJoin
failed with wrong results when using structs as keys. This could (eventually) be traced back to the use of a wrong row reference when comparing structs.How was this patch tested?
TBD