Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22591][SQL] GenerateOrdering shouldn't change CodegenContext.INPUT_ROW #19800

Closed
wants to merge 3 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Nov 23, 2017

What changes were proposed in this pull request?

When I played with codegen in developing another PR, I found the value of CodegenContext.INPUT_ROW is not reliable. Under wholestage codegen, it is assigned to null first and then suddenly changed to i.

The reason is GenerateOrdering changes CodegenContext.INPUT_ROW but doesn't restore it back.

How was this patch tested?

Added test.

@viirya
Copy link
Member Author

viirya commented Nov 23, 2017

cc @cloud-fan @kiszk

@@ -156,4 +156,13 @@ class OrderingSuite extends SparkFunSuite with ExpressionEvalHelper {
assert(genOrdering.compare(rowB1, rowB2) < 0)
}
}

test("SPARK-22591: GenerateOrdering shouldn't change ctx.INPUT_ROW") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it cause any bugs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working to support wholestage codegen when generating expression codes safe from 64k limit. When there is not INPUT_ROW in context but we wrongly set a INPUT_ROW value, a non-existing InternalRow i will be added into function parameters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we use INPUT_ROW a lot in codegen framework, it is risky to change its value without restoring it.

@@ -72,6 +72,7 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR
* Generates the code for ordering based on the given order.
*/
def genComparisons(ctx: CodegenContext, ordering: Seq[SortOrder]): String = {
val oldInputRow = ctx.INPUT_ROW
val comparisons = ordering.map { order =>
val oldCurrentVars = ctx.currentVars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we can also move this our of the loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor

@cloud-fan cloud-fan Nov 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

val oldInputRow = ...
val oldCurrentVars = ...
val inputRow = "i"
ctx.INPUT_ROW = inputRow
ctx.currentVars = null

val comparisons = ...
...
ctx.INPUT_ROW = oldInputRow
ctx.currentVars = oldCurrentVars
...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Looks good.

@SparkQA
Copy link

SparkQA commented Nov 23, 2017

Test build #84128 has finished for PR 19800 at commit 121d309.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Nov 23, 2017

This test org.apache.spark.sql.catalyst.expressions.CastSuite.SPARK-22500: cast for struct should not generate codes beyond 64KB seems flaky? I saw it fails with java.lang.OutOfMemoryError: GC overhead limit exceeded in other PR too.

@SparkQA
Copy link

SparkQA commented Nov 23, 2017

Test build #84131 has finished for PR 19800 at commit 27145ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Restore original currentVars and INPUT_ROW.
ctx.currentVars = oldCurrentVars
ctx.INPUT_ROW = oldInputRow
finalCode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx.currentVars = oldCurrentVars
ctx.INPUT_ROW = oldInputRow
s"""
  |InternalRow $inputRow = null;
  |$code
 """.stripMargin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks.

@cloud-fan
Copy link
Contributor

I created https://issues.apache.org/jira/browse/SPARK-22595 to track the flaky test

@SparkQA
Copy link

SparkQA commented Nov 24, 2017

Test build #84144 has finished for PR 19800 at commit 7462a55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/2.2!

asfgit pushed a commit that referenced this pull request Nov 24, 2017
…NPUT_ROW

## What changes were proposed in this pull request?

When I played with codegen in developing another PR, I found the value of `CodegenContext.INPUT_ROW` is not reliable. Under wholestage codegen, it is assigned to null first and then suddenly changed to `i`.

The reason is `GenerateOrdering` changes `CodegenContext.INPUT_ROW` but doesn't restore it back.

## How was this patch tested?

Added test.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #19800 from viirya/SPARK-22591.

(cherry picked from commit 62a826f)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@asfgit asfgit closed this in 62a826f Nov 24, 2017
asfgit pushed a commit that referenced this pull request Nov 24, 2017
…ruct should not generate codes beyond 64KB

This PR reduces the number of fields in the test case of `CastSuite` to fix an issue that is pointed at [here](#19800 (comment)).

```
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.codehaus.janino.UnitCompiler.findClass(UnitCompiler.java:10971)
	at org.codehaus.janino.UnitCompiler.findTypeByName(UnitCompiler.java:7607)
	at org.codehaus.janino.UnitCompiler.getReferenceType(UnitCompiler.java:5758)
	at org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:5732)
	at org.codehaus.janino.UnitCompiler.access$13200(UnitCompiler.java:206)
	at org.codehaus.janino.UnitCompiler$18.visitReferenceType(UnitCompiler.java:5668)
	at org.codehaus.janino.UnitCompiler$18.visitReferenceType(UnitCompiler.java:5660)
	at org.codehaus.janino.Java$ReferenceType.accept(Java.java:3356)
	at org.codehaus.janino.UnitCompiler.getType(UnitCompiler.java:5660)
	at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:2892)
	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2764)
	at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1262)
	at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1234)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:538)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894)
	at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206)
	at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377)
	at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369)
	at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128)
	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
	at org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894)
	at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206)
	at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377)
	at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369)
	at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128)
	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
	at org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564)
...
```

Used existing test case

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes #19806 from kiszk/SPARK-22595.

(cherry picked from commit 554adc7)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
asfgit pushed a commit that referenced this pull request Nov 24, 2017
…ruct should not generate codes beyond 64KB

## What changes were proposed in this pull request?

This PR reduces the number of fields in the test case of `CastSuite` to fix an issue that is pointed at [here](#19800 (comment)).

```
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.codehaus.janino.UnitCompiler.findClass(UnitCompiler.java:10971)
	at org.codehaus.janino.UnitCompiler.findTypeByName(UnitCompiler.java:7607)
	at org.codehaus.janino.UnitCompiler.getReferenceType(UnitCompiler.java:5758)
	at org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:5732)
	at org.codehaus.janino.UnitCompiler.access$13200(UnitCompiler.java:206)
	at org.codehaus.janino.UnitCompiler$18.visitReferenceType(UnitCompiler.java:5668)
	at org.codehaus.janino.UnitCompiler$18.visitReferenceType(UnitCompiler.java:5660)
	at org.codehaus.janino.Java$ReferenceType.accept(Java.java:3356)
	at org.codehaus.janino.UnitCompiler.getType(UnitCompiler.java:5660)
	at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:2892)
	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2764)
	at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1262)
	at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1234)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:538)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894)
	at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206)
	at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377)
	at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369)
	at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128)
	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
	at org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894)
	at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206)
	at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377)
	at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369)
	at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128)
	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
	at org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564)
...
```

## How was this patch tested?

Used existing test case

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes #19806 from kiszk/SPARK-22595.
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…NPUT_ROW

## What changes were proposed in this pull request?

When I played with codegen in developing another PR, I found the value of `CodegenContext.INPUT_ROW` is not reliable. Under wholestage codegen, it is assigned to null first and then suddenly changed to `i`.

The reason is `GenerateOrdering` changes `CodegenContext.INPUT_ROW` but doesn't restore it back.

## How was this patch tested?

Added test.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#19800 from viirya/SPARK-22591.

(cherry picked from commit 62a826f)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…ruct should not generate codes beyond 64KB

This PR reduces the number of fields in the test case of `CastSuite` to fix an issue that is pointed at [here](apache#19800 (comment)).

```
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.codehaus.janino.UnitCompiler.findClass(UnitCompiler.java:10971)
	at org.codehaus.janino.UnitCompiler.findTypeByName(UnitCompiler.java:7607)
	at org.codehaus.janino.UnitCompiler.getReferenceType(UnitCompiler.java:5758)
	at org.codehaus.janino.UnitCompiler.getType2(UnitCompiler.java:5732)
	at org.codehaus.janino.UnitCompiler.access$13200(UnitCompiler.java:206)
	at org.codehaus.janino.UnitCompiler$18.visitReferenceType(UnitCompiler.java:5668)
	at org.codehaus.janino.UnitCompiler$18.visitReferenceType(UnitCompiler.java:5660)
	at org.codehaus.janino.Java$ReferenceType.accept(Java.java:3356)
	at org.codehaus.janino.UnitCompiler.getType(UnitCompiler.java:5660)
	at org.codehaus.janino.UnitCompiler.buildLocalVariableMap(UnitCompiler.java:2892)
	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2764)
	at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1262)
	at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1234)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:538)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894)
	at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206)
	at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377)
	at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369)
	at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128)
	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
	at org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894)
	at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206)
	at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377)
	at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369)
	at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128)
	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
	at org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209)
	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564)
...
```

Used existing test case

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes apache#19806 from kiszk/SPARK-22595.

(cherry picked from commit 554adc7)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@viirya viirya deleted the SPARK-22591 branch December 27, 2023 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants