Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem with FormatString #19817

Closed
wants to merge 2 commits into from

Conversation

kiszk
Copy link
Member

@kiszk kiszk commented Nov 24, 2017

What changes were proposed in this pull request?

This PR changes FormatString code generation to place generated code for expressions for arguments into separated methods if these size could be large.
This PR passes variable arguments by using an Object array.

How was this patch tested?

Added new test cases into StringExpressionSuite

val argList = ctx.freshName("argLists")
val numArgLists = argListGen.length
val argListCode = argListGen.zipWithIndex.map { case(v, index) =>
v._2.code + s"\n$argList[$index] = " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to use the syntax:

 s"""
  ...
  """

as it is done in many other places? I think it would be more readable. What do you think?

@SparkQA
Copy link

SparkQA commented Nov 24, 2017

Test build #84173 has finished for PR 19817 at commit e5bcbc4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 25, 2017

Test build #84180 has finished for PR 19817 at commit 84c8bf4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor

LGTM

arguments = ("InternalRow", ctx.INPUT_ROW) :: ("Object[]", argList) :: Nil)
} else {
argListCode.mkString("\n")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you create a splitExpressions in CodegenContext for avoiding the duplicate codes, like

    if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
      ctx.splitExpressions(...)
    } else {
      inputs.mkString("\n")
    }

Copy link
Member Author

@kiszk kiszk Nov 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, is it better to do this in this PR? Or, should I create another PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to submit a PR for that, but I was waiting for all the PRs related (this one should be the last one) to be merged. Let me know if I can do that or you are doing it. My suggestion would be: can't we just change the existing method to include this check?

Copy link
Member Author

@kiszk kiszk Nov 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment. After writing the above comment, I thought it would be good to create a separate PR.
Thus, I have just submitted a PR #19821 as WIP before I read your comment. That PR should wait for merging this PR and then I will do rebase for #19821.

@cloud-fan
Copy link
Contributor

LGTM, merging to master/2.2!

asfgit pushed a commit that referenced this pull request Nov 27, 2017
## What changes were proposed in this pull request?

This PR changes `FormatString` code generation to place generated code for expressions for arguments into separated methods if these size could be large.
This PR passes variable arguments by using an `Object` array.

## How was this patch tested?

Added new test cases into `StringExpressionSuite`

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes #19817 from kiszk/SPARK-22603.

(cherry picked from commit 2dbe275)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@asfgit asfgit closed this in 2dbe275 Nov 27, 2017
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
## What changes were proposed in this pull request?

This PR changes `FormatString` code generation to place generated code for expressions for arguments into separated methods if these size could be large.
This PR passes variable arguments by using an `Object` array.

## How was this patch tested?

Added new test cases into `StringExpressionSuite`

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes apache#19817 from kiszk/SPARK-22603.

(cherry picked from commit 2dbe275)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants