Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22508][SQL] Fix 64KB JVM bytecode limit problem with GenerateUnsafeRowJoiner.create() #19737

Closed
wants to merge 3 commits into from

Conversation

kiszk
Copy link
Member

@kiszk kiszk commented Nov 13, 2017

What changes were proposed in this pull request?

This PR changes GenerateUnsafeRowJoiner.create() code generation to place generated code for statements to operate bitmap and offset into separated methods if these size could be large.

How was this patch tested?

Added a new test case into GenerateUnsafeRowJoinerSuite

@SparkQA
Copy link

SparkQA commented Nov 13, 2017

Test build #83801 has finished for PR 19737 at commit aef727b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


val functions = mutable.ArrayBuffer.empty[String]
val args = "java.lang.Object obj1, long offset1, java.lang.Object obj2, long offset2"
val copyBitsets = splitPlatformCode(copyBitset).zipWithIndex.map { case(body, index) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we use splitExpressions here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have ctx here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then create one. The benefit is code reuse and the nested class optimization

Copy link
Member Author

@kiszk kiszk Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I forgot that we can create it. I do not want to reinvent wheels :)

@@ -166,6 +214,8 @@ object GenerateUnsafeRowJoiner extends CodeGenerator[(StructType, StructType), U
| private byte[] buf = new byte[64];
| private UnsafeRow out = new UnsafeRow(${schema1.size + schema2.size});
|
| ${functions.mkString("\n")}
Copy link
Contributor

@cloud-fan cloud-fan Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be ${ctx.declareAddedFunctions()} if we create a codegen context.

@SparkQA
Copy link

SparkQA commented Nov 20, 2017

Test build #84031 has finished for PR 19737 at commit 65885bf.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Nov 20, 2017

Jenkins, retest this please

s"$putLong(buf, ${offset + i * 8}, $bits);\n"
}

val functions = mutable.ArrayBuffer.empty[String]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed?

}

val functions = mutable.ArrayBuffer.empty[String]
val args = "java.lang.Object obj1, long offset1, java.lang.Object obj2, long offset2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -154,7 +164,10 @@ object GenerateUnsafeRowJoiner extends CodeGenerator[(StructType, StructType), U
|$putLong(buf, $cursor, $getLong(buf, $cursor) + ($shift << 32));
""".stripMargin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s"$putLong(buf, $cursor, $getLong(buf, $cursor) + ($shift << 32));\n"

@SparkQA
Copy link

SparkQA commented Nov 20, 2017

Test build #84037 has finished for PR 19737 at commit 65885bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val args = "java.lang.Object obj1, long offset1, java.lang.Object obj2, long offset2"
val copyBitsets = ctx.splitExpressions(copyBitset, "copyBitsetFunc",
("java.lang.Object", "obj1") :: ("long", "offset1") ::
("java.lang.Object", "obj2") :: ("long", "offset2") :: Nil)
Copy link
Member

@gatorsmile gatorsmile Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the named arguments when calling splitExpressions, like what I did in #19790?

}

val updateOffsets = ctx.splitExpressions(updateOffset, "copyBitsetFunc",
("long", "numBytesVariableRow1") :: Nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here.

@cloud-fan
Copy link
Contributor

LGTM pending jenkins

@SparkQA
Copy link

SparkQA commented Nov 21, 2017

Test build #84054 has finished for PR 19737 at commit 0704a82.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Nov 21, 2017

Test build #84061 has finished for PR 19737 at commit 0704a82.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/2.2!

asfgit pushed a commit that referenced this pull request Nov 21, 2017
…nsafeRowJoiner.create()

## What changes were proposed in this pull request?

This PR changes `GenerateUnsafeRowJoiner.create()` code generation to place generated code for statements to operate bitmap and offset into separated methods if these size could be large.

## How was this patch tested?

Added a new test case into `GenerateUnsafeRowJoinerSuite`

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes #19737 from kiszk/SPARK-22508.

(cherry picked from commit c957714)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@asfgit asfgit closed this in c957714 Nov 21, 2017
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…nsafeRowJoiner.create()

## What changes were proposed in this pull request?

This PR changes `GenerateUnsafeRowJoiner.create()` code generation to place generated code for statements to operate bitmap and offset into separated methods if these size could be large.

## How was this patch tested?

Added a new test case into `GenerateUnsafeRowJoinerSuite`

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes apache#19737 from kiszk/SPARK-22508.

(cherry picked from commit c957714)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants