-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34607][SQL][2.4] Add Utils.isMemberClass
to fix a malformed class name error on jdk8u
#31747
Conversation
It seems the test failed and I'll check it. |
Kubernetes integration test starting |
Kubernetes integration test status failure |
See #31745 (comment) for the failure reason. |
Test build #135782 has finished for PR 31747 at commit
|
Could you make a PR to branch-2.4, @rednaxelafx ? Although we know that will fails, we can merge this PR and the original PR together. |
Let me briefly describe the situation in a new comment in my original PR: #31709 (comment) We definitely still want the isMemberClass fix in @maropu san's PR, since it avoids a Java |
I made a relevant test suite fix for this. |
Should this not go into master + 3.x as well? or is it already fixed there? |
Jenkins retest this please |
1 similar comment
Jenkins retest this please |
Is Jenkins unstable? |
retest this please |
Kubernetes integration test starting |
Kubernetes integration test status success |
I suppose this should be in 2.4.8, right? @maropu |
retest this please |
There is a test error in the GA: [info] Cause: java.lang.StringIndexOutOfBoundsException: String index out of range: -83
[info] at java.lang.String.substring(String.java:1931)
[info] at java.lang.Class.getSimpleBinaryName(Class.java:1448)
[info] at java.lang.Class.getSimpleName(Class.java:1309)
[info] at org.apache.spark.sql.catalyst.expressions.objects.NewInstance$$anonfun$11.apply(objects.scala:490)
... |
Yea, I think so. I'll check the test failure. |
I noticed that spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala Line 271 in 3c627ad
I think we have two options;
I feel there is no much difference between them for users though, the second one looks better cuz |
@@ -487,7 +487,7 @@ case class NewInstance( | |||
ev.isNull = resultIsNull | |||
|
|||
val constructorCall = outer.map { gen => | |||
s"${gen.value}.new ${cls.getSimpleName}($argString)" | |||
s"${gen.value}.new ${Utils.getSimpleName(cls)}($argString)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I temporarily apply this fix (#31709) to check if the tests can pass.
SQL tests seem all passed in GA? I only saw unrelated R install error. Triggered GA again. |
retest this please |
Yea, in the latest commit, I modified the test itself and applied the #31709 fix so that the GA tests can pass. The modified test ( |
Hmm, so even with this and #31709, there is still compilation error? Do we know what the compilation error is? |
Yea, right. All the branches have the same compilation error if branch-2.4 has #31709 and this PR. The only difference is that master/branch-3.1/branch-3.2 can fall back into the interpreted one, but branch-2.4 does not. |
Okay, then a compilation error instead of mystical |
I'm fine to have this and #31709 to branch-2.4. |
Utils.isMemberClass
to fix a malformed class name error on jdk8uUtils.isMemberClass
to fix a malformed class name error on jdk8u
Sure. |
bd7dbf0
to
a9c3188
Compare
Utils.isMemberClass
to fix a malformed class name error on jdk8uUtils.isMemberClass
to fix a malformed class name error on jdk8u
done |
retest this please |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #136484 has finished for PR 31747 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Note that this is backport PR. I will merge this to branch-2.4 tomorrow if no more comments. |
Thanks all. Merging to branch-2.4. |
…class name error on jdk8u ### What changes were proposed in this pull request? This PR intends to fix a bug of `objects.NewInstance` if a user runs Spark on jdk8u and a given `cls` in `NewInstance` is a deeply-nested inner class, e.g.,. ``` object OuterLevelWithVeryVeryVeryLongClassName1 { object OuterLevelWithVeryVeryVeryLongClassName2 { object OuterLevelWithVeryVeryVeryLongClassName3 { object OuterLevelWithVeryVeryVeryLongClassName4 { object OuterLevelWithVeryVeryVeryLongClassName5 { object OuterLevelWithVeryVeryVeryLongClassName6 { object OuterLevelWithVeryVeryVeryLongClassName7 { object OuterLevelWithVeryVeryVeryLongClassName8 { object OuterLevelWithVeryVeryVeryLongClassName9 { object OuterLevelWithVeryVeryVeryLongClassName10 { object OuterLevelWithVeryVeryVeryLongClassName11 { object OuterLevelWithVeryVeryVeryLongClassName12 { object OuterLevelWithVeryVeryVeryLongClassName13 { object OuterLevelWithVeryVeryVeryLongClassName14 { object OuterLevelWithVeryVeryVeryLongClassName15 { object OuterLevelWithVeryVeryVeryLongClassName16 { object OuterLevelWithVeryVeryVeryLongClassName17 { object OuterLevelWithVeryVeryVeryLongClassName18 { object OuterLevelWithVeryVeryVeryLongClassName19 { object OuterLevelWithVeryVeryVeryLongClassName20 { case class MalformedNameExample2(x: Int) }}}}}}}}}}}}}}}}}}}} ``` The root cause that Kris (rednaxelafx) investigated is as follows (Kudos to Kris); The reason why the test case above is so convoluted is in the way Scala generates the class name for nested classes. In general, Scala generates a class name for a nested class by inserting the dollar-sign ( `$` ) in between each level of class nesting. The problem is that this format can concatenate into a very long string that goes beyond certain limits, so Scala will change the class name format beyond certain length threshold. For the example above, we can see that the first two levels of class nesting have class names that look like this: ``` org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassName1$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassName1$OuterLevelWithVeryVeryVeryLongClassName2$ ``` If we leave out the fact that Scala uses a dollar-sign ( `$` ) suffix for the class name of the companion object, `OuterLevelWithVeryVeryVeryLongClassName1`'s full name is a prefix (substring) of `OuterLevelWithVeryVeryVeryLongClassName2`. But if we keep going deeper into the levels of nesting, you'll find names that look like: ``` org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$2a1321b953c615695d7442b2adb1$$$$ryVeryLongClassName8$OuterLevelWithVeryVeryVeryLongClassName9$OuterLevelWithVeryVeryVeryLongClassName10$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$2a1321b953c615695d7442b2adb1$$$$ryVeryLongClassName8$OuterLevelWithVeryVeryVeryLongClassName9$OuterLevelWithVeryVeryVeryLongClassName10$OuterLevelWithVeryVeryVeryLongClassName11$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$85f068777e7ecf112afcbe997d461b$$$$VeryLongClassName11$OuterLevelWithVeryVeryVeryLongClassName12$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$85f068777e7ecf112afcbe997d461b$$$$VeryLongClassName11$OuterLevelWithVeryVeryVeryLongClassName12$OuterLevelWithVeryVeryVeryLongClassName13$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$85f068777e7ecf112afcbe997d461b$$$$VeryLongClassName11$OuterLevelWithVeryVeryVeryLongClassName12$OuterLevelWithVeryVeryVeryLongClassName13$OuterLevelWithVeryVeryVeryLongClassName14$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$5f7ad51804cb1be53938ea804699fa$$$$VeryLongClassName14$OuterLevelWithVeryVeryVeryLongClassName15$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$5f7ad51804cb1be53938ea804699fa$$$$VeryLongClassName14$OuterLevelWithVeryVeryVeryLongClassName15$OuterLevelWithVeryVeryVeryLongClassName16$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$5f7ad51804cb1be53938ea804699fa$$$$VeryLongClassName14$OuterLevelWithVeryVeryVeryLongClassName15$OuterLevelWithVeryVeryVeryLongClassName16$OuterLevelWithVeryVeryVeryLongClassName17$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$69b54f16b1965a31e88968df1a58d8$$$$VeryLongClassName17$OuterLevelWithVeryVeryVeryLongClassName18$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$69b54f16b1965a31e88968df1a58d8$$$$VeryLongClassName17$OuterLevelWithVeryVeryVeryLongClassName18$OuterLevelWithVeryVeryVeryLongClassName19$ org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite$OuterLevelWithVeryVeryVeryLongClassNam$$$$69b54f16b1965a31e88968df1a58d8$$$$VeryLongClassName17$OuterLevelWithVeryVeryVeryLongClassName18$OuterLevelWithVeryVeryVeryLongClassName19$OuterLevelWithVeryVeryVeryLongClassName20$ ``` with a hash code in the middle and various levels of nesting omitted. The `java.lang.Class.isMemberClass` method is implemented in JDK8u as: http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/tip/src/share/classes/java/lang/Class.java#l1425 ``` /** * Returns {code true} if and only if the underlying class * is a member class. * * return {code true} if and only if this class is a member class. * since 1.5 */ public boolean isMemberClass() { return getSimpleBinaryName() != null && !isLocalOrAnonymousClass(); } /** * Returns the "simple binary name" of the underlying class, i.e., * the binary name without the leading enclosing class name. * Returns {code null} if the underlying class is a top level * class. */ private String getSimpleBinaryName() { Class<?> enclosingClass = getEnclosingClass(); if (enclosingClass == null) // top level class return null; // Otherwise, strip the enclosing class' name try { return getName().substring(enclosingClass.getName().length()); } catch (IndexOutOfBoundsException ex) { throw new InternalError("Malformed class name", ex); } } ``` and the problematic code is `getName().substring(enclosingClass.getName().length())` -- if a class's enclosing class's full name is *longer* than the nested class's full name, this logic would end up going out of bounds. The bug has been fixed in JDK9 by https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8057919 , but still exists in the latest JDK8u release. So from the Spark side we'd need to do something to avoid hitting this problem. This is the backport of #31733. ### Why are the changes needed? Bugfix on jdk8u. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes #31747 from maropu/SPARK34607-BRANCH2.4. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
Thanks for the reviews, @viirya @cloud-fan ! |
What changes were proposed in this pull request?
This PR intends to fix a bug of
objects.NewInstance
if a user runs Spark on jdk8u and a givencls
inNewInstance
is a deeply-nested inner class, e.g.,.The root cause that Kris (@rednaxelafx) investigated is as follows (Kudos to Kris);
The reason why the test case above is so convoluted is in the way Scala generates the class name for nested classes. In general, Scala generates a class name for a nested class by inserting the dollar-sign (
$
) in between each level of class nesting. The problem is that this format can concatenate into a very long string that goes beyond certain limits, so Scala will change the class name format beyond certain length threshold.For the example above, we can see that the first two levels of class nesting have class names that look like this:
If we leave out the fact that Scala uses a dollar-sign (
$
) suffix for the class name of the companion object,OuterLevelWithVeryVeryVeryLongClassName1
's full name is a prefix (substring) ofOuterLevelWithVeryVeryVeryLongClassName2
.But if we keep going deeper into the levels of nesting, you'll find names that look like:
with a hash code in the middle and various levels of nesting omitted.
The
java.lang.Class.isMemberClass
method is implemented in JDK8u as:http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/tip/src/share/classes/java/lang/Class.java#l1425
and the problematic code is
getName().substring(enclosingClass.getName().length())
-- if a class's enclosing class's full name is longer than the nested class's full name, this logic would end up going out of bounds.The bug has been fixed in JDK9 by https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8057919 , but still exists in the latest JDK8u release. So from the Spark side we'd need to do something to avoid hitting this problem.
This is the backport of #31733.
Why are the changes needed?
Bugfix on jdk8u.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added tests.