Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38510][SQL] Retry ClassSymbol.selfType to work around cyclic references #35852

Closed
wants to merge 4 commits into from

Conversation

shardulm94
Copy link
Contributor

What changes were proposed in this pull request?

Retry ClassSymbol.selfType in case we receive a cyclic reference exception.

Why are the changes needed?

ClassSymbol.selfType fails on certain classes like HiveGenericUDF due to cyclic reference. This issue is due to bug#12190 in Scala which does not handle cyclic references in Java annotations correctly. The cyclic reference in this case comes from InterfaceAudience annotation which annotates itself. This annotation class is present in the type hierarchy of HiveGenericUDF.

To workaround the issue, we can just retry the operation and it will succeed on the retry probably because the annotation is partially resolved from the previous attempt.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added a unit test. This unit test prefers to be run a separate JVM as the error for cyclic references may not be thrown if the annotation class is previously loaded by some other test and so may be dependent on test execution order.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@shardulm94
Copy link
Contributor Author

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hacky but ok if it is necessary to work around for now

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @srowen, hacky ... looks like no other workarounds until this is addressed in scala.

@@ -639,7 +643,7 @@ object ScalaReflection extends ScalaReflection {
def getConstructorParameters(cls: Class[_]): Seq[(String, Type)] = {
val m = runtimeMirror(cls.getClassLoader)
val classSymbol = m.staticClass(cls.getName)
val t = classSymbol.selfType
val t = selfType(classSymbol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this limited only to selfType ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I do not have the technical knowhow to answer this question. What I can say is that ScalaReflection also uses ClassSymbol.companion which seems to work fine in this case. ClassSymbol.annotations fails, as might be expected, but it is not used anywhere in this class.

@shardulm94
Copy link
Contributor Author

@mridulm @srowen Mind taking a second look here? I have updated the PR as per the comments.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How far back should this probably back-port? 3.1?

@shardulm94
Copy link
Contributor Author

shardulm94 commented Mar 17, 2022

How far back should this probably back-port? 3.1?

The issue itself is reproducible all the way back to 2.3 (at least). But I don't think this is too critical since .selfType only ever gets used when you try to fetch the JSON representation of the SQL plan using TreeNode.toJSON or TreeNode.prettyJSON AND if the plan uses Hive UDFs. It has remained unreported all this while, which points to low usage. So I think it is okay for this to only be committed to master.

@srowen
Copy link
Member

srowen commented Mar 19, 2022

Merged to master (3.4.0)

@srowen srowen closed this in 91614ff Mar 19, 2022
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
…eferences

Retry `ClassSymbol.selfType` in case we receive a cyclic reference exception.

`ClassSymbol.selfType` fails on certain classes like `HiveGenericUDF` due to cyclic reference. This issue is due to [bug#12190 in Scala](scala/bug#12190) which does not handle cyclic references in Java annotations correctly. The cyclic reference in this case comes from `InterfaceAudience` annotation which [annotates itself](https://github.com/apache/hadoop/blob/db8ae4b65448c506c9234641b2c1f9b8e894dc18/hadoop-common-project/hadoop-annotations/src/main/java/org/apache/hadoop/classification/InterfaceAudience.java#L45). This annotation class is present in the type hierarchy of `HiveGenericUDF`.

To workaround the issue, we can just retry the operation and it will succeed on the retry probably because the annotation is partially resolved from the previous attempt.

No

Added a unit test. This unit test prefers to be run a separate JVM as the error for cyclic references may not be thrown if the annotation class is previously loaded by some other test and so may be dependent on test execution order.

Closes apache#35852 from shardulm94/spark-38510.

Authored-by: Shardul Mahadik <smahadik@linkedin.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(cherry picked from commit 91614ff)

RB=3246777
BUG=LIHADOOP-63262
G=spark-reviewers
R=tgudivad,ekrogen
A=ekrogen

RB=3456494
BUG=LIHADOOP-63952
G=spark-reviewers
R=ekrogen,tgudivad
A=ekrogen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants