Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] NoSuchMethodError will be encountered when using Spark 2.x #1567

Closed
3 tasks done
rickyma opened this issue Mar 7, 2024 · 0 comments · Fixed by #1565
Closed
3 tasks done

[Bug] NoSuchMethodError will be encountered when using Spark 2.x #1567

rickyma opened this issue Mar 7, 2024 · 0 comments · Fixed by #1565
Assignees

Comments

@rickyma
Copy link
Contributor

rickyma commented Mar 7, 2024

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

24/03/07 16:34:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(tdwadmin); groups with view permissions: Set(); users  with modify permissions: Set(tdwadmin); groups with modify permissions: Set()
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.network.util.NettyUtils.createEventLoop(Lorg/apache/spark/network/util/IOMode;ILjava/lang/String;)Lio/netty/channel/EventLoopGroup;
	at org.apache.spark.network.client.TransportClientFactory.<init>(TransportClientFactory.java:104)
	at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:89)
	at org.apache.spark.rpc.netty.NettyRpcEnv.<init>(NettyRpcEnv.scala:70)
	at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:449)
	at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:56)
	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:264)
	at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
	at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:271)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:474)
	at org.apache.spark.deploy.yarn.SQLApplicationMaster.<init>(SQLApplicationMaster.scala:96)
	at org.apache.spark.deploy.yarn.SQLApplicationMaster.<init>(SQLApplicationMaster.scala:53)
	at org.apache.spark.deploy.yarn.SQLApplicationMaster$$anonfun$main$1.apply$mcV$sp(SQLApplicationMaster.scala:544)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2286)
	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60)
	at org.apache.spark.deploy.yarn.SQLApplicationMaster$.main(SQLApplicationMaster.scala:543)
	at org.apache.spark.deploy.yarn.SQLApplicationMaster.main(SQLApplicationMaster.scala)

Affects Version(s)

master

Uniffle Server Log Output

No response

Uniffle Engine Log Output

No response

Uniffle Server Configurations

No response

Uniffle Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
rickyma added a commit to rickyma/incubator-uniffle that referenced this issue Mar 7, 2024
rickyma added a commit to rickyma/incubator-uniffle that referenced this issue Mar 8, 2024
jerqi pushed a commit that referenced this issue Mar 11, 2024
### What changes were proposed in this pull request?

When we release the shaded client jar for Spark 2.x, the class `org.apache.spark.network.util.NettyUtils.class` should not be included in the package.

### Why are the changes needed?

Fix #1567.
&
It's also a followup PR for #727.

When running in Spark 2.4, we will encounter exceptions as below:
```
24/03/07 16:34:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(tdwadmin); groups with view permissions: Set(); users  with modify permissions: Set(tdwadmin); groups with modify permissions: Set()
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.network.util.NettyUtils.createEventLoop(Lorg/apache/spark/network/util/IOMode;ILjava/lang/String;)Lio/netty/channel/EventLoopGroup;
	at org.apache.spark.network.client.TransportClientFactory.<init>(TransportClientFactory.java:104)
	at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:89)
	at org.apache.spark.rpc.netty.NettyRpcEnv.<init>(NettyRpcEnv.scala:70)
	at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:449)
	at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:56)
	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:264)
	at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
	at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:271)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:474)
	at org.apache.spark.deploy.yarn.SQLApplicationMaster.<init>(SQLApplicationMaster.scala:96)
	at org.apache.spark.deploy.yarn.SQLApplicationMaster.<init>(SQLApplicationMaster.scala:53)
	at org.apache.spark.deploy.yarn.SQLApplicationMaster$$anonfun$main$1.apply$mcV$sp(SQLApplicationMaster.scala:544)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2286)
	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60)
	at org.apache.spark.deploy.yarn.SQLApplicationMaster$.main(SQLApplicationMaster.scala:543)
	at org.apache.spark.deploy.yarn.SQLApplicationMaster.main(SQLApplicationMaster.scala)
```

This is because the return value of `createEventLoop` in `NettyUtils` within Uniffle is `org.apache.uniffle.io.netty.channel.EventLoopGroup` (which is shaded), while the return value of `createEventLoop` in `NettyUtils` within Spark is `io.netty.channel.EventLoopGroup`. When running a Spark application, the Driver loads `NettyUtils` from the rss-client's JAR, causing inconsistency in the method's return values and ultimately leading to a `NoSuchMethodError` exception.

We should let Spark use its own `NettyUtils` instead of ours.

However, if we simply remove the `org.apache.spark.network.util.NettyUtils` file from the code repository, we will encounter errors when running integration tests. 

```
java.lang.RuntimeException: java.lang.NoSuchFieldException: DEFAULT_TINY_CACHE_SIZE
	at org.apache.spark.network.util.NettyUtils.getPrivateStaticField(NettyUtils.java:131)
	at org.apache.spark.network.util.NettyUtils.createPooledByteBufAllocator(NettyUtils.java:118)
	at org.apache.spark.network.server.TransportServer.init(TransportServer.java:94)
	at org.apache.spark.network.server.TransportServer.<init>(TransportServer.java:73)
	at org.apache.spark.network.TransportContext.createServer(TransportContext.java:114)
	at org.apache.spark.rpc.netty.NettyRpcEnv.startServer(NettyRpcEnv.scala:119)
	at org.apache.spark.rpc.netty.NettyRpcEnvFactory$$anonfun$4.apply(NettyRpcEnv.scala:465)
	at org.apache.spark.rpc.netty.NettyRpcEnvFactory$$anonfun$4.apply(NettyRpcEnv.scala:464)
	at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2275)
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
	at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2267)
	at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:469)
	at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:57)
	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:249)
	at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
	at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:256)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:423)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925)
	at org.apache.uniffle.test.SparkIntegrationTestBase.runSparkApp(SparkIntegrationTestBase.java:92)
	at org.apache.uniffle.test.SparkIntegrationTestBase.run(SparkIntegrationTestBase.java:53)
	at org.apache.uniffle.test.RSSStageResubmitTest.testRSSStageResubmit(RSSStageResubmitTest.java:86)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
```

This is because our code project _**globally controls**_ the version of Netty in the root `pom.xml`'s `dependencyManagement`, which leads to Spark2's own lower version of Netty being upgraded to a higher version. This causes exceptions due to Netty version incompatibility, resulting in certain fields not being found. This issue does not occur in the production environment, as Spark has its own `NettyUtils` and does not need to use our provided `NettyUtils`. Retaining `org.apache.spark.network.util.NettyUtils` is somewhat of a workaround for passing integration tests. But given that Spark2 is not frequently updated anymore, maintaining a static version of `NettyUtils` should not pose a significant problem. 

Of course, the optimal approach would be to shade our own Netty during integration testing, allowing Spark to continue using its own Netty dependency, effectively separating the two. This would provide the most accurate testing, as any changes in Spark2's Netty version could be verified through unit tests. However, this would mean that a large amount of integration test code would need to prefix `org.apache.uniffle` to the `import` statements where Netty is used. Ultimately, this could lead to significant redundancy in the code and cause confusion for those who write codes in the future.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant