Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Connector-V2][Translation] Support spark 3.3 #2574

Closed
wants to merge 3 commits into from

Conversation

zhaomin1423
Copy link
Member

Purpose of this pull request

Check list

Copy link
Member

@ashulin ashulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add licenses according to Dependency licenses action.
you can see https://seatunnel.apache.org/docs/contribution/new-license

@Hisoka-X
Copy link
Member

Can you add e2e from spark3 and spark2?
reference:#2499

@Hisoka-X Hisoka-X added the Waiting for users feedback Waiting for feedback from issue/PR author label Sep 3, 2022
@zhaomin1423
Copy link
Member Author

Can you add e2e from spark3 and spark2? reference:#2499

ok, I will add it.

@zhaomin1423
Copy link
Member Author

Please add licenses according to Dependency licenses action. you can see https://seatunnel.apache.org/docs/contribution/new-license

Thanks, I will do it.

@zhaomin1423 zhaomin1423 marked this pull request as draft September 5, 2022 17:26
@zhaomin1423 zhaomin1423 force-pushed the spark3_sink branch 2 times, most recently from a2baef9 to a308365 Compare September 6, 2022 13:31
@zhaomin1423 zhaomin1423 marked this pull request as ready for review September 6, 2022 23:24
@zhaomin1423
Copy link
Member Author

zhaomin1423 commented Sep 6, 2022

Please help me review it. I only add e2e in spark 3.3 for FakeSourceToConsoleIT, will add more after merge it, because the module change frequently, I fixed conflicts repeatedly. @Hisoka-X
The spark scope is changed to provided in example module, because the example don't add the release package, the compile scope need handle licenses, it's unnecessary, spark has a lot of dependencies. @ashulin

@zhaomin1423 zhaomin1423 requested review from ashulin and removed request for Hisoka-X September 6, 2022 23:30
@Hisoka-X
Copy link
Member

Hisoka-X commented Sep 7, 2022

Hi, Please fix UT

@zhaomin1423
Copy link
Member Author

Hi, Please fix UT

The fail is related with flink. The pr don't update, can you help me rerun it?

@zhaomin1423
Copy link
Member Author

zhaomin1423 commented Sep 7, 2022

image

The fail is strange, I don't how to fix it. @Hisoka-X

@Hisoka-X Hisoka-X removed the Waiting for users feedback Waiting for feedback from issue/PR author label Sep 7, 2022
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall look good for me, @ashulin Please check source part. Thanks

@Hisoka-X
Copy link
Member

Hisoka-X commented Sep 7, 2022

How user switch spark2.4 and spark3.3 use shell command?

<dependencyManagement>
<dependencies>
<!--spark-->
<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to spark-2.4 module?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because seatunnel-translation-spark-common depend on spark, if move it, the common module need add it.

@@ -74,7 +74,7 @@

<dependency>
<groupId>org.apache.seatunnel</groupId>
<artifactId>seatunnel-translation-spark-2.4</artifactId>
<artifactId>seatunnel-translation-spark-common</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How user switch seatunnel-transform-spark(2.4、3.3) list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How user switch seatunnel-transform-spark(2.4、3.3) list?

Our release package include them, user choose the jar according to their spark client version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to choose? Seem like you should add jar choose logic in seatunnel-spark-starter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Here,can put it to the dir. The e2e is using by the style.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically, users choose one spark version, so they only use one translation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer a more automated way, such as by judging the version of SPARK_HOME or using SPARK2 by default, you can switch versions by adding parameters to the shell script

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. The arg can control to choose which plugin dir. We can do it in a new pr. How about you

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for me

@zhaomin1423
Copy link
Member Author

How user switch spark2.4 and spark3.3 use shell command?

According to spark client version, choose the jar. But, we can handle it later, the pr only add spark3.3 translation. There are frequent conflicts if we add more content.

@Hisoka-X
Copy link
Member

Hisoka-X commented Sep 7, 2022

How user switch spark2.4 and spark3.3 use shell command?

According to spark client version, choose the jar. But, we can handle it later, the pr only add spark3.3 translation. There are frequent conflicts if we add more content.

OK for me


@Override
protected String getTranslationJarTargetPath() {
return Paths.get(SEATUNNEL_HOME, "plugins", "translation-spark-3.3", "lib",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hisoka-X
Hisoka-X previously approved these changes Sep 8, 2022
@zhaomin1423 zhaomin1423 requested review from ashulin and hailin0 and removed request for ashulin and hailin0 September 8, 2022 14:36
@zhaomin1423
Copy link
Member Author

Could you help me review it, merging is blocking, the conflicts are frequent. @ashulin @CalvinKirs

@Hisoka-X
Copy link
Member

Can we support all spark e2e run both 3.3 and 2.4 without configure anything? I find we need write same code twice if we want test job both on 3.3 and 2.4. If we do this, and all e2e passed. I think we can merge this PR.

@zhaomin1423
Copy link
Member Author

Can we support all spark e2e run both 3.3 and 2.4 without configure anything? I find we need write same code twice if we want test job both on 3.3 and 2.4. If we do this, and all e2e passed. I think we can merge this PR.

I only add one test for 3.3, I will fix conflicts later.

@dik111
Copy link
Contributor

dik111 commented Nov 1, 2022

spark version :3.3.1
hive version: 3.0.0
mysql versin: 5.7
I had test mysql to hive
it throws some exception:

         client token: N/A
         diagnostics: User class threw exception: java.util.ServiceConfigurationError: org.apache.seatunnel.spark.BaseSparkSink: Provider org.apache.seatunnel.spark.hive.sink.Hive could not be instantiated
        at java.util.ServiceLoader.fail(ServiceLoader.java:232)
        at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
        at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
        at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
        at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
        at org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery.loadPluginInstance(AbstractPluginDiscovery.java:128)
        at org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery.createPluginInstance(AbstractPluginDiscovery.java:99)
        at org.apache.seatunnel.core.spark.config.SparkExecutionContext.lambda$getSinks$2(SparkExecutionContext.java:95)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
        at org.apache.seatunnel.core.spark.config.SparkExecutionContext.getSinks(SparkExecutionContext.java:98)
        at org.apache.seatunnel.core.spark.command.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:57)
        at org.apache.seatunnel.core.base.Seatunnel.run(Seatunnel.java:40)
        at org.apache.seatunnel.core.spark.SeatunnelSpark.main(SeatunnelSpark.java:33)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:739)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
        at org.apache.seatunnel.spark.hive.sink.Hive.<init>(Hive.scala:29)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at java.lang.Class.newInstance(Class.java:442)
        at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
        ... 21 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 28 more

         ApplicationMaster host: jtbihdp04.sogal.com
         ApplicationMaster RPC port: 39125
         queue: default
         start time: 1667267960414
         final status: FAILED
         tracking URL: http://jtbihdp13.sogal.com:8088/proxy/application_1667009280185_5845/
         user: suofy
22/11/01 10:00:51 ERROR Client: Application diagnostics message: User class threw exception: java.util.ServiceConfigurationError: org.apache.seatunnel.spark.BaseSparkSink: Provider org.apache.seatunnel.spark.hive.sink.Hive could not be instantiated
        at java.util.ServiceLoader.fail(ServiceLoader.java:232)
        at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
        at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
        at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
        at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
        at org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery.loadPluginInstance(AbstractPluginDiscovery.java:128)
        at org.apache.seatunnel.plugin.discovery.AbstractPluginDiscovery.createPluginInstance(AbstractPluginDiscovery.java:99)
        at org.apache.seatunnel.core.spark.config.SparkExecutionContext.lambda$getSinks$2(SparkExecutionContext.java:95)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
        at org.apache.seatunnel.core.spark.config.SparkExecutionContext.getSinks(SparkExecutionContext.java:98)
        at org.apache.seatunnel.core.spark.command.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:57)
        at org.apache.seatunnel.core.base.Seatunnel.run(Seatunnel.java:40)
        at org.apache.seatunnel.core.spark.SeatunnelSpark.main(SeatunnelSpark.java:33)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:739)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class
        at org.apache.seatunnel.spark.hive.sink.Hive.<init>(Hive.scala:29)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at java.lang.Class.newInstance(Class.java:442)
        at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
        ... 21 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging$class
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 28 more

Exception in thread "main" org.apache.spark.SparkException: Application application_1667009280185_5845 finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1342)
        at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1764)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

@TyrantLucifer TyrantLucifer changed the title Support seatunnel-translation-spark-3.3 [Feature][Connector-V2][Translation] Support spark 3.3 Nov 5, 2022
@jinmu0410
Copy link

截屏2022-12-20 14 52 40

Who can help me look at this problem

@jinmu0410
Copy link

截屏2022-12-20 14 52 40

Who can help me look at this problem

for test spark3 by use spark3_sink build

@Hisoka-X
Copy link
Member

截屏2022-12-20 14 52 40 Who can help me look at this problem

for test spark3 by use spark3_sink build

@JinJiDeJinMu Hi, Support Spark3 not finished at now. But @nishuihanqiu have a branch for spark3 in his repository. #875 (comment)

@jinmu0410
Copy link

截屏2022-12-20 14 52 40 Who can help me look at this problem

for test spark3 by use spark3_sink build

@JinJiDeJinMu Hi, Support Spark3 not finished at now. But @nishuihanqiu have a branch for spark3 in his repository. #875 (comment)

@Hisoka-X thank you

@jinmu0410
Copy link

@Hisoka-X I want to join in the translation of Spark3. x based on v2, What can I do?

@Hisoka-X
Copy link
Member

@Hisoka-X I want to join in the translation of Spark3. x based on v2, What can I do?

@JinJiDeJinMu Wonderful! You can continue do your work with this PR. Fix conflict and do some test, if have some problem then fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants