Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] 让 Kyuubi Engine 跑在阿里MaxCompute或AWS Glue上 #3409

Open
2 of 3 tasks
kevinclcn opened this issue Sep 5, 2022 · 18 comments
Open
2 of 3 tasks

[FEATURE] 让 Kyuubi Engine 跑在阿里MaxCompute或AWS Glue上 #3409

kevinclcn opened this issue Sep 5, 2022 · 18 comments
Labels

Comments

@kevinclcn
Copy link

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the feature

目前Kyuubi Engine可以运行在Yarn或K8s上以执行通过JDBC提交的任务,但在云原生环境里,通常云提供商都提供弹性的云计算资源,比如阿里云的MaxCompute和AWS Glue。如果Kyuubi Engine支持运行在MaxCompute和Glue上,可以大大降低Spark的运行成本和维护成本。

阿里云的通过MaxCompute运行spark任务的API:
https://help.aliyun.com/document_detail/102357.html

AWS的通过Glue运行spark任务的API:
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-job.html#aws-glue-api-jobs-job-CreateJob

Motivation

目前Kyuubi Engine只能运行在Yarn或K8S上,这样在云原生的环境里要么需要申请EMR资源,要么需要申请K8S计算节点,这里存在两个问题:

  1. EMR和K8S的资源不是弹性的,当任务少时,不能缩容以减少硬件成本,当任务多时,不能扩容,以提高计算速度。
  2. 在云环境中,如果使用MaxCompute这样的弹性计算资源,JDBC只能使用Trino这样的交互式查询引擎,造成离线任务和交互式查询的SQL标准不完全一致。

Describe the solution

通过将Kyuubi Engine运行在MaxCompute和Glue这种弹性Spark计算资源上,可以让离线批量任务和交互式查询共用相同的spark sql能力,也可以让计算资源有弹性,节省基础设施成本和运维成本。

Additional context

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@kevinclcn kevinclcn changed the title [FEATURE] Support Kyuubi Engine run on Alicloud MaxCompute and AWS Glue [FEATURE] 让 Kyuubi Engine 跑在阿里MaxCompute或AWS Glue上 Sep 5, 2022
@github-actions
Copy link

github-actions bot commented Sep 5, 2022

Hello @kevinclcn,
Thanks for finding the time to report the issue!
We really appreciate the community's efforts to improve Apache Kyuubi (Incubating).

@pan3793
Copy link
Member

pan3793 commented Sep 5, 2022

Have a quick look at the doc, I think Kyuubi should work out-of-box w/ MaxCompute, but not Glue. Since Kyuubi uses spark-submit to create spark engine app, technically, you can deploy Kyuubi in any environment as long as there is a runnable spark-submit(requires Spark 3.x) under $SPARK_HOME/bin

@pan3793
Copy link
Member

pan3793 commented Sep 6, 2022

@kevinclcn would you like to try deploying Kyuubi on MaxCompute? and the docs are welcome.

@kevinclcn
Copy link
Author

Sure.

@badbye
Copy link

badbye commented Mar 27, 2023

Have a quick look at the doc, I think Kyuubi should work out-of-box w/ MaxCompute, but not Glue. Since Kyuubi uses spark-submit to create spark engine app, technically, you can deploy Kyuubi in any environment as long as there is a runnable spark-submit(requires Spark 3.x) under $SPARK_HOME/bin

I'm trying to run Kyuubi with Adb spark (it is similar to MaxCompute Spark), I got this error in Adb Spark:

at org.apache.kyuubi.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
23/03/27 20:46:27 ERROR ConnectionState: Connection timed out for connection string (beijing-datascience-dev-01:2181)

I'm using a standalone Kyuubi which has an EmbeddedZookeeper service, so the question is how to set the connection string of zookeeper to be the ip:port format instead of hostname:port? since the remote spark server does not know my hostname.

I've tried set kyuubi.zookeeper.embedded.client.port.address to be the public IP, it does not work.

@pan3793
Copy link
Member

pan3793 commented Mar 27, 2023

the embedded zk is not recommended for production, it's designed to use for local testing, please deploy a dedicated zk first

@badbye
Copy link

badbye commented Mar 28, 2023

After fixing the connection between the zookeeper and Adb Spark, I got a connect timeout error on the client side:

2023-03-28 11:47:11.728 INFO org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient: Get service instance:21.25.1.59:45625 and version:Some(1.6.1-incubating) under /kyuubi_1.6.1-incubating_USER_SPARK_SQL/test/default
2023-03-28 11:47:11.768 ERROR org.apache.kyuubi.session.KyuubiSessionImpl: Opening engine [kyuubi_USER_SPARK_SQL_test_default_32adf216-e872-48a9-a87e-6789ef2d4a4c 21.25.1.59:45625] for test session failed
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: connect timed out
	at org.apache.thrift.transport.TSocket.open(TSocket.java:226) ~[libthrift-0.9.3.jar:0.9.3]
	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:266) ~[libthrift-0.9.3.jar:0.9.3]
	at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) ~[libthrift-0.9.3.jar:0.9.3]
	at org.apache.kyuubi.client.KyuubiSyncThriftClient$.createTProtocol(KyuubiSyncThriftClient.scala:455) ~[kyuubi-server_2.12-1.6.1-incubating.jar:1.6.1-incubating]
	at org.apache.kyuubi.client.KyuubiSyncThriftClient$.createClient(KyuubiSyncThriftClient.scala:471) ~[kyuubi-server_2.12-1.6.1-incubating.jar:1.6.1-incubating]
	at org.apache.kyuubi.session.KyuubiSessionImpl.$anonfun$openEngineSession$1(KyuubiSessionImpl.scala:128) ~[kyuubi-server_2.12-1.6.1-incubating.jar:1.6.1-incubating]
	at org.apache.kyuubi.session.KyuubiSessionImpl.$anonfun$openEngineSession$1$adapted(KyuubiSessionImpl.scala:113) ~[kyuubi-server_2.12-1.6.1-incubating.jar:1.6.1-incubating]
	at org.apache.kyuubi.ha.client.DiscoveryClientProvider$.withDiscoveryClient(DiscoveryClientProvider.scala:36) ~[kyuubi-ha_2.12-1.6.1-incubating.jar:1.6.1-incubating]
	at org.apache.kyuubi.session.KyuubiSessionImpl.openEngineSession(KyuubiSessionImpl.scala:113) ~[kyuubi-server_2.12-1.6.1-incubating.jar:1.6.1-incubating]
	at org.apache.kyuubi.operation.LaunchEngine.$anonfun$runInternal$2(LaunchEngine.scala:49) ~[kyuubi-server_2.12-1.6.1-incubating.jar:1.6.1-incubating]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_271]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_271]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_271]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_271]
	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_271]
Caused by: java.net.SocketTimeoutException: connect timed out
	at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_271]
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:476) ~[?:1.8.0_271]
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:218) ~[?:1.8.0_271]
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:200) ~[?:1.8.0_271]
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:394) ~[?:1.8.0_271]
	at java.net.Socket.connect(Socket.java:606) ~[?:1.8.0_271]
	at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ~[libthrift-0.9.3.jar:0.9.3]
	... 14 more
2023-03-28 11:47:11.774 INFO org.apache.curator.framework.imps.CuratorFrameworkImpl: backgroundOperationsLoop exiting
2023-03-28 11:47:11.777 INFO org.apache.zookeeper.ZooKeeper: Session: 0x10926b572df0001 closed
2023-03-28 11:47:11.777 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down for session: 0x10926b572df0001
2023-03-28 11:47:11.789 INFO org.apache.kyuubi.operation.LaunchEngine: Processing test's query[19ab56d1-a2eb-429e-a858-6d96b0ffdbbb]: RUNNING_STATE -> ERROR_STATE, time taken: 60.261 seconds
Error: org.apache.kyuubi.KyuubiSQLException: Error operating LaunchEngine: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: connect timed out
	at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:266)
	at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
	at org.apache.kyuubi.client.KyuubiSyncThriftClient$.createTProtocol(KyuubiSyncThriftClient.scala:455)
	at org.apache.kyuubi.client.KyuubiSyncThriftClient$.createClient(KyuubiSyncThriftClient.scala:471)
	at org.apache.kyuubi.session.KyuubiSessionImpl.$anonfun$openEngineSession$1(KyuubiSessionImpl.scala:128)
	at org.apache.kyuubi.session.KyuubiSessionImpl.$anonfun$openEngineSession$1$adapted(KyuubiSessionImpl.scala:113)
	at org.apache.kyuubi.ha.client.DiscoveryClientProvider$.withDiscoveryClient(DiscoveryClientProvider.scala:36)
	at org.apache.kyuubi.session.KyuubiSessionImpl.openEngineSession(KyuubiSessionImpl.scala:113)
	at org.apache.kyuubi.operation.LaunchEngine.$anonfun$runInternal$2(LaunchEngine.scala:49)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: connect timed out
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:476)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:218)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:200)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:394)
	at java.net.Socket.connect(Socket.java:606)
	at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
	... 14 more

	at org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:69)
	at org.apache.kyuubi.operation.KyuubiOperation$$anonfun$onError$1.applyOrElse(KyuubiOperation.scala:75)
	at org.apache.kyuubi.operation.KyuubiOperation$$anonfun$onError$1.applyOrElse(KyuubiOperation.scala:56)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
	at org.apache.kyuubi.operation.LaunchEngine.$anonfun$runInternal$2(LaunchEngine.scala:51)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: connect timed out
	at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
	at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:266)
	at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
	at org.apache.kyuubi.client.KyuubiSyncThriftClient$.createTProtocol(KyuubiSyncThriftClient.scala:455)
	at org.apache.kyuubi.client.KyuubiSyncThriftClient$.createClient(KyuubiSyncThriftClient.scala:471)
	at org.apache.kyuubi.session.KyuubiSessionImpl.$anonfun$openEngineSession$1(KyuubiSessionImpl.scala:128)
	at org.apache.kyuubi.session.KyuubiSessionImpl.$anonfun$openEngineSession$1$adapted(KyuubiSessionImpl.scala:113)
	at org.apache.kyuubi.ha.client.DiscoveryClientProvider$.withDiscoveryClient(DiscoveryClientProvider.scala:36)
	at org.apache.kyuubi.session.KyuubiSessionImpl.openEngineSession(KyuubiSessionImpl.scala:113)
	at org.apache.kyuubi.operation.LaunchEngine.$anonfun$runInternal$2(LaunchEngine.scala:49)
	... 5 more
Caused by: java.net.SocketTimeoutException: connect timed out
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:476)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:218)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:200)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:394)
	at java.net.Socket.connect(Socket.java:606)
	at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
	... 14 more (state=,code=0)
Beeline version 1.6.1-incubating by Apache Kyuubi (Incubating)

any ideas to fix it? @pan3793

part of my kyuubi conf:

kyuubi.session.engine.login.timeout = 30
kyuubi.session.engine.alive.probe.interval = 30
kyuubi.session.engine.alive.timeout = 120
kyuubi.session.engine.alive.probe.enabled = true

@pan3793
Copy link
Member

pan3793 commented Mar 28, 2023

Get service instance:21.25.1.59:45625 and version:Some(1.6.1-incubating) under /kyuubi_1.6.1-incubating_USER_SPARK_SQL/test/default

Does ADB Spark allow Kyuubi Server to access the Driver through IP directly?

@pan3793
Copy link
Member

pan3793 commented Mar 28, 2023

And kyuubi.session.engine.login.timeout = 30 means 30ms, I suppose you expect 30s not 30ms, the suggested format is PT30S

@pan3793
Copy link
Member

pan3793 commented Mar 28, 2023

Kyuubi uses ISO-8601 standard duration format, please read comments of java.time.Duration to get more details.

@badbye
Copy link

badbye commented Mar 28, 2023

Get service instance:21.25.1.59:45625 and version:Some(1.6.1-incubating) under /kyuubi_1.6.1-incubating_USER_SPARK_SQL/test/default

Does ADB Spark allow Kyuubi Server to access the Driver through IP directly?

No, the Kyuubi server can not access this IP, I'll try to fix it.
I see, so I guess the whole workflow is:

  1. Kyuubi server requests a spark session from the zookeeper, if no, start a Spark session vis spark-submit
  2. after the spark session started, it register itself in the zookeeper
  3. Kyuubi server finds spark session from zookeeper, tries to connect to the session directly

@badbye
Copy link

badbye commented Mar 28, 2023

And kyuubi.session.engine.login.timeout = 30 means 30ms, I suppose you expect 30s not 30ms, the suggested format is PT30S

sorry, my bad. I've read the doc, just forget the unit.

@pan3793
Copy link
Member

pan3793 commented Mar 28, 2023

Yes, that's exactly how Kyuubi works, you got it.

@badbye
Copy link

badbye commented Mar 28, 2023

Get service instance:21.25.1.59:45625 and version:Some(1.6.1-incubating) under /kyuubi_1.6.1-incubating_USER_SPARK_SQL/test/default

Does ADB Spark allow Kyuubi Server to access the Driver through IP directly?

Turns out the Adb Spark cluster has two NICs(Network Interface Cards), and the default NIC is used when the service starts. Is there a way to get it to boot and register to the second NIC?

@badbye
Copy link

badbye commented Mar 28, 2023

Seems it is using this findLocalInetAddress function to find the default IP.

Currently, there is no easy way to use the second NIC, am I right? @pan3793

@pan3793
Copy link
Member

pan3793 commented Mar 28, 2023

Yes, we need to enhance this part to make it more flexible, e.g. introduce an address-binding election strategy, it also helps for K8s environment.

@badbye
Copy link

badbye commented Mar 28, 2023

Yes, we need to enhance this part to make it more flexible, e.g. introduce an address-binding election strategy, it also helps for K8s environment.

Cool. I guess this is the last problem to make it work. I may not have the ability to contribute the code, but I'd like to write a doc. Let me know if there is any progress on this feature.

@badbye
Copy link

badbye commented Mar 29, 2023

Finally solved, I wrote a doc: https://gist.github.com/badbye/2618d6ef47a042427836d4ba9518e203

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants