-
Notifications
You must be signed in to change notification settings - Fork 981
[KYUUBI #6223] Fix Scala interpreter can not access spark.jars issue #6235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc12660 to
aa4292d
Compare
| !file.getName.contains("scala-lang_scala-reflect") | ||
| } | ||
| info(s"Adding jars to Scala interpreter's class path: " + | ||
| allJars.mkString(File.pathSeparator)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
24/04/01 19:01:30 INFO KyuubiSparkILoop: Adding jars to Scala interpreter's class path: file:/Users/fwang12/todo/kyuubi/externals/kyuubi-spark-sql-engine/target/kyuubi-spark-sql-engine_2.12-1.10.0-SNAPSHOT.jar:file:/var/folders/pr/9x9rt3j974zfn7p4f5dpf8m80000gq/T/kyuubi-32215772-8004-43c1-a070-1cb0efd79bf0/test-function-02417951-6b76-4cf7-bb4a-5fc20372c034.jar
| s"${spark.sparkContext.getConf.get("spark.repl.class.outputDir")}") | ||
| s"${spark.sparkContext.getConf.get("spark.repl.class.outputDir")}", | ||
| "-classpath", | ||
| allJars.mkString(File.pathSeparator)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with yarn-cluster mode:
24/04/01 20:12:30 INFO KyuubiSparkILoop: Adding jars to Scala interpreter's class path: file:/hadoop/7/yarn/local/usercache/b_stf/appcache/application_1710906451321_1347246/container_e4072_1710906451321_1347246_01_000002/__app__.jar:file:/hadoop/7/yarn/local/usercache/b_stf/appcache/application_1710906451321_1347246/container_e4072_1710906451321_1347246_01_000002/delta-core_2.12-1.0.2.0.3.0.jar:file:/hadoop/7/yarn/local/usercache/b_stf/appcache/application_1710906451321_1347246/container_e4072_1710906451321_1347246_01_000002/iceberg-spark-runtime-3.1_2.12-1.2.1.0.7.1.jar:file:/hadoop/7/yarn/local/usercache/b_stf/appcache/application_1710906451321_1347246/container_e4072_1710906451321_1347246_01_000002/simplark-etl-assembly-latest-testing.jar:file:/hadoop/7/yarn/local/usercache/b_stf/appcache/application_1710906451321_1347246/container_e4072_1710906451321_1347246_01_000002/zeta-serde-0.0.4-jar-with-dependencies.jar
24/04/01 20:12:36 INFO ExecuteScala: Processing b_stf's query[874507af-4dfe-4521-ad26-3caeaa154d28]: PENDING_STATE -> RUNNING_STATE, statement:
import org.apache.iceberg._
24/04/01 20:12:36 INFO OperationAuditLogger: operation=874507af-4dfe-4521-ad26-3caeaa154d28 opType=ExecuteScala state=RUNNING user=b_stf session=973b28ec-52d8-454e-b2f1-45fc95957615
24/04/01 20:12:36 INFO ExecuteScala:
Spark application name: kyuubi_apollorno_CONNECTION_SPARK_SQL_b_stf_973b28ec-52d8-454e-b2f1-45fc95957615
application ID: application_1710906451321_1347246
application web UI: ......
master: yarn
deploy mode: cluster
version: 3.1.1.1.3.0
Start time: 2024-04-01T20:11:29.244
User: b_stf
24/04/01 20:12:36 INFO ExecuteScala: Processing b_stf's query[874507af-4dfe-4521-ad26-3caeaa154d28]: RUNNING_STATE -> FINISHED_STATE, time taken: 0.154 seconds
24/04/01 20:12:36 INFO OperationAuditLogger: operation=874507af-4dfe-4521-ad26-3caeaa154d28 opType=ExecuteScala state=FINISHED user=b_stf session=973b28ec-52d8-454e-b2f1-45fc95957615
24/04/01 20:12:36 INFO ExecuteScala: statementId=874507af-4dfe-4521-ad26-3caeaa154d28, operationRunTime=0 ms, operationCpuTime=0 ms
2024-04-01 20:12:36.439 INFO org.apache.kyuubi.operation.ExecuteStatement: Query[874507af-4dfe-4521-ad26-3caeaa154d28] in FINISHED_STATE
2024-04-01 20:12:36.439 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing b_stf's query[874507af-4dfe-4521-ad26-3caeaa154d28]: RUNNING_STATE -> FINISHED_STATE, time taken: 0.197 seconds
+------------------------------+
| output |
+------------------------------+
| import org.apache.iceberg._ |
+------------------------------+
1 row selected (6.231 seconds)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this.addUrlsToClassPath(allJars: _*), it does not work for my use case - yarn-cluster mode.
...rk-sql-engine/src/main/scala-2.12/org/apache/kyuubi/engine/spark/repl/KyuubiSparkILoop.scala
Show resolved
Hide resolved
|
cc @cxzl25 |
...rk-sql-engine/src/main/scala-2.12/org/apache/kyuubi/engine/spark/repl/KyuubiSparkILoop.scala
Show resolved
Hide resolved
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6235 +/- ##
============================================
- Coverage 58.49% 58.38% -0.11%
Complexity 24 24
============================================
Files 651 651
Lines 39509 39511 +2
Branches 5439 5440 +1
============================================
- Hits 23109 23068 -41
- Misses 13932 13966 +34
- Partials 2468 2477 +9 ☔ View full report in Codecov by Sentry. |
|
I verified this patch with Spark(3.3) on K8s cluster mode. |
| u.getProtocol == "file" && file.isFile && | ||
| file.getName.contains("scala-lang_scala-reflect") | ||
| } | ||
| this.addUrlsToClassPath(allJars: _*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question, Is there a fundamental difference between -classpath and addUrlsToClassPath ? Why can -classpath solve this problem in this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the underlying implementation, I checked the Spark and Zeppelin code, both use -classpath
# 🔍 Description ## Issue References 🔗 This pull request fixes #6223 Even the user specify `spark.jars`, but they can not access the classes in jars with Scala code. ## Describe Your Solution 🔧 Add the jars into repl classpath. ## Types of changes 🔖 - [x] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests UT. --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) **Be nice. Be informative.** Closes #6235 from turboFei/scala_repl_urls. Closes #6223 3445026 [Wang, Fei] scala 2.13 cc6e289 [Wang, Fei] todo a8b3731 [Wang, Fei] refine 65b438c [Wang, Fei] remove scala reflect check eb257c7 [Wang, Fei] using -classpath e1c6f0e [Wang, Fei] revert 2.13 15d3766 [Wang, Fei] repl 41ebe10 [Wang, Fei] fix ut ed5d344 [Wang, Fei] info 1cdd82a [Wang, Fei] comment aa4292d [Wang, Fei] fix a10cfa5 [Wang, Fei] ut 63fdb88 [Wang, Fei] Use global.classPath.asURLs instead of class loader urls Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Cheng Pan <chengpan@apache.org> (cherry picked from commit 9b618c9) Signed-off-by: Cheng Pan <chengpan@apache.org>
This pull request fixes #6223 Even the user specify `spark.jars`, but they can not access the classes in jars with Scala code. Add the jars into repl classpath. - [x] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) UT. --- - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) **Be nice. Be informative.** Closes #6235 from turboFei/scala_repl_urls. Closes #6223 3445026 [Wang, Fei] scala 2.13 cc6e289 [Wang, Fei] todo a8b3731 [Wang, Fei] refine 65b438c [Wang, Fei] remove scala reflect check eb257c7 [Wang, Fei] using -classpath e1c6f0e [Wang, Fei] revert 2.13 15d3766 [Wang, Fei] repl 41ebe10 [Wang, Fei] fix ut ed5d344 [Wang, Fei] info 1cdd82a [Wang, Fei] comment aa4292d [Wang, Fei] fix a10cfa5 [Wang, Fei] ut 63fdb88 [Wang, Fei] Use global.classPath.asURLs instead of class loader urls Authored-by: Wang, Fei <fwang12@ebay.com> Signed-off-by: Cheng Pan <chengpan@apache.org> (cherry picked from commit 9b618c9) Signed-off-by: Cheng Pan <chengpan@apache.org>
|
Thanks, merged to master/1.9/1.8 |
🔍 Description
Issue References 🔗
This pull request fixes #6223
Even the user specify
spark.jars, but they can not access the classes in jars with Scala code.Describe Your Solution 🔧
Add the jars into repl classpath.
Types of changes 🔖
Test Plan 🧪
Behavior Without This Pull Request ⚰️
Behavior With This Pull Request 🎉
Related Unit Tests
UT.
Checklist 📝
Be nice. Be informative.