Skip to content

Conversation

@turboFei
Copy link
Member

@turboFei turboFei commented Apr 1, 2024

🔍 Description

Issue References 🔗

This pull request fixes #6223

Even the user specify spark.jars, but they can not access the classes in jars with Scala code.

Describe Your Solution 🔧

Add the jars into repl classpath.

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

Related Unit Tests

UT.


Checklist 📝

Be nice. Be informative.

@turboFei turboFei changed the title Scala repl urls Fix scala repl issue Apr 1, 2024
@turboFei turboFei changed the title Fix scala repl issue Fix scala repl can not access spark.jars issue Apr 1, 2024
@turboFei turboFei requested review from pan3793 and yaooqinn April 1, 2024 22:13
@turboFei turboFei self-assigned this Apr 1, 2024
@turboFei turboFei changed the title Fix scala repl can not access spark.jars issue Fix Scala interpreter can not access spark.jars issue Apr 1, 2024
!file.getName.contains("scala-lang_scala-reflect")
}
info(s"Adding jars to Scala interpreter's class path: " +
allJars.mkString(File.pathSeparator))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

24/04/01 19:01:30 INFO KyuubiSparkILoop: Adding jars to Scala interpreter's class path: file:/Users/fwang12/todo/kyuubi/externals/kyuubi-spark-sql-engine/target/kyuubi-spark-sql-engine_2.12-1.10.0-SNAPSHOT.jar:file:/var/folders/pr/9x9rt3j974zfn7p4f5dpf8m80000gq/T/kyuubi-32215772-8004-43c1-a070-1cb0efd79bf0/test-function-02417951-6b76-4cf7-bb4a-5fc20372c034.jar

s"${spark.sparkContext.getConf.get("spark.repl.class.outputDir")}")
s"${spark.sparkContext.getConf.get("spark.repl.class.outputDir")}",
"-classpath",
allJars.mkString(File.pathSeparator))
Copy link
Member Author

@turboFei turboFei Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with yarn-cluster mode:

24/04/01 20:12:30 INFO KyuubiSparkILoop: Adding jars to Scala interpreter's class path: file:/hadoop/7/yarn/local/usercache/b_stf/appcache/application_1710906451321_1347246/container_e4072_1710906451321_1347246_01_000002/__app__.jar:file:/hadoop/7/yarn/local/usercache/b_stf/appcache/application_1710906451321_1347246/container_e4072_1710906451321_1347246_01_000002/delta-core_2.12-1.0.2.0.3.0.jar:file:/hadoop/7/yarn/local/usercache/b_stf/appcache/application_1710906451321_1347246/container_e4072_1710906451321_1347246_01_000002/iceberg-spark-runtime-3.1_2.12-1.2.1.0.7.1.jar:file:/hadoop/7/yarn/local/usercache/b_stf/appcache/application_1710906451321_1347246/container_e4072_1710906451321_1347246_01_000002/simplark-etl-assembly-latest-testing.jar:file:/hadoop/7/yarn/local/usercache/b_stf/appcache/application_1710906451321_1347246/container_e4072_1710906451321_1347246_01_000002/zeta-serde-0.0.4-jar-with-dependencies.jar

24/04/01 20:12:36 INFO ExecuteScala: Processing b_stf's query[874507af-4dfe-4521-ad26-3caeaa154d28]: PENDING_STATE -> RUNNING_STATE, statement:
import org.apache.iceberg._
24/04/01 20:12:36 INFO OperationAuditLogger: operation=874507af-4dfe-4521-ad26-3caeaa154d28	opType=ExecuteScala	state=RUNNING	user=b_stf	session=973b28ec-52d8-454e-b2f1-45fc95957615
24/04/01 20:12:36 INFO ExecuteScala:
           Spark application name: kyuubi_apollorno_CONNECTION_SPARK_SQL_b_stf_973b28ec-52d8-454e-b2f1-45fc95957615
                 application ID: application_1710906451321_1347246
                 application web UI:  ......
                 master: yarn
                 deploy mode: cluster
                 version: 3.1.1.1.3.0
           Start time: 2024-04-01T20:11:29.244
           User: b_stf
24/04/01 20:12:36 INFO ExecuteScala: Processing b_stf's query[874507af-4dfe-4521-ad26-3caeaa154d28]: RUNNING_STATE -> FINISHED_STATE, time taken: 0.154 seconds
24/04/01 20:12:36 INFO OperationAuditLogger: operation=874507af-4dfe-4521-ad26-3caeaa154d28	opType=ExecuteScala	state=FINISHED	user=b_stf	session=973b28ec-52d8-454e-b2f1-45fc95957615
24/04/01 20:12:36 INFO ExecuteScala: statementId=874507af-4dfe-4521-ad26-3caeaa154d28, operationRunTime=0 ms, operationCpuTime=0 ms
2024-04-01 20:12:36.439 INFO org.apache.kyuubi.operation.ExecuteStatement: Query[874507af-4dfe-4521-ad26-3caeaa154d28] in FINISHED_STATE
2024-04-01 20:12:36.439 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing b_stf's query[874507af-4dfe-4521-ad26-3caeaa154d28]: RUNNING_STATE -> FINISHED_STATE, time taken: 0.197 seconds
+------------------------------+
|            output            |
+------------------------------+
| import org.apache.iceberg._  |
+------------------------------+
1 row selected (6.231 seconds)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this.addUrlsToClassPath(allJars: _*), it does not work for my use case - yarn-cluster mode.

@pan3793
Copy link
Member

pan3793 commented Apr 2, 2024

cc @cxzl25

@pan3793 pan3793 changed the title Fix Scala interpreter can not access spark.jars issue [KYUUBI #6223] Fix Scala interpreter can not access spark.jars issue Apr 2, 2024
@codecov-commenter
Copy link

codecov-commenter commented Apr 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.38%. Comparing base (d0448f2) to head (3445026).
Report is 3 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #6235      +/-   ##
============================================
- Coverage     58.49%   58.38%   -0.11%     
  Complexity       24       24              
============================================
  Files           651      651              
  Lines         39509    39511       +2     
  Branches       5439     5440       +1     
============================================
- Hits          23109    23068      -41     
- Misses        13932    13966      +34     
- Partials       2468     2477       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pan3793
Copy link
Member

pan3793 commented Apr 2, 2024

I verified this patch with Spark(3.3) on K8s cluster mode.

u.getProtocol == "file" && file.isFile &&
file.getName.contains("scala-lang_scala-reflect")
}
this.addUrlsToClassPath(allJars: _*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question, Is there a fundamental difference between -classpath and addUrlsToClassPath ? Why can -classpath solve this problem in this way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the underlying implementation, I checked the Spark and Zeppelin code, both use -classpath

@pan3793 pan3793 added this to the v1.8.2 milestone Apr 3, 2024
@pan3793 pan3793 closed this in 9b618c9 Apr 3, 2024
pan3793 pushed a commit that referenced this pull request Apr 3, 2024
# 🔍 Description
## Issue References 🔗

This pull request fixes #6223

Even the user specify `spark.jars`, but they can not access the classes in jars with Scala code.

## Describe Your Solution 🔧

Add the jars into repl classpath.

## Types of changes 🔖

- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests
UT.

---

# Checklist 📝

- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6235 from turboFei/scala_repl_urls.

Closes #6223

3445026 [Wang, Fei] scala 2.13
cc6e289 [Wang, Fei] todo
a8b3731 [Wang, Fei] refine
65b438c [Wang, Fei] remove scala reflect check
eb257c7 [Wang, Fei] using -classpath
e1c6f0e [Wang, Fei] revert 2.13
15d3766 [Wang, Fei] repl
41ebe10 [Wang, Fei] fix ut
ed5d344 [Wang, Fei] info
1cdd82a [Wang, Fei] comment
aa4292d [Wang, Fei] fix
a10cfa5 [Wang, Fei] ut
63fdb88 [Wang, Fei] Use global.classPath.asURLs instead of class loader urls

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit 9b618c9)
Signed-off-by: Cheng Pan <chengpan@apache.org>
pan3793 pushed a commit that referenced this pull request Apr 3, 2024
This pull request fixes #6223

Even the user specify `spark.jars`, but they can not access the classes in jars with Scala code.

Add the jars into repl classpath.

- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

UT.

---

- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6235 from turboFei/scala_repl_urls.

Closes #6223

3445026 [Wang, Fei] scala 2.13
cc6e289 [Wang, Fei] todo
a8b3731 [Wang, Fei] refine
65b438c [Wang, Fei] remove scala reflect check
eb257c7 [Wang, Fei] using -classpath
e1c6f0e [Wang, Fei] revert 2.13
15d3766 [Wang, Fei] repl
41ebe10 [Wang, Fei] fix ut
ed5d344 [Wang, Fei] info
1cdd82a [Wang, Fei] comment
aa4292d [Wang, Fei] fix
a10cfa5 [Wang, Fei] ut
63fdb88 [Wang, Fei] Use global.classPath.asURLs instead of class loader urls

Authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit 9b618c9)
Signed-off-by: Cheng Pan <chengpan@apache.org>
@pan3793
Copy link
Member

pan3793 commented Apr 3, 2024

Thanks, merged to master/1.9/1.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[TASK][EASY] Jars added by spark.jars can not be used in Scala mode

5 participants