Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIVY-246. Support multiple Spark environments in Livy #232

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

jerryshao
Copy link
Contributor

@jerryshao jerryshao commented Nov 17, 2016

This PR propose to add a SparkEnvironment to isolate Spark's related configurations and libraries, also extend it to support multiple SparkEnvironment in LivyServer.

Why

Current Livy Supports different Sparks with one build, user could configure different spark home to choose the right Spark to run, but this still requires stopping LivyServer, updating configuration and restarting it. To further extend the usability of Livy, it would be better to support different Sparks in runtime, when user create a session, user could specify which Spark they wanted and Livy will pick right Spark and start application.

How to use

To enable it, we extend the Livy configurations to support multiple Spark profiles, user could configure:

livy.server.spark-env.default.spark-home = xxx or TEST_SPARK_HOME = xxx
livy.server.spark-env.default.spark-conf-dir = xxx or TEST_SPARK_CONF_DIR = xxx

and

livy.server.spark-env.production.spark-home = xxx or PRODUCTION_SPARK_HOME = xxx
livy.server.spark-env.production.spark-conf-dir = xxx or PRODUCTION_SPARK_CONF_DIR = xxx

Internally Livy will create two Spark environments "test" and "product", when user issue a session creation request, he could specify sparkEnv with "test" in JSON body. Livy will pick the right Spark environment and start application.

To be compatible with existing configuration and test, if user configured:

livy.server.spark-home = xxx or SPARK_HOME = xxx

This is equal to:

livy.server.spark-env.default.spark-home = xxx or DEFAULT_SPARK_HOME = xxx

Livy will treat these as "default" Spark environment. If user didn't specify sparkEnv in JSON body, then the default one will be picked.

Implementation

To achieve this, I introduced a SparkEnvironment class, one LivyServer can have multiple SparkEnvironment based on configuration. Also I refactored the code to move Spark related codes into this class.

Limitation

Some configurations like spark master and deploy mode cannot be configured per Spark environment currently.

Works done and to be done

  • Code implementation for isolated spark environments.
  • Add unit tests.
  • Change launching scripts.
  • Change docs and configuration files.

Please review and suggest, thanks a lot.

@codecov-io
Copy link

codecov-io commented Nov 17, 2016

Codecov Report

Merging #232 into master will increase coverage by 0.57%.
The diff coverage is 67.2%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #232      +/-   ##
============================================
+ Coverage     70.46%   71.03%   +0.57%     
- Complexity      726      742      +16     
============================================
  Files            96       97       +1     
  Lines          5123     5183      +60     
  Branches        774      779       +5     
============================================
+ Hits           3610     3682      +72     
+ Misses          996      981      -15     
- Partials        517      520       +3
Impacted Files Coverage Δ Complexity Δ
.../com/cloudera/livy/utils/SparkProcessBuilder.scala 58.42% <ø> (+1.28%) 13 <0> (+1) ⬆️
.../main/scala/com/cloudera/livy/utils/SparkApp.scala 80.76% <ø> (ø) 1 <0> (ø) ⬇️
...in/scala/com/cloudera/livy/server/LivyServer.scala 35.52% <ø> (-1.9%) 9 <0> (ø)
...a/com/cloudera/livy/utils/LineBufferedStream.scala 78.37% <ø> (ø) 5 <0> (ø) ⬇️
...n/scala/com/cloudera/livy/utils/SparkProcApp.scala 38.88% <ø> (ø) 5 <0> (ø) ⬇️
.../com/cloudera/livy/utils/LineBufferedProcess.scala 73.33% <ø> (ø) 7 <0> (ø) ⬇️
...n/scala/com/cloudera/livy/utils/SparkYarnApp.scala 61.7% <ø> (ø) 29 <0> (ø) ⬇️
...er/src/main/scala/com/cloudera/livy/LivyConf.scala 93.2% <100%> (-1.16%) 14 <0> (-2)
...scala/com/cloudera/livy/utils/LivySparkUtils.scala 69.01% <100%> (-0.85%) 0 <0> (ø)
...ain/scala/com/cloudera/livy/sessions/Session.scala 73.33% <100%> (ø) 17 <0> (ø) ⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2abb8a3...19381f8. Read the comment docs.

@jerryshao jerryshao changed the title LIVY-246. WIP. Support multiple Spark environments in Livy LIVY-246. Support multiple Spark environments in Livy Nov 21, 2016
@jerryshao jerryshao closed this Nov 22, 2016
@jerryshao jerryshao reopened this Nov 22, 2016
@alex-the-man alex-the-man added this to the 0.4 milestone Dec 16, 2016
Change-Id: Ib05f074f2470f67101c76d7de7c3c41c3c9632a7
Change-Id: I8100820a7ac05a6568a1c05fe63f4b05c0cf5278
Change-Id: Id8b410a90117014aaea7c1b8496e54bcef4cdddc
Change-Id: I6ef0650f135c0dbf5e57437ef6cd93c98d9352e7
Change-Id: I0e66ff509903ab276ca816818e4b288b45699213
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants