Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 20 additions & 20 deletions conf/zeppelin-env.cmd.template
Original file line number Diff line number Diff line change
Expand Up @@ -17,35 +17,35 @@ REM limitations under the License.
REM

REM set JAVA_HOME=
REM set MASTER= REM Spark master url. eg. spark://master_addr:7077. Leave empty if you want to use local mode.
REM set ZEPPELIN_JAVA_OPTS REM Additional jvm options. for example, set ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g -Dspark.cores.max=16"
REM set ZEPPELIN_MEM REM Zeppelin jvm mem options Default -Xms1024m -Xmx1024m -XX:MaxMetaspaceSize=512m
REM set ZEPPELIN_INTP_MEM REM zeppelin interpreter process jvm mem options. Default -Xmx1024m -Xms1024m -XX:MaxMetaspaceSize=512m
REM set ZEPPELIN_INTP_JAVA_OPTS REM zeppelin interpreter process jvm options.
REM set ZEPPELIN_JMX_ENABLE REM Enable JMX feature by defining it like "true"
REM set ZEPPELIN_JMX_PORT REM Port number which JMX uses. If not set, JMX won't be enabled
REM set SPARK_MASTER= REM Spark master url. eg. spark://master_addr:7077. Leave empty if you want to use local mode.
REM set ZEPPELIN_JAVA_OPTS REM Additional jvm options. for example, set ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g -Dspark.cores.max=16"
REM set ZEPPELIN_MEM REM Zeppelin jvm mem options Default -Xms1024m -Xmx1024m -XX:MaxMetaspaceSize=512m
REM set ZEPPELIN_INTP_MEM REM zeppelin interpreter process jvm mem options. Default -Xmx1024m -Xms1024m -XX:MaxMetaspaceSize=512m
REM set ZEPPELIN_INTP_JAVA_OPTS REM zeppelin interpreter process jvm options.
REM set ZEPPELIN_JMX_ENABLE REM Enable JMX feature by defining it like "true"
REM set ZEPPELIN_JMX_PORT REM Port number which JMX uses. If not set, JMX won't be enabled

REM set ZEPPELIN_LOG_DIR REM Where log files are stored. PWD by default.
REM set ZEPPELIN_PID_DIR REM The pid files are stored. /tmp by default.
REM set ZEPPELIN_WAR_TEMPDIR REM The location of jetty temporary directory.
REM set ZEPPELIN_NOTEBOOK_DIR REM Where notebook saved
REM set ZEPPELIN_NOTEBOOK_HOMESCREEN REM Id of notebook to be displayed in homescreen. ex) 2A94M5J1Z
REM set ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE REM hide homescreen notebook from list when this value set to "true". default "false"
REM set ZEPPELIN_LOG_DIR REM Where log files are stored. PWD by default.
REM set ZEPPELIN_PID_DIR REM The pid files are stored. /tmp by default.
REM set ZEPPELIN_WAR_TEMPDIR REM The location of jetty temporary directory.
REM set ZEPPELIN_NOTEBOOK_DIR REM Where notebook saved
REM set ZEPPELIN_NOTEBOOK_HOMESCREEN REM Id of notebook to be displayed in homescreen. ex) 2A94M5J1Z
REM set ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE REM hide homescreen notebook from list when this value set to "true". default "false"
REM set ZEPPELIN_NOTEBOOK_S3_BUCKET REM Bucket where notebook saved
REM set ZEPPELIN_NOTEBOOK_S3_USER REM User in bucket where notebook saved. For example bucket/user/notebook/2A94M5J1Z/note.json
REM set ZEPPELIN_NOTEBOOK_S3_ENDPOINT REM Endpoint of the bucket
REM set ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID REM AWS KMS key ID
REM set ZEPPELIN_NOTEBOOK_S3_KMS_KEY_REGION REM AWS KMS key region
REM set ZEPPELIN_NOTEBOOK_S3_SSE REM Server-side encryption enabled for notebooks
REM set ZEPPELIN_IDENT_STRING REM A string representing this instance of zeppelin. $USER by default.
REM set ZEPPELIN_NICENESS REM The scheduling priority for daemons. Defaults to 0.
REM set ZEPPELIN_IDENT_STRING REM A string representing this instance of zeppelin. $USER by default.
REM set ZEPPELIN_NICENESS REM The scheduling priority for daemons. Defaults to 0.
REM set ZEPPELIN_INTERPRETER_LOCALREPO REM Local repository for interpreter's additional dependency loading
REM set ZEPPELIN_INTERPRETER_DEP_MVNREPO REM Maven principal repository for interpreter's additional dependency loading
REM set ZEPPELIN_HELIUM_NODE_INSTALLER_URL REM Remote Node installer url for Helium dependency loader
REM set ZEPPELIN_HELIUM_NPM_INSTALLER_URL REM Remote Npm installer url for Helium dependency loader
REM set ZEPPELIN_HELIUM_YARNPKG_INSTALLER_URL REM Remote Yarn package installer url for Helium dependency loader
REM set ZEPPELIN_NOTEBOOK_STORAGE REM Refers to pluggable notebook storage class, can have two classes simultaneously with a sync between them (e.g. local and remote).
REM set ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC REM If there are multiple notebook storages, should we treat the first one as the only source of truth?
REM set ZEPPELIN_NOTEBOOK_STORAGE REM Refers to pluggable notebook storage class, can have two classes simultaneously with a sync between them (e.g. local and remote).
REM set ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC REM If there are multiple notebook storages, should we treat the first one as the only source of truth?


REM Spark interpreter configuration
Expand All @@ -62,10 +62,10 @@ REM without SPARK_HOME defined, Zeppelin still able to run spark interpreter pro
REM however, it is not encouraged when you can define SPARK_HOME
REM
REM Options read in YARN client mode
REM set HADOOP_CONF_DIR REM yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR.
REM set HADOOP_CONF_DIR REM yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR.
REM Pyspark (supported with Spark 1.2.1 and above)
REM To configure pyspark, you need to set spark distribution's path to 'spark.home' property in Interpreter setting screen in Zeppelin GUI
REM set PYSPARK_PYTHON REM path to the python command. must be the same path on the driver(Zeppelin) and all workers.
REM set PYSPARK_PYTHON REM path to the python command. must be the same path on the driver(Zeppelin) and all workers.
REM set PYTHONPATH

REM Spark interpreter options
Expand All @@ -77,6 +77,6 @@ REM set ZEPPELIN_SPARK_MAXRESULT REM Max number of Spark SQL result to dis

REM ZeppelinHub connection configuration
REM
REM set ZEPPELINHUB_API_ADDRESS REM Refers to the address of the ZeppelinHub service in use
REM set ZEPPELINHUB_API_ADDRESS REM Refers to the address of the ZeppelinHub service in use
REM set ZEPPELINHUB_API_TOKEN REM Refers to the Zeppelin instance token of the user
REM set ZEPPELINHUB_USER_KEY REM Optional, when using Zeppelin with authentication.
78 changes: 39 additions & 39 deletions conf/zeppelin-env.sh.template
Original file line number Diff line number Diff line change
Expand Up @@ -17,50 +17,50 @@
#

# export JAVA_HOME=
# export MASTER= # Spark master url. eg. spark://master_addr:7077. Leave empty if you want to use local mode.
# export ZEPPELIN_ADDR # Bind address (default 127.0.0.1)
# export ZEPPELIN_PORT # port number to listen (default 8080)
# export ZEPPELIN_LOCAL_IP # Zeppelin's thrift server ip address, if not specified, one random IP address will be choosen.
# export ZEPPELIN_JAVA_OPTS # Additional jvm options. for example, export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g -Dspark.cores.max=16"
# export ZEPPELIN_MEM # Zeppelin jvm mem options Default -Xms1024m -Xmx1024m -XX:MaxMetaspaceSize=512m
# export ZEPPELIN_INTP_MEM # zeppelin interpreter process jvm mem options. Default -Xms1024m -Xmx1024m -XX:MaxMetaspaceSize=512m
# export ZEPPELIN_INTP_JAVA_OPTS # zeppelin interpreter process jvm options.
# export ZEPPELIN_SSL_PORT # ssl port (used when ssl environment variable is set to true)
# export ZEPPELIN_JMX_ENABLE # Enable JMX feature by defining "true"
# export ZEPPELIN_JMX_PORT # Port number which JMX uses. If not set, JMX won't be enabled

# export ZEPPELIN_LOG_DIR # Where log files are stored. PWD by default.
# export ZEPPELIN_PID_DIR # The pid files are stored. ${ZEPPELIN_HOME}/run by default.
# export ZEPPELIN_WAR_TEMPDIR # The location of jetty temporary directory.
# export ZEPPELIN_NOTEBOOK_DIR # Where notebook saved
# export ZEPPELIN_NOTEBOOK_HOMESCREEN # Id of notebook to be displayed in homescreen. ex) 2A94M5J1Z
# export ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE # hide homescreen notebook from list when this value set to "true". default "false"

# export ZEPPELIN_NOTEBOOK_S3_BUCKET # Bucket where notebook saved
# export ZEPPELIN_NOTEBOOK_S3_ENDPOINT # Endpoint of the bucket
# export ZEPPELIN_NOTEBOOK_S3_USER # User in bucket where notebook saved. For example bucket/user/notebook/2A94M5J1Z/note.json
# export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID # AWS KMS key ID
# export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_REGION # AWS KMS key region
# export ZEPPELIN_NOTEBOOK_S3_SSE # Server-side encryption enabled for notebooks
# export SPARK_MASTER= # Spark master url. eg. spark://master_addr:7077. Leave empty if you want to use local mode.
# export ZEPPELIN_ADDR # Bind address (default 127.0.0.1)
# export ZEPPELIN_PORT # port number to listen (default 8080)
# export ZEPPELIN_LOCAL_IP # Zeppelin's thrift server ip address, if not specified, one random IP address will be choosen.
# export ZEPPELIN_JAVA_OPTS # Additional jvm options. for example, export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g -Dspark.cores.max=16"
# export ZEPPELIN_MEM # Zeppelin jvm mem options Default -Xms1024m -Xmx1024m -XX:MaxMetaspaceSize=512m
# export ZEPPELIN_INTP_MEM # zeppelin interpreter process jvm mem options. Default -Xms1024m -Xmx1024m -XX:MaxMetaspaceSize=512m
# export ZEPPELIN_INTP_JAVA_OPTS # zeppelin interpreter process jvm options.
# export ZEPPELIN_SSL_PORT # ssl port (used when ssl environment variable is set to true)
# export ZEPPELIN_JMX_ENABLE # Enable JMX feature by defining "true"
# export ZEPPELIN_JMX_PORT # Port number which JMX uses. If not set, JMX won't be enabled

# export ZEPPELIN_LOG_DIR # Where log files are stored. PWD by default.
# export ZEPPELIN_PID_DIR # The pid files are stored. ${ZEPPELIN_HOME}/run by default.
# export ZEPPELIN_WAR_TEMPDIR # The location of jetty temporary directory.
# export ZEPPELIN_NOTEBOOK_DIR # Where notebook saved
# export ZEPPELIN_NOTEBOOK_HOMESCREEN # Id of notebook to be displayed in homescreen. ex) 2A94M5J1Z
# export ZEPPELIN_NOTEBOOK_HOMESCREEN_HIDE # hide homescreen notebook from list when this value set to "true". default "false"

# export ZEPPELIN_NOTEBOOK_S3_BUCKET # Bucket where notebook saved
# export ZEPPELIN_NOTEBOOK_S3_ENDPOINT # Endpoint of the bucket
# export ZEPPELIN_NOTEBOOK_S3_USER # User in bucket where notebook saved. For example bucket/user/notebook/2A94M5J1Z/note.json
# export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID # AWS KMS key ID
# export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_REGION # AWS KMS key region
# export ZEPPELIN_NOTEBOOK_S3_SSE # Server-side encryption enabled for notebooks

# export ZEPPELIN_NOTEBOOK_GCS_STORAGE_DIR # GCS "directory" (prefix) under which notebooks are saved. E.g. gs://example-bucket/path/to/dir
# export GOOGLE_APPLICATION_CREDENTIALS # Provide a service account key file for GCS and BigQuery API calls (overrides application default credentials)

# export ZEPPELIN_NOTEBOOK_MONGO_URI # MongoDB connection URI used to connect to a MongoDB database server. Default "mongodb://localhost"
# export ZEPPELIN_NOTEBOOK_MONGO_DATABASE # Database name to store notebook. Default "zeppelin"
# export ZEPPELIN_NOTEBOOK_MONGO_COLLECTION # Collection name to store notebook. Default "notes"
# export ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT # If "true" import local notes under ZEPPELIN_NOTEBOOK_DIR on startup. Default "false"
# export ZEPPELIN_NOTEBOOK_MONGO_URI # MongoDB connection URI used to connect to a MongoDB database server. Default "mongodb://localhost"
# export ZEPPELIN_NOTEBOOK_MONGO_DATABASE # Database name to store notebook. Default "zeppelin"
# export ZEPPELIN_NOTEBOOK_MONGO_COLLECTION # Collection name to store notebook. Default "notes"
# export ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT # If "true" import local notes under ZEPPELIN_NOTEBOOK_DIR on startup. Default "false"

# export ZEPPELIN_IDENT_STRING # A string representing this instance of zeppelin. $USER by default.
# export ZEPPELIN_NICENESS # The scheduling priority for daemons. Defaults to 0.
# export ZEPPELIN_IDENT_STRING # A string representing this instance of zeppelin. $USER by default.
# export ZEPPELIN_NICENESS # The scheduling priority for daemons. Defaults to 0.
# export ZEPPELIN_INTERPRETER_LOCALREPO # Local repository for interpreter's additional dependency loading
# export ZEPPELIN_INTERPRETER_DEP_MVNREPO # Remote principal repository for interpreter's additional dependency loading
# export ZEPPELIN_HELIUM_NODE_INSTALLER_URL # Remote Node installer url for Helium dependency loader
# export ZEPPELIN_HELIUM_NPM_INSTALLER_URL # Remote Npm installer url for Helium dependency loader
# export ZEPPELIN_HELIUM_YARNPKG_INSTALLER_URL # Remote Yarn package installer url for Helium dependency loader
# export ZEPPELIN_NOTEBOOK_STORAGE # Refers to pluggable notebook storage class, can have two classes simultaneously with a sync between them (e.g. local and remote).
# export ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC # If there are multiple notebook storages, should we treat the first one as the only source of truth?
# export ZEPPELIN_NOTEBOOK_PUBLIC # Make notebook public by default when created, private otherwise
# export ZEPPELIN_NOTEBOOK_STORAGE # Refers to pluggable notebook storage class, can have two classes simultaneously with a sync between them (e.g. local and remote).
# export ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC # If there are multiple notebook storages, should we treat the first one as the only source of truth?
# export ZEPPELIN_NOTEBOOK_PUBLIC # Make notebook public by default when created, private otherwise

# export DOCKER_TIME_ZONE # Set to the same time zone as the zeppelin server. E.g, "America/New_York" or "Asia/Shanghai"

Expand All @@ -84,10 +84,10 @@
## however, it is not encouraged when you can define SPARK_HOME
##
# Options read in YARN client mode
# export HADOOP_CONF_DIR # yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR.
# export HADOOP_CONF_DIR # yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR.
# Pyspark (supported with Spark 1.2.1 and above)
# To configure pyspark, you need to set spark distribution's path to 'spark.home' property in Interpreter setting screen in Zeppelin GUI
# export PYSPARK_PYTHON # path to the python command. must be the same path on the driver(Zeppelin) and all workers.
# export PYSPARK_PYTHON # path to the python command. must be the same path on the driver(Zeppelin) and all workers.
# export PYTHONPATH

## Spark interpreter options ##
Expand All @@ -106,9 +106,9 @@
# export HBASE_CONF_DIR= # (optional) Alternatively, configuration directory can be set to point to the directory that has hbase-site.xml

#### ZeppelinHub connection configuration ####
# export ZEPPELINHUB_API_ADDRESS # Refers to the address of the ZeppelinHub service in use
# export ZEPPELINHUB_API_TOKEN # Refers to the Zeppelin instance token of the user
# export ZEPPELINHUB_USER_KEY # Optional, when using Zeppelin with authentication.
# export ZEPPELINHUB_API_ADDRESS # Refers to the address of the ZeppelinHub service in use
# export ZEPPELINHUB_API_TOKEN # Refers to the Zeppelin instance token of the user
# export ZEPPELINHUB_USER_KEY # Optional, when using Zeppelin with authentication.

#### Zeppelin impersonation configuration
# export ZEPPELIN_IMPERSONATE_CMD # Optional, when user want to run interpreter as end web user. eg) 'sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c '
Expand Down
4 changes: 2 additions & 2 deletions docs/interpreter/spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ You can also set other Spark properties which are not listed in the table. For a
<td>Location of spark distribution</td>
<tr>
<tr>
<td>master</td>
<td>spark.master</td>
<td>local[*]</td>
<td>Spark master uri. <br/> e.g. spark://master_host:7077</td>
<tr>
Expand Down Expand Up @@ -248,7 +248,7 @@ configuration with code together for more flexibility. e.g.
</center>

### Set master in Interpreter menu
After starting Zeppelin, go to **Interpreter** menu and edit **master** property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.
After starting Zeppelin, go to **Interpreter** menu and edit **spark.master** property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.

For example,

Expand Down
4 changes: 2 additions & 2 deletions docs/quickstart/kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ And then start your spark interpreter
sc.parallelize(1 to 100).count
...
```
While `master` property of SparkInterpreter starts with `k8s://` (default `k8s://https://kubernetes.default.svc` when Zeppelin started using zeppelin-server.yaml), Spark executors will be automatically created in your Kubernetes cluster.
While `spark.master` property of SparkInterpreter starts with `k8s://` (default `k8s://https://kubernetes.default.svc` when Zeppelin started using zeppelin-server.yaml), Spark executors will be automatically created in your Kubernetes cluster.
Spark UI is accessible by clicking `SPARK JOB` on the Paragraph.

Check [here](https://spark.apache.org/docs/latest/running-on-kubernetes.html) to know more about Running Spark on Kubernetes.
Expand Down Expand Up @@ -192,7 +192,7 @@ and all interpreter properties are accessible inside the templates.

When interpreter group is `spark`, Zeppelin sets necessary spark configuration automatically to use Spark on Kubernetes.
It uses client mode, so Spark interpreter Pod works as a Spark driver, spark executors are launched in separate Pods.
This auto configuration can be overrided by manually setting `master` property of Spark interpreter.
This auto configuration can be overrided by manually setting `spark.master` property of Spark interpreter.


### Accessing Spark UI (or Service running in interpreter Pod)
Expand Down
3 changes: 1 addition & 2 deletions docs/setup/deployment/cdh.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,14 +76,13 @@ To verify the application is running well, check the web UI for HDFS on `http://
Set following configurations to `conf/zeppelin-env.sh`.

```bash
export MASTER=yarn-client
export HADOOP_CONF_DIR=[your_hadoop_conf_path]
export SPARK_HOME=[your_spark_home_path]
```

`HADOOP_CONF_DIR`(Hadoop configuration path) is defined in `/scripts/docker/spark-cluster-managers/cdh/hdfs_conf`.

Don't forget to set Spark `master` as `yarn-client` in Zeppelin **Interpreters** setting page like below.
Don't forget to set Spark `spark.master` as `yarn-client` in Zeppelin **Interpreters** setting page like below.

<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/zeppelin_yarn_conf.png" />

Expand Down
2 changes: 1 addition & 1 deletion docs/setup/deployment/flink_and_spark_cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -394,7 +394,7 @@ Open a web browser and go to the Zeppelin web-ui at http://yourip:8080.
Now go back to the Zeppelin web-ui at http://`yourip`:8080 and this time click on *anonymous* at the top right, which will open a drop-down menu, select *Interpreters* to enter interpreter configuration.

In the Spark section, click the edit button in the top right corner to make the property values editable (looks like a pencil).
The only field that needs to be edited in the Spark interpreter is the master field. Change this value from `local[*]` to the URL you used to start the slave, mine was `spark://ubuntu:7077`.
The only field that needs to be edited in the Spark interpreter is the `spark.master` field. Change this value from `local[*]` to the URL you used to start the slave, mine was `spark://ubuntu:7077`.

Click *Save* to update the parameters, and click *OK* when it asks you about restarting the interpreter.

Expand Down
Loading