Skip to content

SparkMonitor/varOne

Repository files navigation

varOne

Join the chat at https://gitter.im/SparkMonitor/varOne

This is an Apache Spark web monitoring tool, named varOne.

varOne provides a web UI for you to monitor the metrics of Spark applications more efficiently and easily. varOne ingests the spark event logs and metric data to summarizes them as rich charts. If you don't want to use this web UI, you can use the RESTful APIs provided by varOne and custom one by yourself.

Usage

Prerequisites

  • Spark on YARN(We will loose this restriction in near future)
    • Support yarn-client.
  • JDK 7 and later
  • metrics.properties should enable CsvSink for all instances, you can follow the config in the below:
*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink
*.sink.csv.period=1
*.sink.csv.unit=seconds
*.sink.csv.directory=/path/to/CSV_SINK
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

Tip: metrics.properties exists in $SPARK_HOME/conf.

  • Setting the below flag when using spark-submit:

    --files=/path/to/metrics.properties --conf spark.metrics.conf=metrics.properties

    or set the properties in you application. For example:

    val sparkConf = new spark.SparkConf()
    .set("spark.metrics.conf.*.sink.csv.class", "org.apache.spark.metrics.sink.CsvSink")
    .set("spark.metrics.conf.*.sink.csv.period", "1")
    .set("spark.metrics.conf.*.sink.csv.unit", "seconds")
    .set("spark.metrics.conf.*.sink.csv.directory", "/path/to/CSV_SINK")
    .set("spark.metrics.conf.driver.source.jvm.class", "org.apache.spark.metrics.source.JvmSource")
    .set("spark.metrics.conf.executor.source.jvm.class", "org.apache.spark.metrics.source.JvmSource")
    
    val sc = new spark.SparkContext(sparkConf)
  • Enable Spark event log, in $SPARK_HOME/conf/spark-defaults.conf

    • Set spark.eventLog.enabled to true
    • Give spark.eventLog.dir to a HDFS folder

Tip: varOne only support eventLog be storaged on HDFS currently, we will loose this restriction in near future.

a. Download

Click here to download varOne-0.1.0

b. Start varOne daemond

Deploy the varOne-0.1.0.tgz to each node in your cluster and untar all of it
Then pick one node to start all daemonds by following instructions:

  • Configure varOne-site.xml in $VARONE_HOME/conf directory
  • Configure varOne-env.sh in $VARONE_HOME/conf directory
    • Make sure you have set SPARK_HOME
  • Configure varonedaemond in $VARONE_HOME/conf directory
    • List your each hostname(one host per line)
  • Run: ./bin/varOned-all.sh start

After running, you can check whether VarOned process listed by jps
In addition, you can stop all varOne daemond as this command: ./bin/varOned-all.sh stop

c. Start varOne web server

Follow below steps to start varOne web server:

  • Configure varOne-site.xml in $VARONE_HOME/conf directory
  • Configure varOne-env.sh in $VARONE_HOME/conf directory
    • Make sure you have set SPARK_HOME
    • Make sure you have set HADOOP_CONF_DIR
  • Run: ./bin/varOne.sh

After running, open browser and go to http://localhost:8080/varOne-web/index.html

d. About varOne-site.xml

varOne.server.port

  • varOne Web Server port, default is 8080

varOne.node.port

  • varOne daemond port, default is 8181

varOne.node.thread.number

  • The number of RPC handler for varOne daemond, default is 5

varOne.server.context.path

  • Context Path of the varOne Web Application, default is /varOne-web

varOne.war.tempdir

  • Location of jetty temporary directory

Development

Check this document