Build and Package

In order for EclairJS Client to talk to Apache Spark, it needs a instance of Apache Toree running and Toree must be able to connect to your Spark master.

Prerequisites

Java 8 update 70 or higher

Instructions

Download Apache Spark 2.0.0 built with Hadoop 2.7 and extract it from the archive.
Install Jupyter (pip install jupyter for example) and the Jupyter Kernel Gateway (pip install jupyter-kernel-gateway)
Download and build Apache Toree

$ git clone https://github.com/apache/incubator-toree
$ cd incubator-toree
$ git checkout e8ecd0623c65ad104045b1797fb27f69b8dfc23f
$ make dist
$ make sbt-publishM2

This will create a dist directory containing dist/toree/bin/run.sh

Download the EclairJS Server JAR file from Maven (http://repo2.maven.org/maven2/org/eclairjs/eclairjs-nashorn/${ECLAIRJS_VERSION}/eclairjs-nashorn-${ECLAIRJS_VERSION}-jar-with-dependencies.jar) (replace ${ECLAIRJS_VERSION} with the version you are using)
Download kernel.json and replace the following:

/usr/local/share/jupyter/kernels/apache_toree_scala/bin/run.sh with the location of your installed Apache Toree (/usr/local/incubator-toree/dist/toree/bin/run.sh for example on OSX, see the Location in step 3)
"SPARK_HOME" should point to the extracted Apache Spark directory (spark-2.0.0-bin-hadoop2.7)
/opt/nashorn/lib/eclairjs.jar should point at the JAR file downloaded in step 4. If you run into memory issues (such as out of memory errors), you can up the memory limit in the kernel.json file by adding --driver-memory 8g to SPARK_OPT

Figure out your Jupyter data directory by running:

$ jupyter --data
/Users/youruser/Library/Jupyter

Copy kernel.json to kernels/eclair/ in the directory you got above.

Start Jupyter:

jupyter notebook --no-browser

If you get an error similar to '_xsrf' argument missing from POST, this means you are running a version of Jupyter that has Token Authentication enabled. Please read this guide on how to fix this.

Test EclairJS Client

To make sure everything is working, create a simple EclairJS Client example.

Create a file called package.json:

{
  "name": "eclairjs-test",
  "version": "0.1.0",
  "dependencies": {
    "eclairjs": "*"
  }
}

And a file called test.js:

var eclairjs = require('eclairjs');

var spark = new eclairjs();

var sc = new spark.SparkContext("local[*]", "Simple Text");

var data = sc.parallelize([1,2,3,4,5,6,7,8,9,0]);

data.collect().then(function(val) {
  console.log("Success:", val);

  sc.stop().then(process.exit);
}).catch(function(err) {
  console.log("Error:", err);
  sc.stop().then(process.exit);
});

Install the dependencies:

$ npm install

Now we are ready to actually run the example:

$ node --harmony test.js
Starting WebSocket: ws://127.0.0.1:8888/api/kernels/436e67e6-2605-4085-9c5d-ba43d828a038
got kernel
Success: [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 ]

If you get an error similar to API request failed or Failed to connect to Jupyter instance, please follow this guide on how to fix this.

To run test suite:

$ npm run integration-test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build and Package

Clone this wiki locally