Skip to content

Build and Package

Doron Rosenberg edited this page Jan 20, 2017 · 12 revisions

In order for EclairJS Client to talk to Apache Spark, it needs a instance of Apache Toree running and Toree must be able to connect to your Spark master.

Prerequisites

  • Java 8 update 70 or higher

Instructions

  1. Download Apache Spark 2.0.0 built with Hadoop 2.7 and extract it from the archive.

  2. Install Jupyter (pip install jupyter for example) and the Jupyter Kernel Gateway (pip install jupyter-kernel-gateway)

  3. Download and build Apache Toree

$ git clone https://github.com/apache/incubator-toree
$ cd incubator-toree
$ git checkout e8ecd0623c65ad104045b1797fb27f69b8dfc23f
$ make dist
$ make sbt-publishM2

This will create a dist directory containing dist/toree/bin/run.sh

  1. Download the EclairJS Server JAR file from Maven (http://repo2.maven.org/maven2/org/eclairjs/eclairjs-nashorn/${ECLAIRJS_VERSION}/eclairjs-nashorn-${ECLAIRJS_VERSION}-jar-with-dependencies.jar) (replace ${ECLAIRJS_VERSION} with the version you are using)

  2. Download kernel.json and replace the following:

  • /usr/local/share/jupyter/kernels/apache_toree_scala/bin/run.sh with the location of your installed Apache Toree (/usr/local/incubator-toree/dist/toree/bin/run.sh for example on OSX, see the Location in step 3)
  • "SPARK_HOME" should point to the extracted Apache Spark directory (spark-2.0.0-bin-hadoop2.7)
  • /opt/nashorn/lib/eclairjs.jar should point at the JAR file downloaded in step 4. If you run into memory issues (such as out of memory errors), you can up the memory limit in the kernel.json file by adding --driver-memory 8g to SPARK_OPT
  1. Figure out your Jupyter data directory by running:
$ jupyter --data
/Users/youruser/Library/Jupyter

Copy kernel.json to kernels/eclair/ in the directory you got above.

  1. Start Jupyter:
jupyter notebook --no-browser

If you get an error similar to '_xsrf' argument missing from POST, this means you are running a version of Jupyter that has Token Authentication enabled. Please read this guide on how to fix this.

  1. Test EclairJS Client

To make sure everything is working, create a simple EclairJS Client example.

Create a file called package.json:

{
  "name": "eclairjs-test",
  "version": "0.1.0",
  "dependencies": {
    "eclairjs": "*"
  }
}

And a file called test.js:

var eclairjs = require('eclairjs');

var spark = new eclairjs();

var sc = new spark.SparkContext("local[*]", "Simple Text");

var data = sc.parallelize([1,2,3,4,5,6,7,8,9,0]);

data.collect().then(function(val) {
  console.log("Success:", val);

  sc.stop().then(process.exit);
}).catch(function(err) {
  console.log("Error:", err);
  sc.stop().then(process.exit);
});

Install the dependencies:

$ npm install

Now we are ready to actually run the example:

$ node --harmony test.js
Starting WebSocket: ws://127.0.0.1:8888/api/kernels/436e67e6-2605-4085-9c5d-ba43d828a038
got kernel
Success: [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 ]

If you get an error similar to API request failed or Failed to connect to Jupyter instance, please follow this guide on how to fix this.

To run test suite:

$ npm run integration-test