Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
..
Failed to load latest commit information.
bin
docs/jsdoc
eclairjs
examples
src
tools
README.md
jsdoc_conf.json
kernel.json
pom.xml
setup.py

README.md

EclairJS Server

The EclairJS Server API exposes the Spark programming model to JavaScript. EclairJS Server is built on top of Spark's Java API.

Build from source

Prerequisites

Build Toree

git clone https://github.com/apache/incubator-toree
cd incubator-toree
git checkout e8ecd0623c65ad104045b1797fb27f69b8dfc23f
make dist
make sbt-publishM2

Please note, the last step of publishing to your local maven repository may produce an error, however it can be ignored.

Build EclairJS Jar

git clone https://github.com/eclairjs/eclairjs
cd server
mvn package
export SPARK_HOME=<location of Spark binary distribution>

Usage

bin/eclairjs.sh examples/word_count.js

or

bin/eclairjs.sh
eclairJS>var list = sc.parallelize([1,10,20,30,40]);
list.count();

Examples

    var SparkConf = require('eclairjs/SparkConf');
    var SparkContext = require('eclairjs/SparkContext');
    
    var file = "src/test/resources/dream.txt"; // Should be some file on your system
    var conf = new SparkConf().setAppName("JavaScript word count")
                          .setMaster("local[*]");
    var sparkContext = new SparkContext(conf);
    var rdd = sparkContext.textFile(file).cache();
    var rdd2 = rdd.flatMap(function(sentence) {
        return sentence.split(" ");
    });
    var rdd3 = rdd2.filter(function(word) {
        return word.trim().length > 0;
    });
    var rdd4 = rdd3.mapToPair(function(word) {
        return [word, 1];
    });
    var rdd5 = rdd4.reduceByKey(function(a, b) {
        return a + b;
    });
    var rdd6 = rdd5.mapToPair(function(tuple) {
        return [tuple[1]+0.0, tuple[0]];
    })
    var rdd7 = rdd6.sortByKey(false);
    print("top 10 words = " + rdd7.take(10));

Usage with Jupyter notebooks

Prerequisites

  1. Edit kernel.json and update the following:
<path to incubator-toree clone>/dist/toree/bin/run.sh
"SPARK_OPTS": --jars file:<path to nashorn jar>
"SPARK_HOME": <path to spark 2.0.0 distribution>
  1. Copy kernel.json to jupyter's data directory jupyter --data-dir

  2. Create a directory for your notebook mkdir ~/jsNotebook

  3. Change to that directory cd ~/jsNotebook

  4. Start jupyter jupyter notebook

  5. A browser will open http://localhost:8888/tree select the new->Spark 2.0.0 (EclairJS)

  6. Enter the following code in notebook cell and run

var SparkContext = require('eclairjs/SparkContext');
var sc = new SparkContext("local[*]", "myapp");
var rdd = sc.parallelize([10, 4, 2, 12, 3]);
eval("count = " + rdd.count());

Versions

It should be noted that the master branch is used for development and although every effort is made to keep it stable it could be in a slight state of flux depending on what is going on.