Skip to content

Setting up a Spark 2.0 notebook with MLeap and Toree

Mikhail Semeniuk edited this page Jan 23, 2017 · 7 revisions

Setting up a Spark 2.0 notebook with MLeap an Toree

Install Jupyter and Toree

We are going to assume you already have the following installed:

  1. Python 2.x
  2. Docker (required to install Toree)

Install Jupyter

virtualenv venv

source ./venv/bin/activate

pip install jupyter

Build and install Toree

Clone master into your working directory from Toree's github repo.

For this next step, you'll need to make sure that docker is running.

$ cd incubator-toree

$ make release

$ cd dist/toree-pip

$ pip install .

SPARK_HOME=<path to spark> jupyter toree install

Launch Notebook and Include MLeap

The most error-proof way to add mleap to your project is to modify the kernel directly (or create a new one for Toree and Spark 2.0).

Kernel config files are typically located in /usr/local/share/jupyter/kernels/apache_toree_scala/kernel.json

Go ahead and add or modify __TOREE_SPARK_OPTS_ like so:

"__TOREE_SPARK_OPTS__": "--packages com.databricks:spark-avro_2.11:3.0.1,ml.combust.mleap:mleap-spark_2.11:0.5.0", 

An alternative way is to use AddDeps Magics, but we've run into dependency collisions, so do so at your own risk:

%AddDeps ml.combust.mleap mleap-spark_2.11 0.5.0 --transitive