Toree provides an interface that allows clients to interact with a Spark Cluster. Clients can send libraries and snippets of code that are interpreted and executed using a preconfigured Spark context. These snippets can do a variety of things:
- Define and run spark jobs of all kinds
- Collect results from spark and push them to the client
- Load necessary dependencies for the running code
- Start and monitor a stream
Apache Toree supports the
Scala programming language. It implements the latest Jupyter message protocol (5.0),
so it can easily plug into the latest releases of Jupyter/IPython (3.2.x+ and up) for quick, interactive data exploration.
This project uses
make as the entry point for build, test, and packaging. To perform a local build, you need to
jupyter/ipython, and other development requirements locally on your machine.
To build and interact with Toree using Jupyter, run
This will start a Jupyter notebook server. Depending on your mode, it will be accessible at
http://192.168.44.44:8888. From here you can create notebooks that use Toree configured for Spark local mode.
Tests can be run by doing
NOTE: Do not use
Build & Package
To build and package up Toree, run
This results in 2 packages.
./dist/toree-<VERSION>-binary-release.tar.gzis a simple package that contains JAR and executable
pipinstallable package that adds Toree as a Jupyter kernel.
make release uses
docker. Please refer to
docker installation instructions for your system.
To play with the example notebooks, run
A notebook server will be launched in a
Docker container with Toree and some other dependencies installed.
Refer to your
Docker setup for the ip address. The notebook will be at
This requires you to have a distribution of Apache Spark downloaded to the system where Apache Toree will run. The following commands will install Apache Toree.
pip install --upgrade toree jupyter toree install --spark_home=<YOUR_SPARK_PATH>
Dev snapshots of Toree are located at https://dist.apache.org/repos/dist/dev/incubator/toree. To install using one of those packages, you can use the following:
pip install <PIP_RELEASE_URL> jupyter toree install --spark_home=<YOUR_SPARK_PATH>
PIP_RELEASE_URL is one of the
pip packages. For example:
pip install https://dist.apache.org/repos/dist/dev/incubator/toree/0.2.0/snapshots/dev1/toree-pip/toree-0.2.0.dev1.tar.gz jupyter toree install --spark_home=<YOUR_SPARK_PATH>
Refer to and open issue here
We are working on publishing binary releases of Toree soon. As part of our move into Apache Incubator, Toree will start a new version sequence starting at
Our goal is to keep
master up to date with the latest version of Spark. When new versions of Spark require specific code changes to Toree, we will branch out older Spark version support.
As it stands, we maintain several branches for legacy versions of Spark. The table below shows what is available now.
|Branch||Apache Spark Version|
Please note that for the most part, new features will mainly be added to the
We are currently enhancing our documentation, which is available in our website.