Skip to content
This repository has been archived by the owner. It is now read-only.
Browse files
Documentation: Spark-IO
Add Spark-IO documentation: building and configuration.

Signed-off-by: Jonas Pfefferle <>
  • Loading branch information
PepperJo committed Sep 6, 2018
1 parent f1dcb0d commit c3ed9d648a616c44d10e9534d570e61dfffded61
Showing 1 changed file with 40 additions and 0 deletions.
@@ -33,7 +33,47 @@ from the original data in HDFS.

Crail-Spark-IO contains various I/O accleration plugins for Spark tailored to
high-performance network and storage hardware (RDMA, NVMef, etc.).
Spark-IO is not provided with the default Crail deployment but can be
obtained `here <>`_.
Spark-IO currently contains two IO plugins: a shuffle engine and a broadcast module.
Both plugins inherit all the benefits of Crail such as very high performance
(throughput and latency) and multi-tiering (e.g., DRAM and flash).


* Spark >= 2.0
* Java 8
* Maven
* Crail >= 1.0


To build Crail execute the following steps:

1. Obtain a copy of Crail-Spark-IO from `Github <>`_
2. Make sure your local maven repository contains Crail, if not build Crail
from :ref:`source <Building from source>`
3. Run: :code:`mvn -DskipTests install`

Configure Spark
To configure the crail shuffle plugin add the following lines to spark-defaults.conf

.. code-block:: bash
spark.shuffle.manager org.apache.spark.shuffle.crail.CrailShuffleManager
spark.driver.extraClassPath $CRAIL_HOME/jars/*:<path>/crail-spark-X.Y.jar:.
spark.executor.extraClassPath $CRAIL_HOME/jars/*:<path>/crail-spark-X.Y.jar:.
Since Spark version 2.0.0, broadcast is no longer an exchangeable plugin, unfortunately.
To use the Crail broadcast plugin in Spark it has to be manually added to Spark's BroadcastManager.scala.


0 comments on commit c3ed9d6

Please sign in to comment.