Skip to content
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Java Scala
Branch: master
Clone or download
Latest commit a0e8d3d Dec 11, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src/main Fix mmap error on systems with page size different from 4k Dec 10, 2018
.gitignore Initial commit Aug 3, 2017
AUTHORS Initial commit Aug 3, 2017
LICENSE Initial commit Aug 3, 2017
README.md Spark-2.4 support Nov 26, 2018
checkstyle.xml Add some new java/scala style rules Jul 13, 2018
pom.xml Spark-2.4 support Nov 26, 2018
scalastyle_config.xml Add some new java/scala style rules Jul 13, 2018

README.md

SparkRDMA ShuffleManager Plugin

SparkRDMA is a high performance ShuffleManager plugin for Apache Spark that uses RDMA (instead of TCP) when performing Shuffle data transfers in Spark jobs.

This open-source project is developed, maintained and supported by Mellanox Technologies.

Performance results

Terasort

TeraSort results

Running 320GB TeraSort workload with SparkRDMA is x2.63 faster than standard Spark (runtime in seconds)

Test environment:

7 Spark standalone workers on Azure "h16mr" VM instance, Intel Haswell E5-2667 V3,

224GB RAM, 2000GB SSD for temporary storage, Mellanox InfiniBand FDR (56Gb/s)

Also featured at the Spark+AI Summit 2018, please see more info on our session: https://databricks.com/session/accelerated-spark-on-azure-seamless-and-scalable-hardware-offloads-in-the-cloud

Pagerank

PageRank results

Running 19GB Pagerank with SparkRDMA is x2.01 faster than standard Spark (runtime in seconds)

Test environment:

5 Spark standalone workers, 2x Intel Xeon E5-2697 v3 @ 2.60GHz, 25 cores per Worker, 150GB RAM, non-flash storage (HDD)

Mellanox ConnectX-5 network adapter with 100GbE RoCE fabric, connected with a Mellanox Spectrum switch

Wiki pages

For more information on configuration, performance tuning and troubleshooting, please visit the SparkRDMA GitHub Wiki

Runtime requirements

  • Apache Spark 2.0.0/2.1.0/2.2.0/2.3.0/2.4.0
  • Java 8
  • An RDMA-supported network, e.g. RoCE or Infiniband

Installation

Obtain SparkRDMA and DiSNI binaries

Please use the "Releases" page to download pre-built binaries.
If you would like to build the project yourself, please refer to the "Build" section below.

The pre-built binaries are packed as an archive that contains the following files:

  • spark-rdma-3.1-for-spark-2.0.0-jar-with-dependencies.jar
  • spark-rdma-3.1-for-spark-2.1.0-jar-with-dependencies.jar
  • spark-rdma-3.1-for-spark-2.2.0-jar-with-dependencies.jar
  • spark-rdma-3.1-for-spark-2.3.0-jar-with-dependencies.jar
  • spark-rdma-3.1-for-spark-2.4.0-jar-with-dependencies.jar
  • libdisni.so

libdisni.so must be in java.library.path on every Spark Master and Worker (usually in /usr/lib)

Configuration

Provide Spark the location of the SparkRDMA plugin jars by using the extraClassPath option. For standalone mode this can be added to either spark-defaults.conf or any runtime configuration file. For client mode this must be added to spark-defaults.conf. For Spark 2.0.0 (Replace with 2.1.0, 2.2.0, 2.3.0, 2.4.0 according to your Spark version):

spark.driver.extraClassPath   /path/to/SparkRDMA/target/spark-rdma-3.1-for-spark-2.0.0-jar-with-dependencies.jar
spark.executor.extraClassPath /path/to/SparkRDMA/target/spark-rdma-3.1-for-spark-2.0.0-jar-with-dependencies.jar

Running

To enable the SparkRDMA Shuffle Manager plugin, add the following line to either spark-defaults.conf or any runtime configuration file:

spark.shuffle.manager   org.apache.spark.shuffle.rdma.RdmaShuffleManager

Build

Building the SparkRDMA plugin requires Apache Maven and Java 8

  1. Obtain a clone of SparkRDMA

  2. Build the plugin for your Spark version (either 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0), e.g. for Spark 2.0.0:

mvn -DskipTests clean package -Pspark-2.0.0
  1. Obtain a clone of DiSNI for building libdisni:
git clone https://github.com/zrlio/disni.git
cd disni
git checkout tags/v1.7 -b v1.7
  1. Compile and install only libdisni (the jars are already included in the SparkRDMA plugin):
cd libdisni
autoprepare.sh
./configure --with-jdk=/path/to/java8/jdk
make
make install

Community discussions and support

For any questions, issues or suggestions, please use our Google group: https://groups.google.com/forum/#!forum/sparkrdma

Contributions

Any PR submissions are welcome

You can’t perform that action at this time.