Skip to content

Jian-Zhang/Spark-PMoF

 
 

Repository files navigation

Spark-PMoF: RPMem extension for Spark Shuffle

Spark-PMoF (Persistent Memory over Fabric), RPMem extension for Spark Shuffle, is a Spark Shuffle Plugin which enables persistent memory and high performance fabric technology like RDMA for Spark shuffle to improve Spark performance in shuffle intensive scneario.

Contents

Introduction

Installation

Make sure you got HPNL installed.

git clone https://github.com/Intel-bigdata/Spark-PMoF.git
cd Spark-PMoF; mvn package

Benchmark

Usage

This plugin current supports Spark 2.3 and works well on various Network fabrics, including Socket, RDMA and Omni-Path. Before runing Spark workload, add following contents in spark-defaults.conf, then have fun! :-)

spark.driver.extraClassPath Spark-PMoF-PATH/target/sso-0.1-jar-with-dependencies.jar
spark.executor.extraClassPath Spark-PMoF-PATH/target/sso-0.1-jar-with-dependencies.jar
spark.shuffle.manager org.apache.spark.shuffle.pmof.RdmaShuffleManager

Contact

Chendi Xue, chendi.xue@intel.com Jian Zhang, jian.zhang@intel.com

About

Spark Shuffle Optimization with RDMA+AEP

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 85.1%
  • Scala 10.0%
  • Java 2.9%
  • C 1.5%
  • Other 0.5%