Skip to content
This repository has been archived by the owner on Dec 20, 2022. It is now read-only.

SparkRDMA performance tips

Peter Rudenko edited this page Mar 30, 2018 · 2 revisions
  • Compression! Spark enables compression as the default runtime option. Using compression will result in smaller packet sizes to be sent between the nodes, but at the expense of having higher CPU utilization in order to compress the data. Due to the high performance and low CPU overhead network properties of an RDMA network, it is recommended to disable compression when using SparkRDMA. In your spark.conf file set:

    spark.shuffle.compress          false
    spark.shuffle.spill.compress    false
    

    By disabling compression, you will be able to reclaim precious CPU cycles that were previously used for data compression/decompression, and will also see additional performance benefits in the RDMA data transfer speeds.

  • Disk Media! In order to see the highest and most consistent performance results possible, it is recommended to use the highest performance disk media available. Using a ramdrive or NVMe device for the spark-tmp and hadoop tmp files should be explored whenever possible.