-
Notifications
You must be signed in to change notification settings - Fork 28
Benchmark
Kazuaki Ishizaki edited this page Feb 3, 2016
·
10 revisions
To show the efficiency of our column-based RDD, we measure performance with/without GPU by running a simple logistic regression program that uses map()
and reduce()
.
We achieved 3.15x performance improvement of logistic regression (SparkGPULR) in examples on a 16-thread IvyBridge box with an NVIDIA K40 GPU card over that with no GPU card. We still have rooms to improve performance (e.g. eliminate data copy between map()
and reduce()
)
Spark code for non-GPU version
Spark code for GPU version, CUDA code
N=1000000
D=400
ITERATIONS=5
Slices=128 (w/o GPU), 16 (with GPU)
MASTER=local[8] (w/o GPU), local[8] (with GPU)
Machine: nx360 M4, 2 sockets 8-core Intel Xeon E5-2667 3.3GHz, 256GB memory, with one NVIDIA K40m card
OS: RedHat 6.6
CUDA: 7.0
Java: IBM Java8 pxa6480sr2-20151023_01(SR2)
Spark version: https://github.com/kiszk/spark-gpu/commit/34e9b75c0cab297ed7feb8aef7072164b6a5972c
spark-env.sh
JAVA_HOME=/u/ishizaki/ibm-java-x86_64-802
CUDA_DEVICE_MAX_CONNECTION=32
CUDA_VISIBLE_DEVICES=0
spark-default.conf
spark.driver.extraJavaOptions -Xmn96g -Xgcthreads8 -Xdump:system:none -Xdump:heap:none -Xtrace:none -Xnoloa -Xdisableexplicitgc
spark.eventLog.enabled true
spark.eventLog.dir file:///tmp/eventlog-ishizaki
spark.history.fs.logDirectory file:///tmp/eventlog-ishizaki
spark.driver.cores 16
spark.driver.memory 144g
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.max 512m
spark.akka.frameSize 1024
spark.history.ui.port 18080
non-GPU version
$ MASTER='local[8]' bin/run-example SparkLR 128 1000000 400 5
GPU version
$ MASTER='local[8]' bin/run-example SparkGPULR 16 1000000 400 5