Skip to content
a benchmark to test scalability of xgboost4j-spark and relevant projects
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
build
conf
dev
profiling
project
run
src/main/scala/me/codingcat/xgboost4j
.gitignore
.gitmodules build tools for benchmark (#7) Jan 6, 2018
README.md
build.sbt
scalastyle-config.xml

README.md

xgboost4j-spark-scalability

a benchmark to test scalability of xgboost4j-spark and relevant projects

Prerequestes

You have to ensure that maven (3.0+) and cmake is installed in your $PATH

Build Benchmark

  1. Edit build/build.sh and define variables like TARGET_URL, TARGET_BRANCH

  2. run build/build.sh

  3. You get the benchmark jar in target/

Run Benchmarks

  1. Generate Data:
spark-submit --master yarn-cluster --num-executors 10 --executor-memory 6g --executor-cores 8 \
    --class me.codingcat.xgboost4j.AirlineDataGenerator --files conf/airline_datagen.conf \
     target/scala-2.11/xgboost4j-spark-scalability-assembly-0.1-SNAPSHOT.jar ./airline_datagen.conf
  1. Run workload:
spark-submit --master yarn-cluster --num-executors 10 --executor-memory 6g --executor-cores 8 \
    --class me.codingcat.xgboost4j.AirlineClassifier --files conf/airline.conf \
     target/scala-2.11/xgboost4j-spark-scalability-assembly-0.1-SNAPSHOT.jar ./airline.conf
You can’t perform that action at this time.