Add ml xgboost workload #638

bobjiang82 · 2020-07-30T15:41:53Z

Reuse GradientBoostedTreeDataGenerator to generate dataset
Read dataset and convert to ml.LabeledPoint and to ml.DataFrame
Call XGBoost and passed in params for training
Call XGBoost prediction and print test error
Add XGBoost libs configuration doc
Use pipeline for training
Verified with Scala 2.12, Apache Spark 2.4, and XGBoost v1.1.

Note: based on Xiaochang's PR #628.

…/HiBench into master

xwu99 · 2020-07-31T06:32:49Z

@bobjiang82 #628 is merged. could you rebase the code to resolve the conflict?

bobjiang82 · 2020-08-03T13:17:25Z

@xwu99 Done.

conf/hibench.conf

xwu99 · 2020-08-05T09:46:02Z

docs/run-sparkbench.md

+
+
+### 8. Run xgboost workload ###
+


Could you change xgboost to XGBoost and following the same?

xwu99 · 2020-08-05T09:53:05Z

docs/run-sparkbench.md

+```
+
+#### 8.a latest xgboost release (default) ####
+


don't need to use 8.a, 8.b., need to use correct captial cases for titles.

I think you don't need to write this since it's already written in the above section 4. Run a workload
I suggest you seperate the doc out and only merge code and make sure it's runnable with default HiBench process.

xwu99 · 2020-08-05T09:57:53Z

docs/run-sparkbench.md

+If you only have the xgboost jar files, just copy them to $SPARK_HOME/jars/ and update the relevant versions for xgboost4j and xgboost4j-spark in sparkbench/ml/pom.xml to get aligned.<br>
+For example, if xgboost is built from source on a Linux platform, the jars will be generated and installed to ```~/.m2/repository/ml/dmlc/xgboost4j_<scala version>/<xgboost version>-SNAPSHOT/``` and ```~/.m2/repository/ml/dmlc/xgboost4j-spark_<scala version>/<xgboost version>-SNAPSHOT/``` respectively. To use them, copy the 2 jars to $SPARK_HOME/jars/ and update the relevant versions for xgboost4j and xgboost4j-spark in the pom.xml files.<br>
+After that, build hibench, prepare data and run xgboost benchmark.
+


Generally, the doc style is not consistent as the original doc. and too complicated to follow.
I suggest rewrite or remove. We can merge code first. It should be runnable with default setting.

xwu99 · 2020-08-05T10:02:31Z

docs/run-sparkbench.md

+```
+
+#### 8.a latest xgboost release (default) ####
+


I think you don't need to write this since it's already written in the above section 4. Run a workload
I suggest you seperate the doc out and only merge code and make sure it's runnable with default HiBench process.

commit code first and continue to refine doc.

bobjiang82 · 2020-08-07T02:51:56Z

Updated to merge the code first and continue to refine the doc.

xwu99 · 2020-08-10T01:04:40Z

Updated to merge the code first and continue to refine the doc.

Thanks! could you add this to CI

Updated to merge the code first and continue to refine the doc.

Thanks, could you add this to
benchmark list: conf/benchmarks.lst
and
CI: travis/benchmarks_ml.lst

bobjiang82 · 2020-08-14T10:08:55Z

Added xgboost to conf/benchmarks.lst and travis/benchmarks_ml.lst

xwu99 · 2020-08-17T01:37:34Z

@bobjiang82 could you modify bin/run_all.sh to mask out hadoop since this is for spark only.

sync the forked repo with HiBench base

xwu99 and others added 8 commits July 26, 2020 19:34

Add XGBoost

45b71a9

Clean up code

b81f847

Add num_workers and nthread params and check spark.task.cpus

6210f92

modify README

85851a6

Update README.md

93b663b

Merge commit 'refs/pull/628/head' of https://github.com/Intel-bigdata…

0e923f0

…/HiBench into master

update xgboost configuration to use v1.1.0-scala2.12 by default

31c3b59

refine XGBoost.scala, use pipeline

5791a0f

Merge branch 'master' into addxgboost

205a4e2

xwu99 suggested changes Aug 5, 2020

View reviewed changes

conf/hibench.conf Outdated Show resolved Hide resolved

xwu99 reviewed Aug 5, 2020

View reviewed changes

docs/run-sparkbench.md Outdated

### 8. Run xgboost workload ###

Copy link

Contributor

xwu99 Aug 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change xgboost to XGBoost and following the same?

xwu99 reviewed Aug 5, 2020

View reviewed changes

xwu99 suggested changes Aug 5, 2020

View reviewed changes

bobjiang82 added 2 commits August 7, 2020 09:28

Update hibench.conf

38fc921

Update run-sparkbench.md

a213d75

commit code first and continue to refine doc.

bobjiang82 added 2 commits August 14, 2020 18:06

Update benchmarks.lst

b92c8cc

Update benchmarks_ml.lst

7902f59

Merge pull request #1 from Intel-bigdata/master

b928bb9

sync the forked repo with HiBench base

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ml xgboost workload #638

Add ml xgboost workload #638

bobjiang82 commented Jul 30, 2020

xwu99 commented Jul 31, 2020

bobjiang82 commented Aug 3, 2020

xwu99 Aug 5, 2020

xwu99 Aug 5, 2020

xwu99 Aug 5, 2020 •

edited

xwu99 Aug 5, 2020

xwu99 Aug 5, 2020 •

edited

bobjiang82 commented Aug 7, 2020

xwu99 commented Aug 10, 2020

bobjiang82 commented Aug 14, 2020

xwu99 commented Aug 17, 2020

Add ml xgboost workload #638

Are you sure you want to change the base?

Add ml xgboost workload #638

Conversation

bobjiang82 commented Jul 30, 2020

xwu99 commented Jul 31, 2020

bobjiang82 commented Aug 3, 2020

xwu99 Aug 5, 2020

Choose a reason for hiding this comment

xwu99 Aug 5, 2020

Choose a reason for hiding this comment

xwu99 Aug 5, 2020 • edited

Choose a reason for hiding this comment

xwu99 Aug 5, 2020

Choose a reason for hiding this comment

xwu99 Aug 5, 2020 • edited

Choose a reason for hiding this comment

bobjiang82 commented Aug 7, 2020

xwu99 commented Aug 10, 2020

bobjiang82 commented Aug 14, 2020

xwu99 commented Aug 17, 2020

xwu99 Aug 5, 2020 •

edited

xwu99 Aug 5, 2020 •

edited