Skip to content

Latest commit

 

History

History
36 lines (24 loc) · 1.62 KB

File metadata and controls

36 lines (24 loc) · 1.62 KB

Prepare packages and dataset for pyspark

For simplicity export the location to these jars. All examples assume the packages and dataset will be placed in the /opt/xgboost directory:

Download the jars

  1. Download the XGBoost for Apache Spark jars

  2. Download the RAPIDS Accelerator for Apache Spark plugin jar

    Then download the version of the cudf jar that your version of the accelerator depends on.

Build XGBoost Python Examples

Following this guide, you can get samples.zip and main.py and copy them to /opt/xgboost

Download dataset

You need to download mortgage dataset to /opt/xgboost from this site

Setup environments

export SPARK_XGBOOST_DIR=/opt/xgboost
export CUDF_JAR=${SPARK_XGBOOST_DIR}/cudf-21.10.0-cuda11.jar
export RAPIDS_JAR=${SPARK_XGBOOST_DIR}/rapids-4-spark_2.12-21.10.0.jar
export XGBOOST4J_JAR=${SPARK_XGBOOST_DIR}/xgboost4j_3.0-1.4.2-0.2.0.jar
export XGBOOST4J_SPARK_JAR=${SPARK_XGBOOST_DIR}/xgboost4j-spark_3.0-1.4.2-0.2.0.jar
export SAMPLE_ZIP=${SPARK_XGBOOST_DIR}/samples.zip
export MAIN_PY=${SPARK_XGBOOST_DIR}/main.py