You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/tutorials/spark_estimator.rst
+12-11Lines changed: 12 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,13 +35,13 @@ We can create a ``SparkXGBRegressor`` estimator like:
35
35
)
36
36
37
37
38
-
The above snippet creates a spark estimator which can fit on a spark dataset,
39
-
and return a spark model that can transform a spark dataset and generate dataset
40
-
with prediction column. We can set almost all of xgboost sklearn estimator parameters
41
-
as ``SparkXGBRegressor`` parameters, but some parameter such as ``nthread`` is forbidden
42
-
in spark estimator, and some parameters are replaced with pyspark specific parameters
43
-
such as ``weight_col``, ``validation_indicator_col``, ``use_gpu``, for details please see
44
-
``SparkXGBRegressor`` doc.
38
+
The above snippet creates a spark estimator which can fit on a spark dataset, and return a
39
+
spark model that can transform a spark dataset and generate dataset with prediction
40
+
column. We can set almost all of xgboost sklearn estimator parameters as
41
+
``SparkXGBRegressor`` parameters, but some parameter such as ``nthread`` is forbidden in
42
+
spark estimator, and some parameters are replaced with pyspark specific parameters such as
43
+
``weight_col``, ``validation_indicator_col``, for details please see ``SparkXGBRegressor``
44
+
doc.
45
45
46
46
The following code snippet shows how to train a spark xgboost regressor model,
47
47
first we need to prepare a training dataset as a spark dataframe contains
@@ -88,7 +88,7 @@ XGBoost PySpark fully supports GPU acceleration. Users are not only able to enab
88
88
efficient training but also utilize their GPUs for the whole PySpark pipeline including
89
89
ETL and inference. In below sections, we will walk through an example of training on a
90
90
PySpark standalone GPU cluster. To get started, first we need to install some additional
91
-
packages, then we can set the ``use_gpu`` parameter to ``True``.
91
+
packages, then we can set the ``device`` parameter to ``cuda`` or ``gpu``.
92
92
93
93
Prepare the necessary packages
94
94
==============================
@@ -128,7 +128,7 @@ Write your PySpark application
128
128
==============================
129
129
130
130
Below snippet is a small example for training xgboost model with PySpark. Notice that we are
131
-
using a list of feature names and the additional parameter ``use_gpu``:
131
+
using a list of feature names and the additional parameter ``device``:
132
132
133
133
.. code-block:: python
134
134
@@ -148,12 +148,12 @@ using a list of feature names and the additional parameter ``use_gpu``:
148
148
# get a list with feature column names
149
149
feature_names = [x.name for x in train_df.schema if x.name != label_name]
150
150
151
-
# create a xgboost pyspark regressor estimator and set use_gpu=True
151
+
# create a xgboost pyspark regressor estimator and set device="cuda"
152
152
regressor = SparkXGBRegressor(
153
153
features_col=feature_names,
154
154
label_col=label_name,
155
155
num_workers=2,
156
-
use_gpu=True,
156
+
device="cuda",
157
157
)
158
158
159
159
# train and return the model
@@ -163,6 +163,7 @@ using a list of feature names and the additional parameter ``use_gpu``:
163
163
predict_df = model.transform(test_df)
164
164
predict_df.show()
165
165
166
+
Like other distributed interfaces, the ```device`` parameter doesn't support specifying ordinal as GPUs are managed by Spark instead of XGBoost (good: ``device=cuda``, bad: ``device=cuda:0``).
0 commit comments