You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The XGBoost JVM package is the Java language binding for the XGBoost library. It is supposed to be a lightweight, thin wrapper around XGBoost. However, the current XGBoost JVM implementation is quite heavy-weight. For example, it groups the dataset using RDD for ranking, implements ranking within the XGBoostRegressor, samples the dataset for training and testing, and zips the training and evaluation datasets, duplicates some code usage. All of these additional features make the XGBoost JVM codebase difficult to read and maintain. Additionally, it is missing support for the latest XGBoost parameters and does not properly handle dense/sparse data usage.
Goal
Create a new XGBoostRanker for ranking problem.
Code reusing.
Support DART booster
Remove "grouping" for ranking problem.
Remove the trainTestRatio and its implementation
Remove "zip" train and eval dataset (add a new Boolean validation column to indicate if the instance is for training or for evaluating)
Catch up the latest parameters.
Add setter/getter for all parameters.
Support XGBoost style parameters when defining the parameters, (Like final val baseScore = new DoubleParam(this, "base_score", "The initial )
Support dense when the input is vector type
Support sparse when the input is vector type
Support array input for both CPU and GPU
Support columnar input for CPU ???
Remove the way linking the xgboost4j/xgboost4j-spark to GPU
Use the existing fasterxml.jackson to handle the json.
Avoid repartition if the number of input partittions is equal to num_workers.
Reason
The XGBoost JVM package is the Java language binding for the XGBoost library. It is supposed to be a lightweight, thin wrapper around XGBoost. However, the current XGBoost JVM implementation is quite heavy-weight. For example, it groups the dataset using RDD for ranking, implements ranking within the XGBoostRegressor, samples the dataset for training and testing, and zips the training and evaluation datasets, duplicates some code usage. All of these additional features make the XGBoost JVM codebase difficult to read and maintain. Additionally, it is missing support for the latest XGBoost parameters and does not properly handle dense/sparse data usage.
Goal
final val baseScore = new DoubleParam(this, "base_score", "The initial
)The text was updated successfully, but these errors were encountered: