You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Decision trees lend themselves naturally to categorical variables. It would be nice if xgboost could handle categorical variables inherently. It could read the feature-map or something to identify categorical variables and try to handle them in a better way.
This is important because sometimes it is difficult to encode these categorical variables into numerical values. For large data, One-Hot-Encoding gives waay too many buckets making the data blow up in size. And ordinal encoding assumes a implicit order in the data which is not obvious - for example: airport code.
Many other types of encoding do exist, but why do a round-about way if decision trees already lend themselves to categorical variables more naturally than numerical variables !?
The text was updated successfully, but these errors were encountered:
The politic of XGBoost is to not have a special support for categorical variables. It s up to you to manage them before providing the features to the algo.
This is a feature request.
Decision trees lend themselves naturally to categorical variables. It would be nice if xgboost could handle categorical variables inherently. It could read the feature-map or something to identify categorical variables and try to handle them in a better way.
This is important because sometimes it is difficult to encode these categorical variables into numerical values. For large data, One-Hot-Encoding gives waay too many buckets making the data blow up in size. And ordinal encoding assumes a implicit order in the data which is not obvious - for example: airport code.
Many other types of encoding do exist, but why do a round-about way if decision trees already lend themselves to categorical variables more naturally than numerical variables !?
The text was updated successfully, but these errors were encountered: