Catboost for Imbalanced Data Sets #223

carlosr29 · 2018-01-13T06:50:53Z

Is there a parameter like "scale_pos_weight" in catboost package as there is in the xgboost package in python in order to handle imbalanced classes?

I know there is a parameter called "class_weights", but in the official Documentation (https://tech.yandex.com/catboost/doc/dg/concepts/python-reference_parameters-list-docpage/#python-reference_parameters-list it's not well explained if it helpts for the imbalanced problem, and how to set it.

Thanks in advance.

annaveronika · 2018-01-31T12:23:51Z

Hi, sorry for the long answer, we'll update documentation and add scale_pos_weight parameter.

annaveronika · 2018-01-31T14:34:32Z

For now you can use class_weights in the following way: set weight 1 for class 0 and weight scale_pos_weight for class 1. It will be equal to having scale_pos_weight parameter.

annaveronika · 2018-02-01T15:09:55Z

the parameter is added in last release

annaveronika · 2018-05-16T13:52:56Z

@carlosr29 We are currently working on improving quality for imbalance datasets with binary classification.
If you could share your dataset with us, it would be very helpful.

elfwired · 2018-05-28T11:39:07Z

@annaveronika is there a way to work with imbalanced datasets when solving regression problems?

abhi070493 · 2018-08-14T09:23:47Z

@annaveronika Is there a way to downscale predicted scores after using class_weights? (Since I have noticed that the model over-predicts on using class_weights and I need to use the point estimates for my problem)

The-Gupta · 2018-11-17T19:09:28Z

I'm still getting the error, I tried scale_pos_weight as well-

classifier = CatBoostRegressor(class_weights = [0.8, 0.2])
Traceback (most recent call last):

  File "<ipython-input-49-9098f24b0f97>", line 1, in <module>
    classifier = CatBoostRegressor(class_weights = [0.8, 0.2])

TypeError: __init__() got an unexpected keyword argument 'class_weights'

Could you once check the version, if that could be the problem?

Evgueni-Petrov-aka-espetrov · 2018-11-19T06:43:41Z

Please use CatBoostClassifier.
Parameter class_weights is meaningful only for Logloss, MultiClass, and MultiClassOneVsAll loss functions, while CatBoostRegressor implies that loss function is RMSE, or MAE, or Quantile, or LogLinQuantile, Poisson, or MAPE.

Sandy4321 · 2019-04-07T22:35:27Z

it is written above Hi, sorry for the long answer, we'll update documentation and add scale_pos_weight parameter
but in the documentation I see scale_pos_weight (alias for: class_weights) https://catboost.ai/docs/concepts/python-reference_parameters-list.html so what you added only synonym?
and still it is not clear how to use it : https://stackoverflow.com/questions/54437646/catboost-precision-imbalanced-classes do have a clear code example,
for example from this video data is inbalanced but model trained as for balanced data https://www.youtube.com/watch?v=xl1fwCza9C8&t=44s code https://github.com/catboost/tutorials/blob/master/events/pydata_moscow_oct_13_2018.ipynb

annaveronika · 2019-04-08T21:34:28Z

scale_pos_weigh sets weight for objects of the first class. This is equal to setting class_weights to [1, {scale_pos_weight value}]. To deal properly with inbalanced data, you can try experiment with either of these two parameters, or maybe do oversampling.

sskarkhanis · 2019-10-16T09:12:31Z

Question about class_weights for Multi-class problems

in the documentation for CB,here: https://catboost.ai/docs/concepts/python-reference_parameters-list.html

I see that to pass class_weights, we use a list; the documentation shows the example of binary classification as class_weights=[0.1, 4] which works fine in case of binary classification.

I know, I can pass a list of length equivalent to the #Classes but how does catboost assign these weights to the appropriate label in multi-class context?

I calculated the class weights using the sklearn utils as follows,

from sklearn.utils import class_weight
cw = list(class_weight.compute_class_weight('balanced',
                                             np.unique(df_train['Target']),
                                             df_train['Target']))

and get a list for e.g. [0.5, 4.5, 7.5, 3.4]
if I pass the list as-is, my model performance is worse compared to without the class_weights option.

How do I address this? would it be an option to allow class_weights to accept a dictionary?
e.g. class_weights = { "class_A": 3.5 , "class_B": 4.5, "class_C": 0.5 }

annaveronika · 2019-10-16T09:35:46Z

@sskarkhanis Could you please create a separate issue about this?

annaveronika added the in progress label Jan 31, 2018

annaveronika closed this as completed Feb 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catboost for Imbalanced Data Sets #223

Catboost for Imbalanced Data Sets #223

carlosr29 commented Jan 13, 2018 •

edited

annaveronika commented Jan 31, 2018

annaveronika commented Jan 31, 2018

annaveronika commented Feb 1, 2018

annaveronika commented May 16, 2018

elfwired commented May 28, 2018

abhi070493 commented Aug 14, 2018

The-Gupta commented Nov 17, 2018

Evgueni-Petrov-aka-espetrov commented Nov 19, 2018

Sandy4321 commented Apr 7, 2019

annaveronika commented Apr 8, 2019

sskarkhanis commented Oct 16, 2019 •

edited

annaveronika commented Oct 16, 2019

Catboost for Imbalanced Data Sets #223

Catboost for Imbalanced Data Sets #223

Comments

carlosr29 commented Jan 13, 2018 • edited

annaveronika commented Jan 31, 2018

annaveronika commented Jan 31, 2018

annaveronika commented Feb 1, 2018

annaveronika commented May 16, 2018

elfwired commented May 28, 2018

abhi070493 commented Aug 14, 2018

The-Gupta commented Nov 17, 2018

Evgueni-Petrov-aka-espetrov commented Nov 19, 2018

Sandy4321 commented Apr 7, 2019

annaveronika commented Apr 8, 2019

sskarkhanis commented Oct 16, 2019 • edited

annaveronika commented Oct 16, 2019

carlosr29 commented Jan 13, 2018 •

edited

sskarkhanis commented Oct 16, 2019 •

edited