-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catboost for Imbalanced Data Sets #223
Comments
Hi, sorry for the long answer, we'll update documentation and add scale_pos_weight parameter. |
For now you can use class_weights in the following way: set weight 1 for class 0 and weight scale_pos_weight for class 1. It will be equal to having scale_pos_weight parameter. |
the parameter is added in last release |
@carlosr29 We are currently working on improving quality for imbalance datasets with binary classification. |
@annaveronika is there a way to work with imbalanced datasets when solving regression problems? |
@annaveronika Is there a way to downscale predicted scores after using class_weights? (Since I have noticed that the model over-predicts on using class_weights and I need to use the point estimates for my problem) |
I'm still getting the error, I tried scale_pos_weight as well-
Could you once check the version, if that could be the problem? |
Please use CatBoostClassifier. |
it is written above Hi, sorry for the long answer, we'll update documentation and add scale_pos_weight parameter |
scale_pos_weigh sets weight for objects of the first class. This is equal to setting class_weights to [1, {scale_pos_weight value}]. To deal properly with inbalanced data, you can try experiment with either of these two parameters, or maybe do oversampling. |
Question about class_weights for Multi-class problems in the documentation for CB,here: https://catboost.ai/docs/concepts/python-reference_parameters-list.html I see that to pass class_weights, we use a list; the documentation shows the example of binary classification as class_weights=[0.1, 4] which works fine in case of binary classification. I know, I can pass a list of length equivalent to the #Classes but how does catboost assign these weights to the appropriate label in multi-class context? I calculated the class weights using the sklearn utils as follows,
and get a list for e.g. [0.5, 4.5, 7.5, 3.4] How do I address this? would it be an option to allow class_weights to accept a dictionary? |
@sskarkhanis Could you please create a separate issue about this? |
Is there a parameter like "scale_pos_weight" in catboost package as there is in the xgboost package in python in order to handle imbalanced classes?
I know there is a parameter called "class_weights", but in the official Documentation (https://tech.yandex.com/catboost/doc/dg/concepts/python-reference_parameters-list-docpage/#python-reference_parameters-list it's not well explained if it helpts for the imbalanced problem, and how to set it.
Thanks in advance.
The text was updated successfully, but these errors were encountered: