Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catboost/libs/target/target_converter.cpp:64: Unknown class name: "0.6" #773

Closed
Palmik opened this issue Apr 8, 2019 · 21 comments
Closed

Comments

@Palmik
Copy link

Palmik commented Apr 8, 2019

Problem: The above exception is thrown for certain target values.
catboost version: 0.13.1
Operating System: Linux

How to reproduce:

import catboost as cb
import numpy as np

print(cb.__version__)

model = cb.CatBoostRegressor(
    iterations=1,
    depth=1,
    loss_function='RMSE',
    # If you change the eval metric to RMSE it works
    eval_metric='AUC:border={}'.format(0.5),
    train_dir='/tmp/cbtest2',
)

x = np.array([[1.5], [0.1]])
# If you change the following line to: y = np.array([0.6, 0.4]) it works
y = np.array([0.99, 0.4])
pool = cb.Pool(x, label=y)

x_valid = np.array([[0.33]])
y_valid = np.array([0.6])
pool_valid = cb.Pool(x_valid, label=y_valid)

model.fit(X=pool, eval_set=pool_valid, use_best_model=False)

Full output:

0.13.1

---------------------------------------------------------------------------
CatBoostError                             Traceback (most recent call last)
<ipython-input-81-d2333a747008> in <module>
     21 pool_valid = cb.Pool(x_valid, label=y_valid)
     22 
---> 23 model.fit(X=pool, eval_set=pool_valid, use_best_model=False)

~/.conda/envs/thehft-ml/lib/python3.7/site-packages/catboost/core.py in fit(self, X, y, cat_features, sample_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval)
   2699                          use_best_model, eval_set, verbose, logging_level, plot, column_description,
   2700                          verbose_eval, metric_period, silent, early_stopping_rounds,
-> 2701                          save_snapshot, snapshot_file, snapshot_interval)
   2702 
   2703     def predict(self, data, ntree_start=0, ntree_end=0, thread_count=-1, verbose=None):

~/.conda/envs/thehft-ml/lib/python3.7/site-packages/catboost/core.py in _fit(self, X, y, cat_features, pairs, sample_weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval)
   1171 
   1172         with log_fixup(), plot_wrapper(plot, self.get_params()):
-> 1173             self._train(train_pool, eval_sets, params, allow_clear_pool)
   1174 
   1175         if (not self._object._has_leaf_weights_in_model()) and allow_clear_pool:

~/.conda/envs/thehft-ml/lib/python3.7/site-packages/catboost/core.py in _train(self, train_pool, test_pool, params, allow_clear_pool)
    864 
    865     def _train(self, train_pool, test_pool, params, allow_clear_pool):
--> 866         self._object._train(train_pool, test_pool, params, allow_clear_pool)
    867         self._set_trained_model_attributes()
    868 

_catboost.pyx in _catboost._CatBoost._train()

_catboost.pyx in _catboost._CatBoost._train()

CatBoostError: catboost/libs/target/target_converter.cpp:64: Unknown class name: "0.6"
@annaveronika
Copy link
Contributor

This should be already fixed in code. You can try to build from source and run the code. If it's already fixed then it'll be on pypi in the next version tomorrow. But we'll check one more time and get back to you.

@annaveronika
Copy link
Contributor

This is fixed in the latest release 0.14

@eccodolf
Copy link

v0.15, error persists.

@annaveronika annaveronika reopened this May 29, 2019
@annaveronika
Copy link
Contributor

We cannot reproduce the error. It looks like you are still using the old version.

@eccodolf
Copy link

eccodolf commented May 29, 2019 via email

@annaveronika
Copy link
Contributor

annaveronika commented May 29, 2019

Could you please run print(catboost.__version__) just to make sure that you are right?

@annaveronika
Copy link
Contributor

And if it reproduces, please create a new issue with the code that you are running. The code above runs correctly in 0.15

@andrey-khropov
Copy link
Member

No, it's 0.15. Also tried 0.14 - same result. Appears when I pass validation pool with categorical columns in pandas dataset.

Can you provide a new minimal failing example? Original example in #773 (comment) works without problems in 0.15.

@eccodolf
Copy link

eccodolf commented May 29, 2019

Solved. eval_set contained labels that model have never seen.
My y has roughly 1500 categories and cleaning valuecounts for y =1 and stratifying split by y solved this problem.
Suggest throwing more detailed exception to prevent posting such errors.

@annaveronika
Copy link
Contributor

Yes, we'll update the error, thanks for the suggestion!

@JunpeiTakubo
Copy link

I came across this error at version 0.18.

@agcala
Copy link

agcala commented Sep 15, 2020

I came across this error at version 0.24. I used the class_names parameter to prevent it happening again.

@tobianointing
Copy link

I came across this error at version 0.24. I used the class_names parameter to prevent it happening again.

please how did you do this

@gitpickle
Copy link

Hi, am I understanding this issue correctly ... target category labels are being encountered in y that are not found in X, correct?

@eccodolf states "cleaning valuecounts for y =1 and stratifying split by y solved this problem". Does this mean that he removed the rows that contained labels not found in train, and if possible would someone post an example of how to achieve this?

I don't full understand what he is saying and would greatly appreciate a pointer in the right direction. THANKS. Mike

@gitpickle
Copy link

ah. I think I am seeing what eccodolf is referring to. https://stackoverflow.com/questions/34842405/parameter-stratify-from-method-train-test-split-scikit-learn.

It looks like we can split train/test in such a manner that we make sure all target y labels are found in both sets with a similar %.

Am I on a correct path?

@gitpickle
Copy link

the error I am experiencing is similar to the one in this post:

CatBoostError: c:/program files (x86)/go agent/pipelines/buildmaster/catboost.git/catboost/private/libs/target/target_converter.cpp:228: Unknown class label: "57"

@Evgueni-Petrov-aka-espetrov
Copy link
Contributor

ah. I think I am seeing what eccodolf is referring to. https://stackoverflow.com/questions/34842405/parameter-stratify-from-method-train-test-split-scikit-learn.

It looks like we can split train/test in such a manner that we make sure all target y labels are found in both sets with a similar %.

Am I on a correct path?

Highly likely.

@raffieeey
Copy link

A solution to solve this problem is to define the class_name you can do this using:

catb_model= CatBoostClassifier(iterations=1000,learning_rate=0.05, loss_function='MultiClass', class_names=["1","2","3","4","5","6","7","8","9","10","11"])

@Evgueni-Petrov-aka-espetrov
Copy link
Contributor

Many thanks @raffieeey !

@Ashebir07
Copy link

CatBoostError: catboost/cuda/cuda_lib/cuda_manager.cpp:201: Condition violated: `State == nullptr'

how can i solve this error guys please help me

@Ashebir07
Copy link

for i,( train_index, test_index) in enumerate(folds.split(X, y)):
X_train, X_test, y_train, y_test = X.iloc[train_index], X.iloc[test_index], y[train_index], y[test_index]

Instantiate model

model = CatBoostClassifier(n_estimators=20000, task_type='GPU')

model = CatBoostClassifier(max_depth=12, learning_rate=0.15, task_type = 'GPU',
grow_policy = 'Lossguide', n_estimators=1500)

Train model

model.fit(X_train, y_train,
eval_set=[(X_test, y_test)],
early_stopping_rounds=200,
verbose = 1000,
use_best_model = True)

** here is my code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests