Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: TypeError: a float is required #958

Closed
lrq3000 opened this issue Aug 16, 2019 · 5 comments
Closed

Exception: TypeError: a float is required #958

lrq3000 opened this issue Aug 16, 2019 · 5 comments

Comments

@lrq3000
Copy link

lrq3000 commented Aug 16, 2019

catboost version: 0.16.4
Operating System: Windows 10 64-bits
CPU: Intel Core i7 8th Gen
Python 2.7
Problem:

When I try to fit a simple CatBoostClassifier on my DataFrame containing categorical variables, I get the very cryptic error below.

Here is what I did to try to troubleshoot down the issue:

  • tested a clean notebook on a tutorial (https://github.com/catboost/tutorials/blob/master/python_tutorial.ipynb) --> OK
  • fit CatBoostClassifier only on my non-categorical variables --> OK
  • try to encode with a LabelEncoder my categorical variables and then fit CatBoostClassifier --> OK
  • try to use any single categorical variable without encoding like in the titanic tutorial above --> KO

I can't understand what's happening here and have no idea on how to debug this further. If anyone has any idea, please let me know, I can provide more info.

Here is the code failing (it's not a minimal example, I'm not sure how I can reproduce this):

import catboost
model = catboost.CatBoostClassifier(custom_loss=['Accuracy'], random_seed=0, eval_metric='AUC', logging_level='Silent')
model.fit(X_train.loc[:, ['categoricalvariable']], y_train_enc,
    cat_features=[0],
    plot=True
)

And the error traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-100-7e62575dc820> in <module>()
----> 5     plot=True
      6 )

C:\Users\AI\Anaconda2\lib\site-packages\catboost\core.pyc in fit(self, X, y, cat_features, sample_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model)
   3455         self._fit(X, y, cat_features, None, sample_weight, None, None, None, None, baseline, use_best_model,
   3456                   eval_set, verbose, logging_level, plot, column_description, verbose_eval, metric_period,
-> 3457                   silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model)
   3458         return self
   3459 

C:\Users\AI\Anaconda2\lib\site-packages\catboost\core.pyc in _fit(self, X, y, cat_features, pairs, sample_weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model)
   1386             use_best_model, eval_set, verbose, logging_level, plot,
   1387             column_description, verbose_eval, metric_period, silent, early_stopping_rounds,
-> 1388             save_snapshot, snapshot_file, snapshot_interval, init_model
   1389         )
   1390         params = train_params["params"]

C:\Users\AI\Anaconda2\lib\site-packages\catboost\core.pyc in _prepare_train_params(self, X, y, cat_features, pairs, sample_weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, use_best_model, eval_set, verbose, logging_level, plot, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model)
   1281             del params['cat_features']
   1282 
-> 1283         train_pool = _build_train_pool(X, y, cat_features, pairs, sample_weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, column_description)
   1284         if train_pool.is_empty_:
   1285             raise CatBoostError("X is empty.")

C:\Users\AI\Anaconda2\lib\site-packages\catboost\core.pyc in _build_train_pool(X, y, cat_features, pairs, sample_weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, column_description)
    697             raise CatBoostError("y has not initialized in fit(): X is not catboost.Pool object, y must be not None in fit().")
    698         train_pool = Pool(X, y, cat_features=cat_features, pairs=pairs, weight=sample_weight, group_id=group_id,
--> 699                           group_weight=group_weight, subgroup_id=subgroup_id, pairs_weight=pairs_weight, baseline=baseline)
    700     return train_pool
    701 

C:\Users\AI\Anaconda2\lib\site-packages\catboost\core.pyc in __init__(self, data, label, cat_features, column_description, pairs, delimiter, has_header, weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, feature_names, thread_count)
    326                         )
    327 
--> 328                 self._init(data, label, cat_features, pairs, weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, feature_names)
    329         super(Pool, self).__init__()
    330 

C:\Users\AI\Anaconda2\lib\site-packages\catboost\core.pyc in _init(self, data, label, cat_features, pairs, weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, feature_names)
    678             baseline = np.reshape(baseline, (samples_count, -1))
    679             self._check_baseline_shape(baseline, samples_count)
--> 680         self._init_pool(data, label, cat_features, pairs, weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, feature_names)
    681 
    682 

_catboost.pyx in _catboost._PoolBase._init_pool()

_catboost.pyx in _catboost._PoolBase._init_pool()

_catboost.pyx in _catboost._PoolBase._init_features_order_layout_pool()

_catboost.pyx in _catboost._set_features_order_data_pd_data_frame()

_catboost.pyx in _catboost.get_cat_factor_bytes_representation()

_catboost.pyx in _catboost.get_id_object_bytes_string_representation()

TypeError: a float is required
@lrq3000
Copy link
Author

lrq3000 commented Aug 17, 2019

After further debugging, and updating my scripts and notebook to Python 3.7, I can confirm this bug happens only with Python 2. So this is a Python version specific bug. On Python 3.7, CatBoost can handle categorical features without any prior encoding, but for Python 2.7 there should at least be a warning that categorical features should be encoded, in case this bug cannot be fixed.

@lrq3000
Copy link
Author

lrq3000 commented Aug 17, 2019

What is surprising is that with the titanic dataset provided in catboost, there is no issue even without any prior encoding in Python 2.7. Ah also I found that my columns here had some None values. In Python 3.7 the error was more explicit and thus I could track down and convert the None values to strings, maybe this is the culprit of this error? String or object columns having None values?

@annaveronika
Copy link
Contributor

Can you please try with the latest version? We've made changes to this part of the code and the bug should be fixed.

@annaveronika
Copy link
Contributor

I hope it has helped! Let us know if you still have problems, I'll reopen the issue!

@AnnaMosen
Copy link

I keep getting the same issue. I made sure I am using the Python 3.7 and also that the Catboost package is updated and I am still getting the error: CatBoostError: Bad value for num_feature[non_default_doc_idx=0,feature_idx=11]="['5']": Cannot convert 'b"['5']"' to float
is it make any difference that the value is a list? the data type is a string but the string is a list of numbers (I don't know if it makes any difference)
type(X['myfeature'][0])
str

I also made sure I'm passing the right index for the categorical feature:
categorical_features_indices = [11]

X.info()
Data columns (total 76 columns):
'#' Column Non-Null Count Dtype
...
11 myfeature 91607 non-null object
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants