Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vowpal-Wabbit - Scikit-Learn adaptor raises TypeError #4595

Closed
tdezhdar opened this issue May 19, 2023 · 3 comments
Closed

Vowpal-Wabbit - Scikit-Learn adaptor raises TypeError #4595

tdezhdar opened this issue May 19, 2023 · 3 comments
Labels
Bug Bug in learning semantics, critical by default

Comments

@tdezhdar
Copy link

Describe the bug

Running the following python code

from vowpalwabbit.sklearn import VWClassifier
import numpy as np
import pandas as pd

X = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [4, 1, 4, 2]}, dtype='uint32').astype('category')
y = pd.Series(np.zeros(4), dtype='int64')

VWClassifier(ftrl=True).fit(X, y)

gives a quite mysterious

Traceback (most recent call last):
  File "/xxx/scratch.py", line 8, in <module>
    VWClassifier(ftrl=True).fit(X, y)
  File "/xxx/python3.8/site-packages/vowpalwabbit/sklearn.py", line 568, in fit
    return VW.fit(self, X=X, y=y, sample_weight=sample_weight)
  File "/xxx/python3.8/site-packages/vowpalwabbit/sklearn.py", line 348, in fit
    X = tovw(
  File "/xxx/python3.8/site-packages/vowpalwabbit/sklearn.py", line 831, in tovw
    dump_svmlight_file(x, np.zeros(rows), s)
  File "/xxx/python3.8/site-packages/sklearn/datasets/_svmlight_format_io.py", line 525, in dump_svmlight_file
    _dump_svmlight(X, y, f, multilabel, one_based, comment, query_id)
  File "/xxx/python3.8/site-packages/sklearn/datasets/_svmlight_format_io.py", line 395, in _dump_svmlight
    _dump_svmlight_file(
  File "sklearn/datasets/_svmlight_format_fast.pyx", line 233, in sklearn.datasets._svmlight_format_fast._dump_svmlight_file
  File "sklearn/datasets/_svmlight_format_fast.pyx", line 131, in sklearn.datasets._svmlight_format_fast.__pyx_fused_cpdef
TypeError: Function call with ambiguous argument types

Note the odd original type of X -- this is output from another feature generator. It is possible to convert X first to int64 and then to category to circumvent the problem mostly. Only a cleanup error remains.

For the moment, it is a viable option to first convert X to a signed integer type and then to categorical, however that might not always be the case. Furthermore, this behaviour should probably be documented.

How to reproduce

See above.

Version

vowpalwabbit==9.8.0

OS

macos

Language

python

Additional context

No response

@tdezhdar tdezhdar added the Bug Bug in learning semantics, critical by default label May 19, 2023
@jackgerrits
Copy link
Member

Thanks for opening this. I looked into it and we're using a function in sklearn that seems to not work for unsigned types. I've added a more descriptive type error (#4610) and opened an issue (#4609) to track a real fix for this.

@tdezhdar
Copy link
Author

tdezhdar commented Jun 7, 2023

Thanks @jackgerrits that more descriptive type error is already super useful.

@jackgerrits
Copy link
Member

I'm going to go ahead and close this issue now that the better error is merged and the more complete fix will be tracked by #4609.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Bug in learning semantics, critical by default
Projects
None yet
Development

No branches or pull requests

2 participants