Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with detectors and model when using non-numeric class labels #1

Open
slimebob1975 opened this issue Sep 23, 2022 · 0 comments

Comments

@slimebob1975
Copy link

Thank you for the excellent ml package in scikit-clean!

We are using it primarily to look for illegal payments from the welfare system in our country, where training data may not be correct, since all illegal payments have not been discovered.

However, there are some issues regarding the detectors MCS and PartitioningDetector and the RobustCentroid that we have not yet been able to solve. Then following happens when we use string labels, e.g., in the wellknown iris classification problem. I guess it is connected to us not using integers or floats as class labels.

PartitioningDetector:
File "c:\scikit-clean\skclean\detectors\ensemble.py", line 64, in detect
preds[:, i] = clfs[i].predict(X)
ValueError: could not convert string to float: 'Iris-setosa'

MCS:
File "c:\scikit-clean\skclean\detectors\ensemble.py", line 118, in detect
pc = probs[range(len(y)), y] # (N,), Prob assigned to correct class
IndexError: arrays used as indices must be of integer (or boolean) type

(Robust) Centroid:
File "c:\scikit-clean\skclean\models\svm.py", line 23, in predict
dist[:, c] = pairwise_kernels(X, self.data_[c], metric=self.kernel).mean(axis=1)
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant