Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What format of s and psx should be inputted into get_noise_indices() in a multi-label scenario #23

Closed
KongMingxi opened this issue Mar 8, 2020 · 6 comments

Comments

@KongMingxi
Copy link

I have 20 samples with multi-label and 5 classes, such as:
[[2, 3, 4], [1, 3, 4, 5], [1, 3, 4], [1, 2, 3, 4, 5], [2, 3, 5], [1, 2, 4], [1, 3, 4, 5], [1, 3], [1, 5], [1, 3, 4, 5], [2, 3, 4], [3, 4], [4], [1, 3, 4], [2, 3, 4, 5], [1, 4], [3, 4], [3, 5], [2, 3, 5], [2, 5]]
I inputted this label list and a probabilities matrix as psx (shape=(20,5)) into get_noise_indices().
However, the error is:
File "C:\Users\Anaconda2\envs\tf18\lib\site-packages\cleanlab\pruning.py", line 342, in get_noise_indices
multi_label=multi_label,
File "C:\Users\Anaconda2\envs\tf18\lib\site-packages\cleanlab\latent_estimation.py", line 303, in compute_confident_joint
calibrate=calibrate,
File "C:\Users\Anaconda2\envs\tf18\lib\site-packages\cleanlab\latent_estimation.py", line 216, in _compute_confident_joint_multi_label
multi_label=True,
File "C:\Users\Anaconda2\envs\tf18\lib\site-packages\cleanlab\latent_estimation.py", line 121, in calibrate_confident_joint
confident_joint.T / confident_joint.sum(axis=1) * s_counts
ValueError: operands could not be broadcast together with shapes (5,5) (6,)

Is there any wrong with my inputs?
What format of s and psx are correct in this multi-label scenario?

@cgnorthcutt
Copy link
Member

Hi @KongMingxi. Easy fix. The labels need to start at 0. So, use labels 0, 1, 2, 3, 4 instead of 1, 2, 3, 4, 5.

@KongMingxi
Copy link
Author

KongMingxi commented Mar 8, 2020

Hi @KongMingxi. Easy fix. The labels need to start at 0. So, use labels 0, 1, 2, 3, 4 instead of 1, 2, 3, 4, 5.

Hi @cgnorthcutt
Many thanks for your reply. Besides the above format, can the one-hot labels be inputted directly?
Such as numpy array:
array([[0, 0, 1, 0, 0],
[1, 1, 1, 0, 0],
[1, 1, 1, 1, 0],
[0, 1, 0, 0, 1]])

@cgnorthcutt
Copy link
Member

cgnorthcutt commented Mar 26, 2020

Hi @KongMingxi

You can use cleanlab.util.onehot2int to convert your onehots to the appropriate format before using cleanlab. View the implementation here: https://github.com/cgnorthcutt/cleanlab/blob/master/cleanlab/util.py#L264

For example,

from cleanlab.util import onehot2int
correctly_formatted_labels = onehot2int(onehot_matrix)

@nuaayuyang
Copy link

image
when i am using it to mutil lable i have got this problem , how can i solve it

@cgnorthcutt
Copy link
Member

cgnorthcutt commented Jul 9, 2020 via email

@nuaayuyang
Copy link

hi@cgnorthcutt
image
then I got this output

image
the number(above the image) which I print is s, my psx is a nxm matrix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants