-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raise error:ValueError: operands could not be broadcast together with shapes (20000,9140) (401,) #41
Comments
Does your s contains only and all of labels 0, 1, 2,..., 9138, 9139? |
Based on the error you only have 401 unique classes in a, but your psx has 9140 classes. |
thank you. you are right . I find it only contain 401 class in the first 20000 samples . |
emmm, in my opinion ,the cleanlab is not suitable for data sets with too large number of classes, becase each class requires a certain amount of data to accurately estimate the joint probability distribution. If the the classes number is large, it will not have enough memory.. |
I am facing similar issue. If I load only a part of the labels at one time, then it might not have all the unique labels in |
Hi @sandeepnmenon send me an minimum code to reproduce and I'll take a look. |
Code to reproduce the error with random values.
Gives the similar error
|
One workaround that I can think is to prune the dimensions of the |
@cgnorthcutt Also, is the workaround valid?
|
I am facing the same issue as @sandeepnmenon |
Hi Folks, are you still facing this error? |
Hi, This is a serious issues. Let's imagine I want to use Clean Lab in a self-learning iteration to estimate the noise a classifier is introducing annotating. At every iteration is highly probable that the classifier annotations don't cover all the possible classes that are already in my initial labeled dataset. It should be possible to pass the K as the row dimension of psx, not as unique of s! |
Cleanlab now supports datasets with some classes missing. This support was added in: Feel free to reopen this issue if you still encounter any problems (using latest developer version)! |
hello, I use the cleanlab to clean my dataset.
my dataset contain about 450000 samples of 9140 classes, when i just use the first 20000 sample to clean.I got a error:
(20000,)
(20000, 9140)
/opt/meituan/develop/lixianyang/miniconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3335: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
/opt/meituan/develop/lixianyang/miniconda3/lib/python3.7/site-packages/numpy/core/_methods.py:161: RuntimeWarning: invalid value encountered in true_divide
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "clean_base_on_clean_lab.py", line 24, in
sorted_index_method='normalized_margin', # Orders label errors
File "/opt/meituan/develop/lixianyang/miniconda3/lib/python3.7/site-packages/cleanlab/pruning.py", line 342, in get_noise_indices
multi_label=multi_label,
File "/opt/meituan/develop/lixianyang/miniconda3/lib/python3.7/site-packages/cleanlab/latent_estimation.py", line 337, in compute_confident_joint
psx_bool = (psx >= thresholds - 1e-6)
ValueError: operands could not be broadcast together with shapes (20000,9140) (401,)
my input of s is a numpy.ndarray with shape(20000)
and psx is a numpy.ndarry with shape(20000,9140)
I don't know why the error is can't broadcast together with shapes (20000,9140) (401,)?
where is (401,) come from?
The text was updated successfully, but these errors were encountered: