-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute Confident Joint number of classes K #85
Comments
The idea is to consider the new annotated sample at every iteration of a self-learning (or active-learning) as a noisy test set we would like to clean. We could compute the out-of-sample probabilities of the test set by training a model on the initial labeled training set (assumed to be cleaned and with all the classes represented). The problem is that when I apply the function cl_label_errors = get_noise_indices( If in the test set are not represented all the classes this function returns an error of incompatible dimensions because it calculates the number of classes as the len(unique(s)) in the test set, instead of as the row dimension of psx which would be the correct one. |
I encountered the same issue. Even when computing the calibration_join beforehand which allows setting |
@jcklie @filippoBUO It seems that
Then to overwritten the original
This is a quick fix, but an update directly on |
Thanks @vtsouval !! I'll add this functionality shortly. |
@jwmueller I have tested this in my implementation and it works fine. However, if you plan to use
|
Amazing thank you for the details! |
Hi,
In the compute_confident_joint class there is the possibility of passing the number of classes K. But no matter what value of K I set, the library returns the dimension of the confident joint with K=len(np.unique(s)).
If I want to use this library dynamically like in an iteration of a self-learning or active-learning algorithm in order to see what are the labels with high probability of being noisy, I should be able to set K as the number of unique classes I know I have in my classification problem and not as the number of unique classes the Clean Lab sees. At every iteration I might not see all the classes and this library in that case seems impossible to be used.
The text was updated successfully, but these errors were encountered: