How can we get the indices of examples having class overlap, when using the find_overlapping_classes method? #839
Replies: 2 comments
-
Hi @rishabh706, thank you for the question! It seems you're referring to the Find Dataset-level Issues for Dataset Curation tutorial, so I'll use that as a reference. Instead of running the loop over all the datasets, I just ran the the following for the Caltech256 dataset. # load class names, given labels, and predicted probabilities from already-trained model
pred_probs, labels, class_names = _load_classes_predprobs_labels("caltech256")
df = cleanlab.dataset.find_overlapping_classes(labels, pred_probs, class_names=class_names)
Given that you have the # Extract the row or rows of interest from the df
selected_rows = df.iloc[:1] # This gets the first row. Adjust the slice as needed for multiple rows.
# Extract class indices
class_index_A = selected_rows["Class Index A"].values
class_index_B = selected_rows["Class Index B"].values
# Find overlapping indices in labels for both classes
overlap_indices_A = np.where(labels[:, None] == class_index_A)[0]
overlap_indices_B = np.where(labels[:, None] == class_index_B)[0]
# Combine the indices to get all examples of the overlapping classes
overlap_indices = np.unique(np.concatenate((overlap_indices_A, overlap_indices_B)))
print(overlap_indices) This will provide you with the indices of ALL examples that belong to the overlapping classes, not just the ones estimated to be overlapping. |
Beta Was this translation helpful? Give feedback.
-
Hi @elisno, |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
All reactions