errors found by cleanlab are mostly correct actually. #25

PromptExpert · 2020-04-11T09:05:42Z

I used the method in tutorial:

ordered_label_errors = get_noise_indices( s=numpy_array_of_noisy_labels, psx=numpy_array_of_predicted_probabilities, sorted_index_method='normalized_margin', # Orders label errors )

then the outputs that supposed to be error labels are actually correct, what actions could I take to figure out the reason?

The text was updated successfully, but these errors were encountered:

cgnorthcutt · 2020-04-11T12:02:44Z

Hi @NLPpupil . Can you please share (1) examples of your psx, and matching s, (2) how you computer psx, and (3) a minimum working example of your code?

PromptExpert · 2020-04-11T15:50:39Z

@cgnorthcutt Thank you, showing you the examples and code is a bother for you. I will double check first.

PromptExpert · 2020-04-13T01:40:25Z

Hi @cgnorthcutt , could you please tell me how to use cleanlab.models.fasttext.py to find label errors in details? I have a train file which is of fasttext format and I want to find the labels errors in the train file. Thank you very much .

cgnorthcutt · 2020-04-13T04:08:09Z

Hi @NLPpupil . Create an instance of the object model = FastTextClassifier. Then use the same approach as any other model: https://github.com/cgnorthcutt/cleanlab#learning-with-noisy-labels-in-3-lines-of-code

PromptExpert · 2020-04-13T08:39:00Z

I tried, but the model trained is just like the normal model trained by fasttext command line.Below is my code:
model = cleanft.FastTextClassifier(train_data_fn='train.txt',test_data_fn='test.txt',kwargs_train_supervised={'dim':200,'epoch':10,'minCount':5,'wordNgrams':3}) model.fit() predicted_test_labels = model.predict(train_data=False)

cgnorthcutt · 2020-04-13T15:37:41Z

Please provide the full error stack. Also cleanlab does not have a cleanft.

PromptExpert · 2020-04-14T02:45:19Z

�I figured out the reason.

The reason why "errors found by cleanlab are mostly correct" is that my data is almost clean !

If I randomly replace 10% of the label with an incorrect label, and check the outputs of ordered_label_errors = get_noise_indices(), I found that that 97% of the top 100 instances are really noises and only 9% of the last 100 instances are noises!

Thank your for your excellent work.

cgnorthcutt closed this as completed Apr 13, 2020

cgnorthcutt mentioned this issue Apr 9, 2021

Strange behavior for get_noise_indices #60

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

errors found by cleanlab are mostly correct actually. #25

errors found by cleanlab are mostly correct actually. #25

PromptExpert commented Apr 11, 2020

cgnorthcutt commented Apr 11, 2020

PromptExpert commented Apr 11, 2020

PromptExpert commented Apr 13, 2020

cgnorthcutt commented Apr 13, 2020

PromptExpert commented Apr 13, 2020

cgnorthcutt commented Apr 13, 2020

PromptExpert commented Apr 14, 2020

errors found by cleanlab are mostly correct actually. #25

errors found by cleanlab are mostly correct actually. #25

Comments

PromptExpert commented Apr 11, 2020

cgnorthcutt commented Apr 11, 2020

PromptExpert commented Apr 11, 2020

PromptExpert commented Apr 13, 2020

cgnorthcutt commented Apr 13, 2020

PromptExpert commented Apr 13, 2020

cgnorthcutt commented Apr 13, 2020

PromptExpert commented Apr 14, 2020