How to benchmark with x% of label noise #977

gordon-lim · 2024-02-04T18:08:04Z

By my understanding, cleanlab.benchmarking.noise_generation.generate_noise_matrix_from_trace is used to generate a set percentage of incorrect labels. In the code below, I've tried using a 0.2 noise amount (20% label noise) however the number of label errors created was only 776/20000 = 3.88% which is a big gap from the intended 20% label noise. I would like to clarify the meaning of "noise amount" as used with the trace. I would also like to enquire if there is a way to generate a set percentage of incorrect labels e.g. 20% of 20000 = 4000.

import random
from cleanlab.benchmarking.noise_generation import *

random.seed(100)
random_numbers = [random.randint(0, 119) for _ in range(20000)]
trace = 120 * (1 - 0.2)
noisy_matrix = generate_noise_matrix_from_trace(K=120, trace=trace, valid_noise_matrix=False, seed=100)
noisy_numbers = generate_noisy_labels(random_numbers, noisy_matrix)
sum(noisy_numbers != random_numbers) # prints 776

The text was updated successfully, but these errors were encountered:

gordon-lim added the question A question for Cleanlab maintainers label Feb 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to benchmark with x% of label noise #977

How to benchmark with x% of label noise #977

gordon-lim commented Feb 4, 2024

How to benchmark with x% of label noise #977

How to benchmark with x% of label noise #977

Comments

gordon-lim commented Feb 4, 2024