Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to benchmark with x% of label noise #977

Open
gordon-lim opened this issue Feb 4, 2024 · 0 comments
Open

How to benchmark with x% of label noise #977

gordon-lim opened this issue Feb 4, 2024 · 0 comments
Labels
question A question for Cleanlab maintainers

Comments

@gordon-lim
Copy link
Contributor

By my understanding, cleanlab.benchmarking.noise_generation.generate_noise_matrix_from_trace is used to generate a set percentage of incorrect labels. In the code below, I've tried using a 0.2 noise amount (20% label noise) however the number of label errors created was only 776/20000 = 3.88% which is a big gap from the intended 20% label noise. I would like to clarify the meaning of "noise amount" as used with the trace. I would also like to enquire if there is a way to generate a set percentage of incorrect labels e.g. 20% of 20000 = 4000.

import random
from cleanlab.benchmarking.noise_generation import *

random.seed(100)
random_numbers = [random.randint(0, 119) for _ in range(20000)]
trace = 120 * (1 - 0.2)
noisy_matrix = generate_noise_matrix_from_trace(K=120, trace=trace, valid_noise_matrix=False, seed=100)
noisy_numbers = generate_noisy_labels(random_numbers, noisy_matrix)
sum(noisy_numbers != random_numbers) # prints 776
@gordon-lim gordon-lim added the question A question for Cleanlab maintainers label Feb 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question A question for Cleanlab maintainers
Projects
None yet
Development

No branches or pull requests

1 participant