Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In the “Synthetic Data Quality” part, do we need the same amount of real data and generated data #1093

Closed
Lucy-IM opened this issue Apr 11, 2024 · 1 comment
Labels
question A question for Cleanlab maintainers

Comments

@Lucy-IM
Copy link

Lucy-IM commented Apr 11, 2024

No description provided.

@Lucy-IM Lucy-IM added the question A question for Cleanlab maintainers label Apr 11, 2024
@jwmueller
Copy link
Member

Hi @Lucy-IM, I presume you're referring to this Cleanlab Studio tutorial?
https://help.cleanlab.ai/tutorials/synthetic_data/

or the associated blogposts?
https://cleanlab.ai/blog/synthetic-image-with-stable-diffusion/
https://cleanlab.ai/blog/studio-synthetic-data/

Regardless, the answer is No. You don't have to have the same amount of real data vs generated data. Most of our customers have much more synthetic data than real data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question A question for Cleanlab maintainers
Projects
None yet
Development

No branches or pull requests

2 participants