Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
-
Updated
Jul 14, 2025 - Python
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
A framework for prompt tuning using Intent-based Prompt Calibration
Synthetic data curation for post-training and structured data extraction
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
Perception toolkit for sim2real training and validation in Unity
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data generation pipeline!
Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips
NVIDIA Deep learning Dataset Synthesizer (NDDS)
A curated list of awesome projects which use Machine Learning to generate synthetic content.
Compose multimodal datasets 🎹
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
Generate large synthetic data using an LLM
SynthDet - An end-to-end object detection pipeline using synthetic data
[NeurIPS D&B Track 2024] Official implementation of HumanVid
A novel approach for synthesizing tabular data using pretrained large language models
Unity's privacy-preserving human-centric synthetic data generator
Random dataframe and database table generator
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
awesome synthetic (text) datasets
Add a description, image, and links to the synthetic-dataset-generation topic page so that developers can more easily learn about it.
To associate your repository with the synthetic-dataset-generation topic, visit your repo's landing page and select "manage topics."