Few-step text generation from distilled diffusion.
Diffusion Language Models can generate high-quality text, but their iterative reverse-diffusion sampling makes inference slow. IDLM speeds them up by distilling a pretrained many-step diffusion language model into a few-step generator.
Instead of simply matching every teacher step, IDLM uses an Inverse Distillation view for discrete token spaces.The paper reports 4×–64× fewer inference steps while preserving the teacher model’s entropy and generative perplexity.
- Fast diffusion-language generation with 4, 8, 16, and 32 step sampling recipes.
- Inverse distillation for discrete tokens, including student/fake models training logic.
- Hydra-powered experiments for easy configuration and reproducibility.
- PyTorch Lightning training loop with checkpointing, logging, and distributed training support.
- Ready-to-run scripts for MDLM, DUO, and DCD-style recipes.
- Evaluation utilities for NLL, BPD, perplexity, generative perplexity, and sample entropy.
IDLM/
├── configs/ # Hydra configs: data, model, algo, strategy, callbacks, etc.
│ ├── algo/ # ar, mdlm, duo, duo_base, d3pm, sedd
│ ├── data/ # OpenWebText configs
│ ├── model/ # tiny / small / medium model configs
│ ├── noise/ # diffusion noise schedules
│ └── config.yaml # main experiment config
├── integral/ # precomputed tokenizer / integration assets
├── models/ # DiT backbone, EMA utilities, attention tests
├── scripts/ # training and generation recipes
├── algo.py # model families and IDLM distillation logic
├── dataloader.py # tokenizers, datasets, dataloaders
├── main.py # Hydra + Lightning entry point
├── metrics.py # perplexity, entropy, BPD, NLL metrics
├── trainer_base.py # shared training / sampling base classes
├── utils.py # logging and helper utilities
├── requirements.txt # environment note / dependency list
└── LICENSE
git clone https://github.com/David-cripto/IDLM.git
cd IDLMTo get started, create a conda environment containing the required dependencies.
conda create -n idlm python=3.12
conda activate idlm
conda install nvidia/label/cuda-12.4.0::cuda-toolkit
pip install -r requirements.txt
pip install flash_attn==2.7.4.post1- IDLM-MDLM. Trained on OpenWebText:
- IDLM-Duo. Trained on OpenWebText:
- IDLM-DCD. Trained on OpenWebText:
The repository includes three training recipes. Before executing the scripts, configure the cache_dir parameter in configs.data.openwebtext-split.yaml to specify the desired output path.
bash scripts/train_idlm_mdlm.shbash scripts/train_idlm_duo.shbash scripts/train_idlm_dcd.shThese scripts use Hydra overrides for batch size, dataset, teacher checkpoint, algorithm, sampling steps, precision, logging name, and validation frequency. Use them as strong starting points, then tune the overrides for your compute budget.
The generation scripts sweep over 4, 8, 16, and 32 sampling steps.
Before running them, set eval.generated_samples_path to a real JSON output path.
mkdir -p samples
python -m main \
mode=sample_eval \
loader.batch_size=2 \
loader.eval_batch_size=8 \
data=openwebtext-split \
algo=mdlm \
algo.backbone=hf_dit \
eval.checkpoint_path=kekchpek/idlm-mdlm \
sampling.steps=16 \
sampling.num_sample_batches=10 \
sampling.predictor=ancestral_cache \
sampling.noise_removal=ancestral \
+wandb.offline=true \
eval.generated_samples_path=samples/idlm_mdlm_16steps.jsonmkdir -p samples
python -m main \
mode=sample_eval \
loader.batch_size=2 \
loader.eval_batch_size=8 \
data=openwebtext-split \
algo=duo \
algo.backbone=hf_dit \
eval.checkpoint_path=kekchpek/idlm-duo \
sampling.steps=16 \
sampling.num_sample_batches=10 \
sampling.noise_removal=greedy \
+wandb.offline=true \
eval.generated_samples_path=samples/idlm_duo_16steps.jsonmkdir -p samples
python -m main \
mode=sample_eval \
loader.batch_size=2 \
loader.eval_batch_size=8 \
data=openwebtext-split \
algo=duo \
algo.backbone=hf_dit \
eval.checkpoint_path=kekchpek/idlm-dcd \
sampling.steps=4 \
sampling.num_sample_batches=10 \
sampling.noise_removal=greedy \
+wandb.offline=true \
eval.generated_samples_path=samples/idlm_duo_4steps.jsonbash scripts/generation_idlm_mdlm.sh
bash scripts/generation_idlm_duo.sh
bash scripts/generation_idlm_dcd.shGenerated sample files contain:
{
"generative_ppl": 0.0,
"entropy": 0.0,
"generated_seqs": []
}Training scripts reference teacher checkpoints such as:
kuleshov-group/mdlm-owt
s-sahoo/duo
s-sahoo/duo-distilled
Generation scripts reference IDLM checkpoints such as:
kekchpek/idlm-mdlm
kekchpek/idlm-duo
kekchpek/idlm-dcd
Make sure your environment can access the required checkpoints before running training or generation.
By default, Hydra writes experiment outputs under:
outputs/<dataset>/<date>/<time>/
TensorBoard logs are written under:
tb_logs/
Checkpoints are written according to the checkpointing config in configs/config.yaml.
If you find this repository useful, please cite:
@article{li2026idlm,
title={IDLM: Inverse-distilled Diffusion Language Models},
author={Li, David and Gushchin, Nikita and Abulkhanov, Dmitry and Moulines, Eric and Oseledets, Ivan and Panov, Maxim and Korotin, Alexander},
journal={arXiv preprint arXiv:2602.19066},
year={2026}
}Our codebase is inspired by recent Discrete Diffusion Models projects. Namely, MDLM and Duo.
This project is released under the MIT License. See LICENSE for details.
