Official code for the paper "One-step Language Modeling via Continuous Denoising"
Chanhyuk Lee1, Jaehoon Yoo1, Manan Agarwal2, Sheel Shah2, Jerry Huang2, Aditi Raghunathan2, Seunghoon Hong1, Nicholas M. Boffi†2, Jinwoo Kim†1
1KAIST 2Carnegie Mellon University †Equal advising
We introduce Flow-based Language Model (FLM) and its flow-map distilled variant Flow-map Language Model (FMLM), enabling one-step parallel text generation through continuous denoising.
FLM applies the benefits of continuous image generation to discrete state spaces by encoding text as one-hot vectors and using flow matching to directly map noise to one-hot data. Unlike discrete diffusion, FLM gradually denoises all tokens in parallel, allowing it to represent a superposition of sequences while capturing correlations between tokens — a fundamental bottleneck for discrete diffusion in the few-step regime.
pip install torch>=2.3.0
pip install -r requirements.txt
# Install flash-attn separately matching your python / torch version (see https://github.com/Dao-AILab/flash-attention/releases)
pip install flash-attn==2.8.3 --no-build-isolationOur DiT backbone supports torch.compile with max-autotune for faster training. Enable it by setting the environment variable before running any script:
export DIT_USE_COMPILE=TRUEWith the option, we are able to train OpenWebText experiments with 512 batch size on 8 H100 (80GB VRAM), without gradient accumulation.
Before running, update data.cache_dir in the scripts to point to your dataset location. If the directory is empty, the dataset will be automatically downloaded and preprocessed.
FLM Training (1M steps)
| Dataset | Script |
|---|---|
| LM1B | scripts/train_lm1b_flm.sh |
| OpenWebText | scripts/train_owt_flm.sh |
Flow Map Distillation
Set algo.teacher_path to your pre-trained FLM checkpoint before running.
| Dataset | Script |
|---|---|
| LM1B | scripts/train_lm1b_flm_distill.sh |
| OpenWebText | scripts/train_owt_flm_distill.sh |
Second Stage Distillation (optional)
Set algo.teacher_path_f to your pre-trained FLM checkpoint and algo.teacher_path_g to your distilled backbone from above script.
| Dataset | Script |
|---|---|
| LM1B | scripts/train_lm1b_flm_distill_second.sh |
| OpenWebText | scripts/train_owt_flm_distill_second.sh |
Set CKPT_PATH in the script to your trained checkpoint before running.
| Model | Dataset | Script |
|---|---|---|
| FLM | LM1B | scripts/gen_ppl_lm1b_flm.sh |
| FLM | OpenWebText | scripts/gen_ppl_owt_flm.sh |
| FMLM | LM1B | scripts/gen_ppl_lm1b_flm_distill_double.sh |
| FMLM | OpenWebText | scripts/gen_ppl_owt_flm_distill_double.sh |
Coming Soon
This codebase builds upon DUO.

