One-step Language Modeling via Continuous Denoising

Official code for the paper "One-step Language Modeling via Continuous Denoising"

Chanhyuk Lee¹, Jaehoon Yoo¹, Manan Agarwal², Sheel Shah², Jerry Huang², Aditi Raghunathan², Seunghoon Hong¹, Nicholas M. Boffi^†2, Jinwoo Kim^†1

¹KAIST ²Carnegie Mellon University ^†Equal advising

TL;DR

We introduce Flow-based Language Model (FLM) and its flow-map distilled variant Flow-map Language Model (FMLM), enabling one-step parallel text generation through continuous denoising.

Overview

FLM applies the benefits of continuous image generation to discrete state spaces by encoding text as one-hot vectors and using flow matching to directly map noise to one-hot data. Unlike discrete diffusion, FLM gradually denoises all tokens in parallel, allowing it to represent a superposition of sequences while capturing correlations between tokens — a fundamental bottleneck for discrete diffusion in the few-step regime.

How to Run

Install Dependencies

pip install torch>=2.3.0
pip install -r requirements.txt
# Install flash-attn separately matching your python / torch version (see https://github.com/Dao-AILab/flash-attention/releases)
pip install flash-attn==2.8.3 --no-build-isolation

Our DiT backbone supports torch.compile with max-autotune for faster training. Enable it by setting the environment variable before running any script:

export DIT_USE_COMPILE=TRUE

With the option, we are able to train OpenWebText experiments with 512 batch size on 8 H100 (80GB VRAM), without gradient accumulation.

Training

Before running, update data.cache_dir in the scripts to point to your dataset location. If the directory is empty, the dataset will be automatically downloaded and preprocessed.

FLM Training (1M steps)

Dataset	Script
LM1B	scripts/train_lm1b_flm.sh
OpenWebText	scripts/train_owt_flm.sh

Flow Map Distillation

Set algo.teacher_path to your pre-trained FLM checkpoint before running.

Dataset	Script
LM1B	scripts/train_lm1b_flm_distill.sh
OpenWebText	scripts/train_owt_flm_distill.sh

Second Stage Distillation (optional)

Set algo.teacher_path_f to your pre-trained FLM checkpoint and algo.teacher_path_g to your distilled backbone from above script.

Dataset	Script
LM1B	scripts/train_lm1b_flm_distill_second.sh
OpenWebText	scripts/train_owt_flm_distill_second.sh

Evaluation

Set CKPT_PATH in the script to your trained checkpoint before running.

Model	Dataset	Script
FLM	LM1B	scripts/gen_ppl_lm1b_flm.sh
FLM	OpenWebText	scripts/gen_ppl_owt_flm.sh
FMLM	LM1B	scripts/gen_ppl_lm1b_flm_distill_double.sh
FMLM	OpenWebText	scripts/gen_ppl_owt_flm_distill_double.sh

BibTeX

Coming Soon

Acknowledgements

This codebase builds upon DUO.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
configs		configs
figures		figures
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
algo.py		algo.py
dataloader.py		dataloader.py
main.py		main.py
metrics.py		metrics.py
requirements.txt		requirements.txt
trainer_base.py		trainer_base.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

One-step Language Modeling via Continuous Denoising

TL;DR

Overview

How to Run

Install Dependencies

Training

Evaluation

BibTeX

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

david3684/flm

Folders and files

Latest commit

History

Repository files navigation

One-step Language Modeling via Continuous Denoising

TL;DR

Overview

How to Run

Install Dependencies

Training

Evaluation

BibTeX

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages