Skip to content
/ flm Public

Official Codebase For paper "One-step Language Modeling via Continuous Denoising"

License

Notifications You must be signed in to change notification settings

david3684/flm

Repository files navigation

One-step Language Modeling via Continuous Denoising

Official code for the paper "One-step Language Modeling via Continuous Denoising"

Chanhyuk Lee1, Jaehoon Yoo1, Manan Agarwal2, Sheel Shah2, Jerry Huang2, Aditi Raghunathan2, Seunghoon Hong1, Nicholas M. Boffi†2, Jinwoo Kim†1

1KAIST   2Carnegie Mellon University   Equal advising

Paper arXiv Project Page

TL;DR

We introduce Flow-based Language Model (FLM) and its flow-map distilled variant Flow-map Language Model (FMLM), enabling one-step parallel text generation through continuous denoising.

Overview

FLM applies the benefits of continuous image generation to discrete state spaces by encoding text as one-hot vectors and using flow matching to directly map noise to one-hot data. Unlike discrete diffusion, FLM gradually denoises all tokens in parallel, allowing it to represent a superposition of sequences while capturing correlations between tokens — a fundamental bottleneck for discrete diffusion in the few-step regime.

How to Run

Install Dependencies

pip install torch>=2.3.0
pip install -r requirements.txt
# Install flash-attn separately matching your python / torch version (see https://github.com/Dao-AILab/flash-attention/releases)
pip install flash-attn==2.8.3 --no-build-isolation

Our DiT backbone supports torch.compile with max-autotune for faster training. Enable it by setting the environment variable before running any script:

export DIT_USE_COMPILE=TRUE

With the option, we are able to train OpenWebText experiments with 512 batch size on 8 H100 (80GB VRAM), without gradient accumulation.

Training

Before running, update data.cache_dir in the scripts to point to your dataset location. If the directory is empty, the dataset will be automatically downloaded and preprocessed.

FLM Training (1M steps)

Dataset Script
LM1B scripts/train_lm1b_flm.sh
OpenWebText scripts/train_owt_flm.sh

Flow Map Distillation

Set algo.teacher_path to your pre-trained FLM checkpoint before running.

Dataset Script
LM1B scripts/train_lm1b_flm_distill.sh
OpenWebText scripts/train_owt_flm_distill.sh

Second Stage Distillation (optional)

Set algo.teacher_path_f to your pre-trained FLM checkpoint and algo.teacher_path_g to your distilled backbone from above script.

Dataset Script
LM1B scripts/train_lm1b_flm_distill_second.sh
OpenWebText scripts/train_owt_flm_distill_second.sh

Evaluation

Set CKPT_PATH in the script to your trained checkpoint before running.

Model Dataset Script
FLM LM1B scripts/gen_ppl_lm1b_flm.sh
FLM OpenWebText scripts/gen_ppl_owt_flm.sh
FMLM LM1B scripts/gen_ppl_lm1b_flm_distill_double.sh
FMLM OpenWebText scripts/gen_ppl_owt_flm_distill_double.sh

BibTeX

Coming Soon


Acknowledgements

This codebase builds upon DUO.

About

Official Codebase For paper "One-step Language Modeling via Continuous Denoising"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published