A PyTorch implementation of FS-DFM with custom solvers for efficient text generation and discrete sequence modeling. This software project accompanies the research paper, FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models .
This repository contains:
- Flow Matching (
flow_matching/): Based on Flow Matching from Meta implementation, with our custom discrete solvers added indiscrete_solver_fsdfm.py - FS-DFM (
fs_dfm/): Fast sampling diffusion flow matching for discrete sequences - Pre-training (
pre_training/): Transformer-based model pre-training utilities
| ARM | DFM | FS-DFM (Ours) |
|---|---|---|
![]() |
![]() |
![]() |
- Custom discrete flow matching solvers (
flow_matching/solver/discrete_solver_fsdfm.py) - Student-teacher distillation framework
- Multiple solver options:
mixture_euler,mixture_euler_with_cumulative_scalar - Support for various source distributions (uniform, mask)
- Efficient sampling with configurable steps
- Python 3.8+
- CUDA 11.0+ (for GPU support)
- conda or mamba package manager
# Create conda environment
conda env create -f fsdfm_environment.yml
# Activate environment
conda activate FSDFM
# Install package in development mode
pip install -e .Train a model using the FS-DFM framework:
python fs_dfm/run_train.py \
data.cache_dir=${CACHE_DIR:-./cache_dir}Configuration can be modified in fs_dfm/configs/config.yaml.
Evaluate a trained model:
python fs_dfm/run_eval.py \
--work_dir "/path/to/output/artifacts" \
--ngpus 1 \
--perplexity_n_samples 320 \
--eval_elbo \
--eval_perplexity \
--do_dynamic_step \
--pre_trained_model_path "/path/to/checkpoint.pth"For pre-training transformer models:
python pre_training/run_train.pyKey configuration parameters in fs_dfm/configs/config.yaml:
-
Flow Settings:
source_distribution: Choose betweenuniformormasksampling_steps: Number of sampling steps (default: 1024)student_solver: Solver type for student modeltemperature: Temperature for sampling (default: 1.0)
-
Training Settings:
optimizer: AdamW optimizer with configurable learning ratesweight_decay: 0.03grad_clip: 1.0n_iters: Total training iterations
-
Evaluation Settings:
batch_size: Evaluation batch sizeperplexity: Enable perplexity evaluationsample_batch_size: Batch size for sampling
.
├── flow_matching/
│ └── solver/
│ └── discrete_solver_fsdfm.py # Custom discrete flow solvers
├── fs_dfm/
│ ├── configs/ # Configuration files
│ ├── eval.py # Evaluation utilities
│ ├── logic/
│ │ └── evaluate.py # Likelihood estimation
│ └── run_train.py # Training script
└── pre_training/
├── data/
│ └── data.py # Data loading utilities
├── model/
│ └── transformer.py # Transformer model components
└── run_train.py # Pre-training script
The finite_probs_to_generator method converts probability distributions to flow generators with:
- Energy barrier for controlling transitions
- Proper normalization with step size (
dt_seg) - Safety clipping for numerical stability
The framework supports transformer-based architectures with:
- Configurable vocabulary size
- Dropout regularization
- Distributed training support
If you use this code in your research, please cite:
@article{fsdfm2025,
title={FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models},
author={Amin Karimi Monsefi and Nikhil Bhendawade and Manuel Rafael Ciosici and Dominic Culver and Yizhe Zhang and Irina Belousova},
year={2025}
}The flow matching implementation is based on Flow Matching from Meta, with custom discrete solvers added in discrete_solver_fsdfm.py.
See LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.


