This repository contains the official implementation of FLUID (Flexible Unidirectional Inference Diffusion), a framework designed to efficiently adapt pre-trained Autoregressive (AR) backbones into parallel diffusion models. By enforcing Strictly Causal Alignment and introducing Elastic Horizons, FLUID achieves state-of-the-art performance with orders of magnitude less training data compared to standard diffusion models.
- Strictly Causal Alignment: Unlike bidirectional diffusion, FLUID uses a lower-triangular attention mask to maintain the inductive biases of AR priors. This enables seamless initialization from GPT-style checkpoints like openPangu-Embedded-7B.
-
Elastic Horizon Modeling: An entropy-driven mechanism that dynamically modulates denoising strides
$K_t$ based on local information density. It "sprints" through predictable text and "downshifts" for complex reasoning. - Training Efficiency: Achieves superior results on GSM8K (91.9) and MATH500 (61.8) using only 2.7B tokens of adaptation data, outperforming models trained on trillions of tokens.
- LLaMA-Factory Integration: Fully compatible with the LLaMA-Factory ecosystem for efficient LoRA fine-tuning and scaling.
FLUID bridges the gap between AR models and diffusion paradigms through two core architectural innovations:
Departing from standard bidirectional diffusion models that require bidirectional attention, FLUID injects a lower-triangular attention mask into the Transformer. This restricts the conditional probability of restoring a token to depend solely on its causal history, preserving the inductive biases of pre-trained LLMs.
To resolve the "Entropy-Horizon Dilemma," we replace fixed-size blocks with Elastic Horizons. A lightweight Diffusion K-Head predicts the optimal generation stride
- High-confidence segments: The model expands the horizon to "sprint" through predictable text.
- High-entropy transitions: The model contracts the horizon for fine-grained, cautious reasoning.
The trained weights for FLUID-7B (adapted from openPangu-Embedded-7B) are available on Hugging Face:
| Model | Base Model | Adaptation Data | HF Link |
|---|---|---|---|
| FLUID-7B | openPangu-Embedded-7B | 2.7B Tokens | 🤗 Download from Hugging Face |
FLUID-7B matches or exceeds top-tier AR and Diffusion baselines across standard benchmarks:
| Model | Type | Tokens | MMLU | GSM8K | MATH500 | HumanEval |
|---|---|---|---|---|---|---|
| LLaMA-3-8B | AR | 15T | 68.4 | 78.3 | 27.4 | 59.8 |
| Qwen-2.5-7B | AR | 18T | 76.6 | 91.6 | 84.8 | 79.2 |
| LLaDA-8B | Diff | 2.0T | 65.5 | 36.2 | 34.2 | 47.6 |
| FLUID-7B (Ours) | Diff | 2.7B | 67.8 | 91.9 | 61.8 | 60.4 |
FLUID is trained via a two-stage process using LLaMA-Factory:
Fine-tune the AR backbone (e.g., openPangu-Embedded-7B) using a hybrid objective that combines AR generation and masked denoising.
- Duration: 32,000 iterations.
- Optimization: Rank-16 LoRA on the backbone.
-
Objective:
${\mathcal{L}_{Stage1}} = {L_{AR}} + {\mathcal{L}_{Diff}}$ under strictly causal constraints.
Freeze the backbone and train the Diffusion K-Head to predict the optimal generation stride.
- Duration: 2,000 steps.
-
Objective: Minimizing KL divergence between predicted horizon distribution
$P_{\phi}$ and Gaussian soft targets$\mathcal{Q}$ . -
Confidence Threshold:
$\tau=2.8$ (Optimized for openPangu-Embedded-7B).
-
Clone LLaMA-Factory:
git clone https://github.com/Oli-lab-nun/FLUID.git cd LLaMA-Factory pip install -e .[torch,metrics] -
Run Training: Use the provided
configs_pangu/pangu.yamlconfig:bash train.sh
FLUID is developed based on the openPangu-Embedded-7B base model. We sincerely thank the openPangu team for releasing their model and related resources to the community, which made this research possible.
Our FLUID-7B checkpoint is adapted from openPangu-Embedded-7B, and we gratefully acknowledge the original developers and maintainers of openPangu.
Powered by openPangu.
openPangu is a trademark of Huawei Technologies Co., Ltd.
Please refer to the original openPangu repository and the corresponding license files for more details.
If you find FLUID helpful in your research, please cite our work:
@inproceedings{fluid2026,
title={From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons},
author={Anonymous},
booktitle={Submission to ACL 2026},
year={2026}
}