Code for supervised fine tuning and evaluating diffusion LLMs with Learnability-Informed Fine-Tuning (LIFT).
📄 Paper | 🤗 Hugging Face
- April 2026: Accepted to ICML 2026! 🔥
scripts/ launch scripts
SFT/ fine-tuning and LoRA merge
eval/ evaluation, generation, and scoring
dataset/ local datasets (countdown/sudoku/AIME JSONs)
conda env create -f lift.yml
conda activate liftRun SFT with the root launcher:
bash scripts/sft/run_sft.shMerge LoRA adapters for standalone evaluation checkpoints:
python SFT/merge_lora.py \
--base_model GSAI-ML/LLaDA-8B-Instruct \
--adapter_path SFT/sft_output/<run>/checkpoint-<step> \
--output_path SFT/merged_models/<run>/checkpoint-<step> \
--architecture lladaUse --architecture llada for checkpoints that will be evaluated with this repo's LLaDA evaluation code. The script reads base_model_name_or_path from the adapter config when available; --base_model is used as the fallback.
Run evaluation with:
bash scripts/eval/run_eval.shThe eval runner prints accuracy when generation finishes and writes the score into each generation JSON.
Supported --dataset keys in eval/eval.py:
gsm8k, math500, countdown, sudoku, aime24, aime25, humaneval, mbpp
SFT/README.mdfor training methods, datasets, and SFT scriptseval/README.mdfor evaluation workflow and task details
