FlashMorph: Fast LAyer Selection for Hybrid MORPHing
Disen Lan1,2,*, Jianbin Zheng2, Yuxi Ren2, Xin Xia2, Xuanda Wang2, Xuefeng Xiao2, Xipeng Qiu1,†, Yu Cheng3,†
*Work done at ByteDance Seed. †Corresponding authors.
1Fudan University, 2ByteDance Seed, 3The Chinese University of Hong Kong
FlashMorph is an effective, efficient, and scalable pipeline for converting pretrained Transformers into hybrid attention models, performing optimization-based layer selection under a global hybrid configuration with a fixed full-attention budget.
FlashMorph achieves strong Needle-in-a-Haystack performance with only 20M layer-selection tokens.
FlashMorph maintains commonsense reasoning ability and improves recall-intensive performance across different linear-attention backbones.
Hybrid architecture improves long-context prefill and decode efficiency while using less GPU memory than the full-attention Transformer baseline.
FlashMorph substantially reduces layer-selection cost compared with prior methods.
If you find this repo useful in your research or applications, please consider starring and citing our work:
@article{lan2026morphing,
title={Morphing into Hybrid Attention Models},
author={Lan, Disen and Zheng, Jianbin and Ren, Yuxi and Xia, Xin and Wang, Xuanda and Xiao, Xuefeng and Qiu, Xipeng and Cheng, Yu},
journal={arXiv preprint arXiv:2606.30562},
year={2026}
}For questions or discussion, please contact Disen Lan at disenlan1002@gmail.com.




