Morphing into Hybrid Attention Models

FlashMorph: Fast LAyer Selection for Hybrid MORPHing

Disen Lan^1,2,*, Jianbin Zheng², Yuxi Ren², Xin Xia², Xuanda Wang², Xuefeng Xiao², Xipeng Qiu^1,†, Yu Cheng^3,†

^*Work done at ByteDance Seed. ^†Corresponding authors.

¹Fudan University, ²ByteDance Seed, ³The Chinese University of Hong Kong

Method

FlashMorph is an effective, efficient, and scalable pipeline for converting pretrained Transformers into hybrid attention models, performing optimization-based layer selection under a global hybrid configuration with a fixed full-attention budget.

Results

Long-context Retrieval

FlashMorph achieves strong Needle-in-a-Haystack performance with only 20M layer-selection tokens.

Commonsense Reasoning and Recall-intensive Tasks

FlashMorph maintains commonsense reasoning ability and improves recall-intensive performance across different linear-attention backbones.

Inference Efficiency

Hybrid architecture improves long-context prefill and decode efficiency while using less GPU memory than the full-attention Transformer baseline.

Layer-selection Efficiency

FlashMorph substantially reduces layer-selection cost compared with prior methods.

Citation

If you find this repo useful in your research or applications, please consider starring and citing our work:

@article{lan2026morphing,
  title={Morphing into Hybrid Attention Models},
  author={Lan, Disen and Zheng, Jianbin and Ren, Yuxi and Xia, Xin and Wang, Xuanda and Xiao, Xuefeng and Qiu, Xipeng and Cheng, Yu},
  journal={arXiv preprint arXiv:2606.30562},
  year={2026}
}

Contact

For questions or discussion, please contact Disen Lan at disenlan1002@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Morphing into Hybrid Attention Models

Method

Results

Long-context Retrieval

Commonsense Reasoning and Recall-intensive Tasks

Inference Efficiency

Layer-selection Efficiency

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Morphing into Hybrid Attention Models

Method

Results

Long-context Retrieval

Commonsense Reasoning and Recall-intensive Tasks

Inference Efficiency

Layer-selection Efficiency

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages