Skip to content

InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

License

Notifications You must be signed in to change notification settings

ZJU-REAL/InftyThink-Plus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

Yuchen Yan1,2,*,   Liang Jiang2,   Jin Jiang3,   Shuaicheng Li2,  
Zujie Wen2,   Zhiqiang Zhang2,   Jun Zhou2,   Jian Shao1,   Yueting Zhaung1,   Yongliang Shen1,†

1Zhejiang University,   2Ant Group,   3Peking University
Preprint. Under review.
*Contribution during internship at Ling Team, Ant Group. †Corresponding Author

arXiv Arxiv | 📑 WebPage

News 🔥🔥

  • 2026.02.09: We release our paper.

Overview 🦾🦾

Building upon our previous work InftyThink, we introduce InftyThink+, an end-to-end reinforcement learning framework that directly optimizes the complete iterative reasoning trajectory. Building on InftyThink’s paradigm of model-controlled iteration boundaries and explicit summarization, our approach proceeds in two stages: a cold-start stage that uses supervised fine-tuning to establish the basic iterative reasoning format, followed by an RL stage that optimizes strategic decisions through trajectory-level learning. We carefully design the rollout strategy, reward formulation, and policy gradient estimation tailored to InftyThink’s single-trajectory, multi-inference structure. This design separates format acquisition from strategy optimization, enabling the model to learn not only how to produce iterative reasoning, but also when to summarize, what to preserve, and how to effectively leverage self-generated summaries across iterations.

QuickStart 🎯🎯

Codes and documentations are on the way.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{yan2026inftythinkplus,
      title={InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning}, 
      author={Yuchen Yan and Liang Jiang and Jin Jiang and Shuaicheng Li and Zujie Wen and Zhiqiang Zhang and Jun Zhou and Jian Shao and Yueting Zhuang and Yongliang Shen},
      year={2026},
      eprint={2602.06960},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.06960}, 
}

Contact Us

If you have any questions, please contact us by email: yanyuchen@zju.edu.cn

About

InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published