Skip to content

Osilly/Awesome-Interleaving-Reasoning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 

Repository files navigation

logo

Awesome Interleaving Reasoning

With the release of OpenAI o1 and Deepseek-R1, reasoning models have yielded remarkably promising results and garnered significant attention from the research community. This development signals that reasoning models represent a critical advancement toward Artificial General Intelligence (AGI). The standard reasoning paradigm can be formally defined as:

  • Standard Reasoning: The model conducts a comprehensive intermediate reasoning phase prior to generating the final response. This intermediate reasoning typically manifests as unstructured textual content, with the entire inference process constituting a single atomic operation.

Recently, the introduction of OpenAI o3, Deep research, Zochi, and BAGEL has established an alternative reasoning formulation, which we designate as Interleaving Reasoning. In contrast to standard reasoning, Interleaving Reasoning is characterized by multi-turn interactions and exhibits sophisticated reasoning dynamics. This reasoning modality has empirically demonstrated superior accuracy in addressing complex problems. Consequently, we posit that Interleaving Reasoning potentially constitutes the Next-Generation Reasoning Systems for AGI. We propose a taxonomy of Interleaving Reasoning that encompasses the following categories:

  • Multimodal Interleaving Reasoning: The model's inference process operates on diverse information modalities (e.g., textual, visual, auditory, video). This involves an intricately interleaved execution of modality-specific information processing and cross-modal reasoning. Examples: OpenAi o3, DeepEyes.
  • Multi-Round Acting Interleaving Reasoning: The system achieves task completion through iterative interactions (actions) with the environment. Each action is either predicated upon or performed in conjunction with a reasoning-driven inference step, establishing an interleaved execution of action and inference processes. Examples: Deep research, Search-R1, ReTool, UI-TARS, ReAct.
  • Multi-Agent Interleaving Reasoning: In a multi-agent system, multiple agents, such as LLMs and MLLMs, engage in collaborative or competitive dynamics via a paradigm of interleaved reasoning. This implies that agents either alternate in contributing discrete reasoning steps, share intermediate conclusions to establish a shared cognitive state, and subsequently build upon this foundation, or their respective inferential processes exhibit mutual influence. Examples: Society of Minds, Zochi, MetaGPT.
  • Unified Understanding and Generation Interleaving Reasoning: The model's reasoning capabilities are not confined to producing solely unimodal outputs. Instead, it strategically generates multimodal content (e.g., textual and visual elements) as an integral intermediate step within its intrinsic processes of comprehension and problem-solving. Example: GoT, T2I-R1, BAGEL.

It is imperative to establish precise categorical boundaries:

  • While Multimodal Interleaving Reasoning could conceivably be subsumed within the Multi-Round Acting Interleaving Reasoning paradigm, we formally define Multimodal Interleaving Reasoning as necessitating the direct incorporation of multi-modal information streams during the reasoning process. This information typically derives from the processing of input modalities, as exemplified by OpenAi o3, which extracts visual information and integrates it into text-based reasoning workflows.
  • The fundamental distinction between Multi-Round Acting Interleaving Reasoning and Multi-Agent Interleaving Reasoning lies in their architectural composition: Multi-Round Acting Interleaving Reasoning typically employs a single LLM/MLLM to perform reasoning and determine subsequent actions. Conversely, Multi-Agent Interleaving Reasoning leverages multiple LLM/MLLM entities that collaboratively contribute to reasoning steps.
  • The differentiation between Unified Understanding and Generation Interleaving Reasoning and Multimodal Interleaving Reasoning resides in their information processing mechanisms. Unified Understanding and Generation Interleaving Reasoning utilizes an unified understanding and generation model capable of directly generating multimodal outputs during the reasoning process. In contrast, Multimodal Interleaving Reasoning typically sources its multimodal information from external systems or processes.

We aim to provide the community with a comprehensive and timely synthesis of this fascinating and promising field, as well as some insights into it. This repository provides valuable reference for researchers in the field of Interleaving Reasoning, please start your exploration!

This work is in progress!


Table of Contents


Our Group

Originators

            
Wenxuan Huang          Zhenfei Yin
   ECNU&CUHK           USYD&Oxford

Members

Our Activities


🔥🔥🔥 ICCV 2025 Workshop on Multi-Modal Reasoning for Agentic Intelligence (MMRAgi-2025)

We organised ICCV 2025 Workshop MMRAgi!
Submission DDL: Proceeding Track: 24 June 2025, 23:59 AoE, Non-Proceeding Track: 24 July 2025, 23:59 AoE.  


🔥🔥🔥 Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reasoning capability.  


🔥🔥🔥 DeepEyes: Incentivizing “Thinking with Images” via Reinforcement Learning

The first opensource "o3-like" interleaving reasoning MLLM with "Thinking with Images". They don’t just see an image, they can integrate visual information directly into the reasoning chain.  

Standard Reasoning Examples

Awesome Interleaving Reasoning Papers

PR Temporal

You can select your categories in [Pretrain, SFT, RL, Prompt, Position paper, Survey paper] and so on. Furthermore, you can combine them, for example, SFT+RL.

Multimodal Interleaving Reasoning

Definition: The model's inference process operates on diverse information modalities (e.g., textual, visual, auditory, video). This involves an intricately interleaved execution of modality-specific information processing and cross-modal reasoning.

Multi-Round Acting Interleaving Reasoning

Definition: The system achieves task completion through iterative interactions (actions) with the environment. Each action is either predicated upon or performed in conjunction with a reasoning-driven inference step, establishing an interleaved execution of action and inference processes.

Search

Code

UI

Complex acting

Others

Multi-Agent Interleaving Reasoning

Definition: In a multi-agent system, multiple agents, such as LLMs and MLLMs, engage in collaborative or competitive dynamics via a paradigm of interleaved reasoning. This implies that agents either alternate in contributing discrete reasoning steps, share intermediate conclusions to establish a shared cognitive state, and subsequently build upon this foundation, or their respective inferential processes exhibit mutual influence.

Debate

Coordination

Unified Understanding and Generation Interleaving Reasoning

Definition: The model's reasoning capabilities are not confined to producing solely unimodal outputs. Instead, it strategically generates multimodal content (e.g., textual and visual elements) as an integral intermediate step within its intrinsic processes of comprehension and problem-solving.

Generation

Understanding

Awesome Datasets

About

Interleaving Reasoning: Next-Generation Reasoning Systems for AGI

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •