# Reasoning Agent

ReasoningAgent is designed to enhance language models' reasoning capabilities through systematic exploration of thought processes. By implementing the Tree of Thoughts (ToT) framework, it enables LLMs like GPT-4 and Llama to break down complex problems into manageable steps and explore multiple solution paths simultaneously.

Here, we demonstrate the key features and capabilities of the ReasoningAgent, showing how it can effectively reason about problems.

Here, we demonstrate the key features and capabilities of the [`ReasoningAgent`](/docs/api-reference/autogen/agents/experimental/ReasoningAgent), showing how it can effectively reason about problems.

## Search Strategies

The [`ReasoningAgent`](/docs/api-reference/autogen/agents/experimental/ReasoningAgent) supports multiple search strategies for exploring the reasoning space:

### 1. Beam Search (Default)
- Maintains the top `k` most promising paths at each step
- Efficient for problems with clear evaluation criteria
- Configurable beam width to balance exploration vs computation
- Special case: DFS mode (beam size = 1) for linear reasoning similar to Chain-of-Thought

### 2. Monte Carlo Tree Search (MCTS)
- Balances exploration and exploitation using UCT formula
- Particularly effective for problems with delayed rewards
- Stochastic exploration helps avoid local optima
- Configurable number of simulations and exploration constant

### 3. Language Agent Tree Search (LATS)
- Provides immediate reflection feedback before the next simulation
- Helps identify poor reasoning paths early for future improvement
- Especially useful for complex multi-step reasoning

## Core Components

1. **Thinker Agent**: Generates potential next steps in the reasoning process
2. **Grader Agent**: Evaluates the quality of each reasoning step
3. **Interim Execution**: Option to execute the selected steps, enabling stepwise reasoning.
4. **Code Execution**: a child user agent will execute code automatically during reasoning
5. **Tree Structure**: Organizes thoughts hierarchically for systematic exploration
6. **Visualization Tools**: Built-in Graphviz support for analyzing reasoning paths
7. **Logging Features**: Log and save thinking trajectories to finetune the language model
8. **Configuration Options**: The agent is highly configurable through a single `reason_config` dictionary
9. **Customizabilty with scope**: Define task-specific context to guide the agent’s reasoning.


In [None]:
import random

from autogen.agents.experimental import ReasoningAgent, ThinkNode
from autogen import UserProxyAgent, LLMConfig

# Put your key in the OPENAI_API_KEY environment variable
llm_config = LLMConfig(api_type="openai", model="gpt-4o-mini")

verbose = True

question = "What is the expected maximum dice value if you can roll a 6-sided dice three times?"
random.seed(1)  # setup seed for reproducibility


def last_meaningful_msg(sender, recipient, summary_args):
    import warnings

    if sender == recipient:
        return "TERMINATE"

    summary = ""
    chat_messages = recipient.chat_messages[sender]

    for msg in reversed(chat_messages):
        try:
            content = msg["content"]
            if isinstance(content, str):
                summary = content.replace("TERMINATE", "")
            elif isinstance(content, list):
                # Remove the `TERMINATE` word in the content list.
                summary = "\n".join(
                    x["text"].replace("TERMINATE", "")
                    for x in content
                    if isinstance(x, dict) and "text" in x
                )
            if summary.strip().rstrip():
                return summary
        except (IndexError, AttributeError) as e:
            warnings.warn(
                f"Cannot extract summary using last_msg: {e}. Using an empty str as summary.",
                UserWarning,
            )
    return summary


user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
    is_termination_msg=lambda x: True,  # terminate when reasoning agent responds
)

In [None]:
with llm_config:
    reason_agent = ReasoningAgent(
        name="reason_agent",
        system_message="answer math questions",
        reason_config={"method": "dfs", "max_depth": 3},  # Using DFS
        silent=False,
        # NOTE: it is equivalent to use beam size 1 for O1-style reasoning
        # reason_config={"method": "beam_search", "beam_size": 1, "max_depth": 3},
    )

ans = user_proxy.initiate_chat(
    reason_agent, message=question, summary_method=last_meaningful_msg
)

print(ans.summary)

## Save data to future training
In this section, we will focus on saving the reasoning agent's decision-making data to help future training.

By capturing the structure and content of the reasoning tree, we can create a valuable dataset that can be used to enhance the agent's learning process. This data will allow us to analyze the agent's reasoning patterns, improve its performance, and refine its ability to generate high-quality responses.

The saved data can be utilized for various training methodologies, including supervised fine-tuning and reinforcement learning, ultimately contributing to the development of a more robust and effective reasoning agent.

In [None]:
import json

data = reason_agent._root.to_dict()
with open("reasoning_tree.json", "w") as f:
    json.dump(data, f)

# recover the node
new_node = ThinkNode.from_dict(json.load(open("reasoning_tree.json")))  # noqa: SIM115

sft_data = reason_agent.extract_sft_dataset()
rlhf_data = reason_agent.extract_rlhf_preference_dataset()

print(rlhf_data)