Critic Agents and Tree Search

Philosophical Bits

Prerequisite: We assume that you have read the section on intro to role-playing.

How do agents accomplish hard tasks? While reasoning can naturally emerge from next-token-prediction pretraining, it is still difficult for agents to solve complex tasks which require lots of intermediate steps. To tackle this issue, tree search is a simple and effective framework.

A typical tree search include node expansion and node selection. In the March 2023 paper, CAMEL introduces a heuristic tree search approach with critic in the loop, where the expansion and selection are presented below:

To put it simply, a critic agent is a helper agents in the role-playing session, which is capable of selecting proposals and provide informative verbal feedback to the role-playing agents.

Quick Start

🕹 Step 0: Preparations

from camel.agents import CriticAgent
from camel.generators import SystemMessageGenerator as sys_msg_gen
from camel.messages import BaseMessage as bm
from camel.types import RoleType

🕹 Step 1: Configure the Specifications for Critic Agents

# Set the role name and the task
critic_role = 'a picky critic'

# Create the meta_dict and the role_tuple
meta_dict = dict(critic_role=critic_role,
                 criteria='Help better accomplish the task.')

# Create the role tuple
role_tuple = (critic_role, RoleType.CRITIC)

# Generate the system message
sys_msg = sys_msg_gen().from_dict(meta_dict=meta_dict,
                                  role_tuple=role_tuple)

🕹 Step 2: Get the Critic Agents

With the above arguments, we have:

critic_agent = CriticAgent(system_message=sys_msg,
                           verbose=True)

Let's take a look on the default system message:

print(critic_agent.system_message.content)

>>> You are a a picky critic who teams up with a {user_role} and a >>> {assistant_role} to solve a task: {task}.
    Your job is to select an option from their proposals and provides your explanations.
    Your selection criteria are Help better accomplish the task..
    You always have to choose an option from the proposals.

You may overwrite the system message and configure the critic differently based on your own needs.

🕹 Step 3: Using Critic Agents for Task Solving

Our RolePlaying() class provide a simple way for you to add the critic in the loop. Below we provide a basic pipeline.

# Import necessary classes
from camel.societies import RolePlaying
from camel.configs import ChatGPTConfig
from camel.types import TaskType, ModelType
from colorama import Fore
from camel.utils import print_text_animated

# Set the LLM model type and model config
model_type = ModelType.GPT_3_5_TURBO
model_config = ChatGPTConfig(
    temperature=0.8,  # the sampling temperature; the higher the more random
    n=3,              # the no. of completion choices to generate for each input
    )

We then need to set the kwargs for the task and each agent:

task_kwargs = {
    'task_prompt': 'Develop a plan to TRAVEL TO THE PAST and make changes.',
    'with_task_specify': True,
    'task_specify_agent_kwargs': {'model_type': model_type,
                                  'model_config': model_config}
}

user_role_kwargs = {
    'user_role_name': 'an ambitious aspiring TIME TRAVELER',
    'user_agent_kwargs': {'model_type': model_type,
                          'model_config': model_config}
}

assistant_role_kwargs = {
    'assistant_role_name': 'the best-ever experimental physicist',
    'assistant_agent_kwargs': {'model_type': model_type,
                               'model_config': model_config}
}

critic_role_kwargs = {
    'with_critic_in_the_loop': True,
    'critic_criteria': 'improve the task performance',
    'critic_kwargs': dict(verbose=True)
}

Putting them together:

society = RolePlaying(
    **task_kwargs,             # The task arguments
    **user_role_kwargs,        # The instruction sender's arguments
    **assistant_role_kwargs,   # The instruction receiver's arguments
    **critic_role_kwargs,      # The critic's arguments       
)

And the helper functions to run our society:

def is_terminated(response):
    """
    Give alerts when the session shuold be terminated.
    """
    if response.terminated:
        role = response.msg.role_type.name
        reason = response.info['termination_reasons']
        print(f'AI {role} terminated due to {reason}')

    return response.terminated

def run(society, round_limit: int=10):

    # Get the initial message from the ai assistant to the ai user
    input_msg = society.init_chat()

    # Starting the interactive session
    for _ in range(round_limit):

        # Get the both responses for this round
        assistant_response, user_response = society.step(input_msg)

        # Check the termination condition
        if is_terminated(assistant_response) or is_terminated(user_response):
            break

        # Get the results
        print(f'[AI User] {user_response.msg.content}.\n')
        print(f'[AI Assistant] {assistant_response.msg.content}.\n')

        # Check if the task is end
        if 'CAMEL_TASK_DONE' in user_response.msg.content:
            break

        # Get the input message for the next round
        input_msg = assistant_response.msg

    return None

Now let's set our code in motion:

run(society)

# We omit the real content here – try it on your own!
>>> [AI User] Option 1: ...; Option 2: ...; Option 3: ...
>>> [Critic] I would recommend option 2. This is because ...
>>> [AI Assistant] Option 1: ...; Option 2: ...; Option 3: ...
>>> [Critic] I would recommend option 1. This is because ...
>>> ...

In this setting, the AI User and AI Assistant will generate different options when responding (you can simply change the temperature in model_config to somewhat control the diversity). AI Critic will respond with its option selection and reasoning; such additional context will be fed to the two other agents and help them form better subsequent responses.

Remarks

While we see some performance gains from critic-in-the-loop, it may not really solve the fundamental extrapolation problem (and self-consistency remains a strong baseline for many tasks). It is debatable if those agents can extrapolate by self-play within its current scale. A more practical question is how we may efficiently introduce informative feedbacks/rewards, when agents are connected with external environments and are endowed with tools and memories. They are expected to have a good world model and know how to make abstraction and analogy when necessary. Stay tuned for our next update 🦖🐆🐘🦒🦘🦕.

🪐 This Wiki page is a budding planet in the universe of knowledge, still under construction. Beware of informational meteor showers and the occasional black hole of error as it orbits towards completeness. - From an anonymous cat.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly