# Game of 24: ToT + DFS

## Utils

### Imports

In [1]:
from langchain_openai import ChatOpenAI
from planning_library.strategies import TreeOfThoughtsDFSStrategy
from environments.game_of_24.common.environment import GameOf24Env
import os
from planning_library.strategies.tot_dfs.components import ThoughtEvaluator, ThoughtGenerator
from textwrap import dedent

%load_ext autoreload
%autoreload 2

### Enabling logging

In [2]:
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "ToT (new) + Game of 24 test"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"

## Environment

In [3]:
env = GameOf24Env()

## Defining components

### Hyperparameters

In [4]:
# ToT hyperparameters
max_num_thoughts = 3  # number of thoughts to generate at each iteration
max_num_steps = 20  # total maximum number of iterations
value_threshold = 0.49  # threshold for evaluation; only thoughts with value > value_threshold will be explored

# other hyperparameters
model_name = "gpt-3.5-turbo"
temperature = 0.8

## Thought Evaluator

**Thought Evaluator** is responsible for evaluating the plausibility of individual thoughts.

Default **Thought Evaluator** is powered by a [`Runnable`](https://python.langchain.com/docs/expression_language/interface) that takes in:

* `inputs`: original inputs
* `intermediate_steps`: a sequence of agent actions up to current node
* `next_thought`: a current suggestion for the next step (either finish or call tool(s))

The runnable is allowed to return anything. In this case, it returns a float ranging from 0 to 1.



In [5]:
thought_evaluator = ThoughtEvaluator.create(llm=ChatOpenAI(model=model_name, temperature=temperature), 
                                      user_message=dedent(""""
                                      Given an input and an answer, give a judgement if the answer is correct, i.e. 
                                      1) it uses each given number exactly once; 
                                      2) it doesn't use any other number; 
                                      3) given mathematical expression correctly reaches 24.
                                      
                                      Inputs:
                                      {numbers}
                                      """), 
                                      threshold=value_threshold,
                                      parser_name="openai-tools",
                                            )

## Thought Generator

**Thought Generator** is responsible for suggesting possible actions on each new step.

Default **Thought Generator** is powered by [any agent available in LangChain](https://python.langchain.com/docs/modules/agents/agent_types/). Specifically, it expects either [`BaseSingleActionAgent`](https://api.python.langchain.com/en/latest/agents/langchain.agents.agent.BaseSingleActionAgent.html#langchain.agents.agent.BaseSingleActionAgent) or [`BaseMultiAgentAgent`](https://api.python.langchain.com/en/latest/agents/langchain.agents.agent.BaseMultiActionAgent.html#langchain.agents.agent.BaseMultiActionAgent).

In [Tree of Thoughts paper](https://arxiv.org/abs/2305.10601), there are two possible workflows for Tree of Thoughts:

* *Sample*: $k$ i.i.d. calls to a Thought Generator to get $k$ suggestions
* *Propose*: a single call to a Thought Generator to get $k$ suggestions

Currently, only *Sample* is supported.

In [10]:
thought_generator = ThoughtGenerator.create(llm=ChatOpenAI(model=model_name, temperature=temperature),
                                            tools=env.tools, 
                                            parser_name="openai-tools", 
            user_message=dedent("""
            "You are given four numbers, and your end goal is to obtain 24 from given numbers via basic arithmetic operations. 

            Let's play Game of 24 in a step-by-step fashion: use only one of available tools to suggest a possible next step from the current state. Please, make sure to suggest exactly ONE (1) tool call, no more and no less. 
            
            Refrain from calling tools only when you're ready to give a final answer. In this case, make sure to include a mathematical expression showing how to obtain 24 from given numbers, for instance: '(2 + 2) * (12 / 2) = 24'
            
            Inputs:
            {numbers}"""),
                                max_num_thoughts=max_num_thoughts)

# Defining strategy

In [11]:
thought_generator

<planning_library.strategies.tot_dfs.components.thought_generator.ThoughtGenerator at 0x176da1cd0>

In [12]:
from planning_library.action_executors import GymnasiumActionExecutor


action_executor = GymnasiumActionExecutor(env)


strategy_executor = TreeOfThoughtsDFSStrategy(
    tools=env.tools,
    action_executor=action_executor,
    max_iterations=max_num_steps,
    return_intermediate_steps=True,
    thought_generator=thought_generator,
    thought_evaluator=thought_evaluator,
    do_sorting=False,
)
env.reset(options={"numbers": [1, 1, 4, 6]})
strategy_executor.invoke({"numbers": "1 1 4 6"})



[1m> Entering new TreeOfThoughtsDFSStrategy chain...[0m


NotImplementedError: Unsupported message type: <class 'list'>