# Environment setup Guide
- Install WSL Ubuntu
- Connect via VScode
    - Install extensions for VScode:
        - WSL
        - Jupyter notebooks
        - Python
        - Prettify JSON (optional)
        - Rainbow CSV (optional)
- Install miniconda:
    - `mkdir -p ~/miniconda3`
    - `wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh`
    - `bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3`
    - `rm ~/miniconda3/miniconda.sh`
- Setup Conda environment:
    - `conda create -n tot python=3.10 python-dotenv ipykernel`
    - `conda activate tot`
    - Clone git repo and install Tree of Thought library
        - `git clone https://github.com/princeton-nlp/tree-of-thought-llm`
        - `cd tree-of-thought-llm`
        - `pip install -r requirements.txt`
        - `pip install -e .`
- Add `.env` file with `OPENAI_API_KEY`

In [1]:
from dotenv import load_dotenv
import os
load_dotenv()

True

# Quick Start
The following minimal script will attempt to solve the game of 24 with `4 5 6 10` (might be a bit slow as it's using GPT-4):

In [2]:
import argparse
from tot.methods.bfs import solve
from tot.tasks.game24 import Game24Task

args = argparse.Namespace(backend='gpt-4', temperature=0.7, task='game24', naive_run=False, prompt_sample=None, method_generate='propose', method_evaluate='value', method_select='greedy', n_generate_sample=1, n_evaluate_sample=3, n_select_sample=5)

task = Game24Task()
ys, infos = solve(args, task, 900)
print(ys[0])

functools.partial(<function gpt at 0x7ff6a0d99bd0>, model='gpt-4', temperature=0.7)
-- new_ys --: ('10 - 4 = 6 (left: 5 6 6)\n', '6 - 4 = 2 (left: 2 5 10)\n', '10 - 5 = 5 (left: 4 5 6)\n', '6 / 4 = 1.5 (left: 1.5 5 10)\n', '6 + 4 = 10 (left: 5 10 10)\n', '4 + 5 = 9 (left: 6 9 10)\n', '10 / 5 = 2 (left: 2 4 6)\n', '5 * 4 = 20 (left: 6 10 20)\n')
-- sol values --: (3.0, 3.0, 3.0, 3.0, 3.0, 2.001, 2.001, 0.003)
-- choices --: ['10 - 4 = 6 (left: 5 6 6)\n', '6 - 4 = 2 (left: 2 5 10)\n', '10 - 5 = 5 (left: 4 5 6)\n', '6 / 4 = 1.5 (left: 1.5 5 10)\n', '6 + 4 = 10 (left: 5 10 10)\n']

-- new_ys --: ('10 - 4 = 6 (left: 5 6 6)\n5 * 6 = 30 (left: 6 30)\n', '6 / 4 = 1.5 (left: 1.5 5 10)\n5 - 1.5 = 3.5 (left: 3.5 5 10)\n', '6 / 4 = 1.5 (left: 1.5 5 10)\n10 / 1.5 = 6.67 (approx.) (left: 5 6.67 10)\n', '6 / 4 = 1.5 (left: 1.5 5 10)\n5 / 1.5 = 3.33 (approx.) (left: 3.33 5 10)\n', '6 / 4 = 1.5 (left: 1.5 5 10)\n10 - 5 = 5 (left: 1.5 5 5)\n', '6 / 4 = 1.5 (left: 1.5 5 10)\n10 / 5 = 2 (left: 1.5 2 10)\n

## Test Case:
- Uses a duplicate file with 12 handmade test cases 

In [12]:
import argparse
from tot.methods.bfs import solve
from tot.tasks.game24 import Game24Task

args = argparse.Namespace(backend='gpt-4', temperature=0.7, task='game24', naive_run=False, prompt_sample=None, method_generate='propose', method_evaluate='value', method_select='greedy', n_generate_sample=1, n_evaluate_sample=3, n_select_sample=5)


## This function will force the test file to run
task = Game24Task(file='test24.csv')
ys, infos = solve(args, task, 7)
print(ys[0])

functools.partial(<function gpt at 0x7ff61c0f1bd0>, model='gpt-4', temperature=0.7)
-- new_ys --: ('12 + 1 = 13 (left: 1 10 13)\n', '10 + 1 = 11 (left: 1 11 12)\n', '1 + 1 = 2 (left: 2 10 12)\n', '1 * 1 = 1 (left: 1 10 12)\n', '10 - 1 = 9 (left: 1 9 12)\n', '12 / 1 = 12 (left: 1 10 12)\n', '10 / 1 = 10 (left: 1 10 12)\n', '12 - 1 = 11 (left: 1 10 11)\n', '12 - 10 = 2 (left: 1 2 1)\n', '10 - 1 = 9 (left: 1 9 12)\n')
-- sol values --: (60.0, 40.001, 21.001, 1.002, 1.002, 1.002, 1.002, 0.003, 0.003, 0)
-- choices --: ['12 + 1 = 13 (left: 1 10 13)\n', '10 + 1 = 11 (left: 1 11 12)\n', '1 + 1 = 2 (left: 2 10 12)\n', '1 * 1 = 1 (left: 1 10 12)\n', '10 - 1 = 9 (left: 1 9 12)\n']

-- new_ys --: ('12 + 1 = 13 (left: 1 10 13)\n1 + 10 = 11 (left: 11 13)\n', '10 + 1 = 11 (left: 1 11 12)\n1 + 11 = 12 (left: 12 12)\n', '1 + 1 = 2 (left: 2 10 12)\n2 + 10 = 12 (left: 12 12)\n', '12 + 1 = 13 (left: 1 10 13)\n10 - 1 = 9 (left: 9 13)\n', '12 + 1 = 13 (left: 1 10 13)\n13 - 1 = 12 (left: 10 12)\n', '12 + 1 

## Paper Experiments

Run experiments via ``sh scripts/{game24, text, crosswords}/{standard_sampling, cot_sampling, bfs}.sh``, except in crosswords we use a DFS algorithm for ToT, which can be run via ``scripts/crosswords/search_crosswords-dfs.ipynb``.

The very simple ``run.py`` implements the ToT + BFS algorithm, as well as the naive IO/CoT sampling. Some key arguments:

- ``--naive_run``: if True, run naive IO/CoT sampling instead of ToT + BFS.
-  ``--prompt_sample`` (choices=[``standard``, ``cot``]): sampling prompt
- ``--method_generate`` (choices=[``sample``, ``propose``]): thought generator, whether to sample independent thoughts (used in Creative Writing) or propose sequential thoughts (used in Game of 24)
- ``--method_evaluate`` (choices=[``value``, ``vote``]): state evaluator, whether to use the value states independently (used in Game of 24) or vote on states together (used in Creative Writing)
- ``--n_generate_sample``: number of times to prompt for thought generation
- ``--n_evaluate_sample``: number of times to prompt for state evaluation
- ``--n_select_sample``: number of states to keep from each step (i.e. ``b`` in the paper's ToT + BFS algorithm)

In [1]:
!sh scripts/crosswords/standard_sampling.sh

Namespace(backend='gpt-4', temperature=0.7, task='crosswords', task_start_index=0, task_end_index=20, naive_run=True, prompt_sample='standard', method_generate=None, method_evaluate=None, method_select='greedy', n_generate_sample=10, n_evaluate_sample=1, n_select_sample=1)
functools.partial(<function gpt at 0x7f7746b272e0>, model='gpt-4', temperature=0.7)
^C
Traceback (most recent call last):
  File "/home/awkwabear/miniconda3/envs/tot/lib/python3.10/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/home/awkwabear/tree-of-thought-llm/src/tot/models.py", line 20, in completions_with_backoff
    return openai.ChatCompletion.create(**kwargs)
  File "/home/awkwabear/miniconda3/envs/tot/lib/python3.10/site-packages/openai/api_resources/chat_completion.py", line 25, in create
    return super().create(*args, **kwargs)
  File "/home/awkwabear/miniconda3/envs/tot/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", lin

## Paper Trajectories
``logs/`` contains all the trajectories from the paper's experiments, except for ``logs/game24/gpt-4_0.7_propose1_value3_greedy5_start900_end1000.json`` which was reproduced after the paper (as the original experiment was done in a notebook) and achieved a 69\% score instead of the original 74\% score due to randomness in GPT decoding. We hope to aggregate multiple runs in the future to account for sampling randomness and update the paper, but this shouldn't affect the main conclusions of the paper.

## How to Add A New Task
Setting up a new task is easy, and mainly involves two steps.
* Set up a new task class in ``tot/tasks/`` and task files in ``tot/data/``. See ``tot/tasks/game24.py`` for an example. Add the task to ``tot/tasks/__init__.py``.
* Set up task-specific prompts in ``tot/prompts/``. See ``tot/prompts/game24.py`` for an example. Depending on the nature of the task, choose ``--method_generate`` (choices=[``sample``, ``propose``]) and ``--method_evaluate`` (choices=[``value``, ``vote``]) and their corresponding prompts. 
