# Prompt Engineering for Science Birds Level Generation and Beyond Tutorial
At [IEEE CoG 2024 Tutorial](https://2024.ieee-cog.org/cog-2024-tutorials/)

In this tutorial, our main goal is to implement tree-of-thought prompting for generating levels in the Science Birds game. We will use the [the LLM4PCG Python package](https://github.com/Pittawat2542/llm4pcg-python) to interact with the LLM model. This package is a modified version of the original [ChatGPT4PCG](https://github.com/chatgpt4pcg/chatgpt4pcg-python) package, designed for the [ChatGPT4PCG](https://chatgpt4pcg.github.io) competition.
 
To provide basics of prompt engineering, we will go through some basic prompt engineering, which will serve as a foundation for implementing tree-of-thought prompting.

Basic prompt engineering includes:
1. Zero-shot prompting
2. Zero-shot chain-of-thought prompting
3. Few-shot prompting
4. Tree-of-thought prompting

_**NOTE:** Please ensure that you follow an instruction in **`README.md`** to set up the environment before running the code._

# Mini-LLM4PCG Competition

![LLM4PCG logo](images/logo.jpg)

The mini LLM4PCG competition is a simplified version of the ChatGPT4PCG competition. The goal is to generate levels in the Science Birds game using the LLM model. The competition consists of three characters: `I`, `L`, and `U`. Participants are required to generate `10` levels for each character using the LLM model. The levels are evaluated based on stability and similarity to the target character. You are required to submit the zipped folder containing all the generated responses (see image below).

![Folder Structure](images/folder.png)

**If you are interested in participating into this mini LLM4PCG competition. Please make a submission via this form [here](https://forms.gle/3WDW4acnEMidcqbu6) before Tue Aug 6, 2024 11:45AM. We will make an announcement on a final ranking via the [website](https://chatgpt4pcg.github.io/tutorial).**

---

# Tips

![Conversion Tool](images/converter.png)

Since the actual score of the competition needs to be run through an evaluation pipeline and may be costly to execute, we recommend participants to utilize our online converter, which can convert the generated response into a visual representation that can approximate the actual level. The converter can be found [here](https://chatgpt4pcg.github.io/tools/converter).

## Install required packages

The LLM4PCG Python package requires at least Python 3.11. You may encounter issues if you use an older version of Python. However, instructions for setting up a virtual environment are provided in the `README.md` file.

In [None]:
# Install the LLM4PCG package, which provides the necessary functions for interacting with the LLM model via the competition API.
!pip install llm4pcg

## Importing the packages

Let's run the cell below to import all the packages that you will need during this tutorial.

- [llm4pcg](https://pypi.org/project/llm4pcg/) is a python package containing required and utility functions of ChatGPT4PCG competition, but modified to support local LLMs that compatible with OpenAI API interface.
- [re](https://docs.python.org/3/library/re.html) is a module for working with regular expressions in Python.
- [pathlib](https://docs.python.org/3/library/pathlib.html) provides object-oriented filesystem paths for different operating systems.

In [2]:
import re
from pathlib import Path

from llm4pcg.competition import chat_with_llm, run_evaluation
from llm4pcg.models.trial_context import TrialContext
from llm4pcg.models.trial_loop import TrialLoop

## Configuration

### How to check the port of the local server on LM Studio?

> In case you don't have LM Studio installed, you can follow the instructions on this [page](https://chatgpt4pcg.github.io/tutorial) to set up the program and the local server.

1. From the home screen of LM Studio, navigate to the "Local Server" tab.

![Local Server Tab](images/port-1.png)

2. Check the port number in "Server Port" section.

![Server Port](images/port-2.png)

In [None]:
CHARACTERS = ["I", "L", "U"]  # Change this to the list of targets you want to evaluate. Reduce the number of characters for faster testing. Please note that the `CHARACTERS` should be ["I", "L", "U"] in case you want to submit your results to the mini-competition.
NUM_TRIALS = 1  # Adjust the number of trials as needed. You may want to lower it for faster testing. Please note that the `NUM_TRIALS` should be 10 in case you want to submit your results to the mini-competition.
MODEL_NAME = "local-model"  # This variable may need to be changed to work with some other programs that are not current in LM Studio v0.2.27.
PORT = 1234  # TODO: Please refer to the instructions above on how to check the port of the local server of LM Studio and change it if necessary.
LOCAL_MODEL_BASE_URL = f"http://localhost:{PORT}/v1"

In [5]:
# Helper system prompt for improving structured output generation. You may experiment with modifying this prompt to enhance the output.
SYSTEM_PROMPT = "Output in Markdown code block format (between ``` and ```). The last code block must contain all the \
necessary code required to produce a level. Output only the 'drop_block' function with proper arguments, without any \
other code. You do not need to define the 'drop_block' function or any other functions."

## Zero-Shot Prompting

Zero-shot prompting is a technique used with large language models (LLMs) where the model is given a task or query without any specific examples or prior training for that particular task. Here's a concise description:

Zero-shot prompting involves providing an LLM with a prompt or instruction for a task it hasn't been explicitly trained on, relying on the model's pre-existing knowledge and capabilities to generate an appropriate response. This technique allows users to leverage the model's broad understanding of language and concepts to perform new tasks without additional training or fine-tuning.

Key aspects of zero-shot prompting include:
- No examples are provided in the prompt.
- It relies on the model's pre-training and ability to understand and follow instructions.
- It's particularly useful for quick tasks or exploring an LLM's capabilities without extensive prompt engineering.

Zero-shot prompting showcases the versatility of modern LLMs but may not always produce optimal results for all types of tasks or queries.

### Zero-Shot Prompt

This prompt first start with an instruction of the ChatGPT4PCG task and provide a brief description of what is expected. Next part utilizes role prompting (`1. Role`) to condition the model and provides basic ideas of the task. The rest are additional information about the competition and the task. The prompt file is located at `prompts/task-zero-shot.txt`.

You are encouraged to modify the prompt to better suit your needs and potentially improve the performance. For example, rephrasing the instructions, removing, or adding more details to the task description.

<code>
    Use `drop_block()` function to generate a stable structure that looks like the character <span style="color:green;background-color:yellow;font-weight:bold;">{object}</span>—the goal—and meets all the hard constraints. 
    Dropping position and order are crucial, and they must be determined using techniques in the block-stacking problem.<br>
    1. Role
    You are a player of the Tetris game who aims to generate a structure that meets the goal while satisfying all the hard constraints.<br>   
    2. Definitions
    Slots: The map's width is equally partitioned into W slots where W = 20, with slots 0 and 19 being the most left and right, respectively.
    Layers: The map's height is equally partitioned into H layers where H = 16, with layers 0 and 15 being the bottom and top layers, respectively.
    Base: The bottom of the map, i.e., layer 0.
    Map Initialization:
    # initialize the structure as an empty WxH grid
    structure = [[' ']*W for _ in range(H)]<br>
    3. Environment
    There are three block types as follows:
    b11, a square block whose width is 1 unit and height is 1 unit
    b31, a horizontal block whose width is 3 units and height is 1 unit
    b13, a vertical block whose width is 1 unit and height is 3 units<br>
    4. Tool
    Use the following function to vertically drop a block from layer H such that its center is at slot x_position and drop earlier blocks representing more bottom parts of the structure.
    drop_block(block_type: str, x_position: int),
    where block_type is a block type, and x_position is the slot number from 0 to W-1 where the block center is aligned.  After vertically falling down, the block will end up at either the layer on top of the base or a previously dropped block. This function is defined as follows:
    def drop_block(block_type: str, x_position: int):
        # block_type is the block type, x_position is the slot number from 0 to W-1 where the block center is aligned<br>   
        # initialize the drop position at the top of the map
        drop_pos = (H-1, x_position)<br>
        # drop the block from the top and move it down until it lands on the base or another block
        while drop_pos[0] > 0:
            drop_pos = (drop_pos[0]-1, x_position)
            if structure[drop_pos[0]+1][drop_pos[1]] != ' ':
                break<br>
        # place the block on the structure
        structure[drop_pos[0]][drop_pos[1]] = block_type<br>
    5. Constraints:
    The relevant constraints are given in the following.<br>
    5.1 No boundary intrusion: This is a soft constraint that should be met if possible. Namely, blocks should not intrude on the boundary of the map. In other words, the area of intrusion regions should be zero.
    ---
</code>

For the ChatGPT4PCG competition, a class needs to be implemented that inherits from the `TrialLoop` class. The `run` function should be implemented to execute the evaluation. This `run` function is expected to return the generated text in Markdown format, which includes a code block demonstrating the usage of the `drop_block` function and its arguments generated by ChatGPT.

For additional information on the `llm4pcg` package and its available functions and classes, please visit the [documentation](https://github.com/Pittawat2542/llm4pcg-python).

In [6]:
class ZeroShotPrompting(TrialLoop):
    # ^ If you modify this class name, please also modify the corresponding function name in the run_evaluation function.
    @staticmethod
    def run(ctx: TrialContext, target_character: str) -> str:
        """
        Runs the zero-shot prompting.
        :param ctx: The trial context.
        :param target_character: The target character.
        :return: The generated text.
        """
        # Read the prompt text file
        prompt_template = open(Path("prompts/task-zero-shot.txt"), "r").read()

        # Generate the response from the LLM. We provide a system and a user prompt.
        responses = chat_with_llm(ctx, [
            {"role": "system", "content": SYSTEM_PROMPT},
            # This system prompt is optional but can help improve the output.
            {"role": "user", "content": prompt_template.format(
                object=target_character  # Replace the `{object}` placeholder in the prompt with the target character.
            )}])

        response = responses[
            0]  # Since the `chat_with_llm` function returns a list of responses, we select the first one as we only generate one response.
        return response

In [7]:
# A helper function to run the evaluation for the ChatGPT4PCG task. The first parameter is the folder name, the second is the class that is to implement the prompt, and the rest are the parameters for the prompt engineering, zero-shot prompting in this case.
run_evaluation("zero_shot", ZeroShotPrompting, characters=CHARACTERS, num_trials=NUM_TRIALS,
               model_name=MODEL_NAME, local_model_base_url=LOCAL_MODEL_BASE_URL)

## Zero-Shot Chain-of-Thought Prompting

Zero-shot chain-of-thought (zero-shot CoT) prompting is a technique used with LLMs that encourages step-by-step reasoning without providing specific examples.

The key aspects of zero-shot CoT prompting are:
- It involves appending the phrase "Let's think step by step" (or similar variations) to the end of a query or prompt.
- This simple addition guides the LLM to generate a chain of thought, breaking down its reasoning process into intermediate steps.
- It aims to improve the model's performance on complex reasoning tasks, including arithmetic, commonsense, and symbolic reasoning.
- Unlike few-shot CoT prompting, zero-shot CoT prompting doesn't require manually crafted examples, making it more versatile and easier to implement.
- The technique has shown to be effective with larger language models (>100 billion parameters).
- While generally effective, zero-shot CoT prompting may not always be as accurate as few-shot CoT prompting, especially for more complex tasks.

Zero-shot CoT prompting represents a simple yet powerful approach to enhance LLMs' reasoning capabilities without the need for task-specific training or examples.

## Zero-Shot CoT Prompt

This prompt are largely the same as the zero-shot prompt, but with the addition of the phrase "Let's think step by step" at the end of the prompt. This phrase is crucial for guiding the model to generate a chain of thought and potentially improve the performance. Please note that we place the phrase at the end of the prompt following the original paper. The prompt file is located at `prompts/task-zero-shot-cot.txt`.

As usual, you are encouraged to modify the prompt to better suit your needs and potentially improve the performance. For example, you may opt to go with few-shot chain-of-thought prompting instead by providing explicit reasoning steps and examples in the prompt instead of relying on the model's reasoning.

```
<zero-shot-prompt>
Let's think step-by-step.
```

In [8]:
class ZeroShotCoTPrompting(TrialLoop):
    @staticmethod
    def run(ctx: TrialContext, target_character: str) -> str:
        """
        Runs the zero-shot chain-of-thought prompting.
        :param ctx: The trial context.
        :param target_character: The target character.
        :return: The generated text.
        """
        # Similar to the previous zero-shot prompting, but this time we are reading a different prompt file.
        prompt_template = open(Path("prompts/task-zero-shot-cot.txt"), "r").read()

        responses = chat_with_llm(ctx, [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": prompt_template.format(
                object=target_character
            )}])

        response = responses[0]
        return response

In [9]:
run_evaluation("zero_shot_cot", ZeroShotCoTPrompting, characters=CHARACTERS, num_trials=NUM_TRIALS,
               model_name=MODEL_NAME, local_model_base_url=LOCAL_MODEL_BASE_URL)

## Few-Shot Prompting

Few-shot prompting is a technique used with LLMs where the model is given a small number of examples or demonstrations within the prompt to guide its response for a specific task.

Key aspects of few-shot prompting include:
- Providing 2-5 input-output examples in the prompt to illustrate the desired task or format.
- Leveraging the LLM's ability to learn from these examples and generalize to new inputs (in-context learning).
- Enabling the model to quickly adapt to new tasks without fine-tuning or retraining.
- Particularly effective for tasks like classification, translation, or text generation in specific styles.
- Balancing between too few examples (insufficient guidance) and too many (overwhelming the model or context limit).
- Often more effective than zero-shot prompting, especially for complex or nuanced tasks.

Few-shot prompting takes advantage of LLMs' in-context learning capabilities, allowing them to perform new tasks with minimal task-specific training data. This technique has become an important tool in prompt engineering for enhancing LLM performance across various applications.

### Few-Shot Prompt

This prompt shares similarities with the zero-shot prompt but includes an additional section (`6. Examples`) containing three examples (3-shot prompting). These examples are provided to demonstrate the task and guide the model's response, i.e., allowing the model to perform in-context learning. These examples are also formatted to reflect what the model should generate.
 The prompt file is located at `prompts/task-few-shot.txt`.

As usual, you are encouraged to modify the prompt to better suit your needs and potentially improve the performance. For example, adding more examples or changing the style of the examples.

<code>
    <span style="color:blue;font-weight:bold;">&ltzero-shot prompt&gt</span>
    6. Examples
    6.1. Input: "G"
    Output:
    ```
    drop_block("b31",8)
    drop_block("b31",11)
    drop_block("b11",6)
    drop_block("b31",7)
    drop_block("b31",12)
    drop_block("b31",11)
    drop_block("b11",13)
    drop_block("b13",14)
    drop_block("b13",8)
    drop_block("b31",6)
    drop_block("b31",6)
    drop_block("b31",6)
    drop_block("b31",7)
    drop_block("b11",6)
    drop_block("b31",8)
    drop_block("b11",7)
    drop_block("b31",9)
    ```
    6.2. Input: "Q"
    Output:
    ```
    drop_block("b31",8)
    drop_block("b11",6)
    drop_block("b31",11)
    drop_block("b31",14)
    drop_block("b11",10)
    drop_block("b31",12)
    drop_block("b31",6)
    drop_block("b13",7)
    drop_block("b11",7)
    drop_block("b11",7)
    drop_block("b13",11)
    drop_block("b11",11)
    drop_block("b11",11)
    drop_block("b31",13)
    drop_block("b31",5)
    drop_block("b31",13)
    drop_block("b31",5)
    drop_block("b31",13)
    drop_block("b31",5)
    drop_block("b31",13)
    drop_block("b31",5)
    drop_block("b31",13)
    drop_block("b31",5)
    drop_block("b31",12)
    drop_block("b31",6)
    drop_block("b11",4)
    drop_block("b31",11)
    drop_block("b11",5)
    drop_block("b31",7)
    drop_block("b11",6)
    drop_block("b31",8)
    drop_block("b11",10)
    drop_block("b11",11)
    ```
    6.3. Input: "S"
    Output:
    ```
    drop_block('b31', 10)
    drop_block('b31', 11)
    drop_block('b31', 10)
    drop_block('b31', 9)
    drop_block('b31', 10)
    ```
    ---
    Input: "<span style="color:green;background-color:yellow;font-weight:bold;">{object}</span>"
    Output:
</code>

In [10]:
class FewShotPrompting(TrialLoop):
    @staticmethod
    def run(ctx: TrialContext, target_character: str) -> str:
        """
        Runs the few-shot prompting.
        :param ctx: The trial context.
        :param target_character: The target character.
        :return: The generated text.
        """
        # Similar to the previous zero-shot prompting, but this time we are reading a different prompt file.
        prompt_template = open(Path("prompts/task-few-shot.txt"), "r").read()

        responses = chat_with_llm(ctx, [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": prompt_template.format(
                object=target_character
            )}])

        response = responses[0]
        return response

In [11]:
run_evaluation("few_shot", FewShotPrompting, characters=CHARACTERS, num_trials=NUM_TRIALS,
               model_name=MODEL_NAME, local_model_base_url=LOCAL_MODEL_BASE_URL)

## Tree-of-Thought Prompting

Tree-of-Thought (ToT) prompting is an advanced technique for enhancing the problem-solving capabilities of LLMs. Key aspects of ToT prompting include:
- Structured reasoning: ToT represents the problem-solving process as a tree of "thoughts," where each thought is an intermediate step towards the solution.
- Multiple paths: Unlike linear approaches, ToT generates and explores multiple reasoning paths simultaneously, allowing for a more comprehensive problem-solving approach.
- Self-evaluation: The LLM evaluates the promise of different thought branches, guiding the search through the reasoning tree.
- Deliberate search: ToT incorporates search algorithms like breadth-first or depth-first search to systematically explore the tree of thoughts.
- Improved performance: ToT has shown significant improvements over other prompting methods, especially in tasks requiring complex reasoning, planning, or exploration.
- No additional training: ToT can be implemented without retraining the LLM, making it a versatile prompt engineering technique.

By enabling LLMs to consider multiple possibilities, backtrack when necessary, and make global decisions, ToT prompting mimics more human-like problem-solving strategies, leading to enhanced performance on complex tasks.

![Tree-of-Thought Prompting](images/tot.jpeg)

### ToT Prompts

There are **three** main prompts required for the ToT task: a **`task prompt`** for generating thoughts, an **`evaluation prompt`** for evaluating thoughts and giving numerical scores, and a **`answering prompt`** for combining and formatting accumulated content from the best-performing thoughts at each level.

#### Task Prompt
The task prompt includes elements from few-shot chain-of-thought prompting, in addition to general task instructions following zero-shot prompting. The additional part involves a reasoning step and an associated example, following the style of few-shot chain-of-thought prompting. The prompt file is located at `tot-prompts/task-tot.txt`.

<code>
    Use `drop_block()` function to generate a stable structure that looks like the character <span style="color:green;background-color:yellow;font-weight:bold;">{object}</span>—the goal—and meets all the hard constraints. 
    Dropping position and order are crucial, and they must be determined using techniques in the block-stacking problem.
    <span style="color:blue;font-weight:bold;">&ltpart of zero-shot prompt&gt</span>
    ---
    <span style="color:red;">Let's follow the following steps</span>
    <span style="color:red;">1.</span> Generate the base layer of the structure
    <span style="color:red;">2.</span> Generate the top layer of the structure
    Only perform one step at a time.<br>
    Example
    Character A:
    ```
    # Base layer
    drop_block('b11', 0)
    drop_block('b11', 0)
    drop_block('b11', 2)
    drop_block('b11', 2)
    drop_block('b31', 1)
    ```    
    ```
    # Top layer
    drop_block('b11', 0)
    drop_block('b11', 2)
    drop_block('b31', 1)
    ``` 
    Currently, we have
    <span style="color:green;background-color:yellow;font-weight:bold;">{generated_content_so_far}</span>
    Next, we will perform the
</code>

#### Evaluation Prompt
The evaluation prompt includes instructions for evaluating the generated thoughts and providing numerical scores. In this case, we evaluate the thoughts based on stability and similarity and instruct the LLM to provide scores between 0 and 10 for each aspect. The prompt file is located at `tot-prompts/evaluation-tot.txt`.

<code>
    The following code is used to generate a Science Birds level that resembling the uppercase English character: <span style="color:green;background-color:yellow;font-weight:bold;">{object}</span>. 
    The description of the function utilized for this purpose is given below.<br>
    Provide integer scores for the following levels between 0 and 10 for two aspects stability and similarity. Provide the response in the following format.
    Stability: &ltscore&gt
    Similarity: &ltscore&gt<br>
    <span style="color:blue;font-weight:bold;">&ltpart of zero-shot prompt&gt</span><br>
    3. Generated content to be evaluated
    <span style="color:green;background-color:yellow;font-weight:bold;">{generated_content_so_far}</span><br>
    Scores:
</code>

#### Answering Prompt
The answering prompt includes instructions for formatting the final response based on the evaluated scores. The prompt file is located at `tot-prompts/answer-tot.txt`.

<code>
    <span style="color:green;background-color:yellow;font-weight:bold;">{generated_content}</span><br>
    Give a final code in Markdown format that generates the target structure.<br>
    Output:
</code>

In [None]:
class TreeOfThoughtPrompting(TrialLoop):
    @staticmethod
    def extract_scores(scores_str: str):
        scores_str = scores_str.lower()
        stability_pattern = r".*stability: (10|\d).*"
        similarity_pattern = r".*similarity: (10|\d).*"
        stability = 0
        similarity = 0
        if stability_match := re.search(stability_pattern, scores_str):
            stability = int(stability_match.group(1))
        if similarity_match := re.search(similarity_pattern, scores_str):
            similarity = int(similarity_match.group(1))
        return stability, similarity

    @staticmethod
    def tot(ctx: TrialContext, target_character: str) -> str:
        # Maximum depth of the tree are associate with steps in the provided prompt. It can be adjusted as needed, but also potentially provide better results.
        # Please remember that you also need to adjust the prompt to provide the number of step equal to number of maximum depth. Higher numbers of levels may require more computation time
        max_depth = 2 
        branching_factor = 2  # Number of thoughts to generate at each level. It can be adjusted as needed. Higher numbers may require more computation time, but also potentially provide better results.

        # A variable to hold accumulated content of the best thought at each level
        current_content = ""

        # Read the prompt text files
        task_prompt_template = open(Path("tot-prompts/task-tot.txt"), "r").read()
        evaluation_prompt_template = open(Path("tot-prompts/evaluate-tot.txt"), "r").read()
        answer_prompt_template = open(Path("tot-prompts/answer-tot.txt"), "r").read()

        try:
            # TODO: Implement this function (hints below and complete code is in `tot-final.py`)
            # Loop until reaching the maximum depth
            # | 1. Perform the task to generate {branching_factor} thoughts
            # | 2. Evaluate each thought and select the best one
            # | 3. Format the final response in a correct format and return it

            ### START CODE HERE ### 
            

            ### END CODE HERE ### 
        except (ValueError, TimeoutError) as e:
            print(e)
            return current_content

    @staticmethod
    def run(ctx: TrialContext, target_character: str) -> str:
        """
        Runs the tree-of-thought prompting.
        :param ctx: The trial context.
        :param target_character: The target character.
        :return: The generated text.
        """
        final_response = TreeOfThoughtPrompting.tot(ctx, target_character)

        return final_response

<details>
  <summary><font size="3" color="darkgreen"><b>Click for hints</b></font></summary>

    
```python
# Loop until reaching the maximum depth
for i in range(max_depth):
    # | 1. Perform the task to generate {branching_factor} thoughts
    responses = []

    for j in range(branching_factor):
        res = 
        responses.append(res[0])

    # | 2.1. Evaluate each thought ...
    scores = []
    for response in responses:
        score = 

        evaluation_result =
        scores.append(evaluation_result)

    # | 2.2. ... and select the best one
    best_performing_thought = 
    current_content += 

# | 3. Format the final response in a correct format and return it
final_response = 

return final_response[0]
```

<details>
  <summary><font size="3" color="darkgreen"><b>Click for code</b></font></summary>
   
```python
# Loop until reaching the maximum depth
for i in range(max_depth):
    # | 1. Perform the task to generate {branching_factor} thoughts
    responses = []

    for j in range(branching_factor):
        res = chat_with_llm(ctx, [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": task_prompt_template.format(
                object=target_character,
                generated_content_so_far=current_content == "" and "nothing" or current_content,
            )}])
        responses.append(res[0])

    # | 2.1. Evaluate each thought ...
    scores = []
    for response in responses:
        score = chat_with_llm(ctx, [
            {"role": "user", "content": evaluation_prompt_template.format(
                object=target_character,
                generated_content_so_far=f"{current_content}\n{response}"
            )}])[0]

        evaluation_result = (response, TreeOfThoughtPrompting.extract_scores(score))
        scores.append(evaluation_result)

    # | 2.2. ... and select the best one
    best_performing_thought = sorted(scores, key=lambda x: sum(x[1]), reverse=True)
    current_content += f"{best_performing_thought[0][0]}\n"

# | 3. Format the final response in a correct format and return it
final_response = chat_with_llm(ctx, [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": answer_prompt_template.format(
        generated_content=current_content
    )}
])

return final_response[0]
```


In [13]:
run_evaluation("tot", TreeOfThoughtPrompting, characters=CHARACTERS, num_trials=NUM_TRIALS,
               model_name=MODEL_NAME, local_model_base_url=LOCAL_MODEL_BASE_URL)

Time limit exceeded. 134.79854710499967 > 120


<code>
    <span style="color:red"><zero-shot-prompt></span>  
</code>