# ToT Prompt Test

This notebook performs a simple check on an example from the datasets produced from the notebook ToT-data-ETL based on the MMLU dataset (Hendrycks et al, 2021a; Hendrycks et al, 2021b; Hendrycks et al, 2023) to check if a model with a prompt in the ToT style could produce a better answer than a model without such prompt

Please note that different results could be generated for each run of the notebook and you could need to run the notebook a few times to generate similar results as those indicated in this notebook

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021a. Dataset Card for MMLU [Online]. s.l.: Hugging Face. Available from: https://huggingface.co/datasets/cais/mmlu [Accessed 5 August 2024].

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021b. Measuring Massive Multitask Language Understanding. ICLR 2021, 4 May 2021, Vienna. Ithaca: Cornell University Library, arXiv.org, pp.1-27. Available from: https://arxiv.org/pdf/2009.03300.pdf [Accessed 5 August 2024].
 
Hendrycks, D., Burns, C., Basart, S., Critch, A., Li, J., Song, D. and Steinhardt, J., 2023. Aligning AI With Shared Human Values. ICLR 2021, 4 May 2021, Vienna. Ithaca: Cornell University Library, arXiv.org, pp.1-29. Available from: https://arxiv.org/pdf/2008.02275.pdf [Accessed 5 August 2024].

## Load example

In [55]:
import pandas as pd

full_train_dataset_path_csv = "full_train_dataset.csv"

# Code, that is, the loading of the dataset, adapted from: pandas, 2024. pandas.read_csv (v.2.2) [Online]. 
# Available from: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html [Accessed 17 August 2024].
full_train_dataset = pd.read_csv(full_train_dataset_path_csv)
#

Enhanced question example

In [56]:
# Code adapted from: pandas, 2024. pandas.DataFrame.at (v.2.2) [Online]. 
# Available from: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.at.html#pandas.DataFrame.at [Accessed 5 September 2024].
enhanced_question = full_train_dataset.at[0, "enhanced_question"]
#
print(enhanced_question)

Peter sued Don for breach of contract. The court admitted testimony by Peter that Don and his wife quarreled frequently, a fact of no consequence to the lawsuit. Don seeks to testify in response that he and his wife never quarreled. The court
A. must permit Don to answer if he had objected to Peter's testimony.
B. may permit Don to answer, whether or not he had objected to Peter's testimony. 
C. may permit Don to answer only if he had objected to Peter's testimony.
D. cannot permit Don to answer, whether or not he had objected to Peter's testimony


Correct answer of the enhanced question example

In [57]:
# Code adapted from: pandas, 2024. pandas.DataFrame.at (v.2.2) [Online]. 
# Available from: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.at.html#pandas.DataFrame.at [Accessed 5 September 2024].
letter_answer = full_train_dataset.at[0, "letter_answer"]
#
print(letter_answer)

B


## Load a model

In [58]:
# Code, that is, the import, reused from: LangChain, 2023. OllamaLLM [Online]. 
# Available from: https://python.langchain.com/v0.2/api_reference/ollama/llms/langchain_ollama.llms.OllamaLLM.html [Accessed 1 September 2024].
from langchain_ollama import OllamaLLM
#

# Code, that is, the answer generator, adapted from: LangChain, 2023. OllamaLLM [Online]. 
# Available from: https://python.langchain.com/v0.2/api_reference/ollama/llms/langchain_ollama.llms.OllamaLLM.html [Accessed 1 September 2024].
answer_generator_llm = OllamaLLM(
    # Code, that is, the parameter, reused from: LangChain, 2023. OllamaLLM [Online]. 
    # Available from: https://python.langchain.com/v0.2/api_reference/ollama/llms/langchain_ollama.llms.OllamaLLM.html [Accessed 1 September 2024].
    # Code, that is, the value, reused from: Ollama, 2024. mixtral 8x7b-instruct-v0.1-fp16 [Online]. 
    # Available from: https://ollama.com/library/mixtral:8x7b-instruct-v0.1-fp16 [Accessed 25 September 2024].
    model="mixtral:8x7b-instruct-v0.1-fp16"
    #
    )
#

## Construct a simple prompt

In [59]:
# Prompt (lines 1 - 2 in the simple_prompt_to_generate_answer) is based on: Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021a. 
# Dataset Card for MMLU [Online]. s.l.: Hugging Face. Available from: https://huggingface.co/datasets/cais/mmlu [Accessed 5 August 2024].
#
# Prompt (lines 1 - 2 in the simple_prompt_to_generate_answer) is adapted and based on: mrspiggot, 2023. langchain_tree.py [computer program].
# Available from: https://github.com/mrspiggot/forestOfThoughts/blob/master/langchain_tree.py [Accessed 5 September 2024]. 
# (mrspiggot, 2023, lines 23 - 25)
#
# Please note that enhanced_question variable in the simple_prompt_to_generate_answer would include transformed data from: Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021a. 
# Dataset Card for MMLU [Online]. s.l.: Hugging Face. Available from: https://huggingface.co/datasets/cais/mmlu [Accessed 5 August 2024].
simple_prompt_to_generate_answer = f'''
{enhanced_question}
What is the answer and the answer letter? It is very important that you provide the correct answer letter in the format of Answer: A, Answer: B, Answer: C or Answer: D
'''
#
print(simple_prompt_to_generate_answer)


Peter sued Don for breach of contract. The court admitted testimony by Peter that Don and his wife quarreled frequently, a fact of no consequence to the lawsuit. Don seeks to testify in response that he and his wife never quarreled. The court
A. must permit Don to answer if he had objected to Peter's testimony.
B. may permit Don to answer, whether or not he had objected to Peter's testimony. 
C. may permit Don to answer only if he had objected to Peter's testimony.
D. cannot permit Don to answer, whether or not he had objected to Peter's testimony
What is the answer and the answer letter? It is very important that you provide the correct answer letter in the format of Answer: A, Answer: B, Answer: C or Answer: D



## Generate a base answer based on a simple prompt

In [60]:
retries_for_quality_simple_prompt = 0
max_retries_for_quality_simple_prompt = 5
while retries_for_quality_simple_prompt < max_retries_for_quality_simple_prompt:
    # Code, that is, the answer generation, adapted from: LangChain, 2023. OllamaLLM [Online]. 
    # Available from: https://python.langchain.com/v0.2/api_reference/ollama/llms/langchain_ollama.llms.OllamaLLM.html [Accessed 1 September 2024].
    generated_simple_answer = answer_generator_llm.invoke(simple_prompt_to_generate_answer)
    #
    print(generated_simple_answer)

    # Code is based on several outputs from Mixtral 8x7b Instruct version v0.1, fp16 (pers. comm.) on 03/10/2024 
    # from tot_prompt_to_generate_answer prompt in the prompt_tot function in the Prompt section in ToT-data-answer-generator-and-checker notebook, with and without hint_1 and hint_2 
    # values for the prompt which are indicated in the code in Generate and check answers subsection in ToT-data-answer-generator-and-checker notebook 
    # and enhanced_question values from train_dataset in ToT-data-answer-generator-and-checker notebook
    # for the prompt (that is, the outputs that could be produced with the code in ToT-data-answer-generator-and-checker notebook) which is based on the MMLU dataset: 
    # Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021a. 
    # Dataset Card for MMLU [Online]. s.l.: Hugging Face. Available from: https://huggingface.co/datasets/cais/mmlu [Accessed 5 August 2024].
    # Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021b. Measuring Massive Multitask Language Understanding. 
    # ICLR 2021, 4 May 2021, Vienna. Ithaca: Cornell University Library, arXiv.org, pp.1-27. Available from: https://arxiv.org/pdf/2009.03300.pdf [Accessed 5 August 2024].
    # Hendrycks, D., Burns, C., Basart, S., Critch, A., Li, J., Song, D. and Steinhardt, J., 2023. Aligning AI With Shared Human Values. 
    # ICLR 2021, 4 May 2021, Vienna. Ithaca: Cornell University Library, arXiv.org, pp.1-29. Available from: https://arxiv.org/pdf/2008.02275.pdf [Accessed 5 August 2024].
    # Please note that Mixtral 8x7b Instruct version v0.1, fp16 has been used locally using: 
    # Ollama, 2024a. Ollama [computer program]. Available from: https://ollama.com [Accessed 1 September 2024].
    # Ollama, 2024b. mixtral 8x7b-instruct-v0.1-fp16 [Online]. 
    # Available from: https://ollama.com/library/mixtral:8x7b-instruct-v0.1-fp16 [Accessed 25 September 2024].
    # Code is based on: Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021a. 
    # Dataset Card for MMLU [Online]. s.l.: Hugging Face. Available from: https://huggingface.co/datasets/cais/mmlu [Accessed 5 August 2024].
    if (f"Answer: A" in generated_simple_answer or
        f"Answer: B" in generated_simple_answer or
        f"Answer: C" in generated_simple_answer or
        f"Answer: D" in generated_simple_answer or
        f"answer: A" in generated_simple_answer or
        f"answer: B" in generated_simple_answer or
        f"answer: C" in generated_simple_answer or
        f"answer: D" in generated_simple_answer or
        f"Answer is A" in generated_simple_answer or
        f"Answer is B" in generated_simple_answer or
        f"Answer is C" in generated_simple_answer or
        f"Answer is D" in generated_simple_answer or
        f"answer is A" in generated_simple_answer or
        f"answer is B" in generated_simple_answer or
        f"answer is C" in generated_simple_answer or
        f"answer is D" in generated_simple_answer or
        f"Answer is: A" in generated_simple_answer or
        f"Answer is: B" in generated_simple_answer or
        f"Answer is: C" in generated_simple_answer or
        f"Answer is: D" in generated_simple_answer or
        f"answer is: A" in generated_simple_answer or
        f"answer is: B" in generated_simple_answer or
        f"answer is: C" in generated_simple_answer or
        f"answer is: D" in generated_simple_answer):
    #
        retries_for_quality_simple_prompt = max_retries_for_quality_simple_prompt
        print(f"The simple answer generated with {retries_for_quality_simple_prompt} retries with the specified format")
    else:
        retries_for_quality_simple_prompt += 1
        print(f"The simple answer generated with {retries_for_quality_simple_prompt} retries without the specified format")

 The answer is:

Answer: D,Cannot permit Don to answer, whether or not he had objected to Peter's testimony.

Explanation: In a court of law, evidence that is irrelevant to the case at hand is generally not admissible. The fact that Don and his wife quarreled frequently is not relevant to the breach of contract lawsuit between Peter and Don. Therefore, even if Don had objected to Peter's testimony, the court still cannot permit Don to testify in response that he and his wife never quarreled. This is because the response is also irrelevant and could prejudice the jury against Don or his wife for no legitimate reason.
The simple answer generated with 5 retries with the specified format


## Check the base answer

In [61]:
# Code is based on several outputs from Mixtral 8x7b Instruct version v0.1, fp16 (pers. comm.) on 03/10/2024 
# from tot_prompt_to_generate_answer prompt in the prompt_tot function in the Prompt section in ToT-data-answer-generator-and-checker notebook, with and without hint_1 and hint_2 
# values for the prompt which are indicated in the code in Generate and check answers subsection in ToT-data-answer-generator-and-checker notebook 
# and enhanced_question values from train_dataset in ToT-data-answer-generator-and-checker notebook
# for the prompt (that is, the outputs that could be produced with the code in ToT-data-answer-generator-and-checker notebook) which is based on the MMLU dataset: 
# Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021a. 
# Dataset Card for MMLU [Online]. s.l.: Hugging Face. Available from: https://huggingface.co/datasets/cais/mmlu [Accessed 5 August 2024].
# Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021b. Measuring Massive Multitask Language Understanding. 
# ICLR 2021, 4 May 2021, Vienna. Ithaca: Cornell University Library, arXiv.org, pp.1-27. Available from: https://arxiv.org/pdf/2009.03300.pdf [Accessed 5 August 2024].
# Hendrycks, D., Burns, C., Basart, S., Critch, A., Li, J., Song, D. and Steinhardt, J., 2023. Aligning AI With Shared Human Values. 
# ICLR 2021, 4 May 2021, Vienna. Ithaca: Cornell University Library, arXiv.org, pp.1-29. Available from: https://arxiv.org/pdf/2008.02275.pdf [Accessed 5 August 2024].
# Please note that Mixtral 8x7b Instruct version v0.1, fp16 has been used locally using: 
# Ollama, 2024a. Ollama [computer program]. Available from: https://ollama.com [Accessed 1 September 2024].
# Ollama, 2024b. mixtral 8x7b-instruct-v0.1-fp16 [Online]. 
# Available from: https://ollama.com/library/mixtral:8x7b-instruct-v0.1-fp16 [Accessed 25 September 2024].
# Code is based on: Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021a. 
# Dataset Card for MMLU [Online]. s.l.: Hugging Face. Available from: https://huggingface.co/datasets/cais/mmlu [Accessed 5 August 2024].
if (f"Answer: {letter_answer}" in generated_simple_answer or
    f"answer: {letter_answer}" in generated_simple_answer or
    f"Answer is {letter_answer}" in generated_simple_answer or
    f"answer is {letter_answer}" in generated_simple_answer or
    f"Answer is: {letter_answer}" in generated_simple_answer or
    f"answer is: {letter_answer}" in generated_simple_answer):
#   
    print("The answer generated with a simple prompt is correct")
else:
    print("The answer generated with a simple prompt is incorrect")

The answer generated with a simple prompt is incorrect


## Construct a ToT prompt

In [62]:
# Prompt (lines 1 - 11 in the tot_prompt_to_generate_answer) reused and slightly adapted from: mrspiggot, 2023. langchain_tree.py [computer program].
# Available from: https://github.com/mrspiggot/forestOfThoughts/blob/master/langchain_tree.py [Accessed 5 September 2024]. 
# (mrspiggot, 2023, lines 11 - 22)
#
# Prompt (lines 16 - 99 in the tot_prompt_to_generate_answer, that is, where delimited by #####) is based on: Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y. and Narasimhan, K., 2023. 
# Tree of Thoughts: Deliberate Problem Solving with Large Language Models. 37th Conference on Neural Information Processing Systems (NeurIPS 2023), 10-16 December 2023, New Orleans. 
# Ithaca: Cornell University Library, arXiv.org, pp.1-14. Available from: https://arxiv.org/pdf/2305.10601.pdf [Accessed 17 August 2024].
#
# Prompt (lines 16 - 99 in the tot_prompt_to_generate_answer, that is, where delimited by #####) is based on the approach of how tree is structured: Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y. and Narasimhan, K., 2023. 
# Tree of Thoughts: Deliberate Problem Solving with Large Language Models. 37th Conference on Neural Information Processing Systems (NeurIPS 2023), 10-16 December 2023, New Orleans. 
# Ithaca: Cornell University Library, arXiv.org, pp.1-14. Available from: https://arxiv.org/pdf/2305.10601.pdf [Accessed 17 August 2024]. 
# (for example, Yao et al, 2023, page 2, figure d)
#
# Prompt (lines 13 - 105, line 109 in the tot_prompt_to_generate_answer) is adapted and based on the ToT pattern according to: Zhang, Z., Ye, Z., Shen, Y. and Gan, C., 2023. 
# Autonomous Tree-Search Ability of Large Language Models. Ithaca: Cornell University Library, arXiv.org. arXiv [Online]. 
# Available from: https://arxiv.org/pdf/2310.10686.pdf [Accessed 25 August 2024]. 
# (Zhang et al, 2023, page 13, C.1)
#
# Prompt (lines 16 - 106 in the tot_prompt_to_generate_answer) is adapted and based on: mrspiggot, 2023. langchain_tree.py [computer program].
# Available from: https://github.com/mrspiggot/forestOfThoughts/blob/master/langchain_tree.py [Accessed 5 September 2024]. 
# (mrspiggot, 2023, lines 11 - 26)
#
# Prompt (line 102 in the tot_prompt_to_generate_answer, that is, where indicated about the letters and line 11, line 97, line 99, line 106, line 108, line 109 in the tot_prompt_to_generate_answer) is based on: Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021a. 
# Dataset Card for MMLU [Online]. s.l.: Hugging Face. Available from: https://huggingface.co/datasets/cais/mmlu [Accessed 5 August 2024].
#
# Prompt (line 108 in the tot_prompt_to_generate_answer) reused and slightly adapted from: mrspiggot, 2023. langchain_tree.py [computer program].
# Available from: https://github.com/mrspiggot/forestOfThoughts/blob/master/langchain_tree.py [Accessed 5 September 2024]. 
# (mrspiggot, 2023, line 23)
#
# Prompt (line 109 in the tot_prompt_to_generate_answer) is adapted and based on: mrspiggot, 2023. langchain_tree.py [computer program].
# Available from: https://github.com/mrspiggot/forestOfThoughts/blob/master/langchain_tree.py [Accessed 5 September 2024]. 
# (mrspiggot, 2023, lines 11 - 26)
#
# Prompt (line 109, that is, before asking about the answer) is based on: Kojima, T., Gu, S.S., Reid, M., Matsuo, Y. and Iwasawa, Y., 2023. 
# Large Language Models are Zero-Shot Reasoners. 36th Conference on Neural Information Processing Systems (NeurIPS 2022), 28 November 2022 – 9 December 2022, New Orleans. 
# Ithaca: Cornell University Library, arXiv.org, pp.1-42. Available from: https://arxiv.org/pdf/2205.11916 [Accessed 15 September 2024]. 
# (Kojima et al, 2023, page 2, figure d)
#
# Please note that enhanced_question variable in the tot_prompt_to_generate_answer would include transformed data from: Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021a. 
# Dataset Card for MMLU [Online]. s.l.: Hugging Face. Available from: https://huggingface.co/datasets/cais/mmlu [Accessed 5 August 2024].
#
tot_prompt_to_generate_answer = f'''
Imagine three different experts are answering this question in the Tree of Thoughts style.
They will brainstorm and debate the answer step by step reasoning carefully and taking all facts into consideration
All experts will write down 1 step of their thinking, then share it with the group.
They will each critique their response, and then all the responses of others
They will check their answer based on the appropriate rules.
Then all experts will go on to the next step and write down this step of their thinking.
They will keep going through steps until they reach their conclusion taking into account the thoughts of the other experts.
If at any time they realise that there is a flaw in their logic they will backtrack to where that flaw occurred. 
If any expert realises they're wrong at any point then they acknowledge this and start another train of thought.
Each expert will assign a likelihood of their current assertion being correct.
Continue until the experts agree on the single most likely answer

Use the following template delimited by #####.

#####
Step 1

Expert 1
concise thought 1:
concise critique of thought 1:
probability of thought 1 being correct in percentage:

Expert 2
concise thought 2:
concise critique of thought 2:
probability of thought 2 being correct in percentage:

Expert 3
concise thought 3:
concise critique of thought 3:
probability of thought 3 being correct in percentage:

Step 2

Expert 1
concise thought 1.1:
concise critique of thought 1.1:
probability of thought 1.1 being correct in percentage:

Expert 1
concise thought 1.2:
concise critique of thought 1.2:
probability of thought 1.2 being correct in percentage:

Expert 2
concise thought 2.1:
concise critique of thought 2.1:
probability of thought 2.1 being correct in percentage:

Expert 2
concise thought 2.2:
concise critique of thought 2.2:
probability of thought 2.2 being correct in percentage:

Expert 3
concise thought 3.1:
concise critique of thought 3.1:
probability of thought 3.1 being correct in percentage:

Expert 3
concise thought 3.2:
concise critique of thought 3.2:
probability of thought 3.2 being correct in percentage:

Step 3

Expert 1
concise thought 1.1.1:
concise critique of thought 1.1.1:
probability of thought 1.1.1 being correct in percentage:

Expert 1
concise thought 1.2.1:
concise critique of thought 1.2.1:
probability of thought 1.2.1 being correct in percentage:

Expert 2
concise thought 2.1.1:
concise critique of thought 2.1.1:
probability of thought 2.1.1 being correct in percentage:

Expert 2
concise thought 2.2.1:
concise critique of thought 2.2.1:
probability of thought 2.2.1 being correct in percentage:

Expert 3
concise thought 3.1.1:
concise critique of thought 3.1.1:
probability of thought 3.1.1 being correct in percentage:

Expert 3
concise thought 3.2.1:
concise critique of thought 3.2.1:
probability of thought 3.2.1 being correct in percentage:

Conclusion: write here the answer which all of the experts agree is the correct answer based on the final thoughts of the experts

Answer: write here the answer letter which all of the experts agree is the correct answer based on the final thoughts of the experts
#####

Very important. The thoughts of experts must be diverse and different, that is, the experts must not repeat what they previously said and the thoughts of the experts should include letters (A, B, C or D) to which the thoughts refer or relate. 
Very important. In Step 1, each expert must provide 1 thought (3 thoughts in total, that is, concise thought 1, concise thought 2, concise thought 3). 
Very important. In Step 2, each expert must provide 2 thoughts (6 thoughts in total, that is, concise thought 1.1, concise thought 1.2, concise thought 2.1, concise thought 2.2, concise thought 3.1, concise thought 3.2). 
Very important. In Step 3, each expert must provide at least 2 thoughts (at least 6 thoughts in total), etc.
Very important. The answer letter which all of the experts agree is the correct answer based on the final thoughts of the experts should be in the format of Answer: answer letter which all of the experts agree is the correct answer based on the final thoughts of the experts, for example, Answer: A, Answer: B, Answer: C or Answer: D.

The question is: {enhanced_question}
Think in the Tree of Thoughts style. What is the answer and the answer letter? It is very important that you provide the correct answer letter in the format of Answer: A, Answer: B, Answer: C or Answer: D and it is very important that you provide it only after all the thoughts in step 3
'''
#
print(tot_prompt_to_generate_answer)


Imagine three different experts are answering this question in the Tree of Thoughts style.
They will brainstorm and debate the answer step by step reasoning carefully and taking all facts into consideration
All experts will write down 1 step of their thinking, then share it with the group.
They will each critique their response, and then all the responses of others
They will check their answer based on the appropriate rules.
Then all experts will go on to the next step and write down this step of their thinking.
They will keep going through steps until they reach their conclusion taking into account the thoughts of the other experts.
If at any time they realise that there is a flaw in their logic they will backtrack to where that flaw occurred. 
If any expert realises they're wrong at any point then they acknowledge this and start another train of thought.
Each expert will assign a likelihood of their current assertion being correct.
Continue until the experts agree on the single most

## Generate an answer based on a ToT prompt

In [63]:
retries_for_quality_tot_prompt = 0
max_retries_for_quality_tot_prompt = 5
while retries_for_quality_tot_prompt < max_retries_for_quality_tot_prompt:
    # Code, that is, the answer generation, adapted from: LangChain, 2023. OllamaLLM [Online]. 
    # Available from: https://python.langchain.com/v0.2/api_reference/ollama/llms/langchain_ollama.llms.OllamaLLM.html [Accessed 1 September 2024].
    generated_tot_answer = answer_generator_llm.invoke(tot_prompt_to_generate_answer)
    #
    print(generated_tot_answer)

    # Code is based on several outputs from Mixtral 8x7b Instruct version v0.1, fp16 (pers. comm.) on 03/10/2024 
    # from tot_prompt_to_generate_answer prompt in the prompt_tot function in the Prompt section in ToT-data-answer-generator-and-checker notebook, with and without hint_1 and hint_2 
    # values for the prompt which are indicated in the code in Generate and check answers subsection in ToT-data-answer-generator-and-checker notebook 
    # and enhanced_question values from train_dataset in ToT-data-answer-generator-and-checker notebook
    # for the prompt (that is, the outputs that could be produced with the code in ToT-data-answer-generator-and-checker notebook) which is based on the MMLU dataset: 
    # Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021a. 
    # Dataset Card for MMLU [Online]. s.l.: Hugging Face. Available from: https://huggingface.co/datasets/cais/mmlu [Accessed 5 August 2024].
    # Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021b. Measuring Massive Multitask Language Understanding. 
    # ICLR 2021, 4 May 2021, Vienna. Ithaca: Cornell University Library, arXiv.org, pp.1-27. Available from: https://arxiv.org/pdf/2009.03300.pdf [Accessed 5 August 2024].
    # Hendrycks, D., Burns, C., Basart, S., Critch, A., Li, J., Song, D. and Steinhardt, J., 2023. Aligning AI With Shared Human Values. 
    # ICLR 2021, 4 May 2021, Vienna. Ithaca: Cornell University Library, arXiv.org, pp.1-29. Available from: https://arxiv.org/pdf/2008.02275.pdf [Accessed 5 August 2024].
    # Please note that Mixtral 8x7b Instruct version v0.1, fp16 has been used locally using: 
    # Ollama, 2024a. Ollama [computer program]. Available from: https://ollama.com [Accessed 1 September 2024].
    # Ollama, 2024b. mixtral 8x7b-instruct-v0.1-fp16 [Online]. 
    # Available from: https://ollama.com/library/mixtral:8x7b-instruct-v0.1-fp16 [Accessed 25 September 2024].
    # Code is based on: Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021a. 
    # Dataset Card for MMLU [Online]. s.l.: Hugging Face. Available from: https://huggingface.co/datasets/cais/mmlu [Accessed 5 August 2024].
    if (f"Answer: A" in generated_tot_answer or
        f"Answer: B" in generated_tot_answer or
        f"Answer: C" in generated_tot_answer or
        f"Answer: D" in generated_tot_answer or
        f"answer: A" in generated_tot_answer or
        f"answer: B" in generated_tot_answer or
        f"answer: C" in generated_tot_answer or
        f"answer: D" in generated_tot_answer or
        f"Answer is A" in generated_tot_answer or
        f"Answer is B" in generated_tot_answer or
        f"Answer is C" in generated_tot_answer or
        f"Answer is D" in generated_tot_answer or
        f"answer is A" in generated_tot_answer or
        f"answer is B" in generated_tot_answer or
        f"answer is C" in generated_tot_answer or
        f"answer is D" in generated_tot_answer or
        f"Answer is: A" in generated_tot_answer or
        f"Answer is: B" in generated_tot_answer or
        f"Answer is: C" in generated_tot_answer or
        f"Answer is: D" in generated_tot_answer or
        f"answer is: A" in generated_tot_answer or
        f"answer is: B" in generated_tot_answer or
        f"answer is: C" in generated_tot_answer or
        f"answer is: D" in generated_tot_answer):
    #
        retries_for_quality_tot_prompt = max_retries_for_quality_tot_prompt
        print(f"The tot answer generated with {retries_for_quality_tot_prompt} retries with the specified format")
    else:
        retries_for_quality_tot_prompt += 1
        print(f"The tot answer generated with {retries_for_quality_tot_prompt} retries without the specified format")

 #####
Step 1

Expert 1
concise thought 1A: The court's decision to admit testimony irrelevant to the lawsuit can be appealed as a potential prejudicial error. Don should have objected to Peter's testimony at the time of its delivery.
concise critique of thought 1A: This initial thought assumes that Don did not object, and focuses on the impact of that assumed omission. However, it is still unknown whether Don actually objected or not.
probability of thought 1A being correct in percentage: 60%

Expert 2
concise thought 1B: The admissibility of evidence in this case depends on its relevance and potential for prejudice. Frequent quarreling between Don and his wife may be character evidence, which is generally not admissible in contract disputes.
concise critique of thought 1B: While correct, this thought does not directly address the issue of whether Don should be permitted to respond to Peter's testimony.
probability of thought 1B being correct in percentage: 70%

Expert 3
concise thoug

## Check the answer

In [64]:
# Code is based on several outputs from Mixtral 8x7b Instruct version v0.1, fp16 (pers. comm.) on 03/10/2024 
# from tot_prompt_to_generate_answer prompt in the prompt_tot function in the Prompt section in ToT-data-answer-generator-and-checker notebook, with and without hint_1 and hint_2 
# values for the prompt which are indicated in the code in Generate and check answers subsection in ToT-data-answer-generator-and-checker notebook 
# and enhanced_question values from train_dataset in ToT-data-answer-generator-and-checker notebook
# for the prompt (that is, the outputs that could be produced with the code in ToT-data-answer-generator-and-checker notebook) which is based on the MMLU dataset: 
# Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021a. 
# Dataset Card for MMLU [Online]. s.l.: Hugging Face. Available from: https://huggingface.co/datasets/cais/mmlu [Accessed 5 August 2024].
# Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021b. Measuring Massive Multitask Language Understanding. 
# ICLR 2021, 4 May 2021, Vienna. Ithaca: Cornell University Library, arXiv.org, pp.1-27. Available from: https://arxiv.org/pdf/2009.03300.pdf [Accessed 5 August 2024].
# Hendrycks, D., Burns, C., Basart, S., Critch, A., Li, J., Song, D. and Steinhardt, J., 2023. Aligning AI With Shared Human Values. 
# ICLR 2021, 4 May 2021, Vienna. Ithaca: Cornell University Library, arXiv.org, pp.1-29. Available from: https://arxiv.org/pdf/2008.02275.pdf [Accessed 5 August 2024].
# Please note that Mixtral 8x7b Instruct version v0.1, fp16 has been used locally using: 
# Ollama, 2024a. Ollama [computer program]. Available from: https://ollama.com [Accessed 1 September 2024].
# Ollama, 2024b. mixtral 8x7b-instruct-v0.1-fp16 [Online]. 
# Available from: https://ollama.com/library/mixtral:8x7b-instruct-v0.1-fp16 [Accessed 25 September 2024].
# Code is based on: Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. and Steinhardt, J., 2021a. 
# Dataset Card for MMLU [Online]. s.l.: Hugging Face. Available from: https://huggingface.co/datasets/cais/mmlu [Accessed 5 August 2024].
if (f"Answer: {letter_answer}" in generated_tot_answer or
    f"answer: {letter_answer}" in generated_tot_answer or
    f"Answer is {letter_answer}" in generated_tot_answer or
    f"answer is {letter_answer}" in generated_tot_answer or
    f"Answer is: {letter_answer}" in generated_tot_answer or
    f"answer is: {letter_answer}" in generated_tot_answer):
#   
    print("The answer generated with a tot prompt is correct")
else:
    print("The answer generated with a tot prompt is incorrect")

The answer generated with a tot prompt is correct
