## Installation and Imports
Please follow the **installation guide** in the [ThoughtSource Readme file](https://github.com/OpenBioLink/ThoughtSource) before using this notebook.

In [1]:
# Activate the autoreload to keep variables up-to-date
%load_ext autoreload
%autoreload 2

In [2]:
import os
from cot import Collection
from cot.generate import FRAGMENTS
from rich.pretty import pprint
import json

## Overview
The ThoughtSource library offers functionality for: 
1) Loading datasets
2) Generating novel chain-of-thought reasoning data and answers
3) Evaluating results
4) Visualizing results on a Web Application

## 1. Loading, sampling and saving a dataset

In [3]:
# load a dataset to sample from 
worldtree = Collection(["worldtree"], verbose=False)
print(worldtree)

Loading worldtree...
| Name      |   Train |   Valid |   Test |
|-----------|---------|---------|--------|
| worldtree |    2207 |     496 |   1664 |

Not loaded: ['aqua', 'asdiv', 'commonsense_qa', 'entailment_bank', 'gsm8k', 'mawps', 'med_qa', 'medmc_qa', 'open_book_qa', 'pubmed_qa', 'qed', 'strategy_qa', 'svamp']


In [6]:
# Randomly select 10 rows from train split
worldtree_10 = worldtree.select(split="train", number_samples=10, random_samples=True, seed=0)
worldtree_10

| Name      |   Train | Valid   | Test   |
|-----------|---------|---------|--------|
| worldtree |      10 | -       | -      |

Not loaded: ['aqua', 'asdiv', 'commonsense_qa', 'entailment_bank', 'gsm8k', 'mawps', 'med_qa', 'medmc_qa', 'open_book_qa', 'pubmed_qa', 'qed', 'strategy_qa', 'svamp']

In [9]:
# Note that you could also sample from multiple datasets into one collection like this:
collection_medical = Collection(["med_qa", "pubmed_qa"], verbose=False)
collection_medical_100 = collection_medical.select(split="train", number_samples=100)
collection_medical_100

Loading med_qa...
Loading pubmed_qa...


| Name      |   Train | Valid   | Test   |
|-----------|---------|---------|--------|
| med_qa    |     100 | -       | -      |
| pubmed_qa |     100 | -       | -      |

Not loaded: ['aqua', 'asdiv', 'commonsense_qa', 'entailment_bank', 'gsm8k', 'mawps', 'medmc_qa', 'open_book_qa', 'qed', 'strategy_qa', 'svamp', 'worldtree']

## 2. Generating reasoning chains and extracting answers

#### Using predefined text snippets

ThoughtSource comes pre-loaded with a large [collection of text snippets ('prompt fragments')](https://github.com/OpenBioLink/ThoughtSource/blob/main/libs/cot/cot/fragments.json) to elicit chain-of-thought reasoning in large language models and to extract answers from chains-of-thought. Let's see how prompt fragments look like:

In [10]:
# Chain of thought prompts
pprint(list(FRAGMENTS["cot_triggers"].items())[:3])

In [11]:
# Answer extraction prompts
pprint(list(FRAGMENTS["answer_extractions"].items())[2:7])

#### Configuration parameters

In [12]:
# overview of all available configurations
from cot.config import Config as config_overview
print(f'\033[94m {config_overview.__doc__[48:]}')

[94m 
    "instruction_keys": list(str) - Determines which instruction_keys are used from fragments.json,
        the corresponding string will be inserted under "instruction" in the fragments. Default: [None] (No instruction)
    "cot_trigger_keys": list(str) - Determines which cot triggers are used from fragments.json,
        the corresponding string will be inserted under "cot_trigger" in the fragments. Default: ["kojima-01"]
    "answer_extraction_keys": list(str) - Determines which answer extraction prompts are used from fragments.json,
        the corresponding string will be inserted under "answer" in the fragments. Default: ["kojima-01"]
    "template_cot_generation": string - is the model input in the text generation step, variables in brackets.
        Only variables of this list are allowed: "instruction", 'question", "answer_choices", "cot_trigger"
        Default: {instruction}

{question}
{answer_choices}

{cot_trigger}
    "template_answer_extraction": string - is the 

#### Defining your own template (optional)

The default chain of thought generation template as shown above is: "{instruction}\n\n{question}\n{answer_choices}\n\n{cot_trigger}". </br>
You could also define your own template with a different structure and even free text:

In [9]:
# print("Answer this question:\n{question}\n{answer_choices}\n\nGive a detailed explanation of your answer.")

If you define your custom chain of thought generation template, do not forget to also define a custom answer extraction template.

In [10]:
# print("Answer this question:\n{question}\n{answer_choices}\n\nGive a detailed explanation of your answer.{cot}\n{answer_extraction}")

### 2.1 Using a Mock-API to create reasoning chains and extract answers

In [16]:
# Configuration of the input and parameters of the language model 
config={
    # We compare three different prompts for the chain of thought generation:
    # "Answer: Let's think step by step." and 'Answer: We should think about this step by step.', and "Answer: First," 
    "cot_trigger_keys": ['kojima-01','kojima-02', 'kojima-03'],

    # We use the same answer extraction prompt for all three prompts
    # "Therefore, among A through D, the answer is"
    "answer_extraction_keys": ['kojima-A-D'], 
    
    "author" : "your_name",
    "api_service": "mock_api", # <--- We use a mock API here for demonstration purposes of the tutorial.
    "engine": "", 
    "temperature": 0,
    "max_tokens": 512,
    "verbose": False,
    "warn": True,
}

#### Generating reasoning chains and extracting answers in one call

In [17]:
# Generating chains-of-thought and answer extractions
worldtree_10.generate(config=config)

  0%|          | 0/10 [00:00<?, ?ex/s]

TypeError: Couldn't cast array of type
struct<annotation: list<item: null>, answers: list<item: struct<answer: string, answer_extraction: string, answer_extraction_template: string, answer_extraction_text: string, correct_answer: null, id: string>>, api_service: string, author: string, comment: string, cot: string, cot_trigger: string, cot_trigger_template: string, date: string, fragments_version: string, id: string, instruction: null, model: string, prompt_text: string>
to
{'id': Value(dtype='string', id=None), 'fragments_version': Value(dtype='string', id=None), 'instruction': Value(dtype='string', id=None), 'cot_trigger': Value(dtype='string', id=None), 'prompt_text': Value(dtype='string', id=None), 'answers': [{'id': Value(dtype='string', id=None), 'answer_extraction': Value(dtype='string', id=None), 'answer_extraction_text': Value(dtype='string', id=None), 'answer': Value(dtype='string', id=None), 'correct_answer': Value(dtype='bool', id=None)}], 'cot': Value(dtype='string', id=None), 'author': Value(dtype='string', id=None), 'date': Value(dtype='string', id=None), 'api_service': Value(dtype='string', id=None), 'model': Value(dtype='string', id=None), 'comment': Value(dtype='string', id=None), 'annotation': [{'author': Value(dtype='string', id=None), 'date': Value(dtype='string', id=None), 'key': Value(dtype='string', id=None), 'value': Value(dtype='string', id=None)}]}

As above was a fake call to the mock API we now **load a prepared dataset** with real model answers for the purpose of the tutorial:

In [18]:
# loading a pre-generated example for the purpose of this tutorial
worldtree_10 = Collection.from_json("worldtree_10.json")

KeyError: 'answer_extraction_template'

### 2.2 Using your own API to create reasoning chains and extract answers

ThoughtSource can connect to external AI service providers such as the [OpenAI API](https://openai.com/api/) or the [Hugging Face Hub](https://huggingface.co/docs/hub/index). Set your token, 'api_service' and 'engine' parameters accordingly.

In this tutorial we will use the Hugging Face Hub, which is for free. You can make an account and then copy your token from the Hugging Face settings page.

To use the API you need to set the environment variable `HUGGINGFACEHUB_API_TOKEN` to your API token:

In [14]:
# os.environ["HUGGINGFACEHUB_API_TOKEN"] = "<token>"   # <--- set token (can be found in your Hugging Face settings page)
# os.environ["OPENAI_API_KEY"] = "<token>"  # <--- Set token for which API you want to use

In [15]:
# Configuration of the input and parameters of the language model 
config={
    # We compare three different prompts for the chain of thought generation:
    # "Answer: Let's think step by step." and 'Answer: We should think about this step by step.', and "Answer: First," 
    "cot_trigger_keys": ['kojima-01','kojima-02', 'kojima-03'],

    # We use the same answer extraction prompt for all three prompts
    # "Therefore, among A through D, the answer is"
    "answer_extraction_keys": ['kojima-A-D'], 
    
    "api_service": "huggingface_hub", # <--- Select which API you want to use
    "engine": "google/flan-t5-xl", # <--- Select which model you want to use
    "temperature": 0,
    "max_tokens": 512,
    "verbose": False,
    "warn": True,
}

In [16]:
# Loading just one sample from the dataset so it runs faster
worldtree = Collection(["worldtree"], verbose=False)
worldtree_1 = worldtree.select(split="train", number_samples=1) # just selecting 1 sample, change that if you want to run on more

Loading worldtree...


This is a **two step process:**
 - The language model first answers a question with a detailed reasoning chain.
 - The language model then extracts the answer from its own reasoning chain.

The code does it automatically at once, but it helpful to have in mind the underlying two step process. </br>
Therefore you need two API calls for each example as shown in the warning.

In [17]:
# Calling the external API will approximately take 30 seconds for this example
worldtree_1.generate(config=config)


        You are about to [1m call an external API [0m in total 6 times, which [1m may produce costs [0m.
        API calls for reasoning chain generation: 1 samples  * 1 instructions  * 3 reasoning chain triggers
        API calls for answer extraction: n_samples  1 samples  * 1 instructions  * 3 reasoning chain triggers * 1 answer extraction triggers 
        Do you want to continue? y/n
        


If your cannot press 'y' because your coding environment is not interactive, set "warn" to false in the config to deactivate the warning.

## 3. Evaluation of model answers and downloading all results

In [18]:
worldtree_10.evaluate()

Evaluating worldtree train...


{'worldtree': {'train': {'accuracy': {'text-davinci-003': {'None_kojima-01_kojima-A-D': 0.7,
     'None_kojima-02_kojima-A-D': 0.7,
     'None_kojima-03_kojima-A-D': 0.8}}}}}

Save the file, which now includes the dataset and your generated reasoning chains, extracted answers and evaluation results.

In [19]:
# Save the file that includes all generated chains of thought and answer extractions a the evaluation results
worldtree_10.dump("worldtree_10.json")

If you used your own API, use this code to evaluate and save the results:

In [20]:
# print(worldtree_1.evaluate())
# worldtree_1.dump("worldtree_1.json")

## 4. Inspect the model outputs in the Web Tool

### Here is the link: **[ThoughtSource Annotator](http://thought.samwald.info:3000/)**

Just **upload your just downloaded 'worldtree_10.json' file** to the web tool.

</br>
</br>