## Installation and Imports
Please follow the **installation guide** in the [ThoughtSource Readme file](https://github.com/OpenBioLink/ThoughtSource) before using this notebook.

In [3]:
import os
from cot import Collection
from cot.generate import FRAGMENTS
from rich.pretty import pprint
import json

## Overview
The ThoughtSource library offers functionality for: 
1) Loading datasets
2) Generating novel chain-of-thought reasoning data and answers
3) Evaluating results
4) Visualizing results on a Web Application

## 1. Loading, sampling and saving a dataset

In [4]:
# load a dataset to sample from 
worldtree = Collection(["worldtree"], verbose=False)
print(worldtree)

Loading worldtree...
| Name      |   Train |   Valid |   Test |
|-----------|---------|---------|--------|
| worldtree |    2207 |     496 |   1664 |

Not loaded: ['aqua', 'asdiv', 'commonsense_qa', 'entailment_bank', 'gsm8k', 'mawps', 'med_qa', 'medmc_qa', 'open_book_qa', 'pubmed_qa', 'qed', 'strategy_qa', 'svamp']


In [5]:
# Randomly select 100 rows from train split
worldtree_10 = worldtree.select(split="train", number_samples=10, random_samples=True, seed=0)
worldtree_10

| Name      |   Train | Valid   | Test   |
|-----------|---------|---------|--------|
| worldtree |      10 | -       | -      |

Not loaded: ['aqua', 'asdiv', 'commonsense_qa', 'entailment_bank', 'gsm8k', 'mawps', 'med_qa', 'medmc_qa', 'open_book_qa', 'pubmed_qa', 'qed', 'strategy_qa', 'svamp']

In [6]:
# Note that you could also sample from multiple datasets into one collection like this:
collection_medical = Collection(["med_qa", "medmc_qa", "pubmed_qa"], verbose=False)
collection_medical_100 = collection_medical.select(split="train", number_samples=100)
collection_medical_100

Loading med_qa...
Loading medmc_qa...
Loading pubmed_qa...


| Name      |   Train | Valid   | Test   |
|-----------|---------|---------|--------|
| med_qa    |     100 | -       | -      |
| medmc_qa  |     100 | -       | -      |
| pubmed_qa |     100 | -       | -      |

Not loaded: ['aqua', 'asdiv', 'commonsense_qa', 'entailment_bank', 'gsm8k', 'mawps', 'open_book_qa', 'qed', 'strategy_qa', 'svamp', 'worldtree']

## 2. Generating reasoning chains and extracting answers

This is a **two step process:** 
1) The language model answers a question with a detailed reasoning chain. 
2) The language model extracts the answer from its own reasoning chain.

The code does it automatically at once, but it helpful to understand the underlying two step process.

### Using predefined text snippets

ThoughtSource comes pre-loaded with a large [collection of text snippets ('prompt fragments')](https://github.com/OpenBioLink/ThoughtSource/blob/main/libs/cot/cot/fragments.json) to elicit chain-of-thought reasoning in large language models and to extract answers from chains-of-thought. Let's see how prompt fragments look like:

In [7]:
# Chain of thought prompts
pprint(list(FRAGMENTS["cot_triggers"].items())[:3])

In [8]:
# Answer extraction prompts
pprint(list(FRAGMENTS["answer_extractions"].items())[2:7])

### Setting configuration parameters

In [9]:
# Configuration of the input and parameters of the language model 
config={
    # We compare three different prompts for the chain of thought generation:
    # "Answer: Let's think step by step." and 'Answer: We should think about this step by step.', and "Answer: First," 
    "cot_trigger_keys": ['kojima-01','kojima-02', 'kojima-03'],

    # We use the same answer extraction prompt for all three prompts
    # "Therefore, among A through D, the answer is"
    "answer_extraction_keys": ['kojima-A-D'], 
    
    "author" : "your_name",
    "api_service": "mock_api", # <--- We use a mock API here for demonstration purposes of the tutorial.
    "engine": "", # 
    "temperature": 0,
    "max_tokens": 512,
    "verbose": False,
    "warn": True,
}

In [27]:
# overview of all available configurations
from cot.config import Config as config_overview
print(f'\033[94m {config_overview.__doc__[48:]}')

[94m 
    "instruction_keys": list(str) - Determines which instruction_keys are used from fragments.json,
        the corresponding string will be inserted under "instruction" in the fragments. Default: [None] (No instruction)
    "cot_trigger_keys": list(str) - Determines which cot triggers are used from fragments.json,
        the corresponding string will be inserted under "cot_trigger" in the fragments. Default: ["kojima-01"]
    "answer_extraction_keys": list(str) - Determines which answer extraction prompts are used from fragments.json,
        the corresponding string will be inserted under "answer" in the fragments. Default: ["kojima-01"]
    "template_cot_generation": string - is the model input in the text generation step, variables in brackets.
        Only variables of this list are allowed: "instruction", 'question", "answer_choices", "cot_trigger"
        Default: {instruction}

{question}
{answer_choices}

{cot_trigger}
    "template_answer_extraction": string - is the 

#### Defining your own template (optional)

In [11]:
# The default chain of thought generation template as shown above is: "{instruction}\n\n{question}\n{answer_choices}\n\n{cot_trigger}"
# You could also define your own template with a different structure and even free text.

# print("Answer this question:\n{question}\n{answer_choices}\n\nGive a detailed explanation of your answer.")

In [12]:
# If you define your custom chain of thought generation template, do not forget to also define a custom answer extraction template. E.g.

# print("Answer this question:\n{question}\n{answer_choices}\n\nGive a detailed explanation of your answer.{cot}\n{answer_extraction}")

### Generating reasoning chains and extracting answers in one call

In [13]:
# Generating chains-of-thought and answer extractions (This is in Mock-API mode, not calling model over API)
# You can see in the warning that there are model calls for chain of thought generation and answer extraction.

worldtree_10.generate(config=config)


        You are about to [1m call an external API [0m in total 60 times, which [1m may produce costs [0m.
        API calls for reasoning chain generation: 10 samples  * 1 instructions  * 3 reasoning chain triggers
        API calls for answer extraction: n_samples  10 samples  * 1 instructions  * 3 reasoning chain triggers * 1 answer extraction triggers 
        Do you want to continue? y/n
        [1m Note: You are using a mock api. When entering 'y', a test run without API calls is made. [0m


**If your cannot press 'y' because your coding environment is not interactive, set "warn" to false in the config**

</br>
</br>

The above was a fake call to the mock API
For the **purpose of the tutorial** we now **load a prepared dataset** with real model answers:

In [14]:
# loading a pre-generated example for the purpose of this tutorial
worldtree_10 = Collection.from_json("worldtree_10.json")

## 3. Evaluation of model answers

In [26]:
worldtree_10.evaluate()

Evaluating worldtree train...


  0%|          | 0/10 [00:00<?, ?ex/s]

{'worldtree': {'train': {'accuracy': {'text-davinci-003': {'None_kojima-01_kojima-A-D': 0.7,
     'None_kojima-02_kojima-A-D': 0.7,
     'None_kojima-03_kojima-A-D': 0.8}}}}}

In [25]:
# Save the file that now also includes data in the 'correct_answer' fields 
worldtree_10.dump("worldtree_10.json")

Creating json from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

## 4. Inspect the model outputs in the Web Tool

### Here is the link: **[ThoughtSource Annotator](http://thought.samwald.info:3000/)**

Just **upload your just downloaded 'worldtree_10.json' file** to the web tool.

</br>
</br>

## Next round: Your own data creation with your personal API key.

ThoughtSource can connect to external AI service providers such as the [OpenAI API](https://openai.com/api/) or the [Hugging Face Hub](https://huggingface.co/docs/hub/index). Set your token, 'api_service' and 'engine' parameters accordingly.

In this tutorial we will use the Hugging Face Hub, which is for free. To use the API you need to set the environment variable `HUGGINGFACEHUB_API_TOKEN` to your API token. 

You can find your token in your Hugging Face settings page. You can set the environment variable in the following way:

In [None]:
# os.environ["HUGGINGFACEHUB_API_TOKEN"] = "<token>"   # <--- set token (can be found in your Hugging Face settings page)
# os.environ["OPENAI_API_KEY"] = "<token>"  # <--- Set token for which API you want to use

#### The full process of loading, selecting, generating, evaluating and saving in one cell:

Calling the external API will approximately take 10 seconds

In [93]:
# 1) Dataset loading and selecting a random sample
collection = Collection(["worldtree"], verbose=False)
collection = collection.select(split="train", number_samples=1) # just using 1 sample for demonstration purposes

# 2) Language Model generates chains of thought and then extracts answers

config={
    "instruction_keys": ['qa-01'], # "Answer the following question through step-by-step reasoning."
    "cot_trigger_keys": ['kojima-01'], # "Answer: Let's think step by step."
    "answer_extraction_keys": ['kojima-A-D'], # "Therefore, among A through D, the answer is"
    "api_service": "huggingface_hub",
    "engine": "google/flan-t5-xl",
    "warn": False,
    "verbose": False,
}
collection.generate(config=config)

# 3) Evaluating answers generated by the model
print(collection.evaluate())

# 4) Saving the generated outputs and evaluation results
# collection.dump("worldtree_1.json")

# 5) Using the the ThoughSource Annotator web tool to inspect your results
# http://thought.samwald.info:3000/

Loading worldtree...
Evaluating worldtree train...
{'worldtree': {'train': {'accuracy': {'google/flan-t5-xl': {'qa-01_kojima-01_kojima-A-D': 1.0}}}}}
