# Knowledge Buddy Inference
This notebook makes predictions on a dataset of question-answer pairs using the provided knowledge buddies. The answers are then evaluated using the reference answers. Next, we generate synthetic question-answer pairs based on the corpus, and repeat the same experiment. The choice of metric(s) is customisable.

In [None]:
config = "sales_five_models"

In [None]:
import os
import sys
root = os.environ["PROJECT_ROOT"]
sys.path.append(root)
import pdstools.infinity.client as client
import random
from utils.aync_question import question_async
from functools import partial
import pandas as pd
import yaml
from utils.utils_eval import eval_dataset
from langchain_core.prompts import ChatPromptTemplate
from utils.dataset import MultiDocDataSet


In [None]:

with open(os.path.join(root, "configs", "experiments", f"{config}.yaml"), "r") as f:
    config = yaml.safe_load(f)

num_gen_questions_per_document = config.get("num_gen_questions_per_document")
num_source_docs = config.get("num_source_docs")
num_human_questions = config.get("num_human_questions", num_gen_questions_per_document * num_source_docs)

cl_user = client.Infinity.from_basic_auth(
    pega_version='24.2',
    timeout=100000,
    user_name=os.environ["PEGA_USERNAME"],
    password=os.environ["PEGA_PASSWORD"])

cl_user.knowledge_buddy.question_async = partial(question_async, self=cl_user)

## Part 1 - Human dataset
We first make predictions on the human-annotated data. We start by loading the human annotated questions:

### Inference

In [None]:
ds_kwargs = {}

ds = MultiDocDataSet(
    name=config["dataset_name"],
    experiment_name = config['experiment_name'],
    **ds_kwargs
)

ds.load_human_questions(N=num_human_questions, use_doc_ids=config.get("use_doc_ids", False))

data_descriptives = ds.get_descriptives()
print(f"Descriptives of the {config["dataset_name"]} dataset:")
pd.DataFrame.from_dict(data_descriptives, orient="index", columns=["Value"]).round(2)

In [None]:
question_descriptives = ds.get_question_descriptives()
print(f"Descriptives of the human-annotated QA pairs of the {config["dataset_name"]} dataset:")
pd.DataFrame.from_dict(question_descriptives, orient="index", columns=["Value"]).round(2)

Now let's ask the questions to the buddies

In [None]:
for buddy in config["buddies"]:
    print(buddy)
    try:
        await ds.predict_and_save(
            knowledge_buddy_client=cl_user,
            buddy_name=buddy,
            include_search_results=True,
            allow_overwrite=False,
            batching=10
        )
    except FileExistsError:
        print(f"predictions for the {buddy} buddy have already been made. Set overwrite to true or create a new experiment under a different name to make new predictions.")

### Evaluation
We now evaluate the answers by comparing them with the reference answers using the defined metrics.

In [None]:
results, simple_results = eval_dataset(ds, config, overwrite=False, upload=False)

Let's take a look at the results!

In [None]:
df = pd.DataFrame(simple_results).round(3)
df

## Synthetic data
We now repeat this experiment with synthetic data. Adjust the prompt as you like! Just note the prompt keys ('context' and 'num_questions_per_documents' in the default prompt), which should match the prompt when generating the questions.
### Inference

In [None]:
random.seed(42)
deployment = config["generator_llm_deployment"]
prompt_keys = ['context', 'num_questions_per_document']
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "human",
            """\
                Context information is below..

                -----------------------------
                {context}
                -----------------------------

                Given the context information and not prior knowledge.
                Generate only questions based on the below query.

                You are a Professor. Your task is to setup \
                {num_questions_per_document} questions for an upcoming \
                quiz/examination. The questions should be diverse in nature \
                across the document. The questions should not contain options, not start with Q1/Q2. \
                Restrict the questions to the context information provided.\
                Provide the correct answers together with the questions in json format, using 'question' and 'reference' as keys.\
                Make sure you fact check your work.
            """
        )
    ]
);

num_questions_per_document = config["num_gen_questions_per_document"]

prompt_args = {
    "num_questions_per_document" : num_questions_per_document
    # ADD ADDITIONAL PROMPT ARGS (except for 'context') HERE
}

In [None]:
ds.init_generator(
    deployment=deployment,
    prompt=prompt,
    prompt_keys=prompt_keys
)

In [None]:
try:
    ds.generate_and_save(
        prompt_args = prompt_args,
        data_type= config["generated_dataset_name"],
        allow_overwrite=False, # keep at false to not simulate data unnecessarily
        num_source_docs = config["num_source_docs"],
        min_doc_length=config.get("min_source_doc_length", 0)
    )
    ds.load_generated_questions(data_type=config["generated_dataset_name"], use_doc_ids=True)
except FileExistsError:
    print("These questions were already generated.")
    ds.load_generated_questions(data_type=config["generated_dataset_name"], use_doc_ids=True);

question_descriptives = {k:[v] for k,v in ds.get_question_descriptives().items()}
pd.DataFrame(question_descriptives).T

### Inference

In [None]:
for buddy in config["buddies"]:
    print(f"asking {buddy}...")
    try:
        await ds.predict_and_save(knowledge_buddy_client=cl_user, 
                        buddy_name=buddy, 
                        include_search_results=True, 
                        allow_overwrite=False,
                        batching=10) 
    except FileExistsError:
        print(f"Predictions have already been made with the {buddy} buddy.")


In [None]:
results, simple_results = eval_dataset(ds, config, overwrite=False, upload=False)
df = pd.DataFrame(simple_results).round(3)

In [None]:
df