# Example - one evaluation
In this notebook we walk you through an evaluation of a single buddy with some synthetic questions.

On your KB platform, initiate the following:
- A DataSource named "football" in the Collection "knowledge"
- A Buddy named "FootballBuddy", with the "football" DataSource as knowledge.

Once this is done, set up your API keys in your .env file. Then, this notebook will walk you through a simple evaluation of your buddy with synthetic data.

In [1]:
config = "example"
collection = "knowledge" # change this if you want to add your datasource to a different collection

In [2]:
import os
import sys
import json
root = os.environ["PROJECT_ROOT"]
sys.path.append(root)
import pdstools.infinity.client as client
import random
from utils.aync_question import question_async
from functools import partial
import pandas as pd
import yaml
from utils.utils_eval import eval_dataset
from utils.dataset import MultiDocDataSet
import wikipediaapi
import copy
from langchain_core.prompts import ChatPromptTemplate
from tqdm import tqdm

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
random.seed(0)

with open(os.path.join(root, "configs", "experiments", f"{config}.yaml"), "r") as f:
    config = yaml.safe_load(f)

eval_set_name = config.get("eval_set_name", "human")
num_questions = config.get("num_questions", None)

cl = client.Infinity.from_basic_auth(
    pega_version='24.2',
    timeout=100000,
    user_name=os.environ["PEGA_USERNAME"],
    password=os.environ["PEGA_PASSWORD"])

cl.knowledge_buddy.question_async = partial(question_async, self=cl)

## Data preparation and ingestion
1. **Preparing the corpus**

For this example, we will scrape a single wikipedia page and treat this as our corpus.

In [4]:
# Initialize wikipedia scraper
wiki_wiki = wikipediaapi.Wikipedia(user_agent='RAG-Eval', language='en')
scraped_page = wiki_wiki.page("FC Utrecht")

# Obtain the raw text without too many whitespaces.
text = " ".join([s.strip() for s in scraped_page.text.split()])

# Format the corpus as {DOC_TITLE : DOC_TEXT}
corpus = {scraped_page.title : text}

# Save the corpus
datapath = f"{root}/data/{config['dataset_name']}"
os.makedirs(datapath, exist_ok=True)
with open(os.path.join(datapath, "corpus.json"), "w", encoding="utf8") as f:
    json.dump(corpus, f, indent=3)

corpus

{'FC Utrecht': 'Football Club Utrecht (Dutch pronunciation: [ɛfˈseː ˈytrɛxt]) is a Dutch professional football club based in Utrecht. The club competes in the Eredivisie, the top tier of Dutch football, and plays its home matches at the Stadion Galgenwaard. The club was formed in 1970 as a merger between local clubs VV DOS, USV Elinkwijk and Velox. Since then, the club has won three national cup tournaments: in 1985, 2003 and 2004, also winning the Johan Cruyff Shield in 2004 as the first club outside the traditional Dutch Big Three. Utrecht is also the only club outside the Big Three which has never suffered relegation from the top-flight Eredivisie. Utrecht have competed in 15 European campaigns, reaching the group stages of the 2004–05 UEFA Cup and the 2010–11 UEFA Europa League, their best European results. History 1970–1979: Merger and early years In the late 1960s, the municipality of Utrecht initiated talks of a merger between the professional departments of VV DOS, Velox and US

2. **Ingesting the corpus**

We now ingest our corpus into the correct DataSource on our Knowledge Buddy platform. To do this, we first load our template ingestion file, and then update the values to our own.



In [5]:
dataset_name = config["dataset_name"]
ingestion_template_path = f"{root}/data/ingestion/ingestion_single_doc.json"
chunk_overlap = 200 #chars
chunk_size = 1000 #chars

with open(ingestion_template_path, "rb") as f:
    template = json.load(f)

ingestions = []
for i, (title, content) in enumerate(corpus.items()):
    ingestion = copy.deepcopy(template)
    ingestion["text"][0]["content"] = content
    ingestion["dataSource"] = dataset_name
    ingestion["title"] = title
    ingestion["collection"] = collection
    ingestion["objectId"] = title
    ingestion["chunkSize"] = chunk_size
    ingestion["chunkOverlap"] = chunk_overlap

    ingestions.append(ingestion)

Then, we ingest the corpus. Ensure you do not run this more than once!

In [6]:
# for request in tqdm(ingestions):
#     response = cl.put(
#         data=request,
#         endpoint= os.environ["PEGA_BASE_URL"]+"indexes"
#     )

# assert response.status_code == 202

## Generating questions
Now we generate some evaluation questions. In the cell below we define the prompt used to generate questions. If you make edits, make sure to maintain the json structure in the instructions. Furthermore, if you add or remove any arguments such as `num_questions_per_document`, you should also change the prompt_keys.

In [7]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "human",
            """\
                Context information is below..

                -----------------------------
                {context}
                -----------------------------

                Given the context information and not prior knowledge.
                Generate only questions based on the below query.

                You are a Professor. Your task is to setup \
                {num_questions_per_document} questions for an upcoming \
                quiz/examination. The questions should be diverse in nature \
                across the document. The questions should not contain options, not start with Q1/Q2. \
                Restrict the questions to the context information provided.\
                Provide the correct answers together with the questions in json format, using 'question' and 'reference' as keys.\
                Make sure you fact check your work.
            """
        )
    ]
);
prompt_keys = ['context', 'num_questions_per_document']

Then, we load the dataset and generate a few evaluation questions

In [8]:
# Initialize the dataset
ds = MultiDocDataSet(
    name=config["dataset_name"],
    experiment_name = config['experiment_name'],
)

# Initialize the generator
ds.init_generator(
    prompt = prompt,
    prompt_keys = prompt_keys
)

# Generate questions if it has not been done yet
try:
    ds.generate_and_save(
        prompt_args = {
            "num_questions_per_document" : 5
            # context is the only prompt arg that is automatically specified.
        },
        data_type=config['dataset_name'],
        allow_overwrite=False,
        min_doc_length=0,
        num_source_docs=config["num_source_docs"]
    )
except FileExistsError:
    print("These questions were already generated.")

# Load questions
ds.load_generated_questions(
    data_type=config['dataset_name'],
    use_doc_ids=True
);

These questions were already generated.


Here we have the questions :)

In [9]:
pd.options.display.max_colwidth = 1000
questions = pd.DataFrame.from_dict(ds.questions)
questions.style.set_properties(**{'text-align':'left'})

Unnamed: 0,question,reference
0,"What year was FC Utrecht founded, and what was the reason for its formation?","FC Utrecht was founded on 1 July 1970 as a merger between local clubs VV DOS, USV Elinkwijk, and Velox to continue professional football at a top level in Utrecht."
1,Which notable achievement did FC Utrecht accomplish in 2004 that distinguished it from the traditional Dutch Big Three?,"In 2004, FC Utrecht won the Johan Cruyff Shield, becoming the first club outside the traditional Dutch Big Three to do so."
2,"What was the outcome of FC Utrecht's first official match, and who scored the club's first goal?","FC Utrecht's first official match was against Feyenoord, where they lost 4–1 despite taking a 0–1 lead with Groenendijk scoring the club's first goal."
3,"What significant financial issues did FC Utrecht face in the early 1980s, and how did the community respond?","In the early 1980s, FC Utrecht faced financial malpractices and was placed under a debt moratorium, leading to a community campaign that collected 66,000 signatures to retain the club."
4,"Who was the first manager of FC Utrecht, and what was the composition of the first-team squad in their inaugural season?","The first manager of FC Utrecht was Bert Jacobs, and the first-team squad in their inaugural season consisted almost entirely of former players from DOS, Velox, and USV Elinkwijk."


In [10]:
question_descriptives = ds.get_question_descriptives()
print(f"Descriptives of the synthetic QA pairs of the {config["dataset_name"]} dataset:")
pd.DataFrame.from_dict(question_descriptives, orient="index", columns=["Value"]).round(2)

Descriptives of the synthetic QA pairs of the football dataset:


Unnamed: 0,Value
N questions,5.0
Mean question length,17.6
SD question length,2.24
Shortest question,14.0
Longest question,21.0
Mean answer length,26.8
SD answer length,3.19
Shortest answer,22.0
Longest answer,30.0


## Prediction
The below code asks all the loaded questions to the buddy.

In [11]:
for buddy in config["buddies"]:
    try:
        await ds.predict_and_save(
            knowledge_buddy_client=cl,
            buddy_name=buddy,
            include_search_results=True,
            allow_overwrite=False
        )
    except FileExistsError:
        print(f"predictions for the {buddy} buddy have already been made. Set overwrite to true or create a new experiment under a different name to make predictions.")

predictions for the FootballBuddy buddy have already been made. Set overwrite to true or create a new experiment under a different name to make predictions.


In [12]:
output = ds.load_evaluation_dataset("FootballBuddy")
questions["Answer"] = [s.response for s in output.samples]
questions

Loaded FootballBuddy answers and updated predictions path


Unnamed: 0,question,reference,Answer
0,"What year was FC Utrecht founded, and what was the reason for its formation?","FC Utrecht was founded on 1 July 1970 as a merger between local clubs VV DOS, USV Elinkwijk, and Velox to continue professional football at a top level in Utrecht.","FC Utrecht was founded on 1 July 1970. The reason for its formation was a merger between the professional departments of the local clubs VV DOS, Velox and USV Elinkwijk."
1,Which notable achievement did FC Utrecht accomplish in 2004 that distinguished it from the traditional Dutch Big Three?,"In 2004, FC Utrecht won the Johan Cruyff Shield, becoming the first club outside the traditional Dutch Big Three to do so.","According to the CONTEXT, in 2004 FC Utrecht won the Johan Cruyff Shield at the expense of Ajax (2-1), as Hans Somers claimed a key role with two crucial goals. This was notable as FC Utrecht was the first club outside the traditional Dutch Big Three to win the Johan Cruyff Shield."
2,"What was the outcome of FC Utrecht's first official match, and who scored the club's first goal?","FC Utrecht's first official match was against Feyenoord, where they lost 4–1 despite taking a 0–1 lead with Groenendijk scoring the club's first goal.","According to the CONTEXT, FC Utrecht's first official match was against defending European Cup winners Feyenoord. Groenendijk scored Utrecht's first goal, but despite the 0-1 lead, the team eventually lost 4-1."
3,"What significant financial issues did FC Utrecht face in the early 1980s, and how did the community respond?","In the early 1980s, FC Utrecht faced financial malpractices and was placed under a debt moratorium, leading to a community campaign that collected 66,000 signatures to retain the club.","According to the CONTEXT, in the early 1980s, FC Utrecht faced significant financial malpractices, including not paying national insurance contributions and taxes on signing bonuses, as well as fraud with receipts. This led to the club being placed under a debt moratorium, and bankruptcy seemed inevitable at that point.\n\nHowever, the community responded strongly to support the club. Players and supporters organized campaigns, and through a petition, they managed to collect 66,000 signatures for the retention of the club. The municipality of Utrecht eventually decided to respond to the massive local support and covered the club's expenses."
4,"Who was the first manager of FC Utrecht, and what was the composition of the first-team squad in their inaugural season?","The first manager of FC Utrecht was Bert Jacobs, and the first-team squad in their inaugural season consisted almost entirely of former players from DOS, Velox, and USV Elinkwijk.","According to the CONTEXT provided, the first manager of FC Utrecht was Bert Jacobs, the then 29-year-old head coach of Velox, who was joined by 24-year-old Fritz Korbach from USV Elinkwijk as assistant. In the first season, the Utrecht first-team squad consisted almost entirely of former players from DOS, Velox and USV Elinkwijk. Only one outside player was recruited, as defender Co Adriaanse was signed for ƒ 125,000 from De Volewijckers from Amsterdam. The core of the squad also consisted of former DOS players Cor Hildebrand, Ed van Stijn, Piet van Oudenallen, Tom Nieuwenhuys and John Steen Olsen, former Elinkwijk players Joop Leliveld, Jan Blaauw, Dick Teunissen and Jan Groenendijk and former Velox player Marco Cabo."


## Evaluation

We get the results by calculating the specified metrics based on the obtained answers and the reference answers. We get `results`, containing the metrics of all individual questions, and `simple_results` containing only the average values.

In [13]:
results, simple_results = eval_dataset(ds, config, overwrite=False, upload=False)

Loaded FootballBuddy answers and updated predictions path
Loaded FootballBuddy rejections and updated predictions path
Results are already calculated. If you want to change metrics, set overwrite=True.
loaded results succesfully.


In [14]:
df = pd.DataFrame(simple_results).round(3)
df

Unnamed: 0,FootballBuddy
bleu_score,0.177
rouge_score(mode=fmeasure),0.444
string_present,0.0
non_llm_string_similarity,0.325
semantic_similarity,0.9
false_negatives,0.0
answer_length_mean,65.0
answer_length_sd,34.963
answer_length_min,30.0
answer_length_max,116.0
