# Query pipeline evaluation example notebook

This notebook serves as a usueable example to evaluate the Kwaak query pipeline. Evaluations are done with RAGAS and it uses itself as datasource.

The intention of this notebook is that it can be modified and tailored for any repository to get a grasp of how the Kwaak rag performs on it.

## How does it work
* Generate a RAGAS compatible dataset with recorded ground truths for a set of questions
* Review and modify the generated initial answers to establish a ground truth
* Make some changes to kwaak, run the eval step without recording ground truth, with the base file as input
* /evals should then contain ragas/base.json and a json file for each evaluation
* Do this as many times as desired, then use the provided analysis (or do it better than me) to make a comparison

## TODO
- [ ] Provide a predefined base dataset
- [ ] Store datasets in HF?

## Initial question generation

In [80]:
questions = [
    "Explain kwaak works and explain the architecture. Include a mermaid diagram of all the high level components.",
"I'd like to be able to configure a session in a file, such that users can add their own custom agents. Create a detailed step-by-step plan.",
"There are multiple uses of channels in the app. Explore how the channels work, interact, relate and explain it in simpel terms from a users perspective.",
"How are tools used by an agent?",
"How can I add a tool for an agent?"
]

# Let's prep these for shell commands so they are double quoted and joined by a comma
questions_for_shell = " ".join([f'-q "{q}"' for q in questions])
questions_for_shell

'-q "Explain kwaak works and explain the architecture. Include a mermaid diagram of all the high level components." -q "I\'d like to be able to configure a session in a file, such that users can add their own custom agents. Create a detailed step-by-step plan." -q "There are multiple uses of channels in the app. Explore how the channels work, interact, relate and explain it in simpel terms from a users perspective." -q "How are tools used by an agent?" -q "How can I add a tool for an agent?"'

In [92]:
!cd ../.. && RUSTRUST_LOG=debug cargo run --features evaluations --  --allow-dirty eval ragas $questions_for_shell --output=evals/ragas/base_raw.json -r

[0m [0m[0m[1m[38;5;12m--> [0m[0msrc/cli.rs:3:32[0m
[0m  [0m[0m[1m[38;5;12m|[0m
[0m[1m[38;5;12m3[0m[0m [0m[0m[1m[38;5;12m|[0m[0m [0m[0muse clap::{Parser, Subcommand, ValueEnum};[0m
[0m  [0m[0m[1m[38;5;12m|[0m[0m                                [0m[0m[1m[33m^^^^^^^^^[0m
[0m  [0m[0m[1m[38;5;12m|[0m
[0m  [0m[0m[1m[38;5;12m= [0m[0m[1mnote[0m[0m: `#[warn(unused_imports)]` on by default[0m

[1m[32m    Finished[0m `dev` profile [unoptimized + debuginfo] target(s) in 0.66s
[1m[32m     Running[0m `target/debug/kwaak --allow-dirty eval ragas -q 'Explain kwaak works and explain the architecture. Include a mermaid diagram of all the high level components.' -q 'I'\''d like to be able to configure a session in a file, such that users can add their own custom agents. Create a detailed step-by-step plan.' -q 'There are multiple uses of channels in the app. Explore how the channels work, interact, relate and explain it in simpel terms from a 

In [105]:
import pandas as pd;
from datasets import load_dataset;
ds = load_dataset("json", data_files={"base": "../../evals/ragas/base_raw.json"})
ds = ds.with_format("pandas")
import IPython.display as display

# Pretty print the dataset as a pandas table
display.display(ds["base"].to_pandas())

Unnamed: 0,answer,contexts,ground_truth,question
0,The provided context does not give specific in...,[//! Agents defines various agents that can be...,The provided context does not give specific in...,How can I add a tool for an agent?
1,The context provided doesn't have complete inf...,[Kwaak is free and open-source. You can bring ...,The context provided doesn't have complete inf...,Explain kwaak works and explain the architectu...
2,The context provided does not include detailed...,"[# Architecture\n\nKwaak has a lightweight, ra...",The context provided does not include detailed...,There are multiple uses of channels in the app...
3,The context does not explicitly describe how t...,[## How is Kwaak different from other tools?\n...,The context does not explicitly describe how t...,How are tools used by an agent?
4,The provided context does not include explicit...,[### Session Management\n\nKwaak supports runn...,The provided context does not include explicit...,I'd like to be able to configure a session in ...


In [None]:
import json
import ipywidgets as widgets
from IPython.display import display

# Create a Textarea widget for each answer
textareas = [widgets.Textarea(value=answer.replace('\\n', '\n'), layout=widgets.Layout(width='100%', height='200px')) for answer in ds["base"]["answer"]]

# Display the Textarea widgets with truncated questions as labels
for i, textarea in enumerate(textareas):
    question_label = ds["base"]["question"][i][:100] + "..." if len(ds["base"]["question"][i]) > 100 else ds["base"]["question"][i]
    display(widgets.Label(f"Question {i+1}: {question_label}"))
    display(textarea)

# Function to get the updated answers
def get_updated_answers():
    return [textarea.value for textarea in textareas]

# Button to save the updated answers
save_button = widgets.Button(description="Confirm Answers")
display(save_button)

def on_save_button_clicked(b):
    updated_answers = get_updated_answers()
    print("Updated Answers:")
    for answer in updated_answers:
        print(answer)
    
    # Update the dataset with the new answers
    ds["base"] = ds["base"].map(lambda example, idx: {"answer": updated_answers[idx]}, with_indices=True)
    
    # Save the updated dataset back to the JSON file
    ds.save_to_disk("../../evals/ragas/dataset")

save_button.on_click(on_save_button_clicked)

Label(value='Question 1: How can I add a tool for an agent?')

Textarea(value='The provided context does not give specific instructions on how to add a tool for an agent wit…

Label(value='Question 2: Explain kwaak works and explain the architecture. Include a mermaid diagram of all th…

Textarea(value="The context provided doesn't have complete information on how Kwaak works or its full architec…

Label(value='Question 3: There are multiple uses of channels in the app. Explore how the channels work, intera…

Textarea(value="The context provided does not include detailed information on how channels specifically work, …

Label(value='Question 4: How are tools used by an agent?')

Textarea(value="The context does not explicitly describe how tools are used by an agent in detail. However, it…

Label(value="Question 5: I'd like to be able to configure a session in a file, such that users can add their o…

Textarea(value="The provided context does not include explicit step-by-step instructions for configuring a ses…

Button(description='Confirm Answers', style=ButtonStyle())

Map:   0%|          | 0/5 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/5 [00:00<?, ? examples/s]

In [None]:
# Load the updated dataset from the saved location
updated_ds = load_dataset("json", data_files={"base": "../../evals/ragas/base.json"})
updated_ds = updated_ds.with_format("pandas")

# Pretty print the updated dataset as a pandas table
display.display(updated_ds["base"].to_pandas())

In [68]:
pip install pandas datasets


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [69]:
import pandas as pd;
from datasets import load_dataset;
ds = load_dataset("json", data_files= "../../eval.json")
ds = ds.with_format("pandas")
ds


DatasetDict({
    train: Dataset({
        features: ['answer', 'contexts', 'ground_truth', 'question'],
        num_rows: 5
    })
})

In [70]:
ds["train"]["answer"][0]

"The context provides a wealth of information about Kwaak, but does not cover all high-level components or provide a mermaid diagram. Here's what I can provide based on the available context:\n\n### Overview of Kwaak\n\nKwaak is a tool that allows you to run a team of autonomous AI agents from your terminal. These agents can help write, execute, and improve code. It differentiates itself by focusing on utilizing autonomous agents to manage tasks like updating documentation or code quality, allowing the user to focus on higher priority tasks.\n\n### Key Features\n\n- **Autonomous Agents**: Agents can run multiple tasks autonomously.\n- **Code Awareness**: Kwaak is aware of your codebase and can answer questions, write code, execute commands, and create pull requests.\n- **Open-Source**: The tool is open-source and part of the bosun.ai project.\n- **Backend and Frontend**: It includes a lightweight frontend built on top of Ratatui and a backend communicated through dispatched commands.\n