# Generate synthetic test dataset (with RAGAS)

- Author: [Yoonji](https://github.com/samdaseuss)
- Design: 
- Peer Review: 
- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/99-TEMPLATE/00-BASE-TEMPLATE-EXAMPLE.ipynb) [![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/99-TEMPLATE/00-BASE-TEMPLATE-EXAMPLE.ipynb)

## Overview

### Welcome Back!
Hi everyone! Welcome to our first lecture in the evaluation section. We're going to try something special today! While we've been building RAG systems, we haven't really talked about how to test if they're working well. To properly evaluate a RAG system, we need good test data - and that's exactly what we'll be creating in this tutorial! We'll learn how to build datasets that will help us measure our RAG pipeline's performance.

### Today, what we are going to learn...
In this session, we'll focus on using RAGAS to create evaluation datasets for RAG systems. Our main tasks will include:
- Preprocessing documents for evaluation.
- Defining evaluation objects.
- Configuring data distributions to generate various types of test questions.

We'll explore these concepts through hands-on practice, giving you a practical foundation for building evaluation datasets.

### Why this matters...
The goal is to craft datasets that objectively assess the performance of your RAG system. A well-designed test can highlight how your system handles diverse questions and scenarios, revealing both strengths and areas needing improvement.

By the end of this tutorial, you'll have the skills to build robust datasets for comprehensive evaluation. Without further ado, let's get started!

### Table of Contents

- [Overview](#overview)
- [Environement Setup](#environment-setup)
- [Looking Back at What We've Learned](#looking-back-at-what-weve-learned)
- [Installation](#installation)
- [What is RAGAS](#what-is-ragas)
- [RAGAS in Python](#ragas-in-python)
- [Document](#document)
- [Document Preprocessing](#document-preprocessing)
- [Dataset Generation](#dataset-generation)
- [Distribution of Question Types](#distribution-of-question-types)

### References

- [Testset Generation for RAG](https://docs.ragas.io/en/stable/getstarted/rag_testset_generation/)
- [Testset Generation for RAG : 📚 Core Concepts > Test Data Generation > RAG](https://docs.ragas.io/en/stable/concepts/test_data_generation/rag/)

----

## Environment Setup

Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.

**[Note]**
- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. 
- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details.

In [1]:
%%capture --no-stderr
%pip install langchain-opentutorial

In [2]:
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langchain",
        "langchain_core",
        "langchain_community",
        "langchain_text_splitters",
        "langchain_openai",
    ],
    verbose=False,
    upgrade=False,
)

In [3]:
# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "OPENAI_API_KEY": "",
        "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "Generate synthetic test dataset (with RAGAS)",
    }
)

Environment variables have been set successfully.


You can alternatively set API keys such as `OPENAI_API_KEY` in a `.env` file and load them.

[Note] This is not necessary if you've already set the required API keys in previous steps.

In [4]:
# Load API keys from .env file
from dotenv import load_dotenv

load_dotenv(override=True)

True

## Looking Back at What We've Learned

### We Have Learned About RAG

LLM is a powerful technology, but it has limitations in reflecting real-time information due to the constraints of its training data.

For example, let's say NASA discovered a new planet yesterday, making the total number of planets in the solar system nine. What would happen if we asked an LLM about the number of planets in the solar system? Because LLM responds based on its trained data, it would say there are eight planets. We call this phenomenon '`hallucination`', and to resolve this, we need to wait for a model '`version up`'.

RAG emerged to overcome these limitations. Instead of immediately responding to user questions, the RAG pipeline first searches for the latest information from external knowledge repositories and then generates responses based on this information. This enables the system to provide answers that reflect the most `up-to-date` information.

### Is Our RAG Design Effective?

You have learned various techniques for implementing RAG. Some of you may have already built your own RAG systems and applied them to your work.

However, we need to ask an important question: Is our RAG system truly a 'good' RAG? How can we judge the quality of RAG?

Simply saying "this RAG doesn't perform well" is not enough. We need to be able to measure and verify RAG's performance through objective evaluation metrics.

### Why Use Synthetic Test Dataset?

Evaluating the performance of RAG systems is a crucial process. However, manually creating hundreds of question-answer pairs requires enormous time and effort.

Moreover, manually written questions often remain at a simple and superficial level, making it difficult to thoroughly evaluate the performance of RAG systems.

By utilizing synthetic data to solve these problems, we can reduce developer time spent on building test datasets by up to 90%. Additionally, it enables more thorough performance evaluation by automatically generating test cases of various difficulty levels and types.

## Installation

To proceed with this tutorial, you need to install the `RAGAS` package. Through the command below, we'll install the `RAGAS` package, and immediately after, we'll explore the concept of `RAGAS` and learn about Python's `RAGAS package` in detail.

In [5]:
%pip install -qU ragas

Note: you may need to restart the kernel to use updated packages.


## What is RAGAS?
RAGAS (Retrieval Augmented Generation Assessment Suite) is a comprehensive evaluation framework designed to assess the performance of RAG systems. It helps developers and researchers measure how well their RAG implementations are working through various metrics and evaluation methods.

Let's revisit the example we saw earlier.

Let's say NASA discovered a new planet yesterday, making the total number of planets in our solar system nine. To evaluate the performance of a RAG system, let's ask the test question "How many planets are in our solar system?" RAGAS evaluates the system's response using these key metrics:

1. `Answer Relevancy`: Checks if the answer directly addresses the question about the number of planets
2. `Context Relevancy`: Checks if the system retrieved the recent NASA announcement instead of old astronomy textbooks
3. `Faithfulness`: Checks if the answer about nine planets is based on the NASA announcement and not on outdated data
4. `Context Precision`: Checks if the system used the NASA announcement efficiently without including unnecessary space information

For example, if the RAG system responds with **outdated information** saying there are eight planets, RAGAS will give it a low context relevancy score. Or if it makes claims about the new planet that aren't in the NASA announcement, it will receive a low faithfulness score.

## RAGAS in Python
You can easily use `RAGAS` with Python libraries.

Ragas is a library that provides tools to supercharge the evaluation of Large Language Model (LLM) applications. It is designed to help you evaluate your LLM applications with ease and confidence.

## Document
While the official RAGAS package website demonstrates tutorials using `markdown`, in this tutorial, we'll be working with `pdf` files. Please use the files located in the `data` folder.

In [6]:
file_path = 'data/Newwhitepaper_Agents2.pdf'

## Document Preprocessing

In [7]:
from langchain_community.document_loaders import PDFPlumberLoader

# Create document loader
loader = PDFPlumberLoader(file_path)

# Load documents
docs = loader.load()

# Exclude table of contents and last page
docs = docs[3:-1]

# Get the number of document pages
len(docs)

38

Each document object includes a metadata dictionary that can be used to store additional information about the document, which can be accessed through `metadata`.

Please check if the metadata dictionary contains a key called `filename`.

This key will be used in the `Test datasets generation process`. The `filename` attribute in metadata is used to identify chunks belonging to the same document.

In [8]:
# Set metadata ('filename' must exist)
for doc in docs:
    doc.metadata["filename"] = doc.metadata["source"]

In [9]:
docs

[Document(metadata={'source': 'data/Newwhitepaper_Agents2.pdf', 'file_path': 'data/Newwhitepaper_Agents2.pdf', 'page': 3, 'total_pages': 42, 'CreationDate': "D:20241113100853-07'00'", 'Creator': 'Adobe InDesign 20.0 (Macintosh)', 'ModDate': "D:20241113100858-07'00'", 'Producer': 'Adobe PDF Library 17.0', 'Trapped': 'False', 'filename': 'data/Newwhitepaper_Agents2.pdf'}, page_content="Agents\nThis combination of reasoning,\nlogic, and access to external\ninformation that are all connected\nto a Generative AI model invokes\nthe concept of an agent.\nIntroduction\nHumans are fantastic at messy pattern recognition tasks. However, they often rely on tools\n- like books, Google Search, or a calculator - to supplement their prior knowledge before\narriving at a conclusion. Just like humans, Generative AI models can be trained to use tools\nto access real-time information or suggest a real-world action. For example, a model can\nleverage a database retrieval tool to access specific information

## Dataset Generation
We'll create datasets using ChatOpenAI. Before writing the code, let's define the roles of our objects:
- Dataset Generator: `generator_llm`
- Dataset Critic: `critic_llm`
- Document Embeddings: `embeddings`

In [11]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from ragas.testset.graph import KnowledgeGraph
from ragas.testset.graph import Node, NodeType
from ragas.embeddings.base import embedding_factory


# Dataset Generator
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))

# Document Embeddings
embeddings = embedding_factory()

First, let's initialize the DocumentStore. We'll configure it to use custom LLM and embeddings.

In [12]:
# Configure the text splitter
splitter = RecursiveCharacterTextSplitter(
   chunk_size=1000, 
   chunk_overlap=100
)

# Wrap LangChain's ChatOpenAI model with LangchainLLMWrapper to make it compatible with Ragas
langchain_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))

# Create ragas_embeddings
ragas_embeddings = LangchainEmbeddingsWrapper(embeddings)

kg = KnowledgeGraph()
for doc in docs:
   kg.nodes.append(
       Node(
           type=NodeType.DOCUMENT,
           properties={
               "page_content": doc.page_content,
               "document_metadata": doc.metadata
           }
       )
   )


### Self Check

```python
print(len(generator.knowledge_graph.nodes))
```
Run this code to verify if knowledge graph nodes have been created. If no nodes were created, there may be issues with executing subsequent code.

```python
for node in generator.knowledge_graph.nodes:
    print(node.properties)
```

In [13]:
# check relationships
print("Total number of nodes:", len(kg.nodes))
print("Total number of relationships:", len(kg.relationships))
print("Number of discovered clusters:", kg.find_indirect_clusters)

Total number of nodes: 38
Total number of relationships: 0
Number of discovered clusters: <bound method KnowledgeGraph.find_indirect_clusters of KnowledgeGraph(nodes: 38, relationships: 0)>


Now we will establish relationships between nodes in the knowledge graph.

In [14]:
from ragas.testset.transforms.extractors.llm_based import NERExtractor
from ragas.testset.transforms.splitters import HeadlineSplitter

transforms = [HeadlineSplitter(), NERExtractor()]

In [15]:
from ragas.testset.transforms import apply_transforms
from ragas.testset.transforms import (
    HeadlinesExtractor,
    HeadlineSplitter,
    OverlapScoreBuilder,
    KeyphrasesExtractor
)
            
# Initialize the key phrase extractor using the LLM defined above
keyphrase_extractor = KeyphrasesExtractor(
    llm=langchain_llm, property_name="keyphrases", max_num=10
)
headline_extractor = HeadlinesExtractor(llm=generator_llm)
headline_splitter = HeadlineSplitter(min_tokens=300, max_tokens=1000)
relation_builder = OverlapScoreBuilder(
    property_name="keyphrases",
    new_property_name="overlap_score",
    threshold=0.01,
    distance_threshold=0.9,
)

transforms = [
    headline_extractor,
    headline_splitter,
    keyphrase_extractor,
    relation_builder,
]

In [16]:
for node in kg.nodes:
    if 'keyphrases' in node.properties:
        del node.properties['keyphrases']

apply_transforms(kg, transforms=transforms)

Applying HeadlinesExtractor:   0%|          | 0/38 [00:00<?, ?it/s]

Applying HeadlineSplitter:   0%|          | 0/38 [00:00<?, ?it/s]

Applying KeyphrasesExtractor:   0%|          | 0/76 [00:00<?, ?it/s]

Property 'keyphrases' already exists in node 'c4a111'. Skipping!
Property 'keyphrases' already exists in node '5524a7'. Skipping!
Property 'keyphrases' already exists in node 'f86a9a'. Skipping!
Property 'keyphrases' already exists in node 'e88e53'. Skipping!
Property 'keyphrases' already exists in node '677cb7'. Skipping!
Property 'keyphrases' already exists in node 'd94dcd'. Skipping!
Property 'keyphrases' already exists in node '81d528'. Skipping!
Property 'keyphrases' already exists in node '72e7f8'. Skipping!
Property 'keyphrases' already exists in node '3d1142'. Skipping!
Property 'keyphrases' already exists in node 'e54d79'. Skipping!
Property 'keyphrases' already exists in node 'f42670'. Skipping!
Property 'keyphrases' already exists in node '8fb770'. Skipping!
Property 'keyphrases' already exists in node 'bdb19f'. Skipping!
Property 'keyphrases' already exists in node 'e6256c'. Skipping!
Property 'keyphrases' already exists in node '81e387'. Skipping!
Property 'keyphrases' alr

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

In [17]:
from ragas.testset import TestsetGenerator
clusters = kg.find_indirect_clusters()
generator = TestsetGenerator(
    llm=generator_llm,
    embedding_model=ragas_embeddings,
    knowledge_graph=kg,
)

In [18]:
# check relationships
print("Total number of nodes:", len(kg.nodes))
print("Total number of relationships:", len(kg.relationships))
print("Number of discovered clusters:", kg.find_indirect_clusters)

Total number of nodes: 76
Total number of relationships: 318
Number of discovered clusters: <bound method KnowledgeGraph.find_indirect_clusters of KnowledgeGraph(nodes: 76, relationships: 318)>


## Persona

In [19]:
from ragas.testset.persona import Persona

persona1 = Persona(
    name="AI Researcher",
    role_description="""A theoretical machine learning researcher who investigates causality and fundamental mechanisms in AI systems. 
    Specializes in analyzing why certain AI architectures outperform others, examining the reasoning patterns of different models, 
    and understanding the underlying principles that drive AI system behaviors. 
    Focuses on hypothesis-driven research to uncover relationships between AI components and their impact on system performance, 
    while constantly questioning and analyzing how different approaches affect problem-solving capabilities."""
)

persona2 = Persona(
name="AI Product Manager",
role_description="An AI product manager focused on developing intelligent applications that enhance user experience through personalized recommendations and predictive analytics."
)

persona3 = Persona(
name="AI Ethicist",
role_description="An AI ethicist concerned about the societal impacts of AI systems and how to build them responsibly with strong principles around fairness and transparency."
)

persona4 = Persona(
name="AI Developer",
role_description="A software engineer building custom AI-powered workflows and automation tools to streamline business operations."
)

persona_list = [persona1, persona2, persona3, persona4]

In [20]:
simple_persona_list = [persona1,persona2]
reasoning_persona_list = [persona1]
multi_context_persona_list = [persona3,persona4]
conditional_persona_list = [persona2,persona4]

## Distribution of Question Types
Before we begin generating questions, let's first define the distribution (frequency) of questions by type. Using the **SingleHopSpecificQuerySynthesizer** , **MultiHopAbstractQuerySynthesizer** , **MultiHopSpecificQuerySynthesizer**  and **MultiHopQuerySynthesizer** , we aim to create a test set with the following distribution of question types:

- `simple`: Basic questions (40%) ㅡ **SingleHopSpecificQuerySynthesizer**
- `reasoning`: Questions requiring reasoning (20%) ㅡ **MultiHopAbstractQuerySynthesizer** 
- `multi_context`: Questions requiring consideration of multiple contexts (20%) ㅡ **MultiHopSpecificQuerySynthesizer** 
- `conditional`: Conditional questions (20%) ㅡ **MultiHopQuerySynthesizer** 

### Role of the synthesizers Module
The synthesizers module in Ragas is a core module responsible for Query Synthesis. It provides functionality to generate various types of questions based on documents stored in the Knowledge Graph. This module is used to automatically generate test sets for evaluating RAG (Retrieval-Augmented Generation) systems.

In [21]:
from ragas.testset.synthesizers.single_hop.specific import SingleHopSpecificQuerySynthesizer

In [22]:
simple_synthesizer = SingleHopSpecificQuerySynthesizer(llm=generator_llm)

In [23]:
from dataclasses import dataclass
import typing as t
from ragas.testset.synthesizers.multi_hop.base import (
    MultiHopQuerySynthesizer,
    MultiHopScenario,
)
from ragas.testset.synthesizers.prompts import (
    ThemesPersonasInput,
    ThemesPersonasMatchingPrompt,
)

@dataclass
class NewMultiHopQuery(MultiHopQuerySynthesizer):

    theme_persona_matching_prompt = ThemesPersonasMatchingPrompt()

    async def _generate_scenarios(
        self,
        n: int,
        knowledge_graph,
        persona_list,
        callbacks,
    ) -> t.List[MultiHopScenario]:

        # query and get (node_a, rel, node_b) to create multi-hop queries
        results = kg.find_two_nodes_single_rel(
            relationship_condition=lambda rel: (
                True if rel.type == "keyphrases_overlap" else False
            )
        )

        num_sample_per_triplet = max(1, n // len(results))

        scenarios = []
        for triplet in results:
            if len(scenarios) < n:
                node_a, node_b = triplet[0], triplet[-1]
                overlapped_keywords = triplet[1].properties["overlapped_items"]
                if overlapped_keywords:

                    # match the keyword with a persona for query creation
                    themes = list(dict(overlapped_keywords).keys())
                    prompt_input = ThemesPersonasInput(
                        themes=themes, personas=persona_list
                    )
                    persona_concepts = (
                        await self.theme_persona_matching_prompt.generate(
                            data=prompt_input, llm=self.llm, callbacks=callbacks
                        )
                    )

                    overlapped_keywords = [list(item) for item in overlapped_keywords]

                    # prepare and sample possible combinations
                    base_scenarios = self.prepare_combinations(
                        [node_a, node_b],
                        overlapped_keywords,
                        personas=persona_list,
                        persona_item_mapping=persona_concepts.mapping,
                        property_name="keyphrases",
                    )

                    # get number of required samples from this triplet
                    base_scenarios = self.sample_diverse_combinations(
                        base_scenarios, num_sample_per_triplet
                    )

                    scenarios.extend(base_scenarios)

        return scenarios

In [24]:
query = NewMultiHopQuery(llm=generator_llm)
scenarios2 = await query.generate_scenarios(n=2, knowledge_graph=kg, persona_list=reasoning_persona_list)

scenarios2

[MultiHopScenario(
 nodes=2
 combinations=['foundational components', 'foundational models']
 style=QueryStyle.WEB_SEARCH_LIKE
 length=QueryLength.LONG
 persona=name='AI Researcher' role_description='A theoretical machine learning researcher who investigates causality and fundamental mechanisms in AI systems. \n    Specializes in analyzing why certain AI architectures outperform others, examining the reasoning patterns of different models, \n    and understanding the underlying principles that drive AI system behaviors. \n    Focuses on hypothesis-driven research to uncover relationships between AI components and their impact on system performance, \n    while constantly questioning and analyzing how different approaches affect problem-solving capabilities.'),
 MultiHopScenario(
 nodes=2
 combinations=['ReAct reasoning', 'ReAct reasoning/planning']
 style=QueryStyle.WEB_SEARCH_LIKE
 length=QueryLength.MEDIUM
 persona=name='AI Researcher' role_description='A theoretical machine learning

In [25]:
reasoning_synthesizer = NewMultiHopQuery(llm=generator_llm)

In [26]:
query = NewMultiHopQuery(llm=generator_llm)
scenarios3 = await query.generate_scenarios(n=2, knowledge_graph=kg, persona_list=multi_context_persona_list)

scenarios3

[MultiHopScenario(
 nodes=2
 combinations=['Generative AI model', 'GenerativeModel']
 style=QueryStyle.POOR_GRAMMAR
 length=QueryLength.LONG
 persona=name='AI Developer' role_description='A software engineer building custom AI-powered workflows and automation tools to streamline business operations.'),
 MultiHopScenario(
 nodes=2
 combinations=['API endpoint', 'API endpoints']
 style=QueryStyle.MISSPELLED
 length=QueryLength.LONG
 persona=name='AI Developer' role_description='A software engineer building custom AI-powered workflows and automation tools to streamline business operations.')]

In [27]:
multi_context_synthesizer = NewMultiHopQuery(llm=generator_llm)

In [28]:
query = NewMultiHopQuery(llm=generator_llm)
scenarios4 = await query.generate_scenarios(n=2, knowledge_graph=kg, persona_list=conditional_persona_list)

scenarios4

[MultiHopScenario(
 nodes=2
 combinations=['Generative AI model', 'GenerativeModel']
 style=QueryStyle.PERFECT_GRAMMAR
 length=QueryLength.SHORT
 persona=name='AI Product Manager' role_description='An AI product manager focused on developing intelligent applications that enhance user experience through personalized recommendations and predictive analytics.'),
 MultiHopScenario(
 nodes=2
 combinations=['model', 'model']
 style=QueryStyle.POOR_GRAMMAR
 length=QueryLength.SHORT
 persona=name='AI Product Manager' role_description='An AI product manager focused on developing intelligent applications that enhance user experience through personalized recommendations and predictive analytics.')]

In [29]:
conditional_synthesizer = NewMultiHopQuery(llm=generator_llm)

In [44]:
@dataclass
class ReasoningMultiHopQuery(MultiHopQuerySynthesizer):

    theme_persona_matching_prompt = ThemesPersonasMatchingPrompt()

    async def _generate_scenarios(
        self,
        n: int,
        knowledge_graph,
        persona_list,
        callbacks,
    ) -> t.List[MultiHopScenario]:

        # query and get (node_a, rel, node_b) to create multi-hop queries
        results = kg.find_two_nodes_single_rel(
            relationship_condition=lambda rel: (
                True if rel.type == "keyphrases_overlap" else False
            )
        )

        num_sample_per_triplet = max(1, n // len(results))

        scenarios = []
        for triplet in results:
            if len(scenarios) < n:
                node_a, node_b = triplet[0], triplet[-1]
                overlapped_keywords = triplet[1].properties["overlapped_items"]
                if overlapped_keywords:

                    # match the keyword with a persona for query creation
                    themes = list(dict(overlapped_keywords).keys())
                    prompt_input = ThemesPersonasInput(
                        themes=themes, personas=persona_list
                    )
                    persona_concepts = (
                        await self.theme_persona_matching_prompt.generate(
                            data=prompt_input, llm=self.llm, callbacks=callbacks
                        )
                    )

                    overlapped_keywords = [list(item) for item in overlapped_keywords]

                    # prepare and sample possible combinations
                    base_scenarios = self.prepare_combinations(
                        [node_a, node_b],
                        overlapped_keywords,
                        personas=persona_list,
                        persona_item_mapping=persona_concepts.mapping,
                        property_name="keyphrases",
                    )

                    # get number of required samples from this triplet
                    base_scenarios = self.sample_diverse_combinations(
                        base_scenarios, num_sample_per_triplet
                    )

                    scenarios.extend(base_scenarios)

        return scenarios

In [30]:
# Set distribution by question type
distribution = [
   (simple_synthesizer, 0.4),        # simple: 40%
   (reasoning_synthesizer, 0.2),     # reasoning: 20%  
   (multi_context_synthesizer, 0.2), # multi_context: 20%
   (conditional_synthesizer, 0.2),   # conditional: 20%
]

In [34]:
dataset = generator.generate_with_langchain_docs(
   documents=docs, # document data
   testset_size=10, # number of questions to generate
   query_distribution=distribution, # distribution by question type 
   with_debugging_logs=True # output debugging logs
)

Applying SummaryExtractor:   0%|          | 0/36 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/38 [00:00<?, ?it/s]

Node 0090a019-5288-498b-aba9-9f64e974e294 does not have a summary. Skipping filtering.
Node b385d3c9-c317-4ef9-b265-05b3e3ae0fe2 does not have a summary. Skipping filtering.


Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/112 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/4 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/10 [00:00<?, ?it/s]

In [35]:
dataset.to_pandas()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,Wht is an API?,"[Agents\nThis combination of reasoning,\nlogic...",An API can be used by a model to make various ...,single_hop_specifc_query_synthesizer
1,What are the three essential components in an ...,[Agents\nWhat is an agent?\nIn its most fundam...,The three essential components in an agent's c...,single_hop_specifc_query_synthesizer
2,"What Chain-of-Thought do in agent model, how i...",[Agents\nFigure 1. General agent architecture ...,Chain-of-Thought is a reasoning and logic fram...,single_hop_specifc_query_synthesizer
3,Waht is the DELETE method used for?,"[Agents\nThe tools\nFoundational models, despi...",The DELETE method is a common web API method t...,single_hop_specifc_query_synthesizer
4,How do foundational components contribute to t...,[<1-hop>\n\nAgents\ncombining specialized agen...,Foundational components contribute to the cogn...,NewMultiHopQuery
5,How agents use tools and data stores for retri...,[<1-hop>\n\nAgents\nThe tools\nFoundational mo...,Agents use tools to bridge the gap between the...,NewMultiHopQuery
6,How do Extensions facilitate the use of API en...,[<1-hop>\n\nAgents\nA more resilient approach ...,Extensions facilitate the use of API endpoints...,NewMultiHopQuery
7,"How do Generative AI models utilize reasoning,...",[<1-hop>\n\nAgents\nWhat is an agent?\nIn its ...,Generative AI models function as agents by uti...,NewMultiHopQuery
8,How do Extensions facilitate the use of API en...,[<1-hop>\n\nAgents\nA more resilient approach ...,Extensions facilitate the use of API endpoints...,NewMultiHopQuery
9,How do developers maintain control over data f...,[<1-hop>\n\nAgents\nData stores\nImagine a lan...,Developers maintain control over data flow whe...,NewMultiHopQuery


In [36]:
dataset.to_pandas().to_csv("data/ragas_synthetic_dataset.csv", index=False)