# Introduction reading

**Why synthetic test data?**
Evaluating RAG (Retrieval-Augmented Generation) augmented pipelines is crucial for assessing their performance. However, manually creating hundreds of QA (Question-Context-Answer) samples from documents can be time-consuming and labor-intensive. Additionally, human-generated questions may struggle to reach the level of complexity required for a thorough evaluation, ultimately impacting the quality of the assessment. By using synthetic data generation developer time in data aggregation process can be reduced by 90%.

**How does Ragas differ in test data generation?**
Ragas takes a novel approach to evaluation data generation. An ideal evaluation dataset should encompass various types of questions encountered in production, including questions of varying difficulty levels. LLMs by default are not good at creating diverse samples as it tends to follow common paths. Inspired by works like Evol-Instruct, Ragas achieves this by employing an evolutionary generation paradigm, where questions with different characteristics such as reasoning, conditioning, multi-context, and more are systematically crafted from the provided set of documents. This approach ensures comprehensive coverage of the performance of various components within your pipeline, resulting in a more robust evaluation process.

evol-generate

**In-Depth Evolution**
Large Language Models (LLMs) possess the capability to transform simple questions into more complex ones effectively. To generate medium to hard samples from the provided documents, we employ the following methods:

* Reasoning: Rewrite the question in a way that enhances the need for reasoning to answer it effectively.

* Conditioning: Modify the question to introduce a conditional element, which adds complexity to the question.

* Multi-Context: Rephrase the question in a manner that necessitates information from multiple related sections or chunks to formulate an answer.

Moreover, our paradigm extends its capabilities to create conversational questions from the given documents:

* Conversational: A portion of the questions, following the evolution process, can be transformed into conversational samples. These questions simulate a chat-based question-and-follow-up interaction, mimicking a chat-Q&A pipeline.

# Synthetic test data generation

This notebook performs the test data generation using RAGAS generator. It takes in a set of json files, which are documents of Tech Innovator Inc from its Confluence pages, and generate a set of [questions & answers & contexts]. These data serve as the reference for RAG pipeline evaluation.

RAGAS synthetic test data generator requires 3 core parameters:
* Genarating agent: an LLM to generate Q&As from a set of documents
* Embedding model
* Critic agent: an LLM serve as a quality control agent. It will filter-out bad Q&As generated by the generating agent 

The test dataset is then uploaded to the LangSmith account for later use in RAG evaluation.

## Load source documents

In [None]:
import os
from dotenv import load_dotenv

# Load the .env file
load_dotenv()

True

In [2]:
# Create a function that takes in .json files and return a list of langchain documents
import json
import langchain
from langchain.docstore.document import Document as LangchainDocument

def load_json_files_to_documents(directory):
  """Loads JSON files from a given directory into a list of Langchain Documents.

  Args:
    directory: The path to the directory containing JSON files.

  Returns:
    A list of Langchain Documents.
  """

  documents = []
  for filename in os.listdir(directory):
      if filename.endswith('.json'):
          file_path = os.path.join(directory, filename)
          with open(file_path, 'r') as f:
              data = json.load(f)
              # Extract relevant fields from the JSON data
              content = data['text']  # Replace 'content' with the actual field name
              metadata = {'source': filename}  # Add additional metadata if needed
              document = LangchainDocument(page_content=content, metadata=metadata)
              documents.append(document)
  return documents

In [3]:
# Example usage: Read json files of HR Department
from from_root import from_root
department = "data\HR" # -> Choose the department you would like to generate synthetic data
documents = load_json_files_to_documents(os.path.join(from_root(), department))

## Creating a generative agent

In [4]:
# Choose one of the LLMs as the generative LLM. Ideally OpenAI's LLMs

# For LLMs from OpenAI
from langchain_openai import ChatOpenAI
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
model_name = os.getenv("DEFAULT_OPENAI_MODEL") # -> Choose your desired model
generator_llm = ChatOpenAI(model=model_name)

# For LLMs from Groq
#from langchain_groq import ChatGroq
#os.environ["GROD_CLOUD_API_KEY"] = os.getenv('GROD_CLOUD_API_KEY')
#model_name = 'llama3-8b-8192' # -> Choose your desired model
#generator_groq_llm = ChatGroq(
#                    groq_api_key=os.environ["GROD_CLOUD_API_KEY"],
#                    model_name=model_name
#                    )

## Create an embedding model

In [5]:
#OpenAI Embedding
from langchain_openai import OpenAIEmbeddings
os.getenv("OPENAI_API_KEY")
embedding_model = os.getenv("DEFAULT_OPENAI_EMBEDDING")
embeddings = OpenAIEmbeddings(disallowed_special=())

# For other embedding models
#from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
#embeddings_fastembed = FastEmbedEmbeddings()

## Create a critic agent

Ideally, the critic agent must be different than the generating agent. However, due to API call limit issue, I skip creating this agent. Instead, I use OpenAI as the critic agent.

In [17]:
# Create a critic LLM. This should be different than the generative llm
#from langchain_groq import ChatGroq
#os.environ["GROD_CLOUD_API_KEY"] = os.getenv('GROD_CLOUD_API_KEY')
#model_name = 'llama-3.1-8b-instant'
#critic_llm = ChatGroq(
#                    groq_api_key=os.environ["GROD_CLOUD_API_KEY"],
#                    model_name=model_name
#                    )

#import time
#from ratelimit import limits, sleep_and_retry
#ONE_MINUTE = 60
# Set the rate limit to 25 calls per minute
#@sleep_and_retry
#@limits(calls=25, period=ONE_MINUTE)
#def call_groq_llm(prompt):
#    # Your Groq Llama 3.1 API call logic here
#    from langchain_groq import ChatGroq
#    os.environ["GROD_CLOUD_API_KEY"] = os.getenv('GROD_CLOUD_API_KEY')
#    model_name = 'llama-3.1-8b-instant'
#    critic_llm = ChatGroq(
#                    groq_api_key=os.environ["GROD_CLOUD_API_KEY"],
#                    model_name=model_name
#                    )
#    response = critic_llm(prompt)
#    return response

## Test set generation

In [6]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context

generator_test = TestsetGenerator.from_langchain(
    generator_llm,
    generator_llm, #This one should be critic_llm
    embeddings
)

# Change resulting question type distribution (This is where we determine the distribution of question types in the test dataset)
distributions = {
    simple: 0.5,
    multi_context: 0.4,
    reasoning: 0.1
}

# Setting the number of Q&As to be in the dataset. 
# Note: the final test dataset size might be smaller than this since the critic agent may filtered out some of the bad Q&As.
testset_size = 10 

In [7]:
# use generator.generate_with_llamaindex_docs if you use llama-index as document loader
# 
testset = generator_test.generate_with_langchain_docs(documents, testset_size, distributions, with_debugging_logs=True) 

embedding nodes:   0%|          | 0/40 [00:00<?, ?it/s]

Filename and doc_id are the same for all nodes.


Generating:   0%|          | 0/10 [00:00<?, ?it/s]

[ragas.testset.filters.DEBUG] context scoring: {'clarity': 3, 'depth': 2, 'structure': 3, 'relevance': 3, 'score': 2.75}
[ragas.testset.evolutions.DEBUG] keyphrases in merged node: ['SEO', 'Social media marketing', 'Sales strategies', 'Finance and Accounting', 'Financial analysis', 'Budgeting', 'Accounting principles', 'Financial software', 'Financial statements', 'Workshops and Webinars', 'Business Simulations', 'Professional Certifications', 'PMP', 'CMA', 'Continuous learning', 'Cloud computing', 'Data science', 'AI', 'Business functions', 'Innovation', 'Excellence', 'Professional development']
[ragas.testset.filters.DEBUG] context scoring: {'clarity': 3, 'depth': 3, 'structure': 3, 'relevance': 3, 'score': 3.0}
[ragas.testset.evolutions.DEBUG] keyphrases in merged node: ['Tech Innovators Inc.', 'Senior Director responsible for Analytics Delivery', 'Analytics capabilities', 'Data-driven decision-making', 'High-impact solutions', 'Strategic Leadership', 'Vision and Strategy', 'Roadmap

In [9]:
# Convert the test dataset to a pandas dataframe and print it out
test_df = testset.to_pandas()
test_df

Unnamed: 0,question,contexts,ground_truth,evolution_type,metadata,episode_done
0,What information should be included in the pol...,[2\n2\n Welcome to \n<Company>\n!\nAdd a welco...,"List and describe the policies, procedures, an...",simple,[{'source': 'Template - Employee handbook.json'}],True
1,What is Tech Innovators Inc.'s policy on remot...,[Q12: What is the process for handling workpla...,Tech Innovators Inc. supports flexible working...,simple,[{'source': 'Employee Frequently Asked Questio...,True
2,How does Tech Innovators Inc. strive to create...,[ Inc. upholds the highest ethical standards i...,Tech Innovators Inc. strives to create a diver...,simple,[{'source': 'Tech Innovators Inc. Recruitment ...,True
3,How can past performance reviews be used for s...,[ Self-assessment\nStart by thinking through y...,Start by thinking through your existing streng...,simple,[{'source': 'Template - Career development pla...,True
4,What is the role of mentorship in the onboardi...,[Welcome to Tech Innovators Inc.\n \nWe are th...,Mentorship plays a crucial role in the onboard...,simple,[{'source': 'Tech Innovators Inc. Employee Onb...,True
5,What support does Tech Innovators Inc. provide...,[Table of Contents\nTraining and Development\n...,,multi_context,[{'source': 'Employee Frequently Asked Questio...,True
6,How do employee engagement and disengagement d...,"[Introduction\nAt Tech Innovators Inc., we bel...",Employee engagement and disengagement differ i...,multi_context,[{'source': 'Employee Relations and Engagement...,True
7,What is the purpose of regular HR audits for c...,[5.2 Monitoring and Auditing\nInternal Audits\...,Regular HR audits for compliance with labor la...,multi_context,[{'source': 'Tech Innovators Inc. Compliance w...,True
8,Which section of the employee handbook covers ...,[Table of Contents\nTraining and Development\n...,The answer to given question is not present in...,reasoning,[{'source': 'Employee Frequently Asked Questio...,True


## Converting the result test dataset to a json file before uploading to LangSmith

In [22]:
# Export the resulting test dataset to a json file
from from_root import from_root
folder = "data-test/test_dataset/test_dataset_hr.json"
test_df[['question', 'ground_truth']].to_json(os.path.join(from_root(), folder), orient='records', indent=2)

## Upload the test dataset to LangSmith

### Upload directly to LangSmith

In [11]:
#from ragas.integrations.langsmith import upload_dataset

# dataset_name = "hr test"
# dataset_desc = "HR department test dataset"

# dataset = upload_dataset(testset, dataset_name, dataset_desc)

Created a new dataset 'hr test'. Dataset is accessible at https://smith.langchain.com/o/08bc9556-81b3-56d7-98aa-4f87d6cdfca5/datasets/f04f14f3-f165-48c3-8d94-dbf759844c7d


In [None]:
# Upload test dataset json file to LangSmith

# Add your code here