# RAG Response Evals: Build Dataset

Build a dataset using a combination of user queries and synthetic LLM-generated queries.

Reuse data from previous 'vector search cutoff' experiments.
- LLM generated queries tended to be more complex questions than user queries.
- Aim for 100 questions, 50 pure LLM-generated, and 50 user + few-shot LLM 'user-like' queries

In [1]:
import pandas as pd

## Load previous queries

In [2]:
df_user = pd.read_csv("retrieval_relevance_evaluations_user_queries.csv")

In [5]:
df_synth = pd.read_csv("retrieval_relevance_evaluations.csv")

In [9]:
user_queries = list(df_user['query'].unique())
synth_queries = list(df_synth['query'].unique())

In [10]:
len(user_queries), len(synth_queries)

(16, 20)

In [36]:
user_queries

['What were the first civilizations?',
 'when did julius cesar rule?',
 'how does the author define barbarians?',
 'why are Sunnis and Shia called that?',
 'who were the guptas?',
 'name the Chinese dynasties',
 'what was hellenization?',
 'Can you tell me about groups that moved into europe during the roman empire?',
 'Who were the Magyars?',
 'who were the scythians?',
 "what does 'doge' mean?",
 'tell me about the antonine age in rome',
 'Tell me about the Roman Empire',
 'who were the Seljuks?',
 'What groups had interactions with the Magyars?',
 'What were the main causes of World War I?']

# Generate more queries

In [None]:
# use openai llm to generate a set of questions to ask about a world history book
from langchain.schema.runnable import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from langsmith import traceable
from pydantic import BaseModel, Field
from typing import List

class HistoryQuestions(BaseModel):
    questions: List[str] = Field(description="List of world history questions")

def format_few_shot_examples(queryList):
    return "\n".join([f"- {q}" for q in queryList])

@traceable
def generate_simple_history_questions_few_shot(good_example_list, bad_example_list, num_questions=20):
    # Initialize parser and LLM
    parser = JsonOutputParser(pydantic_object=HistoryQuestions)
    llm = ChatOpenAI(model="gpt-4o", temperature=0.8)
    
    # Create prompt template
    prompt = PromptTemplate.from_template(
    """
    You are a world history expert.
        
    Generate {num_questions} diverse, specific questions that could be asked about a comprehensive world history book. 
    
    The questions should cover:
    - Different time periods (ancient, medieval, modern, contemporary)
    - Various civilizations and regions (Europe, Asia, Africa, Americas, Middle East)

    Below is a set of good example questions. These are simple, factual questions, not complex analytical ones. Try to mimic this style.
    Good examples:
    {good_examples}

    Below is a set of bad example questions. These are more complex and analytical. Try to avoid this style.
    Bad examples:
    {bad_examples}

    {format_instructions}
    """
    )
    
    # Create LCEL chain
    # example_formatter = RunnableLambda(lambda x: {"examples": format_few_shot_examples(x["example_list"])})
    example_formatter_1 = RunnablePassthrough.assign(good_examples=lambda x: format_few_shot_examples(x["example_list"]))
    example_formatter_2 = RunnablePassthrough.assign(bad_examples=lambda x: format_few_shot_examples(x["example_list"]))

    chain = example_formatter_1 | example_formatter_2 | prompt | llm | parser

    # Execute chain
    result = chain.invoke({
        "example_list": example_list,
        "num_questions": num_questions,
        "format_instructions": parser.get_format_instructions()
    })
    
    return result


In [37]:
user_queries_selected = [user_queries[0],user_queries[1],user_queries[3]]

In [40]:
synth_queries[:3]

['What were the key factors that led to the fall of the Western Roman Empire?',
 'How did the spread of Islam in the 7th century influence trade and cultural exchanges across Africa and Europe?',
 'What were the primary motivations behind the European Age of Exploration during the 15th and 16th centuries?']

In [38]:
generate_simple_history_questions_few_shot(example_list=user_queries_selected, num_questions=5)

{'questions': ['What factors led to the rise and fall of the Ancient Egyptian civilization?',
  'During which centuries did the Mongol Empire reach its peak, and who were its most notable leaders?',
  'What were the primary causes and outcomes of the Industrial Revolution in Europe?',
  'How did the Meiji Restoration transform Japan in the late 19th century?',
  'What significant events and changes occurred in Africa during the decolonization period of the mid-20th century?']}

In [16]:
user_queries

['What were the first civilizations?',
 'when did julius cesar rule?',
 'how does the author define barbarians?',
 'why are Sunnis and Shia called that?',
 'who were the guptas?',
 'name the Chinese dynasties',
 'what was hellenization?',
 'Can you tell me about groups that moved into europe during the roman empire?',
 'Who were the Magyars?',
 'who were the scythians?',
 "what does 'doge' mean?",
 'tell me about the antonine age in rome',
 'Tell me about the Roman Empire',
 'who were the Seljuks?',
 'What groups had interactions with the Magyars?',
 'What were the main causes of World War I?']