# RAG Evaluation with RAGAS

Ragas is an open-source framework that helps developers evaluate and test Large Language Model (LLM) applications, particularly Retrieval Augmented Generation (RAG) pipelines.

Ragas uses LLMs to measure key performance metrics, such as retrieved contexts, answer correctness, faithfullness, context relevancy, etc

This notebook shows how you can integrate their (RAGAS) excellent RAG metrics in LangSmith to evaluate your RAG app.

## Prerequisites
Ragas and Langsmith

LangSmith is a DevOps platform that helps developers and data scientists build, test, deploy, and monitor large language model (LLM) applications.

In [44]:
#%%capture --no-stderr
#%pip install -U langsmith ragas numpy openai

In [45]:
from pathlib import Path
import sys

root_dir = Path().resolve().parent
sys.path.append(str(root_dir))


In [46]:
import getpass
import os
import langsmith

from dotenv import load_dotenv
load_dotenv()


os.environ["LANGCHAIN_TRACING_V2"] = "true"
LANGCHAIN_API_KEY = os.environ.get("LANGCHAIN_API_KEY")
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
PINECONE_API_KEY = os.environ.get("PINECONE_API_KEY")
SONAR_API_KEY = os.environ.get("SONAR_API_KEY")

## Select a Dataset to use

In [47]:
# Load your Policy Pulse documents
from embeddings_utils import extract_text

data_dir = "../Policy Pulse + AVE collab"  # Your existing data folder
docs = []

supported_extensions = ['.pdf', '.docx', '.pptx', '.odp']
data_path = Path(data_dir)

for file_path in data_path.glob('**/*'):
    if file_path.is_file() and file_path.suffix.lower() in supported_extensions:
        print(f"Loading: {file_path.name}")
        text = extract_text(file_path)
        if text:
            docs.append({"file": file_path.name, "content": text})

print(f"Loaded {len(docs)} documents")

Loading: Module 1 Lesson 3 Compliance legal and ethical considerations.docx
Loading: Reproductive and Fertility Health at Work Course module 1.docx
Loading: Script Full curriculum with case studies from Chat GPT.docx
Loading: Script v3 module 1 Full curriculum with case studies from Chat GPT.docx
Loading: Case Study How Company X Implemented a Reproductive and Fertility Health Guide.docx
Loading: Handout Key Components Checklist.docx
Loading: Module 2 lesson 1 Key Components of an Effective Reproductive and Fertility Health Guide.docx
Loading: Outline and handout Structure Module 2 lesson 1 Key Components of an Effective Reproductive and Fertility Health Guide.docx
Loading: Script and Lesson outline.docx
Loading: Module_4_Evaluation_Script_We_Are_Eden.pdf
Loading: Mission Statement.odp
Loading: Outline Reproductive_Fertility_Health_Guide_Updated.pdf
Loading: Outline WeAreEdenReproductive_Fertility_Health_Guide_WeAreEden.pdf
Loading: Template Reproductive and Fertility Health Guide Outl

## Define your pipeline

In [48]:

client = langsmith.Client()

In [49]:
dataset_name = "Policy Pulse Reproductive Health Compliance"

test_cases = [
    {
        "inputs": {"question": "What accommodations must we provide for pregnant employees?"},
        "outputs": {"ground_truth": "Under Title VII and the Pregnancy Discrimination Act, employers must provide reasonable accommodations for pregnancy-related conditions."}
    },
    {
        "inputs": {"question": "Can we ask about pregnancy plans during hiring interviews?"},
        "outputs": {"ground_truth": "No, asking about pregnancy plans during hiring is illegal under pregnancy discrimination laws."}
    },
    {
        "inputs": {"question": "What lactation accommodation is required under the PUMP Act?"},
        "outputs": {"ground_truth": "The PUMP Act requires employers to provide reasonable break time and private space for pumping breast milk."}
    },
    {
        "inputs": {"question": "Are employers required to cover IVF treatments?"},
        "outputs": {"ground_truth": "IVF coverage requirements vary by state - some mandate coverage while others don't."}
    },
    {
        "inputs": {"question": "Do fertility benefits need to include same-sex couples?"},
        "outputs": {"ground_truth": "Yes, fertility benefits should be inclusive and cover same-sex couples equally."}
    }
]

# Add examples - back to the original way but with ground_truth instead of expected_answer
for example in test_cases:
    try:
        client.create_example(
            inputs=example["inputs"],
            outputs=example["outputs"],  # This contains ground_truth
            dataset_name=dataset_name
        )
        print(f"Added example: {example['inputs']['question'][:50]}...")
    except Exception as e:
        print(f"Error adding example: {e}")

Added example: What accommodations must we provide for pregnant e...
Added example: Can we ask about pregnancy plans during hiring int...
Added example: What lactation accommodation is required under the...
Added example: Are employers required to cover IVF treatments?...
Added example: Do fertility benefits need to include same-sex cou...


In [None]:
# LangSmith-compatible structure

dataset_name = "Policy Pulse Reproductive Health Compliance"

langsmith_test_cases = [
    {
        "inputs": {"question": "What accommodations must we provide for pregnant employees?"},
        "outputs": {"expected_answer": "Under Title VII and the Pregnancy Discrimination Act, employers must provide reasonable accommodations for pregnancy-related conditions."}
    },
    {
        "inputs": {"question": "Can we ask about pregnancy plans during hiring interviews?"},
        "outputs": {"expected_answer": "No, asking about pregnancy plans during hiring is illegal under pregnancy discrimination laws."}
    },
    {
        "inputs": {"question": "What lactation accommodation is required under the PUMP Act?"},
        "outputs": {"expected_answer": "The PUMP Act requires employers to provide reasonable break time and private space for pumping breast milk."}
    },
    {
        "inputs": {"question": "Are employers required to cover IVF treatments?"},
        "outputs": {"expected_answer": "IVF coverage requirements vary by state - some mandate coverage while others don't."}
    },
    {
        "inputs": {"question": "Do fertility benefits need to include same-sex couples?"},
        "outputs": {"expected_answer": "Yes, fertility benefits should be inclusive and cover same-sex couples equally."}
    }
    
]

# Create dataset with LangSmith structure
for example in langsmith_test_cases:
    try:
        client.create_example(
            inputs=example["inputs"],
            outputs=example["outputs"], 
            dataset_name=dataset_name
        )
    except:
        pass  # Example might already exist

## Create the retriever

In [51]:
from typing import List

import numpy as np
import openai
from langsmith import traceable


class VectorStoreRetriever:
    def __init__(self, docs: list, vectors: list, oai_client):
        self._arr = np.array(vectors)
        self._docs = docs
        self._client = oai_client

    @classmethod
    async def from_docs(cls, docs, oai_client):
        embeddings = await oai_client.embeddings.create(
            model="text-embedding-3-small", input=[doc["content"] for doc in docs]
        )
        vectors = [emb.embedding for emb in embeddings.data]
        return cls(docs, vectors, oai_client)

    @traceable
    async def query(self, query: str, k: int = 5) -> List[dict]:
        embed = await self._client.embeddings.create(
            model="text-embedding-3-small", input=[query]
        )
        # "@" is just a matrix multiplication in python
        scores = np.array(embed.data[0].embedding) @ self._arr.T
        top_k_idx = np.argpartition(scores, -k)[-k:]
        top_k_idx_sorted = top_k_idx[np.argsort(-scores[top_k_idx])]
        return [
            {**self._docs[idx], "similarity": scores[idx]} for idx in top_k_idx_sorted
        ]

In [52]:
import sys
sys.path.append(".")  # Import your modules
from ai_agent import answer_question, retrieve_relevant_chunks
from langsmith import traceable

class PolicyPulseRagBot:
    def __init__(self, retriever=None, model: str = "sonar"):
        self.index_name = "policypulse"
        self.api_key = os.environ.get("PINECONE_API_KEY")

    @traceable
    def get_answer(self, question: str):
        """Use YOUR existing AI agent functions directly"""
        
        # Your answer_question already handles:
        # - Pinecone retrieval
        # - Perplexity/Sonar API calls  
        # - Conversation history (if provided)
        # - System prompts with citations
        # - Domain filtering
        rag_response = answer_question(
            user_query=question,
            index_name=self.index_name, 
            api_key=self.api_key
            # No conversation_history needed for evaluation
        )
        
        # Get chunks separately for RAGAS context evaluation
        retrieved_chunks = retrieve_relevant_chunks(
            question, 
            self.index_name, 
            self.api_key, 
            top_k=5
        )
        
        # Return in RAGAS format
        return {
            "answer": rag_response,
            "contexts": [chunk['text'] for chunk in retrieved_chunks],
        }

In [53]:
retriever = await VectorStoreRetriever.from_docs(docs, openai.AsyncClient())
rag_bot = PolicyPulseRagBot(retriever)

In [54]:
response = rag_bot.get_answer("What accommodations must we provide for pregnant employees?")
response["answer"][:150]

Retrieving relevant chunks for query: What accommodations must we provide for pregnant employees?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits


'To provide accommodations for pregnant employees, employers must consider the requirements outlined in laws like the Pregnant Workers Fairness Act (PW'

## Evaluate

In [55]:
from langchain.smith import RunEvalConfig
from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import (
    answer_correctness,
    answer_relevancy,
    context_precision,
    context_recall,
    faithfulness,
)

# Wrap the RAGAS metrics to use in LangChain
evaluators = [
    EvaluatorChain(metric)
    for metric in [
        answer_correctness,
        answer_relevancy,
        context_precision,
        context_recall,
        faithfulness,
    ]
]
eval_config = RunEvalConfig(custom_evaluators=evaluators)

In [None]:
import datetime

# Create a unique project name with timestamp
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
project_name = f"Policy Pulse Evaluation {timestamp}"

def rag_chain(inputs):
    question = inputs["question"]
    return rag_bot.get_answer(question)

results = client.run_on_dataset(
    dataset_name="Policy Pulse Reproductive Health Compliance",
    llm_or_chain_factory=rag_chain,
    evaluation=eval_config,
    project_name=project_name  # Unique name each time
)

View the evaluation results for project 'Policy Pulse Evaluation 20250617_155006' at:
https://smith.langchain.com/o/e2457dbb-11ba-4afa-9e52-b98b1c7e7125/datasets/02a42c2a-70dd-4326-8ee8-38dc7b23b68e/compare?selectedSessions=7aad9899-78ce-4aa1-abee-fbc9ebaa0a97

View all tests for Dataset Policy Pulse Reproductive Health Compliance at:
https://smith.langchain.com/o/e2457dbb-11ba-4afa-9e52-b98b1c7e7125/datasets/02a42c2a-70dd-4326-8ee8-38dc7b23b68e
[>                                                 ] 0/40Retrieving relevant chunks for query: Do fertility benefits need to include same-sex couples?
Retrieving relevant chunks for query: Are employers required to cover IVF treatments?
Retrieving relevant chunks for query: What lactation accommodation is required under the PUMP Act?
Retrieving relevant chunks for query: Can we ask about pregnancy plans during hiring interviews?
Retrieving relevant chunks for query: What accommodations must we provide for pregnant employees?
Search response rec

Error evaluating run 1496396b-4ec5-4fa4-85c5-2b7c508ec170 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[>                                                 ] 1/40Retrieving relevant chunks for query: Do fertility benefits need to include same-sex couples?
[->                                                ] 2/40Retrieving relevant chunks for query: Are employers required to cover IVF treatments?
[--->                                              ] 3/40Retrieving relevant chunks for query: What lactation accommodation is required under the PUMP Act?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits


Error evaluating run 9bd30387-4277-40c9-900f-25b22b3b31b1 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[---->                                             ] 4/40Retrieving relevant chunks for query: Can we ask about pregnancy plans during hiring interviews?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits
Search response received
Found 5 hits


Error evaluating run 2f829885-ddce-46fc-b7b7-014d18c40859 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[----->                                            ] 5/40Retrieving relevant chunks for query: What accommodations must we provide for pregnant employees?
Search response received
Found 5 hits
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits
Search response received
Found 5 hits
Search response received
Found 5 hits


Error evaluating run a14cea8b-5041-45fb-ac73-87fbdb02eef9 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\openai\_base_client.py", line 1497, in request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1629, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1657, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1694, in _send_hand

[------->                                          ] 6/40Retrieving relevant chunks for query: Do fertility benefits need to include same-sex couples?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits
[-------->                                         ] 7/40Retrieving relevant chunks for query: Are employers required to cover IVF treatments?


Error evaluating run e5b6e8ef-f05a-45b9-b184-9744e0057219 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[--------->                                        ] 8/40Retrieving relevant chunks for query: What lactation accommodation is required under the PUMP Act?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
[---------->                                       ] 9/40Retrieving relevant chunks for query: Can we ask about pregnancy plans during hiring interviews?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits
Search response received
Found 5 hits


Error evaluating run 17aee7fb-1800-4cc0-a689-24fb13923ac6 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[----------->                                      ] 10/40Retrieving relevant chunks for query: What accommodations must we provide for pregnant employees?


Error evaluating run e6a45cfe-dac9-4d75-bdc4-a2cb949e44a8 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[------------->                                    ] 11/40Retrieving relevant chunks for query: Do fertility benefits need to include same-sex couples?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits


Error evaluating run 09c9d966-0d7d-442b-8d61-6a5990c79781 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[-------------->                                   ] 12/40Retrieving relevant chunks for query: Are employers required to cover IVF treatments?
[--------------->                                  ] 13/40Retrieving relevant chunks for query: What lactation accommodation is required under the PUMP Act?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
[----------------->                                ] 14/40Retrieving relevant chunks for query: Can we ask about pregnancy plans during hiring interviews?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits


Error evaluating run 207a2990-9805-423b-abda-7ef3feb9915e with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[------------------>                               ] 15/40Retrieving relevant chunks for query: What accommodations must we provide for pregnant employees?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits
Search response received
Found 5 hits
Search response received
Found 5 hits
Search response received
Found 5 hits
Search response received
Found 5 hits


Error evaluating run db78e8d0-6cc1-4e80-a2ea-4938b06c9c2b with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\openai\_base_client.py", line 1497, in request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1629, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1657, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1694, in _send_hand

[------------------->                              ] 16/40Retrieving relevant chunks for query: Do fertility benefits need to include same-sex couples?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...


Error evaluating run 76da5753-e556-476f-b6c9-6450467399e4 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\openai\_base_client.py", line 1497, in request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1629, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1657, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1694, in _send_hand

[-------------------->                             ] 17/40Retrieving relevant chunks for query: Are employers required to cover IVF treatments?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits
Search response received
Found 5 hits
[--------------------->                            ] 18/40Retrieving relevant chunks for query: What lactation accommodation is required under the PUMP Act?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits


Error evaluating run a02fbabf-74de-489e-827d-05b88abce2c9 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\openai\_base_client.py", line 1497, in request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1629, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1657, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1694, in _send_hand

[----------------------->                          ] 19/40Retrieving relevant chunks for query: Can we ask about pregnancy plans during hiring interviews?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits
[------------------------>                         ] 20/40Retrieving relevant chunks for query: What accommodations must we provide for pregnant employees?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
[------------------------->                        ] 21/40Retrieving relevant chunks for query: Do fertility benefits need to include same-sex couples?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits
[--------------------------->                      ] 22/40Retrieving relevant chunks for query: Are employers required to cover IVF treatments?
Search response received
Found 5 hits
Retrieved 5 rel

Error evaluating run f571bf7e-c1b1-48d1-a39b-83ac9f15e49c with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\openai\_base_client.py", line 1497, in request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1629, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1657, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1694, in _send_hand

[------------------------------->                  ] 26/40Retrieving relevant chunks for query: Do fertility benefits need to include same-sex couples?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits


Error evaluating run 5bc7d41f-a599-43be-ab92-c360f6c458c2 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[--------------------------------->                ] 27/40Retrieving relevant chunks for query: Are employers required to cover IVF treatments?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...


Error evaluating run bdaf46c0-1e26-4615-a354-4046d10ae2ac with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\openai\_base_client.py", line 1497, in request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1629, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1657, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\httpx\_client.py", line 1694, in _send_hand

[---------------------------------->               ] 28/40Retrieving relevant chunks for query: What lactation accommodation is required under the PUMP Act?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits


Error evaluating run 5824ef1f-5d82-404e-bc7a-a522836dc3bb with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[----------------------------------->              ] 29/40Retrieving relevant chunks for query: Can we ask about pregnancy plans during hiring interviews?
Search response received
Found 5 hits
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...


Error evaluating run 11a9dec6-285b-43dc-92b9-7a8ed2e5a1d7 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[------------------------------------->            ] 30/40Retrieving relevant chunks for query: What accommodations must we provide for pregnant employees?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
[-------------------------------------->           ] 31/40Retrieving relevant chunks for query: Do fertility benefits need to include same-sex couples?
Search response received
Found 5 hits
Search response received
Found 5 hits


Error evaluating run aaff9b70-4aa0-4bc6-8f8a-da620a837337 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
[--------------------------------------->          ] 32/40Retrieving relevant chunks for query: Are employers required to cover IVF treatments?


Error evaluating run f2bf6007-c81c-43d6-ade8-71fd0af0e622 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[---------------------------------------->         ] 33/40Retrieving relevant chunks for query: What lactation accommodation is required under the PUMP Act?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits
Search response received
Found 5 hits


Error evaluating run 2a7aab5b-6dad-4568-a8ff-14d03c77ca55 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[----------------------------------------->        ] 34/40Retrieving relevant chunks for query: Can we ask about pregnancy plans during hiring interviews?


Error evaluating run 50534dcb-2cdb-46c8-b73e-3882e80baa93 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[------------------------------------------->      ] 35/40Retrieving relevant chunks for query: What accommodations must we provide for pregnant employees?
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits
Retrieved 5 relevant chunks
Querying SONAR API...
Search response received
Found 5 hits


Error evaluating run 62de78f2-2acd-45a4-81f9-8afc16dc0d46 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[-------------------------------------------->     ] 36/40Search response received
Found 5 hits
Search response received
Found 5 hits


Error evaluating run b55b35ed-ca10-4632-806e-423b2aca7913 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[--------------------------------------------->    ] 37/40

Error evaluating run cc048f6f-3953-477a-a553-e73569e06732 with EvaluatorChain
Traceback (most recent call last):
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\langchain_core\tracers\evaluation.py", line 133, in _evaluate_in_project
    evaluation_result = evaluator.evaluate_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 186, in evaluate_run
    self._validate_langsmith_eval(run, example)
  File "c:\Users\nnamd\OneDrive\Documents\AvE Consulting\We Are Eden\PolicyPulseApp\venv\Lib\site-packages\ragas\integrations\langchain.py", line 159, in _validate_langsmith_eval
    raise ValueError(
ValueError: Expected 'question' and 'ground_truth' in example.Got: ['question']
Error in EvaluatorCallbackHandler.on_chain_end callback: ValueError("Expected 'question' and 'ground_truth' in exam

[----------------------------------------------->  ] 38/40

In [None]:
print("Evaluation completed!")
print(f"Results: {results}")

# The results object should contain a URL like:
# View results at: https://smith.langchain.com/...

In [None]:
# Create the RAGAS-compatible structure mapping
ragas_ground_truths = {
    "What accommodations must we provide for pregnant employees?": "Under Title VII and the Pregnancy Discrimination Act, employers must provide reasonable accommodations for pregnancy-related conditions.",
    "Can we ask about pregnancy plans during hiring interviews?": "No, asking about pregnancy plans during hiring is illegal under pregnancy discrimination laws.",
    "What lactation accommodation is required under the PUMP Act?": "The PUMP Act requires employers to provide reasonable break time and private space for pumping breast milk.",
    "Are employers required to cover IVF treatments?": "IVF coverage requirements vary by state - some mandate coverage while others don't.",
    "Do fertility benefits need to include same-sex couples?": "Yes, fertility benefits should be inclusive and cover same-sex couples equally."
}

def rag_chain(inputs):
    question = inputs["question"]
    result = rag_bot.get_answer(question)
    
    # Add ground_truth for RAGAS
    if question in ragas_ground_truths:
        result["ground_truth"] = ragas_ground_truths[question]
    
    return result