# Evaluation of RAG-based Chatbot Responses with RAGAS

This notebook demonstrates the evaluation of a response to the sample question "What payment methods are available to donate?"
The evaluation is based on four components: User Query, Context, LLM Response, and Golden Truth.
Using these elements, the following RAGAS scores are computed:

- Faithfulness
- Response Relevancy
- Context Precision
- Context Recall



In [None]:
!pip install ragas > /dev/null

In [2]:
from google.colab import userdata
import os
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["RAGAS_APP_TOKEN"] = userdata.get('RAGAS_APP_TOKEN')

In [3]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from ragas.llms import LangchainLLMWrapper

# Initialize GPT-4 model for evaluation
llm = ChatOpenAI(model="gpt-4o")
# Initialize text embedding model
embeddings = OpenAIEmbeddings()
# Set up the LLM wrapper for RAGAS evaluation
evaluator_llm = LangchainLLMWrapper(llm)

In [5]:
from ragas import evaluate, EvaluationDataset
from ragas.metrics import (
   Faithfulness,
   ResponseRelevancy,
   LLMContextPrecisionWithReference,
   LLMContextRecall,
)

def evaluate_rag_system(
    queries: list,
    retrieved_contexts: list,
    actual_responses: list,
    golden_answers: list
) -> None:
  """
  Evaluate the performance of a Retrieval-Augmented Generation (RAG) system
  using multiple metrics provided by the RAGAS library.
  """

  # Check input consistency
  assert len(queries) == len(retrieved_contexts) == len(actual_responses) == len(golden_answers), (
      "All input lists (queries, retrieved_contexts, actual_responses, golden_answers) must have the same length."
  )

  # Prepare evaluation dataset
  dataset = [
      {
          "user_input": query,
          "retrieved_contexts": relevant_docs,
          "response": response,
          "reference": golden_answer,
      }
      for query, relevant_docs, response, golden_answer in zip(
          queries, retrieved_contexts, actual_responses, golden_answers
      )
  ]

  # Convert dataset to EvaluationDataset format
  eval_dataset = EvaluationDataset.from_list(dataset)

  # Define evaluation metrics
  metrics = [
      Faithfulness(),
      ResponseRelevancy(),
      LLMContextPrecisionWithReference(),
      LLMContextRecall(),
  ]

  # Run evaluation and print the results
  result = evaluate(
      dataset=eval_dataset,
      metrics=metrics,
      llm=evaluator_llm,
  )

  print(result)

  # Optionally upload results to the RAGAS dashboard
  result.upload()

# Sample Evaluation

The following example evaluates a single response to the question "What payment methods are available to donate?".

The retrieved contexts are taken from actual support documents created for the prototype system.

In [6]:
queries = [
    "What payment methods are available to donate?",
]

retrieved_contexts = [
    [
        "What payment methods can I use to donate? You can donate using a debit or credit card, including Visa, MasterCard, and American Express. Please note that we currently do not accept other card types. We also accept payments through PayPal, Apple Pay, checks, and bank transfers. These options are available for both one-time donations and monthly sponsorship contributions in the Sponsor a Child program.",
        "What donation amounts can I choose from? You can choose to donate £20, £30, £40, or £50, either as a one-time contribution or a monthly donation. Every contribution, no matter the amount, helps make a meaningful difference in the lives of children and communities we support.",
        "How are donations spent? Your donations are allocated to essential programs and services that aim to address both immediate needs and long-term development goals. This includes emergency response work, education initiatives, healthcare projects, and advocacy efforts. The majority of funds directly support these programs, while a portion is allocated to raising additional funds to expand our reach and impact. Through these efforts, we ensure that your contributions create meaningful change and benefit as many people as possible.",
        "What payment plans are available? We offer flexible payment plans to suit your preferences. You can choose a one-time donation, which allows you to make a single contribution to support our programs. Alternatively, you can opt for a monthly donation, providing ongoing support through recurring contributions that are automatically processed each month. Both options are available for general donations and the Sponsor a Child program, giving you the freedom to support in the way that works best for you.",
        "Do I have to pay any card processing fees? No, The Virtual Charity covers all credit/debit card processing fees. 100% of your specified donation amount will go towards our work, with 85% used for direct charitable activities and 15% for fundraising efforts.",
    ],
]

actual_responses = [
    "You can donate using a debit or credit card (Visa, MasterCard, American Express), PayPal, Apple Pay, checks, and bank transfers. These options are available for both one-time donations and monthly sponsorship contributions in the Sponsor a Child program."
]

golden_answers = [
    "You can donate using a debit or credit card (Visa, MasterCard, American Express), PayPal, Apple Pay, checks, or bank transfers. These payment methods are available for both one-time donations and monthly sponsorship contributions in the Sponsor a Child program.",
]

# Evaluate the RAG system using predefined data
evaluate_rag_system(queries, retrieved_contexts, actual_responses, golden_answers)

Evaluating:   0%|          | 0/4 [00:00<?, ?it/s]

{'faithfulness': 1.0000, 'answer_relevancy': 0.9820, 'llm_context_precision_with_reference': 1.0000, 'context_recall': 1.0000}
Evaluation results uploaded! View at https://app.ragas.io/dashboard/alignment/evaluation/d0054826-6e55-42ac-9cb9-6771450577f0
