**Reference Link:** [RAG Systems Essentials (Analytics Vidhya)](https://courses.analyticsvidhya.com/courses/take/rag-systems-essentials/lessons/60148017-hands-on-deep-dive-into-rag-evaluation-metrics-generator-metrics-i)

# Build a Simple RAG System

## Install OpenAI, and LangChain dependencies

In [2]:
!pip install -qq langchain
!pip install -qq langchain-openai
!pip install -qq langchain-community
!pip install -qq dill

## Install Chroma Vector DB and LangChain wrapper

In [8]:
!pip install -qq langchain-chroma

## Install RAG Evaluation Libraries

In [9]:
!pip install -qq ragas
!pip install -qq deepeval

## Enter Open AI API Key

In [5]:
# from getpass import getpass

# OPENAI_KEY = getpass('Enter Open AI API Key: ')

## Setup Environment Variables

In [6]:
# import os

# os.environ['OPENAI_API_KEY'] = OPENAI_KEY

In [10]:
import os
from dotenv import load_dotenv

load_dotenv()

True

### Open AI Embedding Models

LangChain enables us to access Open AI embedding models which include the newest models: a smaller and highly efficient `text-embedding-3-small` model, and a larger and more powerful `text-embedding-3-large` model.

In [11]:
from langchain_openai import OpenAIEmbeddings

# details here: https://openai.com/blog/new-embedding-models-and-api-updates
openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')

## Loading and Processing the Data

### Get the dataset

In [12]:
# if you can't download using the following code
# go to https://drive.google.com/file/d/1QkSY9W5RyaBnY8c5FLIsmpPVXoHTQ-fb/view?usp=sharing download it
# manually upload it on colab
# !gdown 1QkSY9W5RyaBnY8c5FLIsmpPVXoHTQ-fb

### Load and Process JSON Documents

In [14]:
import pandas as pd

df = pd.read_csv('../../docs/rag_eval_docs.csv')
df

Unnamed: 0,id,title,context
0,1,Machine Learning,Machine learning is a field of artificial inte...
1,2,Deep Learning,Deep learning is a subset of machine learning ...
2,3,Natural Language Processing (NLP),NLP is a branch of AI that enables computers t...
3,4,Pyramids,"Pyramids are ancient structures, often serving..."
4,5,Photosynthesis,Photosynthesis is the process plants use to co...
5,6,Biology,"Biology is the study of living organisms, cove..."
6,7,Quantum Mechanics,Quantum mechanics is a branch of physics that ...
7,8,Cryptocurrency,Cryptocurrency is a digital currency that uses...
8,9,Renewable Energy,"Renewable energy sources, such as solar and wi..."
9,10,Artificial Intelligence,Artificial intelligence refers to machines mim...


In [16]:
docs = df.to_dict(orient='records')
docs[:3]

[{'id': 1,
  'title': 'Machine Learning',
  'context': 'Machine learning is a field of artificial intelligence focused on enabling systems to learn patterns from data. Algorithms analyze past data to make predictions or classify information. Popular applications include recommendation systems and image recognition.'},
 {'id': 2,
  'title': 'Deep Learning',
  'context': 'Deep learning is a subset of machine learning utilizing neural networks with many layers. It excels in complex tasks like image and speech recognition. Convolutional and recurrent neural networks are among the common architectures used.'},
 {'id': 3,
  'title': 'Natural Language Processing (NLP)',
  'context': 'NLP is a branch of AI that enables computers to understand, interpret, and generate human language. Techniques include tokenization, stemming, and sentiment analysis. Applications range from chatbots to language translation services.'}]

In [17]:
from langchain.docstore.document import Document
processed_docs = []

for doc in docs:
    metadata = {
        "title": doc['title'],
        "id": doc['id'],
    }
    data = doc['context']
    processed_docs.append(Document(page_content=data, metadata=metadata))
processed_docs[:3]

[Document(metadata={'title': 'Machine Learning', 'id': 1}, page_content='Machine learning is a field of artificial intelligence focused on enabling systems to learn patterns from data. Algorithms analyze past data to make predictions or classify information. Popular applications include recommendation systems and image recognition.'),
 Document(metadata={'title': 'Deep Learning', 'id': 2}, page_content='Deep learning is a subset of machine learning utilizing neural networks with many layers. It excels in complex tasks like image and speech recognition. Convolutional and recurrent neural networks are among the common architectures used.'),
 Document(metadata={'title': 'Natural Language Processing (NLP)', 'id': 3}, page_content='NLP is a branch of AI that enables computers to understand, interpret, and generate human language. Techniques include tokenization, stemming, and sentiment analysis. Applications range from chatbots to language translation services.')]

## Index Document Chunks and Embeddings in Vector DB

Here we initialize a connection to a Chroma vector DB client, and also we want to save to disk, so we simply initialize the Chroma client and pass the directory where we want the data to be saved to.

In [18]:
from langchain_chroma import Chroma

# create vector DB of docs and embeddings - takes < 30s on Colab
chroma_db = Chroma.from_documents(documents=processed_docs,
                                  collection_name='my_db',
                                  embedding=openai_embed_model,
                                  # need to set the distance function to cosine else it uses euclidean by default
                                  # check https://docs.trychroma.com/guides#changing-the-distance-function
                                  collection_metadata={"hnsw:space": "cosine"},
                                  persist_directory="./my_db")

### Load Vector DB from disk

This is just to show once you have a vector database on disk you can just load and create a connection to it anytime

In [19]:
# load from disk
chroma_db = Chroma(persist_directory="./my_db",
                   collection_name='my_db',
                   embedding_function=openai_embed_model)

In [20]:
chroma_db

<langchain_chroma.vectorstores.Chroma at 0x16a53fc50>

### Semantic Similarity based Retrieval

We use simple cosine similarity here and retrieve the top 3 similar documents based on the user input query

In [21]:
similarity_retriever = chroma_db.as_retriever(search_type="similarity_score_threshold",
                                              search_kwargs={"k": 3, "score_threshold": 0.3})

In [22]:
from IPython.display import display, Markdown

def display_docs(docs):
    for doc in docs:
        print('Metadata:', doc.metadata)
        print('Content Brief:')
        display(Markdown(doc.page_content))
        print()

In [23]:
query = "what is AI?"
top_docs = similarity_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'id': 10, 'title': 'Artificial Intelligence'}
Content Brief:


Artificial intelligence refers to machines mimicking human intelligence, like problem-solving and learning. AI includes applications like virtual assistants, robotics, and autonomous vehicles. It's evolving rapidly with advancements in machine learning and deep learning.


Metadata: {'id': 3, 'title': 'Natural Language Processing (NLP)'}
Content Brief:


NLP is a branch of AI that enables computers to understand, interpret, and generate human language. Techniques include tokenization, stemming, and sentiment analysis. Applications range from chatbots to language translation services.


Metadata: {'id': 1, 'title': 'Machine Learning'}
Content Brief:


Machine learning is a field of artificial intelligence focused on enabling systems to learn patterns from data. Algorithms analyze past data to make predictions or classify information. Popular applications include recommendation systems and image recognition.




In [24]:
query = "how do plants survive?"
top_docs = similarity_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'id': 5, 'title': 'Photosynthesis'}
Content Brief:


Photosynthesis is the process plants use to convert sunlight into energy. This process produces glucose and releases oxygen as a byproduct. It is crucial for sustaining life on Earth by providing food and oxygen.




## Build the RAG Pipeline

In [25]:
from langchain_core.prompts import ChatPromptTemplate

rag_prompt = """You are an assistant who is an expert in question-answering tasks.
                Answer the following question using only the following pieces of retrieved context.
                If the answer is not in the context, do not make up answers, just say that you don't know.
                Keep the answer to the point based on the information from the context.

                Question:
                {question}

                Context:
                {context}

                Answer:
            """

rag_prompt_template = ChatPromptTemplate.from_template(rag_prompt)

In [26]:
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.runnables import RunnableLambda
from operator import itemgetter


chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

src_rag_response_chain = (
    {
        "context": (itemgetter('context')
                        |
                    RunnableLambda(format_docs)),
        "question": itemgetter("question")
    }
        |
    rag_prompt_template
        |
    chatgpt
        |
    StrOutputParser()
)

rag_chain_w_sources = (
    {
        "context": similarity_retriever,
        "question": RunnablePassthrough()
    }
        |
    RunnablePassthrough.assign(response=src_rag_response_chain)
)

In [27]:
query = "What is AI?"
result = rag_chain_w_sources.invoke(query)
result

{'context': [Document(metadata={'id': 10, 'title': 'Artificial Intelligence'}, page_content="Artificial intelligence refers to machines mimicking human intelligence, like problem-solving and learning. AI includes applications like virtual assistants, robotics, and autonomous vehicles. It's evolving rapidly with advancements in machine learning and deep learning."),
  Document(metadata={'id': 3, 'title': 'Natural Language Processing (NLP)'}, page_content='NLP is a branch of AI that enables computers to understand, interpret, and generate human language. Techniques include tokenization, stemming, and sentiment analysis. Applications range from chatbots to language translation services.'),
  Document(metadata={'id': 1, 'title': 'Machine Learning'}, page_content='Machine learning is a field of artificial intelligence focused on enabling systems to learn patterns from data. Algorithms analyze past data to make predictions or classify information. Popular applications include recommendation 

In [28]:
query = "How do plants survive?"
result = rag_chain_w_sources.invoke(query)
result

{'context': [Document(metadata={'id': 5, 'title': 'Photosynthesis'}, page_content='Photosynthesis is the process plants use to convert sunlight into energy. This process produces glucose and releases oxygen as a byproduct. It is crucial for sustaining life on Earth by providing food and oxygen.')],
 'question': 'How do plants survive?',
 'response': 'Plants survive by using photosynthesis to convert sunlight into energy, producing glucose and releasing oxygen as a byproduct.'}

# Create End-to-End RAG Evaluation Workflow

![](https://i.imgur.com/GUIkpjy.png)

## Create a Synthetic RAG Golden Reference Dataset

In [29]:
doc_contexts = [doc.page_content for doc in processed_docs]
doc_contexts[:3]

['Machine learning is a field of artificial intelligence focused on enabling systems to learn patterns from data. Algorithms analyze past data to make predictions or classify information. Popular applications include recommendation systems and image recognition.',
 'Deep learning is a subset of machine learning utilizing neural networks with many layers. It excels in complex tasks like image and speech recognition. Convolutional and recurrent neural networks are among the common architectures used.',
 'NLP is a branch of AI that enables computers to understand, interpret, and generate human language. Techniques include tokenization, stemming, and sentiment analysis. Applications range from chatbots to language translation services.']

In [30]:
from deepeval.synthesizer import Synthesizer
from deepeval.synthesizer import types



In [31]:
synthesizer = Synthesizer(model='gpt-4o',
                          embedder=OpenAIEmbeddings())

eval_data = synthesizer.generate_goldens(
    # Provide a list of context for synthetic data generation
    contexts=[[doc] for doc in doc_contexts],
    include_expected_output=True,
    max_goldens_per_context=1,
    num_evolutions=1,
    scenario="Retrieval Augmented Generation",
    task="Question Answering",
    evolutions={
        types.Evolution.REASONING: 0.1,     # Evolves the input to require multi-step logical thinking.
        types.Evolution.MULTICONTEXT: 0.9,  # Ensures that all relevant information from the context is utilized.
        types.Evolution.CONCRETIZING: 0.0,  # Makes abstract ideas more concrete and detailed.
        types.Evolution.CONSTRAINED: 0.0,   # Introduces a condition or restriction, testing the model's ability to operate within specific limits.
        types.Evolution.COMPARATIVE: 0.0,   # Requires a response that involves a comparison between options or contexts.
        types.Evolution.HYPOTHETICAL: 0.0,  # Forces the model to consider and respond to a hypothetical scenario.
        types.Evolution.IN_BREADTH: 0.0,    # Broadens the input to touch on related or adjacent topics.
    }
)

Event loop is already running. Applying nest_asyncio patch to allow async execution...


✨ Generating up to 10 goldens using DeepEval (using gpt-4o, use case=QA, method=default): 100%|██████████| 10/10 [00:13<00:00,  1.36s/it]


In [50]:
from pprint import pprint

In [59]:
print(eval_data[0])

input='In what ways do solar and wind energy investments contribute to reducing emissions and supporting climate change mitigation?' actual_output=None expected_output='Solar and wind energy investments contribute to reducing emissions by providing sustainable alternatives to fossil fuels, which decrease greenhouse gas emissions. By harnessing naturally replenished resources, these investments support climate change mitigation efforts and are actively promoted by governments to combat environmental impacts.' context=['Renewable energy sources, such as solar and wind, provide sustainable alternatives to fossil fuels. These resources are replenished naturally and reduce greenhouse gas emissions. Governments are investing in renewables to combat climate change.'] retrieval_context=None additional_metadata={'evolutions': ['Multi-context'], 'synthetic_input_quality': 1.0, 'context_quality': None} comments=None tools_called=None expected_tools=None source_file=None


In [71]:
type(eval_data)

list

In [69]:
for elem in eval_data[0]:
    print(f"{elem[0]}: {elem[1]}")

input: In what ways do solar and wind energy investments contribute to reducing emissions and supporting climate change mitigation?
actual_output: None
expected_output: Solar and wind energy investments contribute to reducing emissions by providing sustainable alternatives to fossil fuels, which decrease greenhouse gas emissions. By harnessing naturally replenished resources, these investments support climate change mitigation efforts and are actively promoted by governments to combat environmental impacts.
context: ['Renewable energy sources, such as solar and wind, provide sustainable alternatives to fossil fuels. These resources are replenished naturally and reduce greenhouse gas emissions. Governments are investing in renewables to combat climate change.']
retrieval_context: None
additional_metadata: {'evolutions': ['Multi-context'], 'synthetic_input_quality': 1.0, 'context_quality': None}
comments: None
tools_called: None
expected_tools: None
source_file: None


## Save the Synthetic RAG Golden Reference Dataset

In [72]:
import dill

In [73]:
with open('golden_ref_data.bin', 'wb') as f:
    dill.dump(eval_data, f)

## Create RAG Evaluation Dataset

In [74]:
from deepeval.dataset import EvaluationDataset

eval_dataset = EvaluationDataset()

# load golden dataset
with open('golden_ref_data.bin', 'rb') as f:
    golden_docs = dill.load(f)

eval_dataset.goldens = golden_docs

In [75]:
type(eval_dataset)

deepeval.dataset.dataset.EvaluationDataset

In [76]:
for elem in eval_dataset.goldens[0]:
    print(f"{elem[0]}: {elem[1]}")

input: In what ways do solar and wind energy investments contribute to reducing emissions and supporting climate change mitigation?
actual_output: None
expected_output: Solar and wind energy investments contribute to reducing emissions by providing sustainable alternatives to fossil fuels, which decrease greenhouse gas emissions. By harnessing naturally replenished resources, these investments support climate change mitigation efforts and are actively promoted by governments to combat environmental impacts.
context: ['Renewable energy sources, such as solar and wind, provide sustainable alternatives to fossil fuels. These resources are replenished naturally and reduce greenhouse gas emissions. Governments are investing in renewables to combat climate change.']
retrieval_context: None
additional_metadata: {'evolutions': ['Multi-context'], 'synthetic_input_quality': 1.0, 'context_quality': None}
comments: None
tools_called: None
expected_tools: None
source_file: None


In [77]:
eval_dataset.goldens[0].input

'In what ways do solar and wind energy investments contribute to reducing emissions and supporting climate change mitigation?'

In [79]:
response_obj = rag_chain_w_sources.invoke(eval_dataset.goldens[0].input)

In [80]:
response_obj

{'context': [Document(metadata={'id': 9, 'title': 'Renewable Energy'}, page_content='Renewable energy sources, such as solar and wind, provide sustainable alternatives to fossil fuels. These resources are replenished naturally and reduce greenhouse gas emissions. Governments are investing in renewables to combat climate change.')],
 'question': 'In what ways do solar and wind energy investments contribute to reducing emissions and supporting climate change mitigation?',
 'response': 'Solar and wind energy investments contribute to reducing emissions by providing sustainable alternatives to fossil fuels, which helps to lower greenhouse gas emissions.'}

In [89]:
[doc.page_content for doc in response_obj['context']]

['Renewable energy sources, such as solar and wind, provide sustainable alternatives to fossil fuels. These resources are replenished naturally and reduce greenhouse gas emissions. Governments are investing in renewables to combat climate change.']

In [40]:
from typing import List
from deepeval.test_case import LLMTestCase
from deepeval.dataset import Golden
from tqdm import tqdm

def convert_goldens_to_test_cases(goldens: List[Golden]) -> List[LLMTestCase]:
    test_cases = []
    for golden in tqdm(goldens):
        response_obj = rag_chain_w_sources.invoke(golden.input)
        test_case = LLMTestCase(
            input=golden.input,
            actual_output=response_obj['response'],
            expected_output=golden.expected_output,
            context=golden.context,
            retrieval_context=[doc.page_content for doc in response_obj['context']]
        )
        test_cases.append(test_case)
    return test_cases

In [82]:
eval_dataset.test_cases = convert_goldens_to_test_cases(eval_dataset.goldens)

100%|██████████| 10/10 [00:18<00:00,  1.82s/it]


In [83]:
print(eval_dataset.test_cases[0])

LLMTestCase(input='In what ways do solar and wind energy investments contribute to reducing emissions and supporting climate change mitigation?', actual_output='Solar and wind energy investments contribute to reducing emissions by providing sustainable alternatives to fossil fuels, which helps to lower greenhouse gas emissions.', expected_output='Solar and wind energy investments contribute to reducing emissions by providing sustainable alternatives to fossil fuels, which decrease greenhouse gas emissions. By harnessing naturally replenished resources, these investments support climate change mitigation efforts and are actively promoted by governments to combat environmental impacts.', context=['Renewable energy sources, such as solar and wind, provide sustainable alternatives to fossil fuels. These resources are replenished naturally and reduce greenhouse gas emissions. Governments are investing in renewables to combat climate change.'], retrieval_context=['Renewable energy sources, s

In [88]:
for elem in eval_dataset.test_cases:
    print(f"Input: {elem.input}")
    print(f"Actual Output: {elem.actual_output}")
    print(f"Expected Output: {elem.expected_output}")
    print(f"Context: {elem.context}")
    print(f"Retrieval Context: {elem.retrieval_context}")
    print("-" * 100)

Input: In what ways do solar and wind energy investments contribute to reducing emissions and supporting climate change mitigation?
Actual Output: Solar and wind energy investments contribute to reducing emissions by providing sustainable alternatives to fossil fuels, which helps to lower greenhouse gas emissions.
Expected Output: Solar and wind energy investments contribute to reducing emissions by providing sustainable alternatives to fossil fuels, which decrease greenhouse gas emissions. By harnessing naturally replenished resources, these investments support climate change mitigation efforts and are actively promoted by governments to combat environmental impacts.
Context: ['Renewable energy sources, such as solar and wind, provide sustainable alternatives to fossil fuels. These resources are replenished naturally and reduce greenhouse gas emissions. Governments are investing in renewables to combat climate change.']
Retrieval Context: ['Renewable energy sources, such as solar and 

## Run and View RAG Evaluations on the Evaluation Dataset

In [90]:
from deepeval import evaluate
from deepeval.metrics import ContextualPrecisionMetric, ContextualRecallMetric, ContextualRelevancyMetric
from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric, HallucinationMetric
from deepeval.metrics.ragas import RAGASAnswerRelevancyMetric

contextual_precision = ContextualPrecisionMetric(threshold=0.5, include_reason=True, model="gpt-4o")
contextual_recall = ContextualRecallMetric(threshold=0.5, include_reason=True, model="gpt-4o")
contextual_relevancy = ContextualRelevancyMetric(threshold=0.5, include_reason=True, model="gpt-4o")
answer_relevancy = AnswerRelevancyMetric(threshold=0.5, include_reason=True, model="gpt-4o")
faithfulness = FaithfulnessMetric(threshold=0.5, include_reason=True, model="gpt-4o")
hallucination = HallucinationMetric(threshold=0.5, include_reason=True, model="gpt-4o")
ragas_answer_relevancy = RAGASAnswerRelevancyMetric(threshold=0.5, embeddings=OpenAIEmbeddings(), model="gpt-4o")

eval_results = evaluate(test_cases=eval_dataset.test_cases,
                        metrics=[contextual_precision, contextual_recall, contextual_relevancy,
                                 answer_relevancy, ragas_answer_relevancy, faithfulness, hallucination])

Event loop is already running. Applying nest_asyncio patch to allow async execution...


Evaluating 10 test case(s) in parallel: |          |  0% (0/10) [Time Taken: 00:00, ?test case/s]

Evaluating:   0%|          | 0/1 [00:00<?, ?it/s]

None


Evaluating:   0%|          | 0/1 [00:00<?, ?it/s]

None


Evaluating:   0%|          | 0/1 [00:00<?, ?it/s]

None


Evaluating:   0%|          | 0/1 [00:00<?, ?it/s]

None


Evaluating:   0%|          | 0/1 [00:00<?, ?it/s]

None


Evaluating:   0%|          | 0/1 [00:00<?, ?it/s]

None


Evaluating:   0%|          | 0/1 [00:00<?, ?it/s]

None


Evaluating:   0%|          | 0/1 [00:00<?, ?it/s]

None


Evaluating:   0%|          | 0/1 [00:00<?, ?it/s]

None


Evaluating:   0%|          | 0/1 [00:00<?, ?it/s]

None


ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:ro



Metrics Summary

  - ✅ Contextual Precision (score: 1.0, threshold: 0.5, strict: False, evaluation model: gpt-4o, reason: The score is 1.00 because the first node in the retrieval context directly addresses the ways AI systems utilize data analysis by explaining how algorithms analyze past data to make predictions or classify information. The irrelevant nodes, ranked second and third, discuss AI in general terms and applications like virtual assistants and robotics or focus on NLP techniques, which do not directly relate to the input question. Great job on getting it spot on!, error: None)
  - ✅ Contextual Recall (score: 1.0, threshold: 0.5, strict: False, evaluation model: gpt-4o, reason: The score is 1.00 because every sentence in the expected output is perfectly aligned with information from the nodes in the retrieval context. Great job!, error: None)
  - ❌ Contextual Relevancy (score: 0.3333333333333333, threshold: 0.5, strict: False, evaluation model: gpt-4o, reason: The score i




In [45]:
eval_results.test_results[0]

TestResult(success=True, metrics_data=[MetricData(name='Contextual Precision', threshold=0.5, success=True, score=1.0, reason="The score is 1.00 because the node in the retrieval context perfectly aligns with the input by explaining that 'renewable energy sources, such as solar and wind, provide sustainable alternatives to fossil fuels' and 'reduce greenhouse gas emissions,' which directly addresses how solar and wind investments contribute to reducing emissions and supporting climate change mitigation. Great job on ranking the relevant information at the top!", strict_mode=False, evaluation_model='gpt-4o', error=None, evaluation_cost=0.0040025, verbose_logs='Verdicts:\n[\n    {\n        "verdict": "yes",\n        "reason": "The context states that \'renewable energy sources, such as solar and wind, provide sustainable alternatives to fossil fuels\' and \'reduce greenhouse gas emissions,\' which directly addresses how solar and wind investments contribute to reducing emissions and supp

In [92]:
for result in eval_results.test_results:
    print(f"Input: {result.input}")
    print(f"Expected Output: {result.expected_output}")
    print(f"Actual Output: {result.actual_output}")
    print(f"Context: {result.context}")
    print(f"Retrieval Context: {result.retrieval_context}")
    print(f"Success: {result.success}")
    metrics = result.metrics_data
    for metric in metrics:
        print(f"{metric.name}: {metric.score}")
        print(f"{metric.name}_Success: {metric.success}")
        print(f"{metric.name}_Reason: {metric.reason}")
    print("-" * 100)

Input: In what ways do AI systems utilize data analysis to classify information and predict trends?
Expected Output: AI systems utilize data analysis by employing algorithms that learn patterns from historical data. These algorithms can classify information by identifying features and relationships within the data. Additionally, they predict trends by analyzing past behaviors and outcomes to forecast future events, commonly seen in recommendation systems and image recognition.
Actual Output: AI systems utilize data analysis by employing algorithms to analyze past data, which allows them to identify patterns for making predictions or classifying information.
Context: ['Machine learning is a field of artificial intelligence focused on enabling systems to learn patterns from data. Algorithms analyze past data to make predictions or classify information. Popular applications include recommendation systems and image recognition.']
Retrieval Context: ['Machine learning is a field of artifici

In [46]:
eval_metrics = []
for result in eval_results.test_results:
    eval_dict = {}
    eval_dict['Input'] = result.input
    eval_dict['Expected Output'] = result.expected_output
    eval_dict['Actual Output'] = result.actual_output
    eval_dict['Context'] = result.context
    eval_dict['Retrieval Context'] = result.retrieval_context
    eval_dict['Success'] = result.success
    metrics = result.metrics_data
    for metric in metrics:
        eval_dict[metric.name+'_Score'] = metric.score
    for metric in metrics:
        eval_dict[metric.name+'_Success'] = metric.success
    for metric in metrics:
        eval_dict[metric.name+'_Reason'] = metric.reason
    eval_metrics.append(eval_dict)

In [47]:
eval_metrics[0]

{'Input': 'In what ways do solar and wind energy investments contribute to reducing emissions and supporting climate change mitigation?',
 'Expected Output': 'Solar and wind energy investments contribute to reducing emissions by providing sustainable alternatives to fossil fuels, which decrease greenhouse gas emissions. By harnessing naturally replenished resources, these investments support climate change mitigation efforts and are actively promoted by governments to combat environmental impacts.',
 'Actual Output': 'Solar and wind energy investments contribute to reducing emissions by providing sustainable alternatives to fossil fuels, which helps to lower greenhouse gas emissions.',
 'Context': ['Renewable energy sources, such as solar and wind, provide sustainable alternatives to fossil fuels. These resources are replenished naturally and reduce greenhouse gas emissions. Governments are investing in renewables to combat climate change.'],
 'Retrieval Context': ['Renewable energy so

In [48]:
import pandas as pd

eval_results_df = pd.DataFrame(eval_metrics)
eval_results_df.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
Input,In what ways do solar and wind energy investme...,How do multi-layered neural networks contribut...,In what ways do the construction techniques an...,What role do cryptographic methods and decentr...,In what ways do AI systems utilize data analys...,In what ways do wave-particle duality and unce...,In what ways do AI applications like virtual a...,How have advancements in cellular and DNA rese...,How do plants convert sunlight into energy and...,How does NLP enhance language understanding an...
Expected Output,Solar and wind energy investments contribute t...,"Multi-layered neural networks, such as convolu...","The construction techniques of the pyramids, p...",Cryptographic methods in Bitcoin ensure secure...,AI systems utilize data analysis by employing ...,Wave-particle duality challenges classical phy...,AI applications like virtual assistants and ro...,Advancements in cellular and DNA research have...,Plants convert sunlight into energy through ph...,NLP enhances language understanding by using t...
Actual Output,Solar and wind energy investments contribute t...,Multi-layered neural networks contribute to im...,The context does not provide specific details ...,Cryptographic methods ensure secure transactio...,AI systems utilize data analysis by employing ...,Wave-particle duality and the uncertainty prin...,AI applications like virtual assistants and ro...,Advancements in cellular and DNA research have...,Plants convert sunlight into energy through th...,NLP enhances language understanding by enablin...
Context,"[Renewable energy sources, such as solar and w...",[Deep learning is a subset of machine learning...,"[Pyramids are ancient structures, often servin...",[Cryptocurrency is a digital currency that use...,[Machine learning is a field of artificial int...,[Quantum mechanics is a branch of physics that...,[Artificial intelligence refers to machines mi...,"[Biology is the study of living organisms, cov...",[Photosynthesis is the process plants use to c...,[NLP is a branch of AI that enables computers ...
Retrieval Context,"[Renewable energy sources, such as solar and w...",[Deep learning is a subset of machine learning...,"[Pyramids are ancient structures, often servin...",[Cryptocurrency is a digital currency that use...,[Machine learning is a field of artificial int...,[Quantum mechanics is a branch of physics that...,[Artificial intelligence refers to machines mi...,"[Biology is the study of living organisms, cov...",[Photosynthesis is the process plants use to c...,[NLP is a branch of AI that enables computers ...
Success,True,True,False,True,True,True,True,True,True,False
Contextual Precision_Score,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
Contextual Recall_Score,1.0,0.5,1.0,1.0,1.0,1.0,1.0,0.666667,0.75,1.0
Contextual Relevancy_Score,1.0,0.5,1.0,1.0,0.555556,0.666667,0.555556,1.0,0.666667,0.333333
Answer Relevancy_Score,1.0,1.0,0.5,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [49]:
eval_results_df

Unnamed: 0,Input,Expected Output,Actual Output,Context,Retrieval Context,Success,Contextual Precision_Score,Contextual Recall_Score,Contextual Relevancy_Score,Answer Relevancy_Score,...,Answer Relevancy (ragas)_Success,Faithfulness_Success,Hallucination_Success,Contextual Precision_Reason,Contextual Recall_Reason,Contextual Relevancy_Reason,Answer Relevancy_Reason,Answer Relevancy (ragas)_Reason,Faithfulness_Reason,Hallucination_Reason
0,In what ways do solar and wind energy investme...,Solar and wind energy investments contribute t...,Solar and wind energy investments contribute t...,"[Renewable energy sources, such as solar and w...","[Renewable energy sources, such as solar and w...",True,1.0,1.0,1.0,1.0,...,True,True,True,The score is 1.00 because the node in the retr...,The score is 1.00 because every sentence in th...,The score is 1.00 because the retrieval contex...,The score is 1.00 because the response is perf...,,The score is 1.00 because there are no contrad...,The score is 0.00 because the actual output co...
1,How do multi-layered neural networks contribut...,"Multi-layered neural networks, such as convolu...",Multi-layered neural networks contribute to im...,[Deep learning is a subset of machine learning...,[Deep learning is a subset of machine learning...,True,1.0,0.5,0.5,1.0,...,True,True,True,The score is 1.00 because the relevant node in...,The score is 0.50 because while the first sent...,The score is 0.50 because while the retrieval ...,The score is 1.00 because the response is full...,,The score is 1.00 because there are no contrad...,The score is 0.00 because the actual output pe...
2,In what ways do the construction techniques an...,"The construction techniques of the pyramids, p...",The context does not provide specific details ...,"[Pyramids are ancient structures, often servin...","[Pyramids are ancient structures, often servin...",False,1.0,1.0,1.0,0.5,...,False,True,True,The score is 1.00 because all relevant nodes i...,The score is 1.00 because all sentences in the...,The score is 1.00 because the context perfectl...,The score is 0.50 because while the output par...,,The score is 1.00 because there are no contrad...,The score is 0.00 because the actual output do...
3,What role do cryptographic methods and decentr...,Cryptographic methods in Bitcoin ensure secure...,Cryptographic methods ensure secure transactio...,[Cryptocurrency is a digital currency that use...,[Cryptocurrency is a digital currency that use...,True,1.0,1.0,1.0,1.0,...,True,True,True,The score is 1.00 because the first node in th...,The score is 1.00 because all sentences in the...,The score is 1.00 because the context precisel...,The score is 1.00 because the response perfect...,,The score is 1.00 because there are no contrad...,The score is 0.00 because the actual output al...
4,In what ways do AI systems utilize data analys...,AI systems utilize data analysis by employing ...,AI systems utilize data analysis by employing ...,[Machine learning is a field of artificial int...,[Machine learning is a field of artificial int...,True,1.0,1.0,0.555556,1.0,...,True,True,True,"The score is 1.00 because the relevant node, w...",The score is 1.00 because every sentence in th...,The score is 0.56 because while the relevant s...,The score is 1.00 because the response was per...,,The score is 1.00 because there are no contrad...,The score is 0.00 because the actual output fu...
5,In what ways do wave-particle duality and unce...,Wave-particle duality challenges classical phy...,Wave-particle duality and the uncertainty prin...,[Quantum mechanics is a branch of physics that...,[Quantum mechanics is a branch of physics that...,True,1.0,1.0,0.666667,1.0,...,True,True,True,The score is 1.00 because the node in retrieva...,The score is 1.00 because all sentences in the...,The score is 0.67 because while the retrieval ...,The score is 1.00 because the response perfect...,,The score is 1.00 because there are no contrad...,The score is 0.00 because there are no contrad...
6,In what ways do AI applications like virtual a...,AI applications like virtual assistants and ro...,AI applications like virtual assistants and ro...,[Artificial intelligence refers to machines mi...,[Artificial intelligence refers to machines mi...,True,1.0,1.0,0.555556,1.0,...,True,True,True,The score is 1.00 because all nodes in the ret...,The score is 1.00 because every sentence in th...,The score is 0.56 because while some statement...,The score is 1.00 because the response is perf...,,The score is 1.00 because there are no contrad...,The score is 0.00 because the actual output al...
7,How have advancements in cellular and DNA rese...,Advancements in cellular and DNA research have...,Advancements in cellular and DNA research have...,"[Biology is the study of living organisms, cov...","[Biology is the study of living organisms, cov...",True,1.0,0.666667,1.0,1.0,...,True,True,True,The score is 1.00 because the node in the retr...,The score is 0.67 because the retrieval contex...,The score is 1.00 because the retrieval contex...,The score is 1.00 because the output is perfec...,,The score is 1.00 because there are no contrad...,The score is 0.00 because the actual output fu...
8,How do plants convert sunlight into energy and...,Plants convert sunlight into energy through ph...,Plants convert sunlight into energy through th...,[Photosynthesis is the process plants use to c...,[Photosynthesis is the process plants use to c...,True,1.0,0.75,0.666667,1.0,...,True,True,True,The score is 1.00 because the first node in th...,The score is 0.75 because while the nodes in r...,The score is 0.67 because while the relevant s...,The score is 1.00 because the answer perfectly...,,The score is 1.00 because there are no contrad...,The score is 0.00 because there are no contrad...
9,How does NLP enhance language understanding an...,NLP enhances language understanding by using t...,NLP enhances language understanding by enablin...,[NLP is a branch of AI that enables computers ...,[NLP is a branch of AI that enables computers ...,False,1.0,1.0,0.333333,1.0,...,True,True,True,The score is 1.00 because all the relevant nod...,The score is 1.00 because every sentence in th...,The score is 0.33 because while the retrieval ...,The score is 1.00 because the response is perf...,,The score is 1.00 because there are no contrad...,The score is 0.00 because the actual output fu...


In [0]:
eval_results_df[['Contextual Precision_Score', 'Contextual Recall_Score', 'Contextual Relevancy_Score',
                 'Answer Relevancy_Score', 'Answer Relevancy (ragas)_Score',
                 'Faithfulness_Score', 'Hallucination_Score']].describe()