# Ragas
## Introduction

Ragas is an evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
<https://github.com/explodinggradients/ragas>

Note: Need to make sure you have the following installed ...

sudo apt-get update
sudo apt-get install libgl1-mesa-glx
sudo apt-get install libglib2.0-0


## Installation

In [1]:
%pip install -q -r requirements.txt
%pip install -q "unstructured[pdf]"
%pip install -q --upgrade opencv-python

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
grpcio-status 1.66.1 requires protobuf<6.0dev,>=5.26.1, but you have protobuf 4.25.4 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
opentelemetry-proto 1.27.0 requires protobuf<5.0,>=3.19, but you have protobuf 5.28.0 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Setting up the LLM

In [2]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
# Initialize OpenAI models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

## Loading the documents

In [3]:
from langchain_community.document_loaders import DirectoryLoader
loader = DirectoryLoader("data/papers")
documents = loader.load()

# Ensure metadata includes 'filename'
for document in documents:
    document.metadata['filename'] = document.metadata['source']

## Setup a test generator

In [4]:
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import answer_relevancy, context_precision, faithfulness, context_recall
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context

# Create a generator instance
generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

## Generate the test set

In [5]:
# Generate test set
testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

embedding nodes:   0%|          | 0/118 [00:00<?, ?it/s]

Generating:   0%|          | 0/10 [00:00<?, ?it/s]

max retries exceeded for ReasoningEvolution(generator_llm=LangchainLLMWrapper(run_config=RunConfig(timeout=180, max_retries=15, max_wait=90, max_workers=16, exception_types=<class 'openai.RateLimitError'>, log_tenacity=False, seed=42)), docstore=InMemoryDocumentStore(splitter=<langchain_text_splitters.base.TokenTextSplitter object at 0x7ffe9db86630>, nodes=[Node(metadata={'source': 'data/papers/microsoft_annual_report_2022.pdf', 'filename': 'data/papers/microsoft_annual_report_2022.pdf'}, page_content='Dear shareholders, colleagues, customers, and partners:\n\nWe are living through a period of historic economic, societal, and geopolitical change. The world in 2022 looks nothing like the world in 2019. As I write this, inflation is at a 40-year high, supply chains are stretched, and the war in Ukraine is ongoing. At the same time, we are entering a technological era with the potential to power awesome advancements across every sector of our economy and society. As the world’s largest so

## Show the generated test set

In [6]:
# Convert to Pandas DataFrame
test_df = testset.to_pandas()
test_df

Unnamed: 0,question,contexts,ground_truth,evolution_type,metadata,episode_done
0,How is Microsoft Power Platform helping domain...,[ and related accessories.\n\nThe Ambitions Th...,Microsoft Power Platform is helping domain exp...,simple,[{'source': 'data/papers/microsoft_annual_repo...,True
1,How does average revenue per user expansion im...,"[ and average revenue per user expansion, as w...",Average revenue per user expansion impacts Mic...,simple,[{'source': 'data/papers/microsoft_annual_repo...,True
2,How are foreign exchange forward contracts des...,[ hedged using foreign exchange forward contra...,Foreign exchange forward contracts that are de...,simple,[{'source': 'data/papers/microsoft_annual_repo...,True
3,What is the dividend history of Microsoft Corp...,[ STOCKHOLDERS\n\nOur common stock is traded o...,Our Board of Directors declared the following ...,simple,[{'source': 'data/papers/microsoft_annual_repo...,True
4,What is the method used to test goodwill for i...,[� are business dispositions and transfers bet...,We test goodwill for impairment annually on Ma...,simple,[{'source': 'data/papers/microsoft_annual_repo...,True
5,How are credit default swap contracts used in ...,[ hedged using foreign exchange forward contra...,Credit default swap contracts are used to mana...,reasoning,[{'source': 'data/papers/microsoft_annual_repo...,True
6,Which share repurchase programs are active for...,[ITY\n\nShares Outstanding\n\nShares of common...,The share repurchase program approved on Septe...,multi_context,[{'source': 'data/papers/microsoft_annual_repo...,True
7,What is Microsoft doing in mixed reality solut...,[ and related accessories.\n\nThe Ambitions Th...,Microsoft is accelerating their development of...,multi_context,[{'source': 'data/papers/microsoft_annual_repo...,True
8,What are the effects of IRS audits on a compan...,"[ 2016, and a portion of the IRS audit for tax...",The effects of IRS audits on a company's finan...,multi_context,[{'source': 'data/papers/microsoft_annual_repo...,True


In [7]:
import pandas as pd

# Example DataFrame

# Find the maximum width for each column
max_col_widths = test_df.astype(str).applymap(len).max()

# Adjust each column to have left-justified strings
justified_df = test_df.apply(lambda col: col.astype(str).apply(lambda x: x.ljust(max_col_widths[col.name])))

# Convert to string and print
#print(justified_df.to_string(index=False))

  max_col_widths = test_df.astype(str).applymap(len).max()


In [8]:
# Convert the test dataset to a pandas DataFrame
df = testset.to_pandas()

# Ensure the 'expected_answer' column is present
if 'expected_answer' not in df.columns:
    df['expected_answer'] = None  # or any default value you prefer

# Ensure the 'answer' column is present
if 'answer' not in df.columns:
    df['answer'] = df['ground_truth'].astype(str)

# Convert the DataFrame to a Dataset
from datasets import Dataset
dataset = Dataset.from_pandas(df)

## Evaluating the results

In [9]:
# Now you can use the DataFrame with the evaluate function
from ragas import evaluate
from ragas.metrics import answer_relevancy, context_precision, faithfulness, context_recall

result = evaluate(
    dataset,
    metrics=[answer_relevancy, context_precision, faithfulness, context_recall],
)

print(result)

Evaluating:   0%|          | 0/36 [00:00<?, ?it/s]

{'answer_relevancy': 0.8398, 'context_precision': 1.0000, 'faithfulness': 0.8775, 'context_recall': 0.8778}


In [10]:
df = result.to_pandas()

for index, row in df.iterrows():
    print(f"Question: {row['question']}\n")
    print(f"Answer: {row['answer']}\n")
    print(f"Answer Relevancy: {row['answer_relevancy']}\n")
    print(f"Context Precision: {row['context_precision']}\n")
    print(f"Context Recall: {row['context_recall']}\n")
    print(f"Faithfulness: {row['faithfulness']}\n")

Question: How is Microsoft Power Platform helping domain experts drive productivity gains?

Answer: Microsoft Power Platform is helping domain experts drive productivity gains with low-code/no-code tools, robotic process automation, virtual agents, and business intelligence.

Answer Relevancy: 0.9769264988134311

Context Precision: 0.9999999999

Context Recall: 1.0

Faithfulness: 1.0

Question: How does average revenue per user expansion impact Microsoft's business?

Answer: Average revenue per user expansion impacts Microsoft's business by driving growth and increasing revenue. It is a key factor in the company's financial performance and success. As average revenue per user expands, Microsoft is able to generate more revenue from each user, leading to increased profitability and financial stability. This expansion is influenced by various factors, such as the shift from Office licensed on-premises to Office 365, the demand for communication and storage services, and the sale of addit

## Tracing LLM calls with LangSmith

Note: Make sure you export the Langchain variables for Langsmith to work. 

```
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
export LANGCHAIN_API_KEY=<your-api-key>
export LANGCHAIN_PROJECT="09-ragas.ipynb"
```


In [11]:
%pip install -q -U langsmith

Note: you may need to restart the kernel to use updated packages.


In [12]:
import openai
from langsmith.wrappers import wrap_openai
from langsmith import traceable

# Auto-trace LLM calls in-context
client = wrap_openai(openai.Client())

@traceable # Auto-trace this function
def pipeline(user_input: str):
    result = client.chat.completions.create(
        messages=[{"role": "user", "content": user_input}],
        model="gpt-3.5-turbo"
    )
    return result.choices[0].message.content

pipeline("Hello, cruel world!")
# Out:  Hello there! How can I assist you today?

'Hello there! How can I assist you today?'

In [13]:
result = evaluate(
    dataset,
    metrics=[answer_relevancy, context_precision, faithfulness, context_recall],
)

print(result)

Evaluating:   0%|          | 0/36 [00:00<?, ?it/s]

{'answer_relevancy': 0.8398, 'context_precision': 1.0000, 'faithfulness': 0.8795, 'context_recall': 0.8778}


Checkout Langsmith traces at:  https://smith.langchain.com/