In this notebook we will use some evaluation benchmarks to test the performance of our RAG pipeline. We will rely on `Retrieval Augmented Generation Assessment (RAGA)` implemented through the `ragas` package. This is our [source](https://towardsdatascience.com/evaluating-rag-applications-with-ragas-81d67b0ee31a)

### Prerequisites
Because `ragas` relies on the `langchain` module, we need to make sure the right versions are installed so that they play well together.

In [None]:
# !pip install langchain=0.0.350
# !pip install ragas=0.0.22

In [1]:
from IPython.display import Markdown, display

## Loading Documents

In [2]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter

In [3]:
loader = TextLoader("data/transcript.txt", encoding='utf-8')
documents = loader.load()

In [4]:
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=0, separator='\n')
docs = text_splitter.split_documents(documents)

## Embeddings

In [6]:
from langchain.embeddings import GooglePalmEmbeddings

In [7]:
import os
palm_api_key = os.getenv("PALM_API_KEY")

In [8]:
embeddings = GooglePalmEmbeddings(google_api_key=palm_api_key)

## Vector Store (Pinecone)
These allow us to store our text embeddings for querying later. In this notebook we will be using `pinecone`.

In [9]:
from langchain.vectorstores.pinecone import Pinecone

In [11]:
index_name = "langchain-demo"
docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)

## LLM

In [15]:
from langchain.llms import GooglePalm

In [16]:
llm = GooglePalm(google_api_key=palm_api_key, temperature=0.7)

## RetrievalQA

In [17]:
from langchain.chains import RetrievalQA

In [18]:
query = "How have western perspectives of the world shaped africans' outlook on life?"

In [19]:
retriever = docsearch.as_retriever()

## Evaluation

In [20]:
import os

from langchain.chat_models import ChatGooglePalm
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

In [21]:

# Define LLM
llm = ChatGooglePalm(google_api_key=os.getenv("PALM_API_KEY"), temperature=0.0)

# Define prompt template
template = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use two sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:
"""

prompt = ChatPromptTemplate.from_template(template)

# Setup RAG pipeline
rag_chain = (
    {"context": retriever,  "question": RunnablePassthrough()} 
    | prompt 
    | llm
    | StrOutputParser() 
)

In [22]:
from datasets import Dataset

questions = ["What did professor Ali Mazrui say?", 
             "What is the social in this context?",
             "What is ubuntu?",
            ]
# ground_truths = [["The president said that Justice Breyer has dedicated his life to serve the country and thanked him for his service."],
#                 ["The president said that Pat Gelsinger is ready to increase Intel's investment to $100 billion."],
#                 ["The president asked Congress to pass proven measures to reduce gun violence."]]
answers = []
contexts = []

# Inference
for query in questions:
  answers.append(rag_chain.invoke(query))
  contexts.append([docs.page_content for docs in retriever.get_relevant_documents(query)])

# To dict
data = {
    "question": questions,
    "answer": answers,
    "contexts": contexts,
    # "ground_truths": ground_truths
}

# Convert dict to dataset
dataset = Dataset.from_dict(data)

In [24]:
from ragas import evaluate
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    # context_recall,
    context_precision,
)

result = evaluate(
    dataset = dataset, 
    metrics=[
        context_precision,
        # context_recall,
        faithfulness,
        answer_relevancy,
    ],
)

df = result.to_pandas()

evaluating with [context_precision]


  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:26<?, ?it/s]


RuntimeError: ('Fatal error occurred while running async tasks.', RateLimitError("Error code: 429 - {'error': {'message': 'Your account is not active, please check your billing details on our website.', 'type': 'billing_not_active', 'param': None, 'code': 'billing_not_active'}}"))

## Remarks
The `ragas` library is very brittle and requires a ton of dependencies one of which is `Azure`. This results in 
```
Error code: 429 - {'error': {'message': 'Your account is not active
```
as indeed I do not have an account with them.

This is all very frustrating as these tools should rely on open source services.