## Setting up environnement

In [1]:
!pip install -qU langchain langchain-core langchain-community langchain-openai

In [2]:
!pip install -qU qdrant-client

In [3]:
!pip install -qU tiktoken pymupdf

In [4]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [5]:
from langchain_openai import ChatOpenAI

openai_chat_model = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

## Loading the data

In [6]:
from langchain.document_loaders import PyMuPDFLoader

docs = PyMuPDFLoader("https://d18rn0p25nwr6d.cloudfront.net/CIK-0001326801/c7318154-f6ae-4866-89fa-f0c589f2ee3d.pdf").load()

## Chunking the data

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
import tiktoken

def tiktoken_len(text):
    tokens = tiktoken.encoding_for_model("gpt-3.5-turbo").encode(
        text,
    )
    return len(tokens)


In [8]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 200,
    chunk_overlap = 50,
    length_function = tiktoken_len,
)

split_chunks = text_splitter.split_documents(docs)

In [9]:
len(split_chunks)

765

## Embedding and vectore storing

In [10]:
from langchain_openai.embeddings import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")

In [11]:
from langchain_community.vectorstores import Qdrant

qdrant_vectorstore = Qdrant.from_documents(
    split_chunks,
    embedding_model,
    location=":memory:",
    collection_name="Meta 10-k Fillings",
)

In [12]:
qdrant_retriever = qdrant_vectorstore.as_retriever()

## RAG Prompt

In [13]:
from langchain_core.prompts import ChatPromptTemplate

In [14]:
RAG_PROMPT = """
CONTEXT:
{context}

QUERY:
{question}

Answer the query if the context is related to it; otherwise, answer: 'Sorry, the context is unrelated to the query, I can't answer.'
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)

## RAG Chain

In [15]:
from operator import itemgetter
from langchain.schema.runnable import RunnablePassthrough

retrieval_augmented_qa_chain = (

    {"context": itemgetter("question") | qdrant_retriever, "question": itemgetter("question")}

    | RunnablePassthrough.assign(context=itemgetter("context"))

    | {"response": rag_prompt | openai_chat_model, "context": itemgetter("context")}
)

## Response generation

In [16]:
response_1 = retrieval_augmented_qa_chain.invoke({"question" : "What was the total value of 'Cash and cash equivalents' as of December 31, 2023?"})
response_1["response"].content

"The total value of 'Cash and cash equivalents' as of December 31, 2023, was $41,862."

In [17]:
response_2 = retrieval_augmented_qa_chain.invoke({"question" : "Who are Meta's 'Directors' (i.e., members of the Board of Directors)?"})
response_2["response"].content

"The context provided does not mention the specific names of Meta's Directors (members of the Board of Directors). Therefore, I cannot answer the query based on the given information. Sorry, the context is unrelated to the query, I can't answer."

In [18]:
response_2["context"]

[Document(page_content='to having a skilled, inclusive and diverse workforce because we believe cognitive diversity fuels innovation. To aid in this effort, we have taken steps to reduce\nbias from our hiring processes and performance management systems, as well as offering learning and development courses for our employees.\nCorporate Information\nWe were incorporated in Delaware in July 2004. We completed our initial public offering in May 2012 and our Class\xa0A common stock is currently listed\non the Nasdaq Global Select Market under the symbol "META." Our principal executive offices are located at 1 Meta Way, Menlo Park, California 94025, and\nour telephone number is (650) 543-4800.\nMeta, the Meta logo, Meta Quest, Meta Horizon, Facebook, FB, Instagram, Oculus, WhatsApp, Reels, and our other registered or common law', metadata={'source': 'https://d18rn0p25nwr6d.cloudfront.net/CIK-0001326801/c7318154-f6ae-4866-89fa-f0c589f2ee3d.pdf', 'file_path': 'https://d18rn0p25nwr6d.cloudfron

## First pipeline results analysis

- The answer to the first question is right.
- The answer to the second question is false. It shows that the pipeline is not able to retrieve the context needed to answer.
- I will have to upgrade the context retrieval part of the pipeline.

## Upgrading the chunking strategy
- As we're dealing with a large PDF including tables, let's try to adapt the chunking size to a larger value

In [19]:
text_splitter_2 = RecursiveCharacterTextSplitter(
    chunk_size = 800,
    chunk_overlap = 100,
    length_function = tiktoken_len,
)

split_chunks_2 = text_splitter_2.split_documents(docs)

In [20]:
len(split_chunks_2)

220

In [21]:
qdrant_vectorstore_2 = Qdrant.from_documents(
    split_chunks_2,
    embedding_model,
    location=":memory:",
    collection_name="Meta 10-k Fillings 2",
)

In [22]:
qdrant_retriever_2 = qdrant_vectorstore_2.as_retriever()

In [23]:
retrieval_augmented_qa_chain_2 = (

    {"context": itemgetter("question") | qdrant_retriever_2, "question": itemgetter("question")}

    | RunnablePassthrough.assign(context=itemgetter("context"))

    | {"response": rag_prompt | openai_chat_model, "context": itemgetter("context")}
)

In [24]:
response_1b = retrieval_augmented_qa_chain_2.invoke({"question" : "What was the total value of 'Cash and cash equivalents' as of December 31, 2023?"})
response_1b["response"].content

"The total value of 'Cash and cash equivalents' as of December 31, 2023, was $41,862 million."

In [25]:
response_2b = retrieval_augmented_qa_chain_2.invoke({"question" : "Who are Meta's 'Directors' (i.e., members of the Board of Directors)?"})
response_2b["response"].content

"Sorry, the context is unrelated to the query, I can't answer."

In [26]:
response_2b["context"]

[Document(page_content='Table of Contents\nCompensation, Benefits, Health, and Well-being\nWe offer competitive compensation to attract and retain the best people, and we help care for our people so they can focus on our mission. Our\nemployees\' total compensation package includes market-competitive salary, bonuses or sales incentives, and equity. We generally offer full-time employees\nequity at the time of hire and through annual equity grants because we want them to be owners of the company and committed to our long-term success. We\nhave conducted pay equity analyses for many years, and continue to be committed to pay equity. For example, in July 2023, we announced that our analyses\nconfirm that we continue to have pay equity across genders globally and by race in the United States for people in similar jobs, accounting for factors such as\nlocation, role, and level.\nThrough Life@ Meta, our holistic approach to benefits, we continue to provide our employees and their dependents 

## Second pipeline results analysis
- The second pipeline shows the same results as the first one. The context retrieval is still not working properly, despite the larger chunking size.

## Upgrading the retrieval strategy
- While conserving the same chunking strategy as in the first pipeline, I will try to upgrade the retrieval strategy by using the MultiQueryRetriever.

In [27]:
from langchain.retrievers import MultiQueryRetriever

multiquery_retriever = MultiQueryRetriever.from_llm(retriever=qdrant_retriever, llm=openai_chat_model)

In [28]:
retrieval_augmented_qa_chain_3 = (

    {"context": itemgetter("question") | multiquery_retriever, "question": itemgetter("question")}

    | RunnablePassthrough.assign(context=itemgetter("context"))

    | {"response": rag_prompt | openai_chat_model, "context": itemgetter("context")}
)

In [29]:
response_1c = retrieval_augmented_qa_chain_3.invoke({"question" : "What was the total value of 'Cash and cash equivalents' as of December 31, 2023?"})
response_1c["response"].content

"The total value of 'Cash and cash equivalents' as of December 31, 2023, was $41,862."

In [30]:
response_2c = retrieval_augmented_qa_chain_3.invoke({"question" : "Who are Meta's 'Directors' (i.e., members of the Board of Directors)?"})
response_2c["response"].content

"The members of Meta's Board of Directors are:\n1. Mark Zuckerberg\n2. Susan Li\n3. Aaron Anderson\n4. Peggy Alford\n5. Marc L. Andreessen\n6. Andrew W. Houston\n7. Nancy Killefer\n8. Robert M. Kimmitt"

In [31]:
response_2c["context"]

[Document(page_content='to having a skilled, inclusive and diverse workforce because we believe cognitive diversity fuels innovation. To aid in this effort, we have taken steps to reduce\nbias from our hiring processes and performance management systems, as well as offering learning and development courses for our employees.\nCorporate Information\nWe were incorporated in Delaware in July 2004. We completed our initial public offering in May 2012 and our Class\xa0A common stock is currently listed\non the Nasdaq Global Select Market under the symbol "META." Our principal executive offices are located at 1 Meta Way, Menlo Park, California 94025, and\nour telephone number is (650) 543-4800.\nMeta, the Meta logo, Meta Quest, Meta Horizon, Facebook, FB, Instagram, Oculus, WhatsApp, Reels, and our other registered or common law', metadata={'source': 'https://d18rn0p25nwr6d.cloudfront.net/CIK-0001326801/c7318154-f6ae-4866-89fa-f0c589f2ee3d.pdf', 'file_path': 'https://d18rn0p25nwr6d.cloudfron

## Third pipeline results analysis
- The third pipeline shows good results and can answer both questions. The retrieval strategy has been upgraded and is now working properly.

# Implementing RAGAS

The objective is to use RAGAS to get a quantitative evaluation of the 1st and 3rd pipelines.

As the main difference between the two pipelines is the retrieval strategy, using MultiQueryRetriever in the 3rd pipeline, I will use RAGAS to evaluate the retrieval strategy.

In [32]:
!pip install -qU ragas

In [33]:
eval_documents = docs

text_splitter_eval = RecursiveCharacterTextSplitter(
    chunk_size = 600,
    chunk_overlap = 80
)

eval_documents = text_splitter_eval.split_documents(eval_documents)

In [34]:
#!!! DON'T RUN THIS CELL !!! The test set is already generated and saved in the repo.

# from ragas.testset.generator import TestsetGenerator
# from ragas.testset.evolutions import simple, reasoning, multi_context
# from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
#  critic_llm = ChatOpenAI(model="gpt-3.5-turbo") <--- If you don't have GPT-4 access, or run into rate-limit, or `nan` issues.
# critic_llm = ChatOpenAI(model="gpt-4-turbo-preview")
# embeddings = OpenAIEmbeddings()

# generator = TestsetGenerator.from_langchain(
#     generator_llm,
#     critic_llm,
#     embeddings
# )

# distributions = {
#     simple: 0.4,
#     multi_context: 0.4,
#     reasoning: 0.2
# }

# testset = generator.generate_with_langchain_docs(eval_documents, 10, distributions, is_async = False)

  from .autonotebook import tqdm as notebook_tqdm
Filename and doc_id are the same for all nodes.                     
Generating: 100%|██████████| 10/10 [03:11<00:00, 19.10s/it]


In [51]:
test_df = pd.read_csv("test_df.csv")
test_df

Unnamed: 0,question,contexts,ground_truth,evolution_type,metadata,episode_done
0,How were the financial results of the acquired...,['included the financial results of these acqu...,The financial results of the acquired business...,simple,[{'source': 'https://d18rn0p25nwr6d.cloudfront...,True
1,How do legislative and regulatory developments...,['Table of Contents\nconsiderable uncertainty ...,Legislative and regulatory developments can le...,simple,[{'source': 'https://d18rn0p25nwr6d.cloudfront...,True
2,How might the issuance of significant equity a...,['awards or by future arrangements may not be ...,If we issue significant equity to attract addi...,simple,[{'source': 'https://d18rn0p25nwr6d.cloudfront...,True
3,What were the net income amounts for META PLAT...,"['Table of Contents\nMETA PLATFORMS, INC.\nCON...","The net income amounts for META PLATFORMS, INC...",simple,[{'source': 'https://d18rn0p25nwr6d.cloudfront...,True
4,What benefits come from having a sales presenc...,['is focused on attracting and retaining adver...,The benefits of having a sales presence in aro...,multi_context,[{'source': 'https://d18rn0p25nwr6d.cloudfront...,True
5,How does negative publicity affect user base s...,['internal company documents by a former emplo...,Negative publicity can have an adverse effect ...,multi_context,[{'source': 'https://d18rn0p25nwr6d.cloudfront...,True
6,How could disruptions in logistics and the sup...,['single region such as Asia. We have experien...,Disruptions in logistics and the supply chain ...,multi_context,[{'source': 'https://d18rn0p25nwr6d.cloudfront...,True
7,How do algorithms estimate the number of uniqu...,"['algorithms, and machine learning models that...",Algorithms estimate the number of unique indiv...,multi_context,[{'source': 'https://d18rn0p25nwr6d.cloudfront...,True
8,What are the potential consequences of negativ...,"['policies applicable to developers, have adve...",If we are not successful in our efforts to mai...,reasoning,[{'source': 'https://d18rn0p25nwr6d.cloudfront...,True
9,What are the potential impacts of penalties on...,['personnel) or penalties (including substanti...,"Penalties on a business, such as monetary reme...",reasoning,[{'source': 'https://d18rn0p25nwr6d.cloudfront...,True


In [52]:
test_questions = test_df["question"].values.tolist()
test_groundtruths = test_df["ground_truth"].values.tolist()

In [53]:
# Pipeline 1 - Generating answers and contexts for the test questions
answers = []
contexts = []

for question in test_questions:
    response = retrieval_augmented_qa_chain.invoke({"question" : question})
    answers.append(response["response"].content)
    contexts.append([context.page_content for context in response["context"]])

In [54]:
from datasets import Dataset

response_dataset = Dataset.from_dict({
    "question" : test_questions,
    "answer" : answers,
    "contexts" : contexts,
    "ground_truth" : test_groundtruths
})

In [55]:
response_dataset[0]

{'question': 'How were the financial results of the acquired businesses included in the consolidated financial statements?',
 'answer': 'The financial results of the acquired businesses were included in the consolidated financial statements from their respective dates of acquisition.',
 'contexts': ['Acquisition of businesses in accrued expenses and other current liabilities and other\nliabilities\n$\n119\xa0\n$\n291\xa0\n$\n73\xa0\nOther current assets through financing arrangement in accrued expenses and other\ncurrent liabilities\n$\n15\xa0\n$\n16\xa0\n$\n508\xa0\nRepurchases of Class A common stock in accrued expenses and other current liabilities\n$\n474\xa0\n$\n310\xa0\n$\n340\xa0\nSee Accompanying Notes to Consolidated Financial Statements.\n94',
  'accompanying notes to the consolidated financial statements to the extent material.\nBusiness Combinations\nWe allocate the fair value of purchase consideration to the tangible assets acquired, liabilities assumed and intangible asse

In [56]:
from ragas import evaluate
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    answer_correctness,
    context_recall,
    context_precision,
)

metrics = [
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
    answer_correctness,
]

In [57]:
baseline_results = evaluate(response_dataset, metrics)
baseline_results

Evaluating: 100%|██████████| 50/50 [00:19<00:00,  2.53it/s]


{'faithfulness': 1.0000, 'answer_relevancy': 0.9549, 'context_recall': 0.9750, 'context_precision': 0.9222, 'answer_correctness': 0.7871}

In [58]:
# Pipeline 3 - Generating answers and contexts for the test questions

answers = []
contexts = []

for question in test_questions:
    response = retrieval_augmented_qa_chain_3.invoke({"question" : question})
    answers.append(response["response"].content)
    contexts.append([context.page_content for context in response["context"]])

In [59]:
multiquery_response_dataset = Dataset.from_dict({
    "question" : test_questions,
    "answer" : answers,
    "contexts" : contexts,
    "ground_truth" : test_groundtruths
})

In [60]:
multiquery_results = evaluate(multiquery_response_dataset, metrics)
multiquery_results

Evaluating: 100%|██████████| 50/50 [00:21<00:00,  2.31it/s]


{'faithfulness': 0.9714, 'answer_relevancy': 0.9478, 'context_recall': 0.9500, 'context_precision': 0.9600, 'answer_correctness': 0.7249}

In [61]:
baseline_results_df = baseline_results.to_pandas()
multiquery_results_df = multiquery_results.to_pandas()

In [62]:
df_original = pd.DataFrame(list(baseline_results.items()), columns=['Metric', 'Baseline'])
df_comparison = pd.DataFrame(list(multiquery_results.items()), columns=['Metric', 'MultiQueryRetriever'])

df_merged = pd.merge(df_original, df_comparison, on='Metric')

df_merged['Delta'] = df_merged['MultiQueryRetriever'] - df_merged['Baseline']

df_merged

Unnamed: 0,Metric,Baseline,MultiQueryRetriever,Delta
0,faithfulness,1.0,0.971429,-0.028571
1,answer_relevancy,0.954854,0.947837,-0.007017
2,context_recall,0.975,0.95,-0.025
3,context_precision,0.922222,0.96,0.037778
4,answer_correctness,0.787098,0.724884,-0.062214


# RAGAS results analysis
- The RAGAS evaluation shows that both 1st and 3rd pipelines have a excellent performance. The first pipline is even better than the MultiQueryRetriever one. Except for the context_precision, where the MultiQueryRetriever is better than the first pipeline, but by a few percent only.

- We need to answer these two questions:
    1. "What was the total value of 'Cash and cash equivalents' as of December 31, 2023?"
    2. "Who are Meta's 'Directors' (i.e., members of the Board of Directors)?"

The first question is easy and can be answered by the first pipeline. The second question is more difficult and needs a better retrieval strategy, which is the case in the third pipeline. It never worked in the first pipeline.

Maybe this illustrates some limitations of RAGAS.