# Evaluation with RAGAS and Advanced Retrieval Methods Using LangChain

In the following notebook we'll discuss a major component of LLM Ops:

- Evaluation

We're going to be leveraging the [RAGAS]() framework for our evaluations today as it's becoming a standard method of evaluating (at least directionally) RAG systems.

We're also going to discuss a few more powerful Retrieval Systems that can potentially improve the quality of our generations!

Let's start as we always do: Grabbing our dependencies!

In [9]:
import os
import openai
from langchain_community.document_loaders import PyPDFLoader
from werkzeug.utils import secure_filename
from langchain.text_splitter import MarkdownHeaderTextSplitter
from langchain.schema import Document
from elasticsearch import Elasticsearch
from langchain_elasticsearch import ElasticsearchStore
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
import logging
from langchain_community.vectorstores.faiss import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain.text_splitter import MarkdownHeaderTextSplitter



es_cloud_id="My_deployment:YXAtc291dGgtMS5hd3MuZWxhc3RpYy1jbG91ZC5jb20kNjg1M2ZlMjBhZjljNDEzNTk4Y2E4Yzc4Y2Q0Y2EzMWEkM2M5NGQwNDY3ODQ5NDE1Yzk1MWVjN2I2NjI4ZjJmZTc="
es_api_key="bE9tVmM1RUJEOVhTWGU2ckhTSlk6dU9URldzSHVSVXVnRW9fNkJGMW5nZw=="
client=Elasticsearch(cloud_id=es_cloud_id,api_key=es_api_key)

In [79]:
#loader
loader = PyPDFLoader("env.pdf")
docs = loader.load()
len(docs)



4

In [80]:
for doc in docs:
  print(doc.metadata)

{'source': 'env.pdf', 'page': 0}
{'source': 'env.pdf', 'page': 1}
{'source': 'env.pdf', 'page': 2}
{'source': 'env.pdf', 'page': 3}


### Creating an Index

Let's use a naive index creation strategy of just using `RecursiveCharacterTextSplitter` on our documents and embedding each into our `VectorStore` using `OpenAIEmbeddings()`.

- [`RecursiveCharacterTextSplitter()`](https://api.python.langchain.com/en/latest/text_splitter/langchain.text_splitter.RecursiveCharacterTextSplitter.html)
- [`Chroma`](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.chroma.Chroma.html?highlight=chroma#langchain.vectorstores.chroma.Chroma)
- [`OpenAIEmbeddings()`](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.openai.OpenAIEmbeddings.html?highlight=openaiembeddings#langchain-embeddings-openai-openaiembeddings)

In [81]:
openai_api_key="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"

def get_hierarchical_chunks(pages):
    headers_to_split_on = [
        ("#", "Header 1"),
        ("##", "Header 2"),
        ("###", "Header 3"),
    ]
    markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
    
    final_chunks = []
    for page in pages:
        md_header_splits = markdown_splitter.split_text(page.page_content)
        
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=2000,
            chunk_overlap=400,
            length_function=len,
            separators=["\n\n", "\n", " ", ""]
        )

        for doc in md_header_splits:
            smaller_chunks = text_splitter.split_text(doc.page_content)
            for chunk in smaller_chunks:
                final_chunks.append(Document(
                    page_content=chunk,
                    metadata={
                        "source": page.metadata.get("source", ""),
                        "page": page.metadata.get("page", ""),
                        "header": " > ".join([doc.metadata.get(f"Header {i}", "") for i in range(1, 4) if f"Header {i}" in doc.metadata])
                    }
                ))

    return final_chunks

def create_index_with_mapping(index_name):
    mapping = {
        "mappings": {
            "properties": {
                "vector": {
                    "type": "dense_vector",
                    "dims": 1536 
                },
                "content": {
                    "type": "text"
                },
                "keyword_content": {
                    "type": "keyword"
                }
            }
        }
    }
    # Create the index with the specified mapping
    if elastic_search_client.indices.exists(index=index_name):
        elastic_search_client.indices.delete(index=index_name)
    elastic_search_client.indices.create(index=index_name, body=mapping)
    logging.info(f"Index '{index_name}' with custom mapping created successfully.")

def get_text_chunks(pages, user_session):
    # Assuming `pages` is a list of Document objects, each representing a page of the document
    all_chunks = []

    # Iterate over each page and apply hierarchical chunking
    full_text = ""
    for page in pages:
        # Get hierarchical chunks
        hierarchical_chunks = get_hierarchical_chunks([page])
        all_chunks.extend(hierarchical_chunks)
        full_text += page.page_content
    
    if not os.path.exists(user_session):
        os.makedirs(user_session)
    filename = f'{user_session}/content.txt'
    with open(filename, "w") as file:
        file.write(full_text)
    
    return all_chunks

docss = get_text_chunks(docs, 'surya')

def elastic_store(docs, user_session):
    create_index_with_mapping(user_session)
    db = ElasticsearchStore.from_documents(
    docs,
    es_cloud_id=es_cloud_id,
        index_name=user_session,
            es_api_key=es_api_key
                )

def get_vector_store(text_chunks, usersession):
    try:
        logging.info('creating vector store')
        embeddings = OpenAIEmbeddings(api_key=openai_api_key)
        logging.info('embedding model chosen')
        vector_store = ElasticsearchStore(
    index_name=str(usersession), embedding=embeddings, es_cloud_id=es_cloud_id, es_api_key=es_api_key,vector_query_field="vector"
)
        vector_store.add_documents(text_chunks)
    except Exception as e:
        logging.info(e)
        raise
        
def store_vector(raw_text, user_session):
    text_chunks = get_text_chunks(raw_text, user_session)
    logging.info('text converted to chunks')

    # Store Elastic Search index
    if client.indices.exists(index=user_session):
        client.indices.delete(index=user_session)
        logging.info(f"Index '{user_session}' deleted.")
        client.indices.create(index=user_session)
    logging.info(f"Index '{user_session}' created successfully.")
    # elastic_store(text_chunks, user_session)
    get_vector_store(text_chunks, user_session)
    logging.info("Chunks stored to Elastic Search")

 

store_vector(docss, 'surya')

In [82]:
len(docss)

7

In [83]:
print(max([len(chunk.page_content) for chunk in docss]))

1991


Let's convert our `Chroma` vectorstore into a retriever with the `.as_retriever()` method.

In [84]:
from langchain_elasticsearch.retrievers import ElasticsearchRetriever
from langchain.retrievers import EnsembleRetriever
from elasticsearch import Elasticsearch
from langchain.chains.question_answering import load_qa_chain
from langchain.prompts import PromptTemplate
from langchain_community.chat_models import ChatOllama 
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores.faiss import FAISS
from langchain.retrievers.multi_query import MultiQueryRetriever
import json
import requests
import logging
import time
from langchain_google_genai.chat_models import ChatGoogleGenerativeAI

In [85]:
def keyword_body_func(query):
    return {
        "query": {
            "match": {
                "text": query
            }
        },
        "_source": {
            "includes": ["text"]
        }
    }

In [86]:
es_weight = 0.5
vector_weight = 0.6
weights = [es_weight,vector_weight]
keyret = ElasticsearchRetriever(es_client=client, index_name="surya", body_func=keyword_body_func, content_field="text")
embeddings=OpenAIEmbeddings(api_key=openai_api_key)
vdb = ElasticsearchStore(
        es_cloud_id=es_cloud_id,
        es_api_key=es_api_key,
        index_name="surya",
        embedding=embeddings,
    )
vector_ret = vdb.as_retriever()

ensemble_retriever = EnsembleRetriever(retrievers=[keyret, vector_ret], weights=weights)
docs = ensemble_retriever.get_relevant_documents("hello", k=6)
docs

[Document(metadata={'source': 'env.pdf', 'page': 2, 'header': ''}, page_content='Magsaysay awardee Sh. Rajender Singh  known for his water conservation efforts are some such contemporary\nfigures. Salim Ali  is a renowned ornithologist, famous for his work on Indian birds. In modern India, our late\nPrime Minister Mrs. Indira Gandhi was instrumental in introducing the concept of environmental protection in\nthe Constitution of India as a fundamental duty while Mrs. Maneka Gan dhi, formerly environment minister, has\nworked a lot for the cause of wildlife protection. Citizen’s report on environment  was first published by late\nSh. Anil Aggarwal , the founder Chairman of Centre for Science & Environment. Even with many such key\npersons leading the cause to environment, India is yet to achieve a lot in this field.'),
 Document(metadata={'source': 'env.pdf', 'page': 3, 'header': ''}, page_content='(d) Role of Government, Concept of Ecomark: In order to increase consumer awareness about e

Now to give it a test!

## Creating a Retrieval Augmented Generation Prompt

Now we can set up a prompt template that will be used to provide the LLM with the necessary contexts, user query, and instructions!

In [87]:
from langchain.prompts import ChatPromptTemplate

template = prompt_template = """
Answer the question as detailed as possible but only on the provided context.  Review the chat history carefully to provide all necessary details and avoid incorrect information. Treat synonyms or similar words as equivalent within the context. For example, if a question refers to "modules" or "units" instead of "chapters" or "doc" instead of "document" consider them the same. 
If the question is not related to the provided context, simply respond that the question is out of context and instead provide a summary of the document and example questions that the user can ask.
Do not make up an answer if the provided question is not within the context. Instead, provide example questions that the user can ask and summary of the document.
Do not repeat facts in the answer if you have already stated them. 
If the question is short, like it is asking for the dates, names or requires a very short answer then keep the response short and to the point. 
If the question asks for a particular keyword that is in context, state information related to that keyword. 
Example: Question : When was bill gates born?
Answer: According to the documents you have uploaded, bill gates was born in 1989. 
However, if the question requires a detailed answer, then consider all possibiliies related to the question and try to answer in Bullet points and clear and concise paragraphs. 
If asked to summarise the document, try to provide a basic summary of the entire context and cover it in bullet points but keep the answer concise and not too long. 
If the question mention 'what are the contents of the file' or asks about the uploaded document or anything similar, just cover the entire document as a summary.
VERY IMPORTANT State the source in the end along with the answer but dont state the source if the question is out of context.
If the source is like : temp/abc.docx, then just mention the file name like abc.docx.
In the beginning of the anser, always mention the exact line in quotation marks that is being referred to in the answer and enclose it within **bold** tags.
Example : "According to the document in the line "The company was started in 1990", to answer your question(state the query in a shorter and concise manner), the company was founded in 1990."
Highlight Key Points: Enclose each identified key point within `**bold**` tags.
Highlight Keywords: Enclose each identified keyword within `*italic*` tags.
Documents:\n{context}\n
Question:\n{question}\n
Answer:
"""


prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

### Setting Up our Basic QA Chain

Now we can instantiate our basic RAG chain!

We'll follow *exactly* the chain we made on Tuesday to keep things simple for now - if you need a refresher on what it looked like - check out last week's notebook!

In [88]:
from operator import itemgetter

from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnableLambda, RunnablePassthrough

model = ChatOllama(temperature=0.2, model="llama3.1:70b", top_p=0.5, top_k=10)

retrieval_augmented_qa_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": prompt | model, "context": itemgetter("context")}
)

Let's test it out!

In [90]:
question = "whats the environment"

result = retrieval_augmented_qa_chain.invoke({"question" : question})

print(result)

{'response': AIMessage(content='The text does not provide a clear definition of "the environment". However, based on the context, it appears to refer to the natural world and the ecosystem that supports life on Earth. The text discusses various environmental issues, such as conservation, sustainability, and pollution, which suggests that the environment is being referred to in a broad sense, encompassing both the physical and biological components of the planet.', additional_kwargs={}, response_metadata={'model': 'llama3.1:70b', 'created_at': '2024-10-01T08:16:59.520399287Z', 'message': {'role': 'assistant', 'content': ''}, 'done_reason': 'stop', 'done': True, 'total_duration': 7427899296, 'load_duration': 46558995, 'prompt_eval_count': 1026, 'prompt_eval_duration': 1999729000, 'eval_count': 80, 'eval_duration': 5329826000}, id='run-176b9187-40df-4e30-9fc3-075029654f5c-0'), 'context': [Document(metadata={'_index': 'surya', '_id': '59f898e9-fc26-4fbf-ae0a-1c47725caee3', '_score': 0.2528

### Ground Truth Dataset Creation Using GPT-3.5-turbo and GPT-4

The next section might take you a long time to run, so the evaluation dataset is provided.

The basic idea is that we can use LangChain to create questions based on our contexts, and then answer those questions.

Let's look at how that works in the code!

In [91]:
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser

question_schema = ResponseSchema(
    name="question",
    description="a question about the context."
)

question_response_schemas = [
    question_schema,
]

In [92]:
question_output_parser = StructuredOutputParser.from_response_schemas(question_response_schemas)
format_instructions = question_output_parser.get_format_instructions()

In [93]:
question_generation_llm = ChatOpenAI(model="gpt-4o", api_key=openai_api_key)

bare_prompt_template = "{content}"
bare_template = ChatPromptTemplate.from_template(template=bare_prompt_template)

In [94]:
from langchain.prompts import ChatPromptTemplate

qa_template = """\
You are a University Professor creating a test for advanced students. For each context, create a question that is specific to the context. Avoid creating generic or general questions.

question: a question about the context.

Format the output as JSON with the following keys:
question

context: {context}
"""

prompt_template = ChatPromptTemplate.from_template(template=qa_template)

messages = prompt_template.format_messages(
    context=docs[0],
    format_instructions=format_instructions
)

question_generation_chain = bare_template | question_generation_llm

response = question_generation_chain.invoke({"content" : messages})
output_dict = question_output_parser.parse(response.content)

In [95]:
for k, v in output_dict.items():
  print(k)
  print(v)

question
What role did Mrs. Indira Gandhi play in incorporating environmental protection into the Constitution of India?
context
Magsaysay awardee Sh. Rajender Singh known for his water conservation efforts are some such contemporary figures. Salim Ali is a renowned ornithologist, famous for his work on Indian birds. In modern India, our late Prime Minister Mrs. Indira Gandhi was instrumental in introducing the concept of environmental protection in the Constitution of India as a fundamental duty while Mrs. Maneka Gandhi, formerly environment minister, has worked a lot for the cause of wildlife protection. Citizen’s report on environment was first published by late Sh. Anil Aggarwal, the founder Chairman of Centre for Science & Environment. Even with many such key persons leading the cause to environment, India is yet to achieve a lot in this field.
metadata
{'source': 'env.pdf', 'page': 2, 'header': ''}


In [96]:
!pip install -q -U tqdm

In [97]:
from tqdm import tqdm

qac_triples = []

for text in tqdm(docs[:10]):
  messages = prompt_template.format_messages(
      context=text,
      format_instructions=format_instructions
  )
  response = question_generation_chain.invoke({"content" : messages})
  try:
    output_dict = question_output_parser.parse(response.content)
  except Exception as e:
    continue
  output_dict["context"] = text
  qac_triples.append(output_dict)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:12<00:00,  3.11s/it]


In [98]:
qac_triples[2]

{'question': 'What was the significance of the United Nations Conference on Environment and Development, held in Rio de Janeiro in 1992, and how did it contribute to international environmental awareness?',
 'context': Document(metadata={'source': 'env.pdf', 'page': 1, 'header': ''}, page_content='Environment Calendar\nNEED FOR PUBLIC AWARENESS\n(a) International Efforts for Environment\nEnvironmental issues received international attention about 35 years back in Stockholm Conference, held on 5th\nJune, 1972. Since then we celebrate World Environment Day on 5th June. At the United Nations Conference\non Environment and Development held at Ri o de Janeiro, in 1992, known popularly as Earth Summit, and ten\nyears later, the World Summit on Sustainable Development, held at Johannesburg in 2002, key issues of global\nenvironmental concern were highlighted. Attention of general public was drawn towards the deteriorating\nenvironmental conditions all over the world.')}

In [99]:
answer_generation_llm = ChatOpenAI(model="gpt-4o", api_key=openai_api_key)

answer_schema = ResponseSchema(
    name="answer",
    description="an answer to the question"
)

answer_response_schemas = [
    answer_schema,
]

answer_output_parser = StructuredOutputParser.from_response_schemas(answer_response_schemas)
format_instructions = answer_output_parser.get_format_instructions()

qa_template = """\
You are a University Professor creating a test for advanced students. For each question and context, create an answer.

answer: a answer about the context.

Format the output as JSON with the following keys:
answer

question: {question}
context: {context}
"""

prompt_template = ChatPromptTemplate.from_template(template=qa_template)

messages = prompt_template.format_messages(
    context=qac_triples[0]["context"],
    question=qac_triples[0]["question"],
    format_instructions=format_instructions
)

answer_generation_chain = bare_template | answer_generation_llm

response = answer_generation_chain.invoke({"content" : messages})
output_dict = answer_output_parser.parse(response.content)

In [100]:
for k, v in output_dict.items():
  print(k)
  print(v)

question
Discuss the contributions of Sh. Rajender Singh to water conservation efforts in India and how it has impacted contemporary environmental policies.
context
page_content='Magsaysay awardee Sh. Rajender Singh known for his water conservation efforts are some such contemporary figures. Salim Ali is a renowned ornithologist, famous for his work on Indian birds. In modern India, our late Prime Minister Mrs. Indira Gandhi was instrumental in introducing the concept of environmental protection in the Constitution of India as a fundamental duty while Mrs. Maneka Gandhi, formerly environment minister, has worked a lot for the cause of wildlife protection. Citizen’s report on environment was first published by late Sh. Anil Aggarwal, the founder Chairman of Centre for Science & Environment. Even with many such key persons leading the cause to environment, India is yet to achieve a lot in this field.' metadata={'source': 'env.pdf', 'page': 2, 'header': ''}
answer
Sh. Rajender Singh, a Ma

In [101]:
for triple in tqdm(qac_triples):
  messages = prompt_template.format_messages(
      context=triple["context"],
      question=triple["question"],
      format_instructions=format_instructions
  )
  response = answer_generation_chain.invoke({"content" : messages})
  try:
    output_dict = answer_output_parser.parse(response.content)
  except Exception as e:
    continue
  triple["answer"] = output_dict["answer"]

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:30<00:00,  7.55s/it]


In [102]:
!pip install -q -U datasets

In [103]:
import pandas as pd
from datasets import Dataset

ground_truth_qac_set = pd.DataFrame(qac_triples)
ground_truth_qac_set["context"] = ground_truth_qac_set["context"].map(lambda x: str(x.page_content))
ground_truth_qac_set = ground_truth_qac_set.rename(columns={"answer" : "ground_truth"})


eval_dataset = Dataset.from_pandas(ground_truth_qac_set)

In [104]:
eval_dataset

Dataset({
    features: ['question', 'context', 'ground_truth'],
    num_rows: 4
})

In [105]:
eval_dataset[0]

{'question': 'Discuss the contributions of Sh. Rajender Singh to water conservation efforts in India and how it has impacted contemporary environmental policies.',
 'context': 'Magsaysay awardee Sh. Rajender Singh  known for his water conservation efforts are some such contemporary\nfigures. Salim Ali  is a renowned ornithologist, famous for his work on Indian birds. In modern India, our late\nPrime Minister Mrs. Indira Gandhi was instrumental in introducing the concept of environmental protection in\nthe Constitution of India as a fundamental duty while Mrs. Maneka Gan dhi, formerly environment minister, has\nworked a lot for the cause of wildlife protection. Citizen’s report on environment  was first published by late\nSh. Anil Aggarwal , the founder Chairman of Centre for Science & Environment. Even with many such key\npersons leading the cause to environment, India is yet to achieve a lot in this field.',
 'ground_truth': "Sh. Rajender Singh, a Magsaysay awardee, is renowned for hi

In [106]:
eval_dataset.to_csv("groundtruth_eval_dataset.csv")

Creating CSV from Arrow format: 100%|██████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 177.97ba/s]


5799

### Evaluating RAG Pipelines

If you skipped ahead and need to load the `.csv` directly - uncomment the code below.

If you're using Colab to do this notebook - please ensure you add it to your session files.

In [107]:
from datasets import Dataset
eval_dataset = Dataset.from_csv("groundtruth_eval_dataset.csv")

Generating train split: 4 examples [00:00, 481.27 examples/s]


In [112]:
eval_dataset

Dataset({
    features: ['question', 'context', 'ground_truth'],
    num_rows: 4
})

[NbConvertApp] Converting notebook uniquery_ragas.ipynb to script
[NbConvertApp] Writing 25853 bytes to uniquery_ragas.py


### Evaluation Using RAGAS

Now we can evaluate using RAGAS!

The set-up is fairly straightforward - we simply need to create a dataset with our generated answers and our contexts, and then evaluate using the framework.

In [78]:
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
    answer_correctness,
    answer_similarity
)

from ragas.metrics.critique import harmfulness
from ragas import evaluate

def create_ragas_dataset(rag_pipeline, eval_dataset):
  rag_dataset = []
  for row in tqdm(eval_dataset):
    answer = rag_pipeline.invoke({"question" : row["question"]})
    rag_dataset.append(
        {"question" : row["question"],
         "answer" : answer["response"].content,
         "contexts" : [context.page_content for context in answer["context"]],
         "ground_truths" : [row["ground_truth"]]
         }
    )
  rag_df = pd.DataFrame(rag_dataset)
  rag_eval_dataset = Dataset.from_pandas(rag_df)
  return rag_eval_dataset

def evaluate_ragas_dataset(ragas_dataset):
  result = evaluate(
    ragas_dataset,
    metrics=[
        context_precision,
        faithfulness,
        answer_relevancy,
        context_recall,
        answer_correctness,
        answer_similarity
    ],
  )
  return result

ImportError: cannot import name 'get_cache_dir' from 'ragas.utils' (/home/cr/.local/lib/python3.12/site-packages/ragas/utils.py)

Lets create our dataset first:

In [61]:
from tqdm import tqdm
import pandas as pd

basic_qa_ragas_dataset = create_ragas_dataset(retrieval_augmented_qa_chain, eval_dataset)

NameError: name 'create_ragas_dataset' is not defined

In [None]:
basic_qa_ragas_dataset

In [None]:
basic_qa_ragas_dataset[0]

Save it for later:

In [None]:
basic_qa_ragas_dataset.to_csv("uniquery_ragas_dataset.csv")

And finally - evaluate how it did!

In [None]:
basic_qa_result = evaluate_ragas_dataset(basic_qa_ragas_dataset)

In [None]:
basic_qa_result

### Testing Other Retrievers

Now we can test our how changing our Retriever impacts our RAGAS evaluation!

We'll build this simple qa_chain factory to create standardized qa_chains where the only different component will be the retriever.

In [None]:
def create_qa_chain(retriever):
  primary_qa_llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
  created_qa_chain = (
    {"context": itemgetter("question") | retriever,
     "question": itemgetter("question")
    }
    | RunnablePassthrough.assign(
        context=itemgetter("context")
      )
    | {
         "response": prompt | primary_qa_llm,
         "context": itemgetter("context"),
      }
  )

  return created_qa_chain

#### Parent Document Retriever

One of the easier ways we can imagine improving a retriever is to embed our documents into small chunks, and then retrieve a significant amount of additional context that "surrounds" the found context.

You can read more about this method [here](https://python.langchain.com/docs/modules/data_connection/retrievers/parent_document_retriever)!

The basic outline of this retrieval method is as follows:

1. Obtain User Question
2. Retrieve child documents using Dense Vector Retrieval
3. Merge the child documents based on their parents. If they have the same parents - they become merged.
4. Replace the child documents with their respective parent documents from an in-memory-store.
5. Use the parent documents to augment generation.

In [None]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore

parent_splitter = RecursiveCharacterTextSplitter(chunk_size=1500)
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

vectorstore = Chroma(collection_name="split_parents", embedding_function=OpenAIEmbeddings())

store = InMemoryStore()

In [None]:
parent_document_retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=store,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)

In [None]:
parent_document_retriever.add_documents(base_docs)

Let's create, test, and then evaluate our new chain!

In [None]:
parent_document_retriever_qa_chain = create_qa_chain(parent_document_retriever)

In [None]:
parent_document_retriever_qa_chain.invoke({"question" : "What is RAG?"})["response"].content

In [None]:
pdr_qa_ragas_dataset = create_ragas_dataset(parent_document_retriever_qa_chain, eval_dataset)

In [None]:
pdr_qa_ragas_dataset.to_csv("pdr_qa_ragas_dataset.csv")

In [None]:
pdr_qa_result = evaluate_ragas_dataset(pdr_qa_ragas_dataset)

In [None]:
pdr_qa_result

#### Ensemble Retrieval

Next let's look at ensemble retrieval!

You can read more about this [here](https://python.langchain.com/docs/modules/data_connection/retrievers/ensemble)!

The basic idea is as follows:

1. Obtain User Question
2. Hit the Retriever Pair
    - Retrieve Documents with BM25 Sparse Vector Retrieval
    - Retrieve Documents with Dense Vector Retrieval Method
3. Collect and "fuse" the retrieved docs based on their weighting using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm into a single ranked list.
4. Use those documents to augment our generation.

Ensure your `weights` list - the relative weighting of each retriever - sums to 1!

In [None]:
!pip install -q -U rank_bm25

In [None]:
from langchain.retrievers import BM25Retriever, EnsembleRetriever

text_splitter = RecursiveCharacterTextSplitter(chunk_size=450, chunk_overlap=75)
docs = text_splitter.split_documents(base_docs)

bm25_retriever = BM25Retriever.from_documents(docs)
bm25_retriever.k = 2

embedding = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(docs, embedding)
chroma_retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

ensemble_retriever = EnsembleRetriever(retrievers=[bm25_retriever, chroma_retriever], weights=[0.75, 0.25])

In [None]:
ensemble_retriever_qa_chain = create_qa_chain(ensemble_retriever)

In [None]:
ensemble_retriever_qa_chain.invoke({"question" : "What is RAG?"})["response"].content

In [None]:
ensemble_qa_ragas_dataset = create_ragas_dataset(ensemble_retriever_qa_chain, eval_dataset)

In [None]:
ensemble_qa_ragas_dataset.to_csv("ensemble_qa_ragas_dataset.csv")

In [None]:
ensemble_qa_result = evaluate_ragas_dataset(ensemble_qa_ragas_dataset)

In [None]:
ensemble_qa_result

### Conclusion

Observe your results in a table!

In [None]:
basic_qa_result

In [None]:
pdr_qa_result

In [None]:
ensemble_qa_result

We can also zoom in on each result and find specific information about each of the questions and answers.

In [None]:
ensemble_qa_result_df = ensemble_qa_result.to_pandas()

In [None]:
ensemble_qa_result_df

We'll also look at combining the results and looking at them in a single table so we can make inferences about them!

In [None]:
def create_df_dict(pipeline_name, pipeline_items):
  df_dict = {"name" : pipeline_name}
  for name, score in pipeline_items:
    df_dict[name] = score
  return df_dict

In [None]:
basic_rag_df_dict = create_df_dict("basic_rag", basic_qa_result.items())

In [None]:
pdr_rag_df_dict = create_df_dict("pdr_rag", pdr_qa_result.items())

In [None]:
ensemble_rag_df_dict = create_df_dict("ensemble_rag", ensemble_qa_result.items())

In [None]:
results_df = pd.DataFrame([basic_rag_df_dict, pdr_rag_df_dict, ensemble_rag_df_dict])

In [None]:
results_df.sort_values("answer_correctness", ascending=False)

### ❓QUESTION❓

What conclusions can you draw about the above results?

Describe in your own words what the metrics are expressing.

In [None]:
retrieval_augmented_qa_chain = (
    RunnableParallel({
        'context': itemgetter('question') | base_retriever,
        'question': RunnablePassthrough()
    }) | {
        'response': prompt | primary_qa_llm | parser,
        'context': itemgetter('context')
    }
)