# **Rewrite-Retrieve-Read (RRR)**
Rewrite-Retrieve-Read is a three-step framework for tasks that involve retrieval augmentation, such as open-domain question answering. It focuses on improving the quality of retrieved information and generating accurate outputs by refining the input query.

Research Paper: [Rewrite-Retrieve-Read](https://arxiv.org/pdf/2305.14283)

## **Initial Setup**

In [None]:
! pip install --q athina chromadb

In [None]:
# ! pip install --q athina datasets langchain_community  langchain-openai langchainhub chromadb langchain

In [None]:
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ['ATHINA_API_KEY'] = userdata.get('ATHINA_API_KEY')

## **Indexing**

In [None]:
# load embedding model
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [None]:
# load data
from langchain.document_loaders import CSVLoader
loader = CSVLoader("./context.csv")
documents = loader.load()

In [None]:
# split documents
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
documents = text_splitter.split_documents(documents)

In [None]:
# create vectorstore
from langchain.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents, embeddings)

## **Retriever**

In [None]:
# create retriever
retriever = vectorstore.as_retriever()

## **RAG Chain**

In [None]:
# load llm
from langchain_openai import ChatOpenAI
llm = ChatOpenAI()

In [None]:
# create document chain
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

template = """"
You are a helpful assistant that answers questions based on the following context.
If you don't find the answer in the context, just say that you don't know.
Context: {context}

Question: {input}

Answer:

"""
prompt = ChatPromptTemplate.from_template(template)


rag_chain = (
    {"context": retriever,  "input": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

## **Simple Query**

In [None]:
# define simple query
simple_query = "who directed the matrix"

In [None]:
# response
response = rag_chain.invoke(simple_query)
response

'The Matrix was directed by the Wachowskis.'

## **Distracted Query**

In [None]:
# define distracted query
distracted_query = "who create the matrics"

In [None]:
# response
response = rag_chain.invoke(distracted_query)
response

"I don't know."

## **Rewrite Retrieve Read**

In [None]:
# define rewrite prompt for distracted query

template = """Provide a better search query for \
web search engine to answer the given question, end \
the queries with ’**’. Question: \
{x} Answer:"""

rewrite_prompt = ChatPromptTemplate.from_template(template)

In [None]:
# parse response
def _parse(text):
    return text.strip('"').strip("**")

In [None]:
# create rewriter chain
rewriter = rewrite_prompt | ChatOpenAI(temperature=0) | StrOutputParser() | _parse

In [None]:
# updated query
rewriter.invoke({"x": distracted_query})

'Who is the creator of the Matrix film series?'

In [None]:
# create rewrite retrieve read chain
rewrite_retrieve_read_chain = (
    {
        "context": {"x": RunnablePassthrough()} | rewriter | retriever,
        "input": RunnablePassthrough(),
    }
    | prompt
    | llm
    | StrOutputParser()
)

In [None]:
# final response
rewrite_retrieve_read_chain.invoke(distracted_query)

'The Matrix was created by the Wachowskis.'

## **Preparing Data for Evaluation**

In [None]:
# create data
response = []
contexts = []
questions = []

rewritten_query = rewriter.invoke({"x": distracted_query})
questions.append(rewritten_query)
response.append(rewrite_retrieve_read_chain.invoke(distracted_query))
contexts.append([docs.page_content for docs in retriever.get_relevant_documents(rewritten_query)])


data = {
    "query": questions,
    "response": response,
    "context": contexts,
}

In [None]:
# create dataset
from datasets import Dataset
dataset = Dataset.from_dict(data)

In [None]:
# create dataframe
import pandas as pd
df = pd.DataFrame(dataset)

In [None]:
df

Unnamed: 0,query,response,context
0,Who is the creator of the Matrix film series?,The Matrix was created by the Wachowskis.,"['The Matrix is a 1999 science fiction action film written and directed by the Wachowskis. It is the first installment in the Matrix film series, starring Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano, and depicts a dystopian future in which humanity is unknowingly trapped inside the Matrix, a simulated reality that intelligent machines have created to distract humans while using their bodies as an energy source. When computer programmer Thomas Anderson..."


In [None]:
# Convert to dictionary
df_dict = df.to_dict(orient='records')

# Convert context to list
for record in df_dict:
    if not isinstance(record.get('context'), list):
        if record.get('context') is None:
            record['context'] = []
        else:
            record['context'] = [record['context']]

## **Evaluation in Athina AI**

We will use **Answer Relevancy** eval here. It Measures how pertinent the generated response is to the given prompt. Please refer to our [documentation](https://docs.athina.ai/api-reference/evals/preset-evals/overview) for further details

In [None]:
# set api keys for Athina evals
from athina.keys import AthinaApiKey, OpenAiApiKey
OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))
AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))

In [None]:
# load dataset
from athina.loaders import Loader
dataset = Loader().load_dict(df_dict)

In [None]:
# evaluate
from athina.evals import RagasAnswerRelevancy
RagasAnswerRelevancy(model="gpt-4o").run_batch(data=dataset).to_df()

evaluating with [answer_relevancy]


100%|██████████| 1/1 [00:01<00:00,  1.08s/it]


You can view your dataset at: https://app.athina.ai/develop/ddec3010-12e6-4f5e-bbc0-188934bc90dc


Unnamed: 0,query,context,response,expected_response,display_name,failed,grade_reason,runtime,model,ragas_answer_relevancy
0,Who is the creator of the Matrix film series?,"['The Matrix is a 1999 science fiction action film written and directed by the Wachowskis. It is the first installment in the Matrix film series, starring Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano, and depicts a dystopian future in which humanity is unknowingly trapped inside the Matrix, a simulated reality that intelligent machines have created to distract humans while using their bodies as an energy source. When computer programmer Thomas Anderson...",The Matrix was created by the Wachowskis.,,Ragas Answer Relevancy,,"A response is deemed relevant when it directly and appropriately addresses the original query. Importantly, our assessment of answer relevance does not consider factuality but instead penalizes cases where the response lacks completeness or contains redundant details",1493,gpt-4o,0.961197
