In [1]:
from dotenv import load_dotenv
load_dotenv()

True

## Open Source RAG

As we did before, we'll leverage the ArxivLoader to load some papers, and then split them into more manageable chunks!

In [2]:
from langchain.document_loaders import ArxivLoader

docs = ArxivLoader(query="Retrieval Augmented Generation", load_max_docs=5).load()

Be aware of your model's context window - and how many input tokens you allowed when creating your endpoint!



In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,# the character length of the chunk
    chunk_overlap = 200,# the character length of the overlap between chunks
    length_function =len # the length function - in this case, character length (aka the python len() fn.)
)

split_chunks = text_splitter.split_documents(docs)

## Open Source Embeddings

Now we can leverage sentence-transformers models through Hugging Face to handle all our embedding needs locally!

Keep in mind that if you're using a GPU instance, you'll be able to set device to cuda. Otherwise you should set it to cpu.

In [4]:
from langchain.embeddings import HuggingFaceEmbeddings

model_name = 'sentence-transformers/all-mpnet-base-v2'
model_kwargs = {'device': "cpu"}
encode_kwargs = {'normalize_embeddings': False}

hf_embeddings = HuggingFaceEmbeddings(model_name = model_name, model_kwargs = model_kwargs, encode_kwargs = encode_kwargs)

  from .autonotebook import tqdm as notebook_tqdm


Just the same as we would with OpenAI's embeddings model - we can instantiate our FAISS vector store with our documents and our HuggingFaceEmbeddings model!

In [5]:
from langchain.vectorstores import FAISS

faiss_vectorstore = FAISS.from_documents(
    embedding=hf_embeddings,
    documents=split_chunks,
    
)

## Retrieval QA Chain

All that's left to do is set up a RetrievalQA pipeline - and away we go!

Remember your model's context window and the allowed number of input tokens you set up when creating the endpoint when setting a value for k.

In [6]:
from langchain.llms import HuggingFaceHub

llm = HuggingFaceHub(repo_id = 'mistralai/Mistral-7B-Instruct-v0.1')

In [7]:
from langchain.chains import RetrievalQA

k = 4
qa = RetrievalQA.from_chain_type(
    llm= llm,
    retriever=faiss_vectorstore.as_retriever(search_kwargs={"k" : k})
)
  

Let's test it out!

In [8]:
qa("What is Retrieval Augmented Generation?")

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingfac

{'query': 'What is Retrieval Augmented Generation?',
 'result': ' Retrieval Augmented Generation is a method of generating text by combining a retrieval model with'}

In [9]:
qa("What process is used to update the model's weights?")


{'query': "What process is used to update the model's weights?",
 'result': " The process used to update the model's weights is not specified in the provided context."}

## Evaluation Our Open Source RAG Pipeline

Now we can leverage RAGAS to evaluate our RAG pipeline using RAGAS.

We'll use a preset list of questions - though you can reference the previous week's assignment if you want to build your own test questions!



In [10]:
question_list = ['What is the title of the paper on Retrieval Augmented Generation?', 'What is the title of the paper on Retrieval Augmented Generation mentioned in the context?', 'What is the title of the paper on Retrieval Augmented Generation mentioned in the context information?', 'What is the title of the paper on Retrieval Augmented Generation?', 'What is the task of the Retrieval Augmented Generation (RAG) model according to the context information?', 'What advantages does RAG have over the DPR QA system in terms of generating answers?', 'What are some potential downsides of using RAG, a language model based on Wikipedia, in various scenarios?', 'What is the file name of the paper on Retrieval Augmented Generation?', 'What is the accuracy of RAG models in generating correct answers even when the correct answer is not in any retrieved document?', 'Question: What is an example of how parametric and non-parametric memories work together in BART?', 'What is the title of the paper on Retrieval Augmented Generation?', 'What is the purpose of using multiple answer annotations in open-domain QA?', 'Question: Which novel by Ernest Hemingway is based on his wartime experiences?', 'What advantage do non-parametric memory models like RAG have over parametric-only models like T5 or BART?', 'What is the title of the paper on Retrieval Augmented Generation?', 'What is the title of the paper on Retrieval Augmented Generation?', 'What training setup details are mentioned in the paper on Retrieval Augmented Generation?', 'What is the title of the paper on Retrieval Augmented Generation mentioned in the context information?', 'What is the title of the paper mentioned in the context information?', 'What are the three sections into which the 14th century work "The Divine Comedy" is divided?', 'What are the two components of RAG models described in the context?', 'What is the title of the paper on Retrieval Augmented Generation?', 'What is the benchmark dataset used for question answering research mentioned in the provided context?', 'What are the two models proposed in the paper on Retrieval Augmented Generation?', 'What is the approach used to train the retriever and generator components in the paper on Retrieval Augmented Generation?', 'What is the best performing "closed-book" open-domain QA model mentioned in the context?', 'What is the ratio of distinct to total tri-grams for the generation tasks in the Jeopardy Question Generation Task?', 'What is the main finding of the paper on Retrieval Augmented Generation?', 'What is the main objective of the work presented in the paper on Retrieval Augmented Generation?', 'What are the limitations of large pre-trained language models in accessing and manipulating knowledge in knowledge-intensive tasks?', 'What is the main objective of RAG in the experiments conducted in the paper on Retrieval Augmented Generation?', 'What is the purpose of the MSMARCO NLG task v2.1?']

Now let's collect all the answers to our questions!

Be sure to return the source documents to have context to mark with!

In [13]:
test_pipeline = RetrievalQA.from_chain_type(
    llm = llm,
    chain_type="stuff",
    retriever = faiss_vectorstore.as_retriever(search_kwargs={"k" : k}),
    return_source_documents = True
)

In [14]:
question_context_answer = []

for idx, question in enumerate(question_list):
    result = test_pipeline({"query":question})
    # print(result)
    answer_package = {
        "question":question,
        "contexts":[page.page_content  for page in result["source_documents"]],
        "answer":result["result"]
    }

    question_context_answer.append(answer_package)
    

{'query': 'What is the title of the paper on Retrieval Augmented Generation?', 'result': ' The title of the paper on Retrieval Augmented Generation is "Forward-Looking Active', 'source_documents': [Document(page_content='resulting in the interleaving of retrieval and genera-\ntion. Formally, at step t(t ≥ 1), the retrieval query\nqt is formulated based on both the user input x and\npreviously generated output y<t = [y0, ..., yt−1]:\nqt = qry(x, y<t),\nwhere qry(·) is the query formulation function. At\nthe start of the generation (t = 1), the previous\ngeneration is empty (y<1 = ∅), and the user input\nis used as the initial query (q1 = x). Given the re-\ntrieved documents Dqt, LMs continually generate\nthe answer until the next retrieval is triggered or\nreaches the end:\nyt = LM([Dqt, x, y<t]),\nwhere yt represents the generated tokens at the\ncurrent step t, and the input to LMs is the concate-\nnation of the retrieved documents Dqt, the user\ninput x, and the previous generation y<

Let's convert these responses into a dataset in a format that is compatible with RAGAS!

## RAGAS

In [15]:
import datasets
import pandas as pd

dataset = datasets.Dataset.from_pandas(pd.DataFrame(data=question_context_answer))

In [20]:
df = pd.DataFrame(data=question_context_answer)
df

Unnamed: 0,question,contexts,answer
0,What is the title of the paper on Retrieval Au...,[resulting in the interleaving of retrieval an...,The title of the paper on Retrieval Augmented...
1,What is the title of the paper on Retrieval Au...,[the generation and use the query terms to ret...,FLARE: Forward-Looking Active REtrieval Augme...
2,What is the title of the paper on Retrieval Au...,[the generation and use the query terms to ret...,The title of the paper on Retrieval Augmented...
3,What is the title of the paper on Retrieval Au...,[resulting in the interleaving of retrieval an...,The title of the paper on Retrieval Augmented...
4,What is the task of the Retrieval Augmented Ge...,[boost the quality of ﬁnal generation. To this...,The task of the Retrieval Augmented Generatio...
5,What advantages does RAG have over the DPR QA ...,[and compare the performance of FLARE with all...,RAG has several advantages over the DPR QA sy...
6,What are some potential downsides of using RAG...,"[Gretchen Krueger, Tom Henighan, Rewon Child,\...",\n\nThere are several potential downsides of u...
7,What is the file name of the paper on Retrieva...,[resulting in the interleaving of retrieval an...,The file name of the paper on Retrieval Augme...
8,What is the accuracy of RAG models in generati...,[LM (Ans-Deletion)\n(b) Answer-deletion test.\...,The provided context does not mention the acc...
9,Question: What is an example of how parametric...,[that our model consistently outperforms all t...,"In the context of BART, parametric memory ref..."


In [21]:
df.to_csv('output.csv')

In [18]:
dataset

Dataset({
    features: ['question', 'contexts', 'answer'],
    num_rows: 32
})

In [19]:
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_precision,
)
from ragas.metrics.critique import harmfulness
from ragas import evaluate

result = evaluate(
    dataset,
    metrics=[
        answer_relevancy,
        faithfulness,
        context_precision,
        harmfulness
    ],
)

evaluating with [answer_relevancy]


  0%|          | 0/3 [00:00<?, ?it/s]Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/chat/completions (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2d08164d0>: Failed to resolve 'api.openai.com' ([Errno 8] nodename nor servname provided, or not known)")).
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/chat/completions (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2d0816bc0>: Failed to resolve 'api.openai.com' ([Errno 8] nodename nor servname provided, or not known)")).
Retryin

APIConnectionError: Error communicating with OpenAI: HTTPSConnectionPool(host='api.openai.com', port=443): Max retries exceeded with url: /v1/chat/completions (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x2d0816ef0>: Failed to resolve 'api.openai.com' ([Errno 8] nodename nor servname provided, or not known)"))

In [None]:
result