<a href="https://colab.research.google.com/github/Tanaya2012/QA-chatbot/blob/main/Retrieval_Augmented_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Retrieval Augmented Generation

This notebook focuses on implementing retrieval augmented generation on a given dataset. The notebook explores the use of a specific model for this task. 

The model used in this notebook combines two powerful techniques: retrieval-based and generation-based approaches. It leverages pre-existing retrieval models to identify relevant context or information and uses it to enhance the generation process. This combination enables the model to generate more accurate and contextually appropriate responses.

Retrieval augmented generation works by first retrieving relevant information or context from a knowledge base or dataset. This retrieved information is then used as input alongside the query or prompt for the generation model. The generation model, typically a language model like GPT, then generates a response based on both the query and the retrieved context, ensuring more accurate and informative answers.

This notebook provides step-by-step instructions and code examples to showcase the implementation of retrieval augmented generation on a specific dataset. It demonstrates how to set up the retrieval model, integrate it with the generation model, and perform inference or training to generate high-quality responses with the combined model.

In [1]:
# import locale
# def getpreferredencoding(do_setlocale = True):
#     return "UTF-8"
# locale.getpreferredencoding = getpreferredencoding
# !pip install transformers

In [2]:
import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, Trainer, TrainingArguments, PreTrainedModel, PreTrainedTokenizerFast, AdamW
from pathlib import Path
import pandas as pd

In [16]:
pdf_text = Path('/content/drive/MyDrive/cleaned_sentences.txt').read_text()
examples = pdf_text.split('\n')

In [4]:
# !pip install --upgrade pip
# !pip install 'farm-haystack[all]' ## or 'all-gpu' for the GPU-enabled dependencies

In [5]:
# !pip install TensorFlow --upgrade

## Model

The given code snippet utilizes the Haystack library to work with a document store and create a collection of documents. 

First, an empty list called `documents` is initialized. Then, using a loop, each item in the `examples` list is converted into a `Document` object with its content and a unique ID. These `Document` objects are added to the `documents` list.

Next, an `InMemoryDocumentStore` is created, specifying the parameters `use_gpu=False` and `use_bm25=True`. The `use_gpu` parameter indicates whether to use a GPU for document retrieval (set to `False` in this case), and the `use_bm25` parameter indicates whether to employ the BM25 algorithm for scoring relevance during document retrieval (set to `True` in this case).

Finally, the `write_documents` method of the `InMemoryDocumentStore` is called, passing the `documents` list as an argument. This operation writes the documents to the document store, making them available for subsequent search or retrieval operations.

In [8]:
from haystack.schema import Document
from haystack.document_stores import InMemoryDocumentStore

documents = []
for i, d in enumerate(examples):
    documents.append(Document(content=d, id=i))

document_store = InMemoryDocumentStore(use_gpu=False, use_bm25=True)
document_store.write_documents(documents)

Updating BM25 representation...:   0%|          | 0/993 [00:00<?, ? docs/s]

The given code snippet imports the necessary libraries and modules for performing retrieval augmented generation using the Haystack library. It sets up a BM25Retriever for initial retrieval and a SentenceTransformersRanker for reranking. 

The code then defines a prompt template named "lfqa" using PromptTemplate, which includes a specific prompt text format for generating answers based on the provided context and question. The prompt template also specifies an output parser for handling the generated answer.

Lastly, a PromptNode is instantiated with the model "MBZUAI/LaMini-Flan-T5-783M" and the lfqa_prompt template. Additional model-specific configurations are provided, such as the maximum length of the model input and the torch data type.

Overall, this code segment sets up the necessary components for performing retrieval augmented generation using the Haystack library, including retrievers, rankers, and prompt templates.

In [9]:
import torch
from haystack.nodes import  PromptNode, PromptTemplate
from haystack.nodes import BM25Retriever, SentenceTransformersRanker

retriever = BM25Retriever(document_store=document_store, top_k=100)
reranker = SentenceTransformersRanker(model_name_or_path="cross-encoder/ms-marco-MiniLM-L-12-v2", top_k=1)


lfqa_prompt = PromptTemplate(name="lfqa",
                             prompt_text="Answer the question using the provided context. Your answer should be in your own words and be no longer than 50 words. \n\n Context: {join(documents)} \n\n Question: {query} \n\n Answer:",
                             output_parser={"type": "AnswerParser"}) 
prompt = PromptNode(model_name_or_path="MBZUAI/LaMini-Flan-T5-783M", default_prompt_template=lfqa_prompt,
                    model_kwargs={"model_max_length": 2048, "torch_dtype": torch.bfloat16},)

Downloading (…)lve/main/config.json:   0%|          | 0.00/791 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/134M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


Downloading (…)okenizer_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/860 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

## Defining the pipeline

In [10]:
from haystack import Pipeline
p = Pipeline()
p.add_node(component=retriever, name="Retriever", inputs=["Query"])
p.add_node(component=reranker, name="Reranker", inputs=["Retriever"])
p.add_node(component=prompt, name="prompt_node", inputs=["Reranker"])

## Testing

In [11]:
a = p.run("When did the GARDASIL 9 recommendations change?", debug=True)
a['answers'][0].answer

'The GARDASIL 9 recommendations changed in February 2015.'

## Final Answer

In [14]:
user_questions = ['When did the GARDASIL 9 recommendations change?',
'What were the past 3 recommendation changes for GARDASIL 9?',
'Is GARDASIL 9 recommended for Adults?',
'Does the ACIP recommend one dose GARDASIL 9?']

for question in user_questions:
  a = p.run(question, debug=True)
  print('Question: ', question)
  print('Answer: ', a['answers'][0].answer)

Question:  When did the GARDASIL 9 recommendations change?
Answer:  The GARDASIL 9 recommendations changed in February 2015.
Question:  What were the past 3 recommendation changes for GARDASIL 9?
Answer:  The Acip advisory committee on immunization practices recommended GARDASIL 9 as one of three HPV vaccines that can be used for routine vaccination.
Question:  Is GARDASIL 9 recommended for Adults?
Answer:  The context does not provide information on whether GARDASIL 9 is recommended for adults.
Question:  Does the ACIP recommend one dose GARDASIL 9?
Answer:  The ACIP recommends one dose of GARDASIL 9 as one of three HPV vaccines that can be used for routine vaccination.
