# Building RAG Chatbots for Technical Documentation

## Table of contents

- [Introduction](#introduction)
- [Environment Setup](#environment-setup)
- [Indexing](#indexing)
- [Retriever](#retriever)
- [Prompt](#prompt)
- [LLM](#llm)
- [RAG Chain](#rag-chain)
- [Comparisons](#evaluation-metrics-and-comparison)
- [Chat history](#chat-history)

## Introduction 

This project involves implementing a retrieval augmented generation (RAG) with *LangChain* to create a chatbot for
answering questions about technical documentation. The document chosen for this assignment was the following: **The European Union Medical Device Regulation - Regulation (EU) 2017/745 (EU MDR)**. 

## Environment Setup

Install the packages and dependencies to be used:

In [1]:
# Install required libraries
%pip install -qU langchain langchain-community langchain-chroma langchain-text-splitters unstructured sentence_transformers langchain-huggingface huggingface_hub pdfplumber langchain-google-genai ipywidgets python-dotenv lark chainlit

Note: you may need to restart the kernel to use updated packages.


As Google's generative AI model is being used, ensure that the ``GOOGLE_API_KEY`` is securely stored in the ``.env`` file.

In [2]:
from dotenv import load_dotenv

load_dotenv()

True

## Embeddings

Firstly, we start by connecting to Google's generative AI embeddings model. The **Text Embeddings 004** model from Gemini is employed for the embedding generation, with the task_type set to *retrieval_document* to optimize embeddings for retrieval tasks. 

In [3]:
from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004", task_type="retrieval_document")

## Indexing

In the indexing stage, we start by loading the PDF document and splitting it into manageable sections. To optimize execution time and improve efficiency, we store the vector store locally in a folder named "db." This allows us to quickly access previously processed data without having to re-index the document each time.

In [4]:
import os
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PDFPlumberLoader

if os.path.exists("db"):
    vectorstore = Chroma(persist_directory="db", embedding_function=embeddings)
else:
    loader = PDFPlumberLoader("document.pdf")
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        add_start_index=True,
        separators=["\n\n", "\n", " ", ""],
    )
    pages = loader.load_and_split(text_splitter)
    vectorstore = Chroma.from_documents(
        documents=pages, embedding=embeddings, persist_directory="db"
    )

## Retriever

From the vector store, a retriever is created, configured to perform similarity searches and return the top 5 most relevant results:


In [5]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 5})

### Usage example

In [6]:
retrieved_docs = retriever.invoke("Describe the use of harmonised standards")

print("Retrieved document number : " + str(len(retrieved_docs)))

for doc in retrieved_docs:
    print("page " + str(doc.metadata["page"] + 1) + ":", doc.page_content[:300])

Retrieved document number : 5
page 169: of those staff, in order to ensure that personnel who carry out and perform
assessment and verification operations are competent to fulfil the tasks
required of them.
page 196: conformity assessment procedures,
— identification of applicable general safety and performance
requirements and solutions to fulfil those requirements, taking
applicable CS and, where opted for, harmonised standards or
other adequate solutions into account,
— risk management as referred to in Secti
page 147: purpose, and shall include a justification, validation and verification of the
solutions adopted to meet those requirements. The demonstration of
conformity shall include:
(a) the general safety and performance requirements that apply to the device
and an explanation as to why others do not apply;
(
page 179: — carry out the appropriate examinations and tests in order to verify that
the solutions adopted by the manufacturer meet the general safety and
performance requ

## Prompt

We establish a structured format for the prompts sent to the LLM. This prompt format conveys the context while instructing the LLM to refrain from answering when it lacks confidence, thereby minimizing the risk of hallucinations.

In [7]:
from langchain_core.prompts import PromptTemplate

template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible. Mention in which pages the answer is found.

Context: {context}

Question: {question}

Helpful Answer:"""
prompt = PromptTemplate.from_template(template)

### Usage example

In [8]:
example_messages = prompt.invoke(
    {"context": "filler context", "question": "filler question"}
).to_messages()
example_messages

print(example_messages[0].content)

Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible. Mention in which pages the answer is found.

Context: filler context

Question: filler question

Helpful Answer:


# LLM

The LLM utilized in this project is **Gemini 1.5 Flash**, recognized as Google Gemini’s fastest multimodal model. It boasts an impressive context window of 1 million tokens, allowing for comprehensive understanding and processing of extensive inputs.

In [9]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")

### Usage example

In [10]:
from IPython.display import Markdown

result = llm.invoke("What is an LLM?")

print(result.__dict__.keys())
Markdown(result.content)

dict_keys(['content', 'additional_kwargs', 'response_metadata', 'type', 'name', 'id', 'example', 'tool_calls', 'invalid_tool_calls', 'usage_metadata'])


LLM stands for **Large Language Model**. It's a type of artificial intelligence (AI) that excels at understanding and generating human-like text. 

Here's a breakdown:

**What is it?**

* **Deep learning model:** An LLM is a type of neural network, a complex mathematical structure inspired by the human brain. It's trained on massive datasets of text and code.
* **Text-based:** LLMs specialize in processing and generating textual information.
* **Generative:** They can create new text content, not just analyze existing text.

**How does it work?**

* **Training:** LLMs are trained on vast amounts of text data, learning patterns, relationships, and nuances of language. This allows them to understand context, grammar, and meaning.
* **Processing:** When you input text, the LLM analyzes the input and predicts the most likely next word or phrase based on its training data.
* **Output:** The LLM generates coherent and contextually relevant text, often mimicking human writing style.

**Examples of LLMs:**

* **GPT-3 (Generative Pre-trained Transformer 3):** Developed by OpenAI, known for its ability to write different kinds of creative content, translate languages, and answer your questions in an informative way.
* **LaMDA (Language Model for Dialogue Applications):** Developed by Google, designed for conversational AI, capable of engaging in open-ended, natural-sounding dialogue.
* **BERT (Bidirectional Encoder Representations from Transformers):** Developed by Google, excels at understanding the meaning of words in context, enabling it to perform various tasks like question answering and sentiment analysis.

**Applications of LLMs:**

* **Chatbots and virtual assistants:**  Providing engaging and informative conversations.
* **Content creation:** Generating articles, stories, poems, scripts, and more.
* **Translation:** Translating text between languages accurately and naturally.
* **Code generation:** Writing and debugging code in various programming languages.
* **Summarization:** Condensing large amounts of text into concise summaries.
* **Question answering:** Providing answers to questions based on a given text.

**Key takeaway:** LLMs are powerful AI models that are revolutionizing how we interact with text and data. They offer a wide range of applications, from creating engaging content to automating tasks.


## RAG chain

Putting it all together, we can now define a RAG chain that takes a question, retrieves relevant documents, constructs a prompt, passes it into a model, and parses the output.

In [11]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    formatted_docs = []
    for doc in docs:
        page_number = doc.metadata["page"] + 1 
        content_with_page = f"Page {page_number}:\n{doc.page_content}"
        formatted_docs.append(content_with_page)
    return "\n\n".join(formatted_docs)

In [12]:
rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | llm
    | StrOutputParser()
)

### Usage example

In [13]:
query = "What is the medical devices regulation?"
Markdown(rag_chain.invoke(query))

The Medical Devices Regulation (MDR) is a European Union regulation that establishes requirements for the safety and performance of medical devices. It is found in Regulation (EU) 2017/746. The MDR aims to ensure that medical devices placed on the market in the EU meet high standards of safety and performance, and to protect public health. 


## Evaluation metrics and comparison

### Tuning parameters

In [14]:
temperatures = [0.1, 0.5, 1.0]
top_ps = [0.1, 0.5, 0.9]

results = "| Temperature | Top P | Response |\n" + "|-------------|-------|----------|\n"

for temperature in temperatures:
    for top_p in top_ps:
        llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=temperature, top_p=top_p)

        query = "What is harmonised standards?"
        response = rag_chain.invoke(query)

        results += f"| {temperature} | {top_p}  | {response}"

Markdown(results)

| Temperature | Top P | Response |
|-------------|-------|----------|
| 0.1 | 0.1  | Harmonized standards are standards that have been published in the Official Journal of the European Union and are presumed to be in conformity with the requirements of the Regulation. This is found on page 16. 
| 0.1 | 0.5  | Harmonized standards are standards that have been published in the Official Journal of the European Union. These standards are presumed to be in conformity with the requirements of the Regulation. This information is found on page 16. 
| 0.1 | 0.9  | Harmonised standards are standards that have been published in the Official Journal of the European Union and are presumed to be in conformity with the requirements of the Regulation. This information can be found on page 16. 
| 0.5 | 0.1  | Harmonised standards are standards published in the Official Journal of the European Union that are presumed to be in conformity with the requirements of the Regulation. These standards cover system or process requirements for economic operators or sponsors, including quality management systems, risk management, and post-market surveillance systems.  This information is found on page 16 of the document. 
| 0.5 | 0.5  | Harmonised standards are standards that have been published in the Official Journal of the European Union and are presumed to be in conformity with the requirements of the Regulation. This is mentioned on page 16. 
| 0.5 | 0.9  | Harmonised standards are standards that have been published in the Official Journal of the European Union and are presumed to be in conformity with the requirements of the Regulation. These standards can be used to demonstrate conformity with general safety and performance requirements. This information is found on page 16. 
| 1.0 | 0.1  | Harmonised standards are standards that have been published in the Official Journal of the European Union. They are presumed to be in conformity with the requirements of the regulation. This information is found on page 16. 
| 1.0 | 0.5  | Harmonised standards are standards that have been published in the Official Journal of the European Union and are presumed to be in conformity with the requirements of the Regulation. This information is found on page 16. 
| 1.0 | 0.9  | Harmonised standards are standards that have been published in the Official Journal of the European Union. These standards are presumed to be in conformity with the requirements of the Regulation. (Page 16) 


### Gemini vs GPT2

In [15]:
from langchain_huggingface.llms import HuggingFacePipeline
from transformers import pipeline

generator = pipeline('text-generation', model='gpt2', max_length=1000, pad_token_id=50256, return_full_text=False)
gpt2 = HuggingFacePipeline(pipeline=generator)

In [16]:
rag_chain_gpt2 = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | gpt2
    | StrOutputParser()
)

query = "What is the medical devices regulation?"
Markdown(rag_chain_gpt2.invoke(query))

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.




The technical and legal definition provided by paragraph (e) of Article 2 shall be found in the 'Technical and Legal' annex

of this Regulation or in the technical and legal annex to this Regulation.

The following questions, which in themselves form an attempt to answer all questions raised here:

Question 1:

Who can request or receive a sample for testing purposes while holding a

medical device for testing purposes?

What constitutes a sample

of the device's manufacturer's specifications?

Question 2:

Who can determine the quality from which

the device is tested? What will

the quality test be in fact?

Question 3:

Do not use to obtain or administer a device by

in vitro testing or by a drug test that requires the

treatment with any medicine whatsoever, even if to achieve

this requirement, if it is not possible to test the device

and if, by reason of this failure to comply with this restriction and the

concern in the preceding paragraphs,

It becomes necessary to obtain

and administer samples

### Prompt tuning

In [17]:
template = """
{context}

Question: {question}

Helpful Answer:"""
simplified_prompt = PromptTemplate.from_template(template)

In [18]:
query = "What is a LLM?"
rag_chain_prompt_tuning = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | simplified_prompt
    | llm
    | StrOutputParser()
)
Markdown(rag_chain_prompt_tuning.invoke(query))

The provided text snippets don't contain information about LLMs (Large Language Models).  LLMs are a type of artificial intelligence that are trained on massive datasets of text and code.  They are capable of generating human-like text, translating languages, writing different kinds of creative content, and answering your questions in an informative way.

The text snippets you provided seem to be from various EU regulations and legal documents focusing on topics like:

* **Competency requirements for personnel performing assessments and verifications.** (Page 169)
* **Data protection measures.** (Page 81)
* **Risk assessment for medical devices.** (Page 178)
* **EU regulations related to medical devices.** (Page 1)
* **Communication requirements for companies.** (Page 15)

These topics are not directly related to LLMs. To find information about LLMs, you might need to look for resources that specifically discuss artificial intelligence, machine learning, or natural language processing. 


## Chat history

In [19]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnableLambda
from langchain.memory import ChatMessageHistory

chat_history = ChatMessageHistory()

system_template = """
You are a Q&A chatbot that helps to answer the user's questions about a given document. Always follow these rules to answer the question:

Use the following pieces of context to answer the questions.
If the question is not related to the context, just say it is not related.
If you don't know the answer to any of the questions, just say that you don't know, don't try to make up an answer.
Always mention in which pages the information you give are found.

<context>
{context}
</context>
"""

question_answering_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            system_template,
        ),
        MessagesPlaceholder(variable_name="chat_history"),
    ]
)

question_runnable = RunnableLambda(lambda input: input["question"])
chat_history_runnable = RunnableLambda(lambda input: input["chat_history"])


rag_chain = (
    {
        "context": question_runnable | retriever | format_docs,
        "question": question_runnable,
        "chat_history": chat_history_runnable,
    }
    | question_answering_prompt
    | llm
    | StrOutputParser()
)

In [20]:
def QuestionAnswerLoop():
    print("Enter your question (type 'quit' to exit): ")
    while True:
        user_input = input("Enter your question (type 'quit' to exit): ")
        if user_input.lower() == 'quit':
            print("Exiting Q&A chat. Goodbye!")
            break
        else:
            chat_history.add_user_message(user_input)
            response = rag_chain.invoke(
                {
                    "question": user_input, 
                    "chat_history": chat_history.messages
                }
            )

            # Add the AI's response to the chat history
            chat_history.add_ai_message(response)

            # Print the response
            print("Question: " + user_input)
            print("Answer: " + response)


In [21]:
QuestionAnswerLoop()

Enter your question (type 'quit' to exit): 
Question: What is the medical devices regulation?
Answer: The Medical Devices Regulation is a regulation that defines the requirements for medical devices and their use in the EU. It is intended to ensure that medical devices are safe and effective, and that they are used correctly. This information is found on page 5. 

Exiting Q&A chat. Goodbye!


## Basic RAG limitations

A basic RAG architecture will fetch documents from the database based on the similarity to the given user question. However, the user may not reference the document directly, e.g., "What is in the first page?" or "Can you further explain the first question?".

To solve this, we can pass the question and the chat history to the LLM in an initial step to create the question that will be searched in the database.

In [22]:
from langchain.retrievers import SelfQueryRetriever
from langchain.chains.query_constructor.schema import AttributeInfo
from langchain.callbacks.base import BaseCallbackHandler
from langchain.schema import LLMResult

class PrintLLMOutputCallbackHandler(BaseCallbackHandler):   # This is used so we can see what's going on behind the scenes
    def on_llm_end(self, response: LLMResult, **kwargs):
        print("LLM Generated:", response.generations[0][0].text)

llm_with_logging = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash", callbacks=[PrintLLMOutputCallbackHandler()]
)

metadata_field_info = [ # Setup the metadata that the LLM will be able to filter by
    AttributeInfo(
        name="page",
        type="int",
        description="The page number of the document",
    )
]

document_content_description = "Medical devices regulation"

query_with_history_template = """
You are an AI assistant that helps a user query a document.
The user makes some questions and you create the queries to find the parts of document that are most relevant to the questions.
The search will be performed by similarity so you need to provide a query similar to the contents that are in the document.
You do not have the context of the document. You will be making the queries based on the questions and history to get the context from the document.
This is the chat history of the conversation until now:
<history>
{chat_history}
</history>

Take into account the context of the chat history when preparing the query for the following question:
Prepare a query for this question. Output only the query as a natural language question.
Question: {question}"""

retriever_prompt_template = PromptTemplate.from_template(query_with_history_template)

improved_retriever = SelfQueryRetriever.from_llm(
    llm_with_logging,
    vectorstore,
    document_content_description,
    metadata_field_info,
    verbose=True,
)

# Ask the LLM to generate a question based on the chat history and the user question
# And ask the LLM to perform a search (can filter) based on the question generated
improved_retriever_chain = retriever_prompt_template | llm_with_logging | StrOutputParser()

# Normal QA RAG chain with the documents from the retrieval
improved_rag_chain = question_answering_prompt | llm_with_logging | StrOutputParser()


In [23]:
def formatted_chat_history():   # Format the chat history to be used in the initial prompt (that will generate the database question)
        result = ""
        question_id = 1
        for message in chat_history.messages:
            if message.type == "human":
                result += f"{question_id}. Human: {message.content}\n"
                question_id += 1
            else:
                result += f"AI: {message.content}\n"
        return result

def invoke_improved_rag_chain(user_input):
    llm_db_question = improved_retriever_chain.invoke({"question": user_input, "chat_history": formatted_chat_history()})
    docs = improved_retriever.invoke(llm_db_question)
    return improved_rag_chain.invoke(
        {
            "context": docs,
            "question": llm_db_question,
            "chat_history": chat_history.messages,
        }
    )


In [24]:
queries = [
    "What is in the first page of the document?",
    "Can you further explain the first question?",
]

In [25]:
chat_history = ChatMessageHistory() # reset the chat history

print("Normal RAG Chain\n\n")
for user_input in queries:
    chat_history.add_user_message(user_input)
    response = rag_chain.invoke(
        {"question": user_input, "chat_history": chat_history.messages}
    )
    chat_history.add_ai_message(response)
    print("Question: " + user_input)
    print("Answer: " + response)

Normal RAG Chain


Question: What is in the first page of the document?
Answer: I'm sorry, but the context provided doesn't contain information about the first page of the document. 

Question: Can you further explain the first question?
Answer: You asked: "What is in the first page of the document?"

I understand you are asking for the content of the first page of the document I am using as context.  However, the context I have only provides information from specific pages (230, 231, 232, and 169).  There is no information about the first page. 



Each query will generate 3 `LLM Generated` answers.
- The first one is the actual question to pass to the database, which serves as a way to include the chat history in this question
- The second is the LLM generated query to the database
- The third is the actual answer from the LLM to the user

In [26]:
chat_history = ChatMessageHistory()  # reset the chat history

print("Improved RAG Chain")
for user_input in queries:
    print("\n\n---")
    print("Question: " + user_input)
    chat_history.add_user_message(user_input)
    response = invoke_improved_rag_chain(user_input)
    chat_history.add_ai_message(response)


Improved RAG Chain


---
Question: What is in the first page of the document?
LLM Generated: What is the content of the first page of the document? 

LLM Generated: ```json
{
    "query": "Medical devices regulation",
    "filter": "eq(\"page\", 1)"
}
```
LLM Generated: The first page of the document contains the title of the document "REGULATION (EU) 2017/745 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL" and the subject matter and scope of the regulation. It also includes the definition of "medical devices" and "accessories for medical devices". This information is found on page 1 of the document. 



---
Question: Can you further explain the first question?
LLM Generated: What is the content of the first page of the document "REGULATION (EU) 2017/745 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL"? 

LLM Generated: ```json
{
    "query": "REGULATION (EU) 2017/745 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL",
    "filter": "eq(\"page\", 1)"
}
```
LLM Generated: The first page of the

### Conclusions
This different architecture does not work 100% of the time, but it is a good improvement to the basic RAG architecture.

In fact, this new architecture should be able to answer all of the questions that the basic RAG architecture can answer, and also some additional questions that the basic RAG architecture cannot answer.