# Building RAG Chatbots for Technical Documentation

## Table of contents

- [Introduction](#introduction)
- [Environment Setup](#environment-setup)
- [Indexing](#indexing)
- [Retriever](#retriever)
- [Prompt](#prompt)
- [LLM](#llm)
- [RAG Chain](#rag-chain)
- [Comparisons](#evaluation-metrics-and-comparison)
- [Chat history](#chat-history)

## Introduction 

This project involves implementing a retrieval augmented generation (RAG) with *LangChain* to create a chatbot for
answering questions about technical documentation. The document chosen for this assignment was the following: **The European Union Medical Device Regulation - Regulation (EU) 2017/745 (EU MDR)**. 

## Environment Setup

Install the packages and dependencies to be used:

In [167]:
# Install required libraries
%pip install -qU langchain langchain-community langchain-chroma langchain-text-splitters unstructured sentence_transformers langchain-huggingface huggingface_hub pdfplumber langchain-google-genai

Note: you may need to restart the kernel to use updated packages.


As Google's generative AI model is being used, ensure that the ``GOOGLE_API_KEY`` is securely stored in the ``.env`` file.

In [168]:
from dotenv import load_dotenv

load_dotenv()

True

## Embeddings

Firstly, we start by connecting to Google's generative AI embeddings model. The **Text Embeddings 004** model from Gemini is employed for the embedding generation, with the task_type set to *retrieval_document* to optimize embeddings for retrieval tasks. 

In [169]:
from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004", task_type="retrieval_document")

## Indexing

In the indexing stage, we start by loading the PDF document and splitting it into manageable sections. To optimize execution time and improve efficiency, we store the vector store locally in a folder named "db." This allows us to quickly access previously processed data without having to re-index the document each time.

In [170]:
import os
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PDFPlumberLoader

if os.path.exists("db"): 
    vectorstore = Chroma(persist_directory="db", embedding_function=embeddings)
else:
    loader = PDFPlumberLoader("document.pdf")
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        add_start_index=True,
        separators=["\n\n", "\n", " ", ""],
    )
    pages = loader.load_and_split(text_splitter)
    vectorstore = Chroma.from_documents(
        documents=pages, embedding=embeddings, persist_directory="db"
    )

## Retriever

From the vector store, a retriever is created, configured to perform similarity searches and return the top 5 most relevant results:


In [171]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 5})

### Usage example

In [172]:
retrieved_docs = retriever.invoke("Describe the use of harmonised standards")

print("Retrieved document number : " + str(len(retrieved_docs)))

for doc in retrieved_docs:
    print("page " + str(doc.metadata["page"] + 1) + ":", doc.page_content[:300])

Retrieved document number : 5
page 169: of those staff, in order to ensure that personnel who carry out and perform
assessment and verification operations are competent to fulfil the tasks
required of them.
page 196: conformity assessment procedures,
— identification of applicable general safety and performance
requirements and solutions to fulfil those requirements, taking
applicable CS and, where opted for, harmonised standards or
other adequate solutions into account,
— risk management as referred to in Secti
page 147: purpose, and shall include a justification, validation and verification of the
solutions adopted to meet those requirements. The demonstration of
conformity shall include:
(a) the general safety and performance requirements that apply to the device
and an explanation as to why others do not apply;
(
page 153: description of the conformity assessment procedure performed and identifi­
cation of the certificate or certificates issued;
9. Where applicable, additional info

## Prompt

We establish a structured format for the prompts sent to the LLM. This prompt format conveys the context while instructing the LLM to refrain from answering when it lacks confidence, thereby minimizing the risk of hallucinations.

In [173]:
from langchain_core.prompts import PromptTemplate

template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible. Mention in which pages the answer is found.

Context: {context}

Question: {question}

Helpful Answer:"""
prompt = PromptTemplate.from_template(template)

### Usage example

In [174]:
example_messages = prompt.invoke(
    {"context": "filler context", "question": "filler question"}
).to_messages()
example_messages

print(example_messages[0].content)

Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible. Mention in which pages the answer is found.

Context: filler context

Question: filler question

Helpful Answer:


# LLM

The LLM utilized in this project is **Gemini 1.5 Flash**, recognized as Google Gemini’s fastest multimodal model. It boasts an impressive context window of 1 million tokens, allowing for comprehensive understanding and processing of extensive inputs.

In [175]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")

### Usage example

In [176]:
from IPython.display import Markdown

result = llm.invoke("What is an LLM?")

print(result.__dict__.keys())
Markdown(result.content)

dict_keys(['content', 'additional_kwargs', 'response_metadata', 'type', 'name', 'id', 'example', 'tool_calls', 'invalid_tool_calls', 'usage_metadata'])


LLM stands for **Large Language Model**. It's a type of artificial intelligence (AI) that excels at understanding and generating human-like text. 

Here's a breakdown:

**What it is:**

* **A complex statistical model:** LLMs are trained on massive datasets of text and code, learning patterns and relationships within the data. 
* **Predictive in nature:** They predict the next word in a sequence, based on the context of the preceding words. This allows them to generate coherent and contextually relevant text.
* **Capable of various tasks:** LLMs can perform a wide range of natural language processing (NLP) tasks, including:
    * **Text generation:** Writing stories, poems, articles, code, etc.
    * **Translation:** Translating text between languages.
    * **Summarization:** Condensing large amounts of text into concise summaries.
    * **Question answering:** Providing answers to questions based on given text.
    * **Dialogue generation:** Engaging in conversations with humans.
    * **Code generation:** Writing computer code.

**Examples:**

* **GPT-3 (Generative Pre-trained Transformer 3):** Developed by OpenAI, it's known for its impressive text generation abilities and has been used in various applications, including content creation, chatbot development, and code writing.
* **LaMDA (Language Model for Dialogue Applications):** Developed by Google, it focuses on conversational AI and is designed to engage in natural, human-like dialogues.
* **BERT (Bidirectional Encoder Representations from Transformers):** Developed by Google, it excels at understanding the meaning of words in context and is widely used in tasks like sentiment analysis and question answering.

**Key features:**

* **Scale:** LLMs are trained on massive datasets, often billions of words or more.
* **Transformers:** Many LLMs use transformer architecture, a powerful neural network architecture that has revolutionized NLP.
* **Pre-training:** LLMs are typically pre-trained on a large dataset before being fine-tuned for specific tasks.

**Impact and future:**

LLMs are rapidly evolving and have the potential to revolutionize various industries, from content creation and customer service to education and research. However, ethical concerns regarding bias, misinformation, and job displacement need to be addressed as these models become more powerful.

**In essence, LLMs are powerful AI tools that can understand and generate human-like text, opening up exciting possibilities but also raising important ethical questions.**


## RAG chain

Putting it all together, we can now define a RAG chain that takes a question, retrieves relevant documents, constructs a prompt, passes it into a model, and parses the output.

In [177]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    formatted_docs = []
    for doc in docs:
        page_number = doc.metadata["page"] + 1 
        content_with_page = f"Page {page_number}:\n{doc.page_content}"
        formatted_docs.append(content_with_page)
    return "\n\n".join(formatted_docs)

In [178]:
rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | llm
    | StrOutputParser()
)

### Usage example

In [179]:
query = "What is the medical devices regulation?"
Markdown(rag_chain.invoke(query))

The Medical Devices Regulation (Regulation (EU) 2017/746) is a regulation of the European Union that governs the placing on the market and putting into service of medical devices. This regulation defines a medical device as any instrument, apparatus, appliance, software, implant, reagent, material or other article intended by the manufacturer to be used, alone or in combination, for human beings for a specific medical purpose. This definition is found on page 5 of the document. 


## Evaluation metrics and comparison

### Tuning parameters

In [180]:
temperatures = [0.1, 0.5, 1.0]
top_ps = [0.1, 0.5, 0.9]

results = "| Temperature | Top P | Response |\n" + "|-------------|-------|----------|\n"

for temperature in temperatures:
    for top_p in top_ps:
        llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=temperature, top_p=top_p)

        query = "What is harmonised standards?"
        response = rag_chain.invoke(query)

        results += f"| {temperature} | {top_p}  | {response}"

Markdown(results)

| Temperature | Top P | Response |
|-------------|-------|----------|
| 0.1 | 0.1  | Harmonized standards are a set of standards that are applied to demonstrate conformity with general safety and performance requirements. They are mentioned in the context of demonstrating conformity with the general safety and performance requirements of a device. This information is found on page 147. 
| 0.1 | 0.5  | Harmonized standards are standards that are applied to demonstrate conformity with general safety and performance requirements. This is mentioned on page 147, under point (c) of the demonstration of conformity. These standards are used alongside other solutions like CS or other methods to ensure compliance. 
| 0.1 | 0.9  | Harmonized standards are specific standards used to demonstrate conformity with general safety and performance requirements. They are mentioned alongside other solutions like Conformity Assessment Schemes (CS) on Page 147. The demonstration of conformity must include the precise identity of the controlled documents offering evidence of conformity with each harmonized standard. 
| 0.5 | 0.1  | Harmonized standards are standards that are used to demonstrate conformity with general safety and performance requirements. They are mentioned in the context of demonstrating conformity with the general safety and performance requirements of a device. This information is found on page 147. 
| 0.5 | 0.5  | Harmonised standards are a set of standards that are used to demonstrate conformity with general safety and performance requirements. These standards are referenced in the technical documentation to offer evidence of conformity. This information can be found on page 147. 
| 0.5 | 0.9  | Harmonized standards are standards that are applied to demonstrate conformity with general safety and performance requirements. This information is found on Page 147, specifically in point (c) of the "demonstration of conformity" section. 
| 1.0 | 0.1  | Harmonized standards are a set of standards that manufacturers can apply to demonstrate conformity with general safety and performance requirements. This information is found on page 147, specifically in section (c).  Harmonized standards are one of the solutions used to demonstrate conformity, along with other methods like CS or other solutions. 
| 1.0 | 0.5  | Harmonised standards are standards that are applied to demonstrate conformity with general safety and performance requirements. This information is found on page 147, specifically in point (c) of the demonstration of conformity. These standards are used to ensure the safety and performance of medical devices throughout their lifetime. 
| 1.0 | 0.9  | Harmonised standards are standards that are applied to demonstrate conformity with the general safety and performance requirements of a device. They are mentioned in the context of the demonstration of conformity, which includes the method or methods used to demonstrate conformity with each applicable general safety and performance requirement. This information is found on page 147 of the document. 


### Gemini vs GPT2

In [181]:
from langchain_huggingface.llms import HuggingFacePipeline
from transformers import pipeline

generator = pipeline('text-generation', model='gpt2', max_length=1000, pad_token_id=50256, return_full_text=False)
gpt2 = HuggingFacePipeline(pipeline=generator)

In [182]:
rag_chain_gpt2 = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | gpt2
    | StrOutputParser()
)

query = "What is the medical devices regulation?"
Markdown(rag_chain_gpt2.invoke(query))

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.




This Regulation provides for prescribing devices with a prescribed medical and

medical practice, particularly for an accident or serious injury. The following are

1) the technical conditions which are intended to apply after approval from the Technical (Parties)

(Chapter 1) Commission.

. 2) the conditions and the procedures that will in time be prescribed before receiving approval from the Commission. In

concealed agreements in force between technical bodies concerning any matters relating

to human life‑related health. These agreements may have

related provisions covering the protection of persons at the front and

back when working on their device. The legal and

legal consequences of taking that action, with respect to personal

health, including the risk it might, would

have had to a person when they became sick or injured.

3) the conditions by which the regulations apply to device manufacturers,

especially in the case of medical devices. It shall be

unlawful (Article 6 of Regulation (EU) 2016/39/EC) to have any

device device manufacturer or any other organisation

### Prompt tuning

In [183]:
template = """
{context}

Question: {question}

Helpful Answer:"""
prompt = PromptTemplate.from_template(template)

In [184]:
query = "What is a LLM?"
rag_chain_prompt_tuning = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | llm
    | StrOutputParser()
)
Markdown(rag_chain_prompt_tuning.invoke(query))

The provided text snippets don't contain information about what an LLM is.  

LLM stands for **Large Language Model**.  It's a type of artificial intelligence that is trained on massive amounts of text data to understand and generate human-like text. 

The provided text is from various EU regulations and directives, focusing on things like:

* **Competency requirements for staff:** Ensuring personnel involved in assessments and verifications are qualified. 
* **Data protection:** Implementing measures to protect information and personal data.
* **Communication requirements:**  Regulations regarding communication with individuals and entities within the EU.
* **References to other regulations:**  Listing specific articles from other directives and regulations. 

These topics are relevant to various fields, but they don't provide any definition or information about LLMs. 


## Chat history

In [185]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnableLambda
from langchain.memory import ChatMessageHistory

chat_history = ChatMessageHistory()

system_template = """
You are a Q&A chatbot that helps to answer the user's questions about a given document. Always follow these rules to answer the question:

Use the following pieces of context to answer the questions.
If the question is not related to the context, just say it is not related.
If you don't know the answer to any of the questions, just say that you don't know, don't try to make up an answer.
Always mention in which pages the information you give are found.

<context>
{context}
</context>
"""

question_answering_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            system_template,
        ),
        MessagesPlaceholder(variable_name="chat_history"),
    ]
)

question_runnable = RunnableLambda(lambda input: input["question"])
chat_history_runnable = RunnableLambda(lambda input: input["chat_history"])


rag_chain = (
    {
        "context": question_runnable | retriever | format_docs,
        "question": question_runnable,
        "chat_history": chat_history_runnable,
    }
    | prompt
    | llm
    | StrOutputParser()
)

In [186]:
def QuestionAnswerLoop():
    print("Enter your question (type 'quit' to exit): ")
    while True:
        user_input = input("Enter your question (type 'quit' to exit): ")
        if user_input.lower() == 'quit':
            print("Exiting Q&A chat. Goodbye!")
            break
        else:
            chat_history.add_user_message(user_input)
            response = rag_chain.invoke(
                {
                    "question": user_input, 
                    "chat_history": chat_history.messages
                }
            )

            # Add the AI's response to the chat history
            chat_history.add_ai_message(response)

            # Print the response
            print("Question: " + user_input)
            print("Answer: " + response)


In [187]:
QuestionAnswerLoop()

Enter your question (type 'quit' to exit): 
Question: What is the medical devices regulation?
Answer: The provided text is an excerpt from the **Medical Devices Regulation (EU) 2017/745**. 

This regulation sets out the rules for the **safety and performance** of medical devices sold in the European Union. It aims to:

* **Ensure high standards of safety and performance** for medical devices.
* **Harmonize regulations** across all EU countries.
* **Improve transparency** and traceability of medical devices.
* **Increase patient safety** by providing better information and oversight.

The regulation covers a wide range of devices, including:

* Instruments
* Apparatus
* Appliances
* Software
* Implants
* Reagents
* Materials
* Other articles

The text excerpts highlight key aspects of the regulation, such as:

* **Scope:** The regulation does not affect national laws regarding healthcare provision, but it sets rules for medical devices themselves. 
* **Definitions:** The definition of a

## Basic RAG limitations

A basic RAG architecture will fetch documents from the database based on the similarity to the given user question. However, the user may not reference the document directly, e.g., "What is in the first page?" or "Can you further explain the first question?".

To solve this, we can pass the question and the chat history to the LLM in an initial step to create the question that will be searched in the database.

In [188]:
from langchain.retrievers import SelfQueryRetriever
from langchain.chains.query_constructor.schema import AttributeInfo
from langchain.callbacks.base import BaseCallbackHandler
from langchain.schema import LLMResult

class PrintLLMOutputCallbackHandler(BaseCallbackHandler):   # This is used so we can see what's going on behind the scenes
    def on_llm_end(self, response: LLMResult, **kwargs):
        print("LLM Generated:", response.generations[0][0].text)

llm_with_logging = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash", callbacks=[PrintLLMOutputCallbackHandler()]
)

metadata_field_info = [ # Setup the metadata that the LLM will be able to filter by
    AttributeInfo(
        name="page",
        type="int",
        description="The page number of the document",
    )
]

document_content_description = "Medical devices regulation"

query_with_history_template = """
You are an AI assistant that helps a user query a document.
The user makes some questions and you create the queries to find the parts of document that are most relevant to the questions.
The search will be performed by similarity so you need to provide a query similar to the contents that are in the document.
You do not have the context of the document. You will be making the queries based on the questions and history to get the context from the document.
This is the chat history of the conversation until now:
<history>
{chat_history}
</history>

Take into account the context of the chat history when preparing the query for the following question:
Prepare a query for this question. Output only the query as a natural language question.
Question: {question}"""

retriever_prompt_template = PromptTemplate.from_template(query_with_history_template)

improved_retriever = SelfQueryRetriever.from_llm(
    llm_with_logging,
    vectorstore,
    document_content_description,
    metadata_field_info,
    verbose=True,
)

# Ask the LLM to generate a question based on the chat history and the user question
# And ask the LLM to perform a search (can filter) based on the question generated
improved_retriever_chain = retriever_prompt_template | llm_with_logging | StrOutputParser()

# Normal QA RAG chain with the documents from the retrieval
improved_rag_chain = question_answering_prompt | llm_with_logging | StrOutputParser()


In [189]:
def formatted_chat_history():   # Format the chat history to be used in the initial prompt (that will generate the database question)
        result = ""
        question_id = 1
        for message in chat_history.messages:
            if message.type == "human":
                result += f"{question_id}. Human: {message.content}\n"
                question_id += 1
            else:
                result += f"AI: {message.content}\n"
        return result

def invoke_improved_rag_chain(user_input):
    llm_db_question = improved_retriever_chain.invoke({"question": user_input, "chat_history": formatted_chat_history()})
    docs = improved_retriever.invoke(llm_db_question)
    return improved_rag_chain.invoke(
        {
            "context": docs,
            "question": llm_db_question,
            "chat_history": chat_history.messages,
        }
    )


In [190]:
queries = [
    "What is in the first page of the document?",
    "Can you further explain the first question?",
]

In [191]:
chat_history = ChatMessageHistory() # reset the chat history

print("Normal RAG Chain\n\n")
for user_input in queries:
    chat_history.add_user_message(user_input)
    response = rag_chain.invoke(
        {"question": user_input, "chat_history": chat_history.messages}
    )
    chat_history.add_ai_message(response)
    print("Question: " + user_input)
    print("Answer: " + response)

Normal RAG Chain


Question: What is in the first page of the document?
Answer: Unfortunately, I can't tell you what's on the first page of the document. You've provided me with snippets from different pages, but not the actual first page. 

To find out what's on the first page, you'll need to:

1. **Locate the full document:**  Do you have a PDF or physical copy of this document?
2. **Open the document:**  Find the beginning of the document, usually marked by a page number "1."

Once you have the full document, you'll be able to see the contents of the first page. 

Question: Can you further explain the first question?
Answer: Please provide me with the first question you are referring to. I need the actual question to understand what you are asking for further explanation. 

The text you provided seems to be a correlation table between articles in different regulations. Without the actual question, I cannot provide a helpful answer. 

Please share the question, and I will be happy to

Each query will generate 3 `LLM Generated` answers.
- The first one is the actual question to pass to the database, which serves as a way to include the chat history in this question
- The second is the LLM generated query to the database
- The third is the actual answer from the LLM to the user

In [192]:
chat_history = ChatMessageHistory()  # reset the chat history

print("Improved RAG Chain\n\n")
for user_input in queries:
    print("Question: " + user_input)
    print("---")
    chat_history.add_user_message(user_input)
    response = invoke_improved_rag_chain(user_input)
    chat_history.add_ai_message(response)
    print("---")


Improved RAG Chain


Question: What is in the first page of the document?
---
LLM Generated: What is the content of the first page of the document? 

LLM Generated: ```json
{
    "query": "Medical devices regulation",
    "filter": "eq(\"page\", 1)"
}
```
LLM Generated: The first page of the document contains the title of the document, which is "REGULATION (EU) 2017/745 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL", and the subject matter and scope of the regulation, which is to lay down rules concerning the placing on the market, making available on the market or putting into service of medical devices for human use and accessories for such devices in the Union. It also applies to clinical investigations concerning such medical devices and accessories conducted in the Union. This information is found on page 1. 

---
Question: Can you further explain the first question?
---
LLM Generated: What is the purpose and scope of the REGULATION (EU) 2017/745 OF THE EUROPEAN PARLIAMENT AND OF 

### Conclusions
This different architecture does not work 100% of the time, but it is a good improvement to the basic RAG architecture.

In fact, this new architecture should be able to answer all of the questions that the basic RAG architecture can answer, and also some additional questions that the basic RAG architecture cannot answer.