# Building RAG Chatbots for Technical Documentation

## Table of contents

- [Introduction](#introduction)
- [Environment Setup](#environment-setup)
- [Indexing](#indexing)
- [Retriever](#retriever)
- [Prompt](#prompt)
- [LLM](#llm)
- [RAG Chain](#rag-chain)
- [Comparisons](#evaluation-metrics-and-comparison)
- [Chat history](#chat-history)

## Introduction 

This project involves implementing a retrieval augmented generation (RAG) with *LangChain* to create a chatbot for
answering questions about technical documentation. The document chosen for this assignment was the following: **The European Union Medical Device Regulation - Regulation (EU) 2017/745 (EU MDR)**. 

## Environment Setup

Install the packages and dependencies to be used:

In [30]:
# Install required libraries
%pip install -qU langchain langchain-community langchain-chroma langchain-text-splitters unstructured sentence_transformers langchain-huggingface huggingface_hub pdfplumber langchain-google-genai ipywidgets python-dotenv lark chainlit

Note: you may need to restart the kernel to use updated packages.


As Google's generative AI model is being used, ensure that the ``GOOGLE_API_KEY`` is securely stored in the ``.env`` file.

In [1]:
from dotenv import load_dotenv

load_dotenv()

True

## Embeddings

Firstly, we start by connecting to Google's generative AI embeddings model. The **Text Embeddings 004** model from Gemini is employed for the embedding generation, with the task_type set to *retrieval_document* to optimize embeddings for retrieval tasks. 

In [2]:
from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004", task_type="retrieval_document")

## Indexing

In the indexing stage, we start by loading the PDF document and splitting it into manageable sections. To optimize execution time and improve efficiency, we store the vector store locally in a folder named "db." This allows us to quickly access previously processed data without having to re-index the document each time.

In [3]:
import os
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PDFPlumberLoader

if os.path.exists("db"):
    vectorstore = Chroma(persist_directory="db", embedding_function=embeddings)
else:
    loader = PDFPlumberLoader("document.pdf")
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        add_start_index=True,
        separators=["\n\n", "\n", " ", ""],
    )
    pages = loader.load_and_split(text_splitter)
    vectorstore = Chroma.from_documents(
        documents=pages, embedding=embeddings, persist_directory="db"
    )

## Retriever

From the vector store, a retriever is created, configured to perform similarity searches and return the top 5 most relevant results:


In [4]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 5})

### Usage example

In [5]:
retrieved_docs = retriever.invoke("Describe the use of harmonised standards")

print("Retrieved document number : " + str(len(retrieved_docs)))

for doc in retrieved_docs:
    print("page " + str(doc.metadata["page"] + 1) + ":", doc.page_content[:300])

Retrieved document number : 0


## Prompt

We establish a structured format for the prompts sent to the LLM. This prompt format conveys the context while instructing the LLM to refrain from answering when it lacks confidence, thereby minimizing the risk of hallucinations.

In [7]:
from langchain_core.prompts import PromptTemplate

template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible. Mention in which pages the answer is found.

Context: {context}

Question: {question}

Helpful Answer:"""
prompt = PromptTemplate.from_template(template)

### Usage example

In [8]:
example_messages = prompt.invoke(
    {"context": "filler context", "question": "filler question"}
).to_messages()
example_messages

print(example_messages[0].content)

Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible. Mention in which pages the answer is found.

Context: filler context

Question: filler question

Helpful Answer:


# LLM

The LLM utilized in this project is **Gemini 1.5 Flash**, recognized as Google Gemini’s fastest multimodal model. It boasts an impressive context window of 1 million tokens, allowing for comprehensive understanding and processing of extensive inputs.

In [9]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")

### Usage example

In [10]:
from IPython.display import Markdown

result = llm.invoke("What is an LLM?")

print(result.__dict__.keys())
Markdown(result.content)

dict_keys(['content', 'additional_kwargs', 'response_metadata', 'type', 'name', 'id', 'example', 'tool_calls', 'invalid_tool_calls', 'usage_metadata'])


## What is an LLM?

LLM stands for **Large Language Model**. It's a type of artificial intelligence (AI) that excels at understanding and generating human-like text. 

Here's a breakdown:

**What it is:**

* **A complex algorithm:** LLMs are trained on massive datasets of text and code. They learn patterns and relationships within this data, enabling them to understand and generate text in a way that mimics human language.
* **A powerful tool:** They can be used for various tasks, including:
    * **Text generation:** Writing stories, articles, poems, and even code.
    * **Translation:** Converting text from one language to another.
    * **Summarization:** Condensing large amounts of text into concise summaries.
    * **Question answering:** Providing answers to questions based on given information.
    * **Dialogue generation:** Creating realistic and engaging chatbot conversations.

**Key features:**

* **Vast knowledge:** Trained on massive datasets, LLMs possess a broad understanding of various topics and domains.
* **Contextual understanding:** They can analyze text in context, understanding the meaning of words and phrases based on their surroundings.
* **Generative capabilities:** LLMs can generate new, creative text, often mimicking human writing styles.

**Examples:**

* **GPT-3 (Generative Pre-trained Transformer 3):** Developed by OpenAI, GPT-3 is one of the most well-known and powerful LLMs. It's used in various applications, including text generation, translation, and code completion.
* **BERT (Bidirectional Encoder Representations from Transformers):** Developed by Google, BERT is another popular LLM used for tasks like question answering and sentiment analysis.
* **LaMDA (Language Model for Dialogue Applications):** Developed by Google, LaMDA is specifically designed for conversational AI and powers Google's AI chatbot Bard.

**Potential impact:**

LLMs are revolutionizing the way we interact with technology and information. They have the potential to:

* **Improve communication:** Facilitate more natural and efficient communication between humans and machines.
* **Boost creativity:** Enable new forms of creative expression and content creation.
* **Automate tasks:** Streamline tasks that require text processing and understanding, like customer service and content moderation.

**However, LLMs also come with challenges:**

* **Bias and misinformation:** Trained on vast amounts of data, LLMs can reflect existing biases and generate misinformation.
* **Ethical concerns:** Questions arise about the responsible use of LLMs and their potential impact on society.

**In conclusion:**

LLMs are a powerful and rapidly evolving technology with the potential to revolutionize various aspects of our lives. As LLMs continue to develop, it's essential to understand their capabilities and limitations to ensure their responsible and ethical use.


## RAG chain

Putting it all together, we can now define a RAG chain that takes a question, retrieves relevant documents, constructs a prompt, passes it into a model, and parses the output.

In [11]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    formatted_docs = []
    for doc in docs:
        page_number = doc.metadata["page"] + 1 
        content_with_page = f"Page {page_number}:\n{doc.page_content}"
        formatted_docs.append(content_with_page)
    return "\n\n".join(formatted_docs)

In [12]:
rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | llm
    | StrOutputParser()
)

### Usage example

In [13]:
query = "What is the medical devices regulation?"
Markdown(rag_chain.invoke(query))

The Medical Devices Regulation (Regulation (EU) 2017/746) is a European Union regulation that establishes the rules for the safety and performance of medical devices. It sets out the requirements for manufacturers, importers, distributors, and other economic operators involved in the supply chain of medical devices. This regulation is found across the provided document, but specifically defined on pages 2 and 5. 


## Evaluation metrics and comparison

### Tuning parameters

In [14]:
temperatures = [0.1, 0.5, 1.0]
top_ps = [0.1, 0.5, 0.9]

results = "| Temperature | Top P | Response |\n" + "|-------------|-------|----------|\n"

for temperature in temperatures:
    for top_p in top_ps:
        llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=temperature, top_p=top_p)

        query = "What is harmonised standards?"
        response = rag_chain.invoke(query)

        results += f"| {temperature} | {top_p}  | {response}"

Markdown(results)

| Temperature | Top P | Response |
|-------------|-------|----------|
| 0.1 | 0.1  | Harmonised standards are standards that have been published in the Official Journal of the European Union. These standards are presumed to be in conformity with the requirements of the Regulation. This information is found on Page 16. 
| 0.1 | 0.5  | Harmonised standards are standards that have been published in the Official Journal of the European Union and are presumed to be in conformity with the requirements of the Regulation. This information is found on Page 16. 
| 0.1 | 0.9  | Harmonised standards are standards that have been published in the Official Journal of the European Union. These standards are presumed to be in conformity with the requirements of the Regulation.  This information is found on page 16. 
| 0.5 | 0.1  | Harmonised standards are standards that have been published in the Official Journal of the European Union. These standards are presumed to be in conformity with the requirements of the Regulation.  This information is found on page 16. 
| 0.5 | 0.5  | Harmonised standards are standards that have been published in the Official Journal of the European Union and are presumed to be in conformity with the requirements of the Regulation. This information is found on page 16. 
| 0.5 | 0.9  | Harmonised standards are standards published in the Official Journal of the European Union that are presumed to be in conformity with the requirements of the Regulation. These standards can be applied to demonstrate conformity with general safety and performance requirements. This information is found on pages 16 and 147. 
| 1.0 | 0.1  | Harmonized standards are standards that have been published in the Official Journal of the European Union and are presumed to be in conformity with the requirements of the Regulation. This information is found on page 16. 
| 1.0 | 0.5  | Harmonised standards are standards that have been published in the Official Journal of the European Union. These standards are presumed to be in conformity with the requirements of the Regulation. This information is found on page 16. 
| 1.0 | 0.9  | Harmonised standards are standards that have been published in the Official Journal of the European Union. They are presumed to be in conformity with the requirements of the Regulation. This information can be found on page 16. 


### Gemini vs GPT2

In [15]:
from langchain_huggingface.llms import HuggingFacePipeline
from transformers import pipeline

generator = pipeline('text-generation', model='gpt2', max_length=1000, pad_token_id=50256, return_full_text=False)
gpt2 = HuggingFacePipeline(pipeline=generator)

In [16]:
rag_chain_gpt2 = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | gpt2
    | StrOutputParser()
)

query = "What is the medical devices regulation?"
Markdown(rag_chain_gpt2.invoke(query))

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.




1. The regulations apply to general purposes to any medical device which is intended to be used

for all purposes of the protection of human health, except insofar as such use

may cause serious injury or disability in a person or in the process of a medical situation as

part of an emergency.

2. The regulations must also apply to, or in relation to, any specific use thereof that is to be limited

to certain specific uses. In addition, they may specify the exact circumstances under which

a specific use is likely to be prohibited, the procedures that are used as a control

over incidental uses, and the general methods and activities employed.

3. The regulation may also specify the specific conditions for

presuming or reducing the harmful effects of substances or substances containing

particles.

4. The rules shall exclude the following situations:

— where a specified percentage of this Article applies to other uses, if (1)

the quantity of the used substances or the particular

specific use is not a specific danger to the human health,

(2

### Prompt tuning

In [None]:
template = """
{context}

Question: {question}

Helpful Answer:"""
simplified_prompt = PromptTemplate.from_template(template)

In [None]:
query = "What is a LLM?"
rag_chain_prompt_tuning = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | simplified_prompt
    | llm
    | StrOutputParser()
)
Markdown(rag_chain_prompt_tuning.invoke(query))

The provided text snippets don't contain information about LLMs (Large Language Models). Therefore, it's impossible to answer the question from this provided text. 

LLMs are a type of artificial intelligence (AI) that are trained on massive amounts of text data. They are capable of generating human-like text, translating languages, writing different kinds of creative content, and answering your questions in an informative way. 

If you would like to learn more about LLMs, please let me know! I can provide you with information about their capabilities, applications, and limitations. 


## Chat history

In [None]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnableLambda
from langchain.memory import ChatMessageHistory

chat_history = ChatMessageHistory()

system_template = """
You are a Q&A chatbot that helps to answer the user's questions about a given document. Always follow these rules to answer the question:

Use the following pieces of context to answer the questions.
If the question is not related to the context, just say it is not related.
If you don't know the answer to any of the questions, just say that you don't know, don't try to make up an answer.
Always mention in which pages the information you give are found.

<context>
{context}
</context>
"""

question_answering_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            system_template,
        ),
        MessagesPlaceholder(variable_name="chat_history"),
    ]
)

question_runnable = RunnableLambda(lambda input: input["question"])
chat_history_runnable = RunnableLambda(lambda input: input["chat_history"])


rag_chain = (
    {
        "context": question_runnable | retriever | format_docs,
        "question": question_runnable,
        "chat_history": chat_history_runnable,
    }
    | question_answering_prompt
    | llm
    | StrOutputParser()
)

In [20]:
def QuestionAnswerLoop():
    print("Enter your question (type 'quit' to exit): ")
    while True:
        user_input = input("Enter your question (type 'quit' to exit): ")
        if user_input.lower() == 'quit':
            print("Exiting Q&A chat. Goodbye!")
            break
        else:
            chat_history.add_user_message(user_input)
            response = rag_chain.invoke(
                {
                    "question": user_input, 
                    "chat_history": chat_history.messages
                }
            )

            # Add the AI's response to the chat history
            chat_history.add_ai_message(response)

            # Print the response
            print("Question: " + user_input)
            print("Answer: " + response)


In [21]:
QuestionAnswerLoop()

Enter your question (type 'quit' to exit): 
Question: What is the medical devices regulation?
Answer: The Medical Devices Regulation (MDR) is a comprehensive set of rules governing the safety and performance of medical devices in the European Union.  It's a complex piece of legislation, but here's a breakdown of key points based on your provided text snippets:

**Purpose:**

* The MDR aims to ensure the safety and effectiveness of medical devices used on humans, while promoting innovation and patient safety.
* It covers a wide range of medical devices, from simple instruments to complex implants and software.

**Key Elements:**

* **Definition of "medical device":**  The text defines a medical device as any instrument, apparatus, appliance, software, implant, etc. intended by the manufacturer to be used for a specific medical purpose. Examples include:
    * **Diagnosis:** Identifying a disease
    * **Prevention:**  Stopping a disease from developing
    * **Monitoring:** Tracking a p

## Basic RAG limitations

A basic RAG architecture will fetch documents from the database based on the similarity to the given user question. However, the user may not reference the document directly, e.g., "What is in the first page?" or "Can you further explain the first question?".

To solve this, we can pass the question and the chat history to the LLM in an initial step to create the question that will be searched in the database.

In [22]:
from langchain.retrievers import SelfQueryRetriever
from langchain.chains.query_constructor.schema import AttributeInfo
from langchain.callbacks.base import BaseCallbackHandler
from langchain.schema import LLMResult

class PrintLLMOutputCallbackHandler(BaseCallbackHandler):   # This is used so we can see what's going on behind the scenes
    def on_llm_end(self, response: LLMResult, **kwargs):
        print("LLM Generated:", response.generations[0][0].text)

llm_with_logging = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash", callbacks=[PrintLLMOutputCallbackHandler()]
)

metadata_field_info = [ # Setup the metadata that the LLM will be able to filter by
    AttributeInfo(
        name="page",
        type="int",
        description="The page number of the document",
    )
]

document_content_description = "Medical devices regulation"

query_with_history_template = """
You are an AI assistant that helps a user query a document.
The user makes some questions and you create the queries to find the parts of document that are most relevant to the questions.
The search will be performed by similarity so you need to provide a query similar to the contents that are in the document.
You do not have the context of the document. You will be making the queries based on the questions and history to get the context from the document.
This is the chat history of the conversation until now:
<history>
{chat_history}
</history>

Take into account the context of the chat history when preparing the query for the following question:
Prepare a query for this question. Output only the query as a natural language question.
Question: {question}"""

retriever_prompt_template = PromptTemplate.from_template(query_with_history_template)

improved_retriever = SelfQueryRetriever.from_llm(
    llm_with_logging,
    vectorstore,
    document_content_description,
    metadata_field_info,
    verbose=True,
)

# Ask the LLM to generate a question based on the chat history and the user question
# And ask the LLM to perform a search (can filter) based on the question generated
improved_retriever_chain = retriever_prompt_template | llm_with_logging | StrOutputParser()

# Normal QA RAG chain with the documents from the retrieval
improved_rag_chain = question_answering_prompt | llm_with_logging | StrOutputParser()


In [23]:
def formatted_chat_history():   # Format the chat history to be used in the initial prompt (that will generate the database question)
        result = ""
        question_id = 1
        for message in chat_history.messages:
            if message.type == "human":
                result += f"{question_id}. Human: {message.content}\n"
                question_id += 1
            else:
                result += f"AI: {message.content}\n"
        return result

def invoke_improved_rag_chain(user_input):
    llm_db_question = improved_retriever_chain.invoke({"question": user_input, "chat_history": formatted_chat_history()})
    docs = improved_retriever.invoke(llm_db_question)
    return improved_rag_chain.invoke(
        {
            "context": docs,
            "question": llm_db_question,
            "chat_history": chat_history.messages,
        }
    )


In [24]:
queries = [
    "What is in the first page of the document?",
    "Can you further explain the first question?",
]

In [25]:
chat_history = ChatMessageHistory() # reset the chat history

print("Normal RAG Chain\n\n")
for user_input in queries:
    chat_history.add_user_message(user_input)
    response = rag_chain.invoke(
        {"question": user_input, "chat_history": chat_history.messages}
    )
    chat_history.add_ai_message(response)
    print("Question: " + user_input)
    print("Answer: " + response)

Normal RAG Chain


Question: What is in the first page of the document?
Answer: Unfortunately, without the context of the entire document, it's impossible to know what's on the first page. The provided snippets seem to be from different sections within a larger document, likely a legal or technical regulation. 

To determine the content of the first page, you'll need the full document. 

Question: Can you further explain the first question?
Answer: It seems you are asking about the "Correlation Table" in Annex XVII of Regulation 02017R0745. This table shows how provisions from two older directives (Council Directive 90/385/EEC and Council Directive 93/42/EEC) are incorporated into this newer regulation. 

**Understanding the Table:**

* **Columns:** The table has three columns:
    * **Council Directive 90/385/EEC:**  Lists articles from the first directive.
    * **Council Directive 93/42/EEC:** Lists articles from the second directive.
    * **This Regulation:** Lists articles from t

Each query will generate 3 `LLM Generated` answers.
- The first one is the actual question to pass to the database, which serves as a way to include the chat history in this question
- The second is the LLM generated query to the database
- The third is the actual answer from the LLM to the user

In [29]:
chat_history = ChatMessageHistory()  # reset the chat history

print("Improved RAG Chain")
for user_input in queries:
    print("\n\n---")
    print("Question: " + user_input)
    chat_history.add_user_message(user_input)
    response = invoke_improved_rag_chain(user_input)
    chat_history.add_ai_message(response)


Improved RAG Chain


---
Question: What is in the first page of the document?
LLM Generated: What is on the first page of this document? 

LLM Generated: ```json
{
    "query": "",
    "filter": "eq(\"page\", 1)"
}
```
LLM Generated: The first page of the document contains the title of the document, which is "REGULATION (EU) 2017/745 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 5 April 2017 on medical devices, amending Directive 2001/83/EC, Regulation (EC) No 178/2002 and Regulation (EC) No 1223/2009 and repealing Council Directives 90/385/EEC and 93/42/EEC (Text with EEA relevance)". It also includes information about the subject matter and scope of the regulation. 
This information can be found on page 1 of the document. 



---
Question: Can you further explain the first question?
LLM Generated: What is the subject matter and scope of the regulation "REGULATION (EU) 2017/745 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 5 April 2017 on medical devices, amending Directive 2001

### Conclusions
This different architecture does not work 100% of the time, but it is a good improvement to the basic RAG architecture.

In fact, this new architecture should be able to answer all of the questions that the basic RAG architecture can answer, and also some additional questions that the basic RAG architecture cannot answer.