#  Retrieval-Augmented Generation (RAGS)

- RAG architectures allow for injecting additional information to a query to augment the knownledge provided by the LLM with domain specific knowledge.
- This is often helpful in situations where not sufficient knowledge was available in the corpus of the pretrained model.
- Alternatively, fine-tuning can be considered to further train the model on the new data, but this is often resource intensive and requires large data sets of the representative domain.
- In this notebook we introduce build a simple RAG architecture to demonstrate the concept and extend it to query DORA regulations.

![title](https://github.com/blueraincloud/blueraincloud.github.io/blob/main/images/RAG/rag.png?raw=true)

## Imports and dependencies

In [1]:
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import AIMessage
import bs4

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [None]:
!pip install langchain
!pip install langchain-community
!pip install sentence-transformers
!pip install faiss-cpu
!pip install bs4
!pip install langchain-groq

## Generate a document for the RAG

Firstly, we ask chatGPT generate a document pretending that prompt-engineering has a different definition to its original meaning.
It is given a completly bogus meaning, and this is used to check the effects of the RAG on the system.
Additionally, a bogus word is made up "clorkimn" and given a definition.
The document generated is located here:

https://github.com/blueraincloud/blueraincloud.github.io/blob/main/misc/rag-text.txt

## Fetch the document for the RAG

In [3]:
# Document location
urls = [
    "https://github.com/blueraincloud/blueraincloud.github.io/blob/main/misc/rag-text.txt"
]

#  User Agent
class CustomWebBaseLoader(WebBaseLoader):
    def __init__(self, url):
        super().__init__(url, requests_kwargs={"headers": {"User-Agent": "Mozilla/5.0"}})

# Load document
docs = [CustomWebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

In [4]:
# Text_splitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=200, chunk_overlap=0
)
# Split the documents into chunks
doc_splits = text_splitter.split_documents(docs_list)

In [None]:
vectorstore = FAISS.from_documents(doc_splits, HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2"))
retriever = vectorstore.as_retriever(k=4)

In [6]:
from langchain_ollama import ChatOllama
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Define the prompt template for the LLM
prompt = PromptTemplate(
    template="""You are an assistant for question-answering tasks.
    Use the following documents to answer the question.
    If you don't know the answer, just say that you don't know.
    Use three sentences maximum and keep the answer concise:
    Question: {question}
    Documents: {documents}
    Answer:
    """,
    input_variables=["question", "documents"],
)

In [7]:
# Initialize Llama 3.1
llm = ChatOllama(
    model="llama3.1",
    temperature=0,
)

## Testing base model without RAG

In [8]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are an assistant for question-answering tasks.
            Use short sentences and keep the answer to a max of three sentences.
            Question: {question}
            Answer:""",
        ),
        ("human", "{question}"),
    ]
)

chain = prompt | llm
ai_msg = chain.invoke(
    {
        "question": "What is prompt-engineering?",
    }
)
ai_msg.content

"Prompt-engineering is the process of designing and crafting input prompts to elicit specific, accurate, and relevant responses from language models or AI systems. It involves understanding how to phrase questions, statements, or tasks in a way that maximizes the model's ability to provide helpful and informative answers. Effective prompt-engineering can significantly improve the quality and reliability of AI-generated output."

In [9]:
ai_msg = chain.invoke(
    {
        "question": "What is clorkimn?",
    }
)
ai_msg.content

'I couldn\'t find any information on "clorkimn". It\'s possible it\'s a misspelling or not a widely known term.'

As shown in the two prompts above it gives the normal definition for prompt-engineering and it does not know what the made up word "clorkimn" means.

## Testing with RAG

In [10]:
from langchain_ollama import ChatOllama
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Define the prompt template for the LLM
prompt = PromptTemplate(
    template="""You are an assistant for question-answering tasks.
    Use the documents in question and fall back on the main corpus of knowledge if the documents are insufficient.
    Do not mention when you fall back or if it comes from the documents or main corpus of knowledge.
    Use short sentences and keep the answer to a max of three sentences.
    Question: {question}
    Documents: {documents}
    Answer:
    """,
    input_variables=["question", "documents"],
)

In [11]:
# Create a chain combining the prompt template and LLM
rag_chain = prompt | llm | StrOutputParser()

In [12]:
# RAG application
class RAGApplication:
    def __init__(self, retriever, rag_chain):
        self.retriever = retriever
        self.rag_chain = rag_chain
    def run(self, question):
        # Retrieve documents
        documents = self.retriever.invoke(question)
        # Extract content documents
        doc_texts = "\\n".join([doc.page_content for doc in documents])
        # Invoke LLM
        answer = self.rag_chain.invoke({"question": question, "documents": doc_texts})
        return answer

In [13]:
# Initialize the RAG application
rag_application = RAGApplication(retriever, rag_chain)
# Example usage
question = "What is Prompt engineering"
answer = rag_application.run(question)
print("Question:", question)
print("Answer:", answer)

Question: What is Prompt engineering
Answer: Prompt engineering is a way of life that cultivates awareness, fosters creativity, and connects people through shared experiences. It involves harnessing the power of thoughts and feelings to connect with the universe by crafting effective questions. This practice unlocks the treasures of one's mind and reveals the richness of their thoughts and feelings.


In [14]:
question = "What is clorkimn?"
answer = rag_application.run(question)
print("Question:", question)
print("Answer:", answer)

Question: What is clorkimn?
Answer: Clorkimn is the color of time. It originates from the English words clorkimness, which is not a made-up term. Clorkimn is an uncommon word that was lost in translation but holds importance.


## DORA

In [None]:
import io
import requests
from PyPDF2 import PdfReader
headers = {'User-Agent': 'Mozilla/5.0 (X11; Windows; Windows x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36'}

url = 'https://www.url_of_pdf_file.com/sample.pdf'
response = requests.get(url=url, headers=headers, timeout=120)
on_fly_mem_obj = io.BytesIO(response.content)
pdf_file = PdfReader(on_fly_mem_obj)

In [36]:
# importing required classes
import requests
import io
from pypdf import PdfReader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document  # Import the Document class
headers = {'User-Agent': 'Mozilla/5.0 (X11; Windows; Windows x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36'}

doc_names = [
    "https://github.com/blueraincloud/blueraincloud.github.io/blob/main/resources/RAG/a.pdf?raw=true",
    "https://github.com/blueraincloud/blueraincloud.github.io/blob/main/resources/RAG/b.pdf?raw=true",
    "https://github.com/blueraincloud/blueraincloud.github.io/blob/main/resources/RAG/c.pdf?raw=true",
    "https://github.com/blueraincloud/blueraincloud.github.io/blob/main/resources/RAG/d.pdf?raw=true",
    "https://github.com/blueraincloud/blueraincloud.github.io/blob/main/resources/RAG/e.pdf?raw=true",
    "https://github.com/blueraincloud/blueraincloud.github.io/blob/main/resources/RAG/f.pdf?raw=true",
    "https://github.com/blueraincloud/blueraincloud.github.io/blob/main/resources/RAG/g.pdf?raw=true",
    "https://github.com/blueraincloud/blueraincloud.github.io/blob/main/resources/RAG/h.pdf?raw=true",
    "https://github.com/blueraincloud/blueraincloud.github.io/blob/main/resources/RAG/i.pdf?raw=true",
    "https://github.com/blueraincloud/blueraincloud.github.io/blob/main/resources/RAG/j.pdf?raw=true"
]

corpus = []
for adoc in doc_names:
    # Fetch and process PDF
    response = requests.get(url = adoc, headers=headers, timeout=120)
    on_fly_mem_obj = io.BytesIO(response.content)
    
    reader = PdfReader(on_fly_mem_obj)

    # Extract text
    # print(page.extract_text())

    docs_list = [Document(page_content=doc.extract_text()) for doc in reader.pages]
    corpus = corpus + docs_list
    

In [37]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=200, chunk_overlap=0
)
# Split the documents into chunks
doc_splits = text_splitter.split_documents(corpus)

In [38]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
import bs4

vectorstore = FAISS.from_documents(doc_splits, HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2"))
retriever = vectorstore.as_retriever(k=4)

In [39]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an assistant for question-answering tasks. \
            Use the following documents to answer the question.\
            If you don't know the answer, just say that you don't know.\
            In general you are dealing with regulatory documents and often it is verbose. Giving a summary would be helpful.\
            Each document typicall has a background and rational and a technical standards section.\
            WHen giving answer back focus on the content in the technical standards section.\
            THe background and rational and other sections are not as important:\
            Question: {question}\
            Answer:",
        ),
        ("human", "{question}"),
    ]
)

chain = prompt | llm
ai_msg = chain.invoke(
    {
        "question": "What is DORA act about?",
    }
)
ai_msg.content

'I don\'t have any documents related to a "DORA act". Could you please provide more context or information about what DORA act refers to? I\'ll do my best to find relevant documents and answer your question. \n\nHowever, after some research, I found that the Digital Operational Resilience Act (DORA) is a proposed EU regulation aimed at strengthening the operational resilience of financial institutions. If this is the correct context, please let me know and I can try to provide more information based on available documents.\n\nPlease note that my previous response was incorrect, and I\'m trying to correct it now.'

In [40]:
prompt = PromptTemplate(
    template="""You are an assistant for question-answering tasks. \
            Use the following documents to answer the question.\
            If you don't know the answer, just say that you don't know.\
            In general you are dealing with regulatory documents and often it is verbose. Giving a summary would be helpful.\
             Each document typicall has a section where responds give their feedback and a section where the actual draft of standards are given.\
            When summarizing this information only consider the actual standards that are set and ignore the other sections:\
    Question: {question}
    Documents: {documents}
    Answer:
    """,
    input_variables=["question", "documents"],
)

In [41]:
# Create a chain combining the prompt template and LLM
rag_chain = prompt | llm | StrOutputParser()

rag_application = RAGApplication(retriever, rag_chain)
# Example usage
question = "What is DORA act about?"
answer = rag_application.run(question)
print("Question:", question)
print("Answer:", answer)

Question: What is DORA act about?
Answer: Based on the provided documents, I can summarize that DORA is about regulations related to the oversight of financial entities and ICT third-party service providers.

The actual standards set by DORA include:

* Tests organized at the level of a financial entity by the TLPT authority of its home Member State (point 75)
* Requirements for competent authorities in relation to the joint examination team (point c)

These regulations are outlined in two separate Regulatory Technical Standards (RTS) under DORA.

As for what DORA is about, I can provide a brief summary:

DORA appears to be an act that regulates the oversight of financial entities and ICT third-party service providers. It sets standards for tests and examinations to be conducted by competent authorities, with a focus on ensuring the stability and security of the financial system.


In [42]:
# Example usage
question = "Can you give me a one page summary of the documents"
answer = rag_application.run(question)
print("Question:", question)
print("Answer:", answer)

Question: Can you give me a one page summary of the documents
Answer: Here is a one-page summary of the documents:

**Summary**

The European Securities and Markets Authority (ESAs) has made some changes to provide more clarity in the draft regulatory technical standards (RTS). The main points are:

* **Electronic format**: The report must be in a searchable electronic format, but no specific document type is mandated.
* **Content requirements**: The report should include minimum elements, but entities can add other useful information as long as they cover the required content.
* **Contractual structure and documentation**: Option A has been retained, which prescribes fields for contractual structure (documentation management).

**Key Changes**

The ESAs have introduced some changes to provide more clarity, including:

* Deleting unnecessary text
* Providing more flexibility in electronic format requirements
* Emphasizing that the report is not an exhaustive list, but rather a minimum 

In [43]:
# Example usage
question = "What is the register of information"
answer = rag_application.run(question)
print("Question:", question)
print("Answer:", answer)

Question: What is the register of information
Answer: Based on the documents provided, here is a summary of what I found regarding the "register of information":

**Summary:** The register of information is composed of 15 templates that are linked together using relational keys. The templates cover three purposes: (i) ICT risk management; (ii) reporting and disclosure; and (iii) supervision.

**Key Components:**

1. **Templates**: There are 15 templates in total, which are linked to each other using relational keys.
2. **Relational Keys**: Some of the relational keys used include:
	* Contractual arrangement reference number
	* LEI (Legal Entity Identifier) of the entity making use of ICT services
	* ICT third-party service provider identifier
	* Function identifier
	* Type of ICT services (provided in Annex III)
3. **Purpose**: The register of information serves three purposes:
	* ICT risk management
	* Reporting and disclosure
	* Supervision

**Standards:**

1. Financial entities must

In [None]:
# Example usage
question = "What can you tell me about the standards on classifying ICT related incidients"
answer = rag_application.run(question)
print("Question:", question)
print("Answer:", answer)

In [None]:
# Example usage
question = "What can you tell me about the RTS on ICT services supporting critical or important functions"
answer = rag_application.run(question)
print("Question:", question)
print("Answer:", answer)

In [None]:
# Example usage
question = "What can you tell me about the risk management framework and simplified risk management framework. What are the main differences? Ignore the correspondance and give me the standards"
answer = rag_application.run(question)
print("Question:", question)
print("Answer:", answer)

In [None]:
# Example usage
question = "TLPT is threat led penetration testing. What can you tell me about the TLPT. What are the main differences?"
answer = rag_application.run(question)
print("Question:", question)
print("Answer:", answer)

In [None]:
# Example usage
question = "How does DORA treat estimation of aggregated annual costs and losses caused by major ICT-related incidents?"
answer = rag_application.run(question)
print("Question:", question)
print("Answer:", answer)




In [None]:
# Example usage
question = "What does DORA say about third-party risk management and contract management?"
answer = rag_application.run(question)
print("Question:", question)
print("Answer:", answer)




## Next steps

* Evaluation metrics
* Fine-tuning comparison

## References

 - https://github.com/meta-llama/llama-recipes/blob/main/recipes/quickstart/Getting_to_know_Llama.ipynb
 - https://www.datacamp.com/tutorial/llama-3-1-rag