# Retrieval Augmented Generation (RAG)

This notebook demonstrates an example of using [LangChain](https://www.langchain.com/) to delvelop a Retrieval Augmented Generation (RAG) pattern. It uses Azure AI Document Intelligence as document loader, which can extracts tables, paragraphs, and layout information from pdf, image, office and html files. The output markdown can be used in LangChain's markdown header splitter, which enables semantic chunking of the documents. Then the chunked documents are indexed into Azure AI Search vectore store. Given a user query, it will use Azure AI Search to get the relevant chunks, then feed the context into the prompt with the query to generate the answer.

![Semantic chunking in RAG](../media/semantic-chunking-rag.png)


## Prerequisites
- An Azure AI Document Intelligence resource in one of the 3 preview regions: **East US**, **West US2**, **West Europe** - follow [this document](https://learn.microsoft.com/azure/ai-services/document-intelligence/create-document-intelligence-resource?view=doc-intel-4.0.0) to create one if you don't have.
- An Azure AI Search resource - follow [this document](https://learn.microsoft.com/azure/search/search-create-service-portal) to create one if you don't have.
- An Azure OpenAI resource and deployments for embeddings model and chat model - follow [this document](https://learn.microsoft.com/azure/ai-services/openai/how-to/create-resource?pivots=web-portal) to create one if you don't have.

## Setup

In [18]:
# pip install python-dotenv langchain langchain-community langchain-openai langchainhub openai tiktoken azure-ai-documentintelligence azure-identity azure-search-documents==11.4.0b8

In [19]:
"""
This code loads environment variables using the `dotenv` library and sets the necessary environment variables for Azure services.
The environment variables are loaded from the `.env` file in the same directory as this notebook.
"""
import os
from dotenv import load_dotenv

load_dotenv()

os.environ["AZURE_OPENAI_ENDPOINT"] = os.getenv("AZURE_OPENAI_ENDPOINT")
os.environ["AZURE_OPENAI_API_KEY"] = os.getenv("AZURE_OPENAI_API_KEY")
doc_intelligence_endpoint = os.getenv("AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT")
doc_intelligence_key = os.getenv("AZURE_DOCUMENT_INTELLIGENCE_KEY")


In [20]:
from langchain import hub
from langchain_openai import AzureChatOpenAI
from langchain_community.document_loaders import AzureAIDocumentIntelligenceLoader
from langchain_openai import AzureOpenAIEmbeddings
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.text_splitter import MarkdownHeaderTextSplitter
from langchain.vectorstores.azuresearch import AzureSearch

## Load a document

In [21]:
loader = AzureAIDocumentIntelligenceLoader(file_path="data/BD-D100_D120GV_XGV.pdf", api_key = doc_intelligence_key, api_endpoint = doc_intelligence_endpoint, api_model="prebuilt-layout")
docs = loader.load()

## Split it into semantic chunks

In [22]:
# Split the document into chunks base on markdown headers.
headers_to_split_on = [
    ("#", "Header 1"),
    ("##", "Header 2"),
    ("###", "Header 3"),
]
text_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)

docs_string = docs[0].page_content
splits = text_splitter.split_text(docs_string)

print("Length of splits: " + str(len(splits)))

Length of splits: 107


## Embed and index the chunks

In [23]:
# Embed the splitted documents and insert into Azure Search vector stor

aoai_embeddings = AzureOpenAIEmbeddings(
    azure_deployment="text-embedding-ada-002",
    openai_api_version="2024-02-15-preview",  # e.g., "2023-12-01-preview"
)

vector_store_address: str = os.getenv("AZURE_SEARCH_ENDPOINT")
vector_store_password: str = os.getenv("AZURE_SEARCH_ADMIN_KEY")

index_name: str = "washer-index-base"
vector_store: AzureSearch = AzureSearch(
    azure_search_endpoint=vector_store_address,
    azure_search_key=vector_store_password,
    index_name=index_name,
    embedding_function=aoai_embeddings.embed_query,
)

In [24]:
vector_store.add_documents(documents=splits)

['YjdjZTBmZDgtYzIyNC00ZTRiLWI4NWQtM2RmNTc0Nzg3YTFh',
 'YWZmNmY2OTEtOTY2MS00YmMxLWEwNDEtMWEyMjAwM2IwNzEz',
 'YmE3ZjJmYmYtN2QzMS00Yjc5LTkzOGItY2EyNDlhMjQ0ZmQy',
 'NjdkMDkxMWItZWRkNy00YTFhLWIyM2ItZDgyZmY1MmQyMmY2',
 'MzZhMzU5NGMtZDVmNC00YWZjLWJmNjgtMjkyNDFkNzYyYWJi',
 'NzUzYTFkZGItN2M1OS00NDBlLWJjY2ItZTA5NTBmODVjNmMx',
 'YTBlNjZkZWUtNzdmNi00ZmIyLWFkMzgtMjNmZjAyOGJkNTA1',
 'MjA0MWE3N2YtMzcwMi00OWIzLWIwMDktZmIyNmIwYTYzZDhi',
 'ODY5NDAwOGYtYzYzMy00ZmVkLWI2NmItZmE3YTI3ZjYxMmI5',
 'NGZjZjU0MGYtMDY5NC00NGE4LWI5ZGQtM2U4MDFjNDIzNDg4',
 'Njc1ODEyZmQtOThjMy00NWM3LTk3MjItMTNlOWUwOWRhZDlk',
 'MjE5NGNhZTUtZjNlNy00OTM2LWExZDEtYjI0NDdiNDNjOTU1',
 'YzVmMjBjNTAtOTc2NS00YzRiLTg2ZmUtNmRkZDU3ODM1YWJk',
 'YTA5MjUwMjktMTc2ZS00NTE4LWEzNjctMjE2ZTZiNWI4MTNh',
 'MDkzYjU4NTQtZDQxMi00Mzk0LTliYTgtOGJhNTY5ZjdhZDJj',
 'ZmVmM2U5NWMtY2QzYy00NDJhLTgxZDItMWMwMDAwYzg1YTE4',
 'MjgxYjRiYjUtZWM3ZC00YTM5LWIwODEtZTIwZjNhMDUzMDY5',
 'ODQzNDBlNGUtYjE5MS00YjhiLWE5YWMtMWVkYmYwZTY5OGY2',
 'ZDY2MGIzOTQtNTI5OS00NzU5LThmOTAtNDA0YmQ5MGVk

## Retrive relevant chunks based on a question

In [26]:
# Retrieve relevant chunks based on the question

retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 3})

retrieved_docs = retriever.get_relevant_documents(
    "can i repair the appliance on my own?"
)

print(retrieved_docs[0].page_content)


Do not dismantle, repair or modify the appliance.  
\> This could cause malfunction, fire, electric shock or injury.  
\> For repair, please contact your local service center.  
Do not put fire sources or anything inflammable in or near the washer dryer (gas, diesel, petrol, thinner, alcohol or clothes stained with these substances).  
\> This could cause an explosion or fire.  
Do not pour water on the machine when it is running.  
\> This could cause an electric shock.  
Do not climb on the washer dryer.  
\> This may cause injury.  
Do not put heavy objects on the washer dryer.  
\> This may damage the surface of the worktop.  
It is a malfunction if the drum is still rotating when the door is open. In this case never touch the drum nor laundry inside and stop using the washer dryer immediately, and contact your local service center for repair.  
This appliance is not intended for use by persons (including children) with reduced physical, sensory or mental capabilities, or lack of e

## Document Q&A

In [27]:
# Ask a question about the document

# Use a prompt for RAG that is checked into the LangChain prompt hub (https://smith.langchain.com/hub/rlm/rag-prompt?organizationId=989ad331-949f-4bac-9694-660074a208a7)
prompt = hub.pull("rlm/rag-prompt")
llm = AzureChatOpenAI(
    openai_api_version="2024-02-15-preview",  # e.g., "2023-12-01-preview"
    azure_deployment="gpt-4-turbo",
    temperature=0,
)


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("can i repair the appliance on my own?")

'Based on the provided context, you should not attempt to repair the appliance on your own. The instructions explicitly advise against dismantling, repairing, or modifying the appliance due to risks such as malfunction, fire, electric shock, or injury. For repairs, you are instructed to contact your local service center.'

## Doucment Q&A with references

In [28]:
# Return the retrieved documents or certain source metadata from the documents

from operator import itemgetter

from langchain.schema.runnable import RunnableMap

rag_chain_from_docs = (
    {
        "context": lambda input: format_docs(input["documents"]),
        "question": itemgetter("question"),
    }
    | prompt
    | llm
    | StrOutputParser()
)
rag_chain_with_source = RunnableMap(
    {"documents": retriever, "question": RunnablePassthrough()}
) | {
    "documents": lambda input: [doc.metadata for doc in input["documents"]],
    "answer": rag_chain_from_docs,
}

rag_chain_with_source.invoke("can i repair the appliance on my own?")

{'documents': [{'Header 2': 'General safety'},
  {'Header 2': 'Customer Service'},
  {'Header 2': 'Installation'}],
 'answer': 'Based on the provided context, you should not attempt to repair the appliance on your own. The instructions explicitly state not to dismantle, repair, or modify the appliance and to contact your local service center for repairs. Attempting to repair it yourself could cause malfunction, fire, electric shock, or injury.'}

In [29]:
rag_chain_with_source.invoke("can i lift the appliance with one hand?")

{'documents': [{'Header 2': 'General safety'},
  {'Header 2': 'Safety Instructions'},
  {'Header 2': 'Interior lighting setting (For model BD-D120XGV/BD-D100XGV)',
   'Header 3': 'Emergency door release.'}],
 'answer': "The provided context does not specify the weight of the appliance or whether it is designed to be lifted with one hand. Therefore, based on the information given, I cannot determine if you can lift the appliance with one hand. It is generally recommended to follow the manufacturer's guidelines for moving appliances to avoid injury or damage."}

In [30]:
rag_chain_with_source.invoke("where is water supply hose located?")

{'documents': [{'Header 2': 'Hose and cable length'},
  {'Header 2': '. Connection on the right side'},
  {'Header 2': '1', 'Header 3': '2'}],
 'answer': 'The water supply hose is located at the rear panel of the washer dryer. It should be connected to a horizontal tap and firmly attached to ensure water-tightness. The hose connector and rubber seal are integral parts of the connection to the rear panel.'}

In [31]:
rag_chain_with_source.invoke("How do i pair the appliance with phone?")

{'documents': [{'Header 2': 'Use with smartphones',
   'Header 3': 'About the App function'},
  {'Header 2': '5\\. Proceed to connect (pairing) the washer dryer to your smartphone.'},
  {'Header 2': 'Operate the application on the smartphone.'}],
 'answer': 'To pair the appliance with your phone, you can choose either the "Simple connection" or "Manual connection" method. For the Simple connection, operate the app on your smartphone, check your wireless LAN router connection, tap "Pairing" followed by "Simple connection," and then follow the on-screen instructions to connect and set up the appliance. For the Manual connection, tap "Pairing" followed by "Manual connection," enter your router\'s password, and follow the on-screen instructions to connect and set up the appliance.'}

In [32]:
rag_chain_with_source.invoke("what is max load for AI wash program for washing and drying?")

{'documents': [{'Header 2': 'To set and unset fabric softener'},
  {'Header 2': 'Hints and Tips for Eco-friendly Washing'},
  {'Header 2': 'Operation:'}],
 'answer': 'The maximum load for the AI Wash program for washing is 12 kg for model BD-D120XGV/BD-D120GV and 10 kg for model BD-D100XGV/BD-D100GV. For washing and drying, the maximum load is 8 kg for model BD-D120XGV/BD-D120GV and 7 kg for model BD-D100XGV/BD-D100GV, with a lower temperature limit of 4 kg for both models.'}

In [33]:
rag_chain_with_source.invoke("what type of laundry i can use for AI wash program?")

{'documents': [{'Header 2': 'To set and unset fabric softener'},
  {'Header 2': 'Hints and Tips for Eco-friendly Washing'},
  {'Header 2': 'Operation:'}],
 'answer': 'The AI Wash program is designed for washing your usual laundry. It automatically adjusts the washing method and operating time based on various conditions such as the type of detergent, type of clothing, water hardness, and load size. You should keep to the load limit specified for your model, which is 12 kg for the BD-D120XGV/BD-D120GV model and 10 kg for the BD-D100XGV/BD-D100GV model.'}

In [34]:
rag_chain_with_source.invoke("For AI wash program what is suggested spin setting for washing?")

{'documents': [{'Header 2': 'To set and unset fabric softener'},
  {'Header 2': 'Hints and Tips for Eco-friendly Washing'},
  {'Header 2': 'Operation:'}],
 'answer': 'The provided context does not specify the exact suggested spin setting for the AI Wash program. However, it mentions that the washer dryer has multiple sensors to automatically control the washing method and operating time based on various conditions. To set the program, one would select the AI Wash program and start the operation, allowing the machine to adjust settings automatically.'}