# Implementing Basic RAG QA with Langchain
This notebook demonstrates how to implement a basic Retrieval-Augmented Generation (RAG) chain using our FAQ data. The overall approach is as follows:
1. Load FAQ Data & Structure Metadata
2. Create Embeddings for the FAQ Data using Amazon's Titan Embeddings model, and save these embeddings locally using Chroma.
3. Create a Question Answering (QA) chain which retrieves context based on the embeddings saved in Chroma, serving these as context to the Amazon Titan Express LLM to answer the provided user prompt.
4. Format the response so that source materials can be cited.

This implementation mostly follows these Langchain tutorials:
- https://python.langchain.com/docs/modules/data_connection/document_loaders/json#using-jsonloader

In [1]:
# Define metadata extraction function so we can filter on sections and return deep links as sources
def metadata_func(record: dict, metadata: dict) -> dict:
    metadata["section"] = record.get("section")
    metadata["source"] = record.get("deep_link")
    
    return metadata

In [2]:
# Import JSON FAQ File using JSONLoader
from langchain_community.document_loaders import JSONLoader
from pprint import pprint

file_path='../data/processed/faq_data/PS_NSYR.json'

loader = JSONLoader(
    file_path=file_path,
    jq_schema=".[]",
    content_key="answer",
    metadata_func=metadata_func
)

data = loader.load()

In [3]:
from langchain_community.embeddings import BedrockEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = BedrockEmbeddings(
    model_id="amazon.titan-embed-text-v1", region_name="us-east-1"
)
persistDirectory = './data/processed/faq_data/vectordata/PS_NSYR'
vectorstore = Chroma.from_documents(
    documents=data, embedding=embeddings, persist_directory=persistDirectory)
vectorstore.persist()

Run the cells below to re-use already-generated vector embeddings.

In [4]:
from langchain_community.embeddings import BedrockEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = BedrockEmbeddings(
    model_id="amazon.titan-embed-text-v1", region_name="us-east-1"
)

persistDirectory = './data/processed/faq_data/vectordata/PS_NSYR'
# Retrieve and generate answers using relevant FAQs
vectordb = Chroma(persist_directory=persistDirectory,
                  embedding_function=embeddings)

retriever = vectordb.as_retriever()

In [5]:
from utils import CreateInferenceModifier # Import the function from utils.py
from langchain import hub
from langchain_community.llms import Bedrock
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate

prompt_template = """Drawing from your knowledge base, answer the question below.
If you don't know the answer from the provided context, tell me that your training materials don't include this information.
Keep the answer as concise and relevant to the question as possible.

Knowledge Base: {context}

Question: {question}

Helpful Answer:"""
custom_rag_prompt = PromptTemplate.from_template(prompt_template)


# Define the universal set of modifier parameters
modifiers = {"max_tokens": 4000,
             "temperature": 0.2,
             "top_k": 250,
             "top_p": 1,
             "stop_sequences": ["\n\nHuman"],
             }


llm = Bedrock(model_id="anthropic.claude-v2:1",
              model_kwargs=CreateInferenceModifier("claude", params=modifiers))

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

from langchain_core.runnables import RunnableParallel

rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)

rag_chain_with_source = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
).assign(answer=rag_chain_from_docs)

In [6]:
# Define a function to extract unique URLs used in the retrieved source materials.
from IPython.display import Markdown, display
def extract_unique_urls(response):
    unique_urls = set()  # Use a set to store unique URLs
    
    # Iterate through each document in the 'context'
    for document in response['context']:
        source_url = document.metadata['source']  # Extract the 'source' URL
        unique_urls.add(source_url)  # Add the URL to the set
    
    # Convert the set of unique URLs to a string
    urls_string = '; '.join(unique_urls)
    
    return urls_string


# Invoke the chain and print the response and sources.
response = rag_chain_with_source.invoke(
    "زه به د ترکیې پناه غوښتنې لپاره کوم ګامونه اخیستل ضروري دي؟")
answer = response["answer"]
prettyResponse = f"## Answer: \n\n {answer} \n\n ## Sources: \n\n {extract_unique_urls(response)}"
Markdown(prettyResponse)

## Answer: 

  زما د زده کړې موادو له مخې، په ترکیه کې د پناه غوښتنې لپاره باید دا ګامونه واخیستل شي:

<p><span style="font-weight: 400;">که تاسو د سوریې هیواد اتباع یاست نو د لنډمهاله خوندیتوب لپاره ځان راجستر کړئ او که چیرته تاسو د سوریې پرته د کوم بل هیواد اتباع یاست نو باید د نړیوال خوندیتوب لپاره ځان راجستر کړئ.</span></p>

<p><span style="font-weight: 400;">په ترکیه کې د لنډمهاله خوندیتوب یا نړیوال خوندیتوب لپاره د غوښتنلیک سپارلو وروسته، تاسو حق لرئ چې په جبري توګه بیرته خپل اصلي هیواد ته نه واستول شئ (یا ډیپورټ نه شئ). دا پداسې حال کې ده چې ستاسو غوښتنلیک ته به بیاکتنه وشي او تر هغه چې دا په پای کې رد شوی نه وي.</span></p>

<p><span style="font-weight: 400;">په ورته ډول، که تاسو د لنډمهاله محافظت یا نړیوال محافظت لاندې نوم لیکنه وکړئ تاسو حق لرئ چې په قانوني توګه په ترکیه کې د لنډمهاله محافظت پیژندنې سند یا د نړیوال محافظت غوښتونکي سند سره پاتې شئ.</span></p> 

 ## Sources: 

 https://multecihaklari.info/ps/services/%d8%af-%d8%a8%db%90-%d8%b3%d8%b1%d9%be%d8%b1%d8%b3%d8%aa%d9%87-%da%a9%d9%88%da%86%d9%86%db%8c%d8%a7%d9%86%d9%88-%d9%84%d9%be%d8%a7%d8%b1%d9%87-%d8%ad%d9%82%d9%88%d9%82-%d8%a7%d9%88-%d9%be%d8%b1%d9%88/?section=questions&question=18; https://multecihaklari.info/ps/services/%d8%af-%d8%a8%db%90-%d8%b3%d8%b1%d9%be%d8%b1%d8%b3%d8%aa%d9%87-%da%a9%d9%88%da%86%d9%86%db%8c%d8%a7%d9%86%d9%88-%d9%84%d9%be%d8%a7%d8%b1%d9%87-%d8%ad%d9%82%d9%88%d9%82-%d8%a7%d9%88-%d9%be%d8%b1%d9%88/?section=questions&question=21; https://multecihaklari.info/ps/services/%d8%af-%d8%a8%db%90-%d8%b3%d8%b1%d9%be%d8%b1%d8%b3%d8%aa%d9%87-%da%a9%d9%88%da%86%d9%86%db%8c%d8%a7%d9%86%d9%88-%d9%84%d9%be%d8%a7%d8%b1%d9%87-%d8%ad%d9%82%d9%88%d9%82-%d8%a7%d9%88-%d9%be%d8%b1%d9%88/?section=questions&question=11; https://multecihaklari.info/ps/services/rent-and-property-6/?section=questions&question=17

In [7]:
def query_asylum_system(question):
    from IPython.display import Markdown, display
    # Invoke the LLM chain with the provided question
    response = rag_chain_with_source.invoke(question)

    # Extract the answer from the response
    answer = response["answer"]

    # Format the response using Markdown
    pretty_response = f"## Answer: \n\n{answer}\n\n## Sources:\n\n{extract_unique_urls(response)}"
    return pretty_response

In [9]:
Markdown(query_asylum_system(
    "په ترکیه کې د پناه غوښتنلیک پروسه عموما څومره وخت نیسي؟"))

## Answer: 

 <p><span style="font-weight: 400;">په ترکیه کې د پناه غوښتنلیک پروسه کېدای شي چې څو میاشتې یا کلونه ونیسي، په تابع د ستاسو د هیواد او شخصي حالت سره.</span></p>

<p><span style="font-weight: 400;">که چیرې تاسو د سوریې اتباع یاست، تاسې به د لنډمهاله خوندیتوب لپاره غوښتنلیک وړاندې کړئ. د لنډمهاله خوندیتوب د غوښتنلیک پروسه کېدای شي چې څو میاشتې ونیسي.</span></p>

<p><span style="font-weight: 400;">که چیرې تاسو د سوریې پرته د نورو هیوادونو اتباع یاست، تاسو به د نړیوال خوندیتوب لپاره غوښتنلیک وړاندې کړئ. د نړیوال خوندیتوب د غوښتنلیک پروسه کېدای شي چې څو کلونه ونیسي.</span></p>

<p><span style="font-weight: 400;">په عمومي توګه، د پناه غوښتنې پروسه کېدای شي چې لنډه وي که چیرې ستاسو غوښتنلیک ډېر قوي مدارک ولري او ستاسو د هیواد حالت داسې وي چې تاسو ته د پناه ورکولو د کنوانسیون د معیارونو سره سمون وکړي. برعکس، که چیرې ستاسو حالت پیچلی وي او/یا ستاسو غوښتنلیک کمزوری مدارک ولري، د پروسې موده کېدای شي چې اوږده وي.</span></p>

<p><span style="font-weight: 400;">په هر صورت کې، تاسو کولای شئ د خپلې پروسې پر مهال د خپل مشاور څخه منظمه معلومات ترلاسه کړئ. دوی کولی شي ستاسو سره مرسته وکړي چې پوه شئ ستاسو پروسه چیرته پر مخ وړي او کوم احتمالي ګامونه باید واخیستل شي.</span></p>

## Sources:

https://multecihaklari.info/ps/services/%d8%af-%d8%a8%db%90-%d8%b3%d8%b1%d9%be%d8%b1%d8%b3%d8%aa%d9%87-%da%a9%d9%88%da%86%d9%86%db%8c%d8%a7%d9%86%d9%88-%d9%84%d9%be%d8%a7%d8%b1%d9%87-%d8%ad%d9%82%d9%88%d9%82-%d8%a7%d9%88-%d9%be%d8%b1%d9%88/?section=questions&question=18; https://multecihaklari.info/ps/services/%d8%af-%d8%a8%db%90-%d8%b3%d8%b1%d9%be%d8%b1%d8%b3%d8%aa%d9%87-%da%a9%d9%88%da%86%d9%86%db%8c%d8%a7%d9%86%d9%88-%d9%84%d9%be%d8%a7%d8%b1%d9%87-%d8%ad%d9%82%d9%88%d9%82-%d8%a7%d9%88-%d9%be%d8%b1%d9%88/?section=questions&question=21; https://multecihaklari.info/ps/services/%d8%af-%d8%a8%db%90-%d8%b3%d8%b1%d9%be%d8%b1%d8%b3%d8%aa%d9%87-%da%a9%d9%88%da%86%d9%86%db%8c%d8%a7%d9%86%d9%88-%d9%84%d9%be%d8%a7%d8%b1%d9%87-%d8%ad%d9%82%d9%88%d9%82-%d8%a7%d9%88-%d9%be%d8%b1%d9%88/?section=questions&question=15; https://multecihaklari.info/ps/services/%d8%af-%d8%a8%db%90-%d8%b3%d8%b1%d9%be%d8%b1%d8%b3%d8%aa%d9%87-%da%a9%d9%88%da%86%d9%86%db%8c%d8%a7%d9%86%d9%88-%d9%84%d9%be%d8%a7%d8%b1%d9%87-%d8%ad%d9%82%d9%88%d9%82-%d8%a7%d9%88-%d9%be%d8%b1%d9%88/?section=questions&question=12