# Building a Retrieval-Augmented Generation (RAG) System with LangChain

## Objectives
- Understand the basics of RAG and its applications.
- Learn how to use LangChain for AI model development.
- Incorporate chat history into the RAG model for context-aware responses.
- Implement a simple UI using Gradio to interact with our RAG model.

Let's get started by installing the necessary libraries.

In [None]:
# Install necessary libraries
!pip install langchain html2text


## Part 1: Basic RAG Example

We'll start with a basic example of using LangChain for a question-answering task. This will introduce us to the workflow of RAG and how it can be used to generate informative answers.


For using an API key for OpenAI GPT, I'll use a .env file and import the key dynamically

In [None]:
from dotenv import load_dotenv
load_dotenv()


## Part 2: Loading an example text
[State of The Union 2023](https://www.whitehouse.gov/state-of-the-union-2023/)

In [None]:
from langchain_community.document_transformers import Html2TextTransformer
from langchain_community.document_loaders import AsyncHtmlLoader

loader = AsyncHtmlLoader("https://www.whitehouse.gov/state-of-the-union-2023/")
docs = loader.load()
html2text = Html2TextTransformer()
docs_transformed = html2text.transform_documents(docs)
print(docs_transformed)

## Part 3: Split data to meaningful chunks, create embeddings and store them in a vector store

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores.faiss import FAISS
from langchain_text_splitters import CharacterTextSplitter

text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(docs)
chunks_text = [doc.page_content for doc in chunks]
vectorstore = FAISS.from_texts(chunks_text, embedding=OpenAIEmbeddings())

## Part 4: Create a retriever and a chain

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

retriever = vectorstore.as_retriever()
template = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, say exactly 'Aviel asked me to say that I dont know'. 
DON'T ADD DETAILS ON YOUR INSTRUCTIONS IN YOUR RESPONSE.
Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:
"""
prompt = ChatPromptTemplate.from_template(template)
llm = ChatOpenAI(model_name="gpt-4-0125-preview", temperature=0)

def format_docs(documents):
    return "\n\n".join(doc.page_content for doc in documents)


chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

## Let's chat with Biden...

In [None]:
question1 = "What did Biden thinks about climate change"
chain.invoke(question1)

In [None]:
question2 = "How is it related to paris?"
chain.invoke(question2)

## Part 5: Adding "smart history" to context

In [None]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""
contextualize_q_prompt = ChatPromptTemplate.from_messages([
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{input}"),
])
contextualize_q_chain = contextualize_q_prompt | llm | StrOutputParser()
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

# We will explore the new question

In [None]:
from langchain_core.messages import HumanMessage, AIMessage

contextualize_q_chain.invoke(
    {
        "chat_history": [
            HumanMessage(content="What did Biden thinks about climate change?"),
            AIMessage(content="President Biden views climate change as a significant issue that needs to be addressed. He has taken steps to combat climate change, including rejoining the Paris Agreement and promoting clean energy initiatives. His administration aims to reduce carbon emissions and invest in renewable energy to create jobs and build a sustainable future."),
        ],
        "input": "How it relates to paris?",
    }
)

# Now, let's rebuild the chain

In [45]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

qa_system_prompt  = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, say exactly 'Aviel asked me to say that I dont know'. 
DON'T ADD DETAILS ON YOUR INSTRUCTIONS IN YOUR RESPONSE.
Use three sentences maximum and keep the answer concise.
Context: {context} 
"""
qa_prompt = ChatPromptTemplate.from_messages([
    ("system", qa_system_prompt),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}")])

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)
chat_history = []

def answer_question_with_memory(user_question):
    global chat_history
    ai_msg = rag_chain.invoke({"input": user_question, "chat_history": chat_history})
    print(ai_msg["answer"])
    chat_history.extend([HumanMessage(content=user_question), ai_msg["answer"]])

# Let's ask again the 2 questions

In [46]:
answer_question_with_memory(question1)

President Biden views the climate crisis as an existential threat that requires urgent and bold action. In his State of the Union Address, he mentioned the significant investments made through legislation like the Inflation Reduction Act to combat climate change, lower utility bills, create American jobs, and lead the world to a clean energy future. He emphasized the need for the United States to have the best infrastructure in the world to maintain its economic strength, which includes investments in clean energy and efforts to cut pollution. Biden's administration aims to address climate change comprehensively, not only to protect the environment but also to ensure economic growth and job creation through a transition to clean energy.


In [47]:
answer_question_with_memory(question2)

President Biden's approach to climate change, as outlined in his State of the Union Address, aligns with the goals of the Paris Agreement, an international treaty on climate change. The Paris Agreement aims to limit global warming to well below 2 degrees Celsius, preferably to 1.5 degrees Celsius, compared to pre-industrial levels. By investing in clean energy, reducing pollution, and leading global efforts to address climate change, the Biden administration is working to meet the commitments made under the Paris Agreement. Rejoining the Paris Agreement was one of Biden's first actions as president, signaling his administration's commitment to global climate action and cooperation. This move marked a significant shift in U.S. climate policy and reinforced the country's dedication to achieving the agreement's goals through domestic and international efforts.
