# Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) is a powerful approach that combines retrieval-based methods with generative models to provide contextually relevant and informative answers. In this notebook, we use LangChain's ecosystem to set up a conversational RAG system that uses documents stored as embeddings for rapid retrieval and accurate responses.

## Importing the Libraries

In [1]:
from langchain_community.chat_models import ChatOllama
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain_community.vectorstores import FAISS
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.chains import create_history_aware_retriever
from langchain_core.messages import HumanMessage
import os
import asyncio
import nest_asyncio

## Allow nested asynchronous loops

Jupyter notebooks already have an event loop running in the background, making it challenging to run asynchronous code directly. `nest_asyncio.apply()` resolves this by allowing asynchronous code to run within a notebook cell, even if the loop is already active.

In [2]:
nest_asyncio.apply()

## The `get_conversational_answer` Function

### Contextualizing the Question
- The function starts by setting up a `contextualize_q_system_prompt`, which is a system instruction that reformulates the user's question based on the chat history. This step ensures that questions referencing past conversation context are rewritten as standalone questions that can be understood without that context.
- The prompt is then fed into a `ChatPromptTemplate`, which organizes the messages for the language model. It includes placeholders for the system message, chat history, and user input.

### Creating a History-Aware Retriever
- `mistral:7b` is initialized as the LLM.
- Using this LLM, `create_history_aware_retriever` is called, which combines the LLM with a retriever (a tool that fetches relevant documents). This retriever will be context-aware, ensuring conversational flow.

### Setting Up the Question-Answering (QA) System Prompt
- The `qa_system_prompt` is another system message that directs the assistant to answer the question concisely and to only respond if it has enough information.
- A second `ChatPromptTemplate` is created to format these QA instructions, integrating context, chat history, and user input.

### Creating the RAG Chain
- A `question_answer_chain` is created using `create_stuff_documents_chain`, which combines the LLM and the QA prompt. This chain processes the retrieved documents (context) and provides answers.
- Next, `create_retrieval_chain` links the history-aware retriever and the question-answering chain to form a RAG pipeline. The pipeline retrieves relevant context from documents and uses it to generate concise and precise answers.

### Generating the Answer
- The `rag_chain.invoke` method is called with the user input and chat history, returning a response (`ai_msg`) from the RAG pipeline. This response is structured to provide clear, contextually accurate answers based on both the user’s question and the retrieved documents.


In [3]:
async def get_conversational_answer(retriever, input, chat_history):
    contextualize_q_system_prompt = """Given a chat history and the latest user question \
    which might reference context in the chat history, formulate a standalone question \
    which can be understood without the chat history. Do NOT answer the question, \
    just reformulate it if needed and otherwise return it as is."""
    contextualize_q_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", contextualize_q_system_prompt),
            MessagesPlaceholder("chat_history"),
            ("human", "{input}"),
        ]
    )

    llm = ChatOllama(model="mistral:7b")

    history_aware_retriever = create_history_aware_retriever(
        llm, retriever, contextualize_q_prompt
    )

    qa_system_prompt = """You are an assistant for question-answering tasks. \
    Use the following pieces of retrieved context to answer the question. \
    If you don't know the answer, just say that you don't know. \
    Use three sentences maximum and keep the answer concise.\
    Do not generate any additional text unless you are asked to.\
    Keep the answers really short and concise.\

    {context}"""
    qa_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", qa_system_prompt),
            MessagesPlaceholder("chat_history"),
            ("human", "{input}"),
        ]
    )

    question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
    rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)
    ai_msg = rag_chain.invoke({"input": input, "chat_history": chat_history})
    return ai_msg

## The `main` Function 

The `main` function initiates the conversational question-answering chain based on PDF documents stored in the specified directory.

### PDF Document Loading
- The directory containing PDF files is specified (`./data`), and these documents are loaded using `PyPDFDirectoryLoader`.
- `documents` stores the loaded PDF data, which will be converted into embeddings for retrieval.

### Vector Store Initialization
- `OllamaEmbeddings` (using the "mistral:7b" model) is used to generate embeddings for the documents, which allows for semantic similarity searching.
- Facebook AI Similarity Search (FAISS) is an open-source library that helps developers search for similar multimedia documents in large datasets. It stores these document embeddings, enabling quick retrieval based on the user's questions.
- The vector store’s `as_retriever()` method provides a retriever object for retrieving relevant document chunks.

### Conversation State Initialization
- `chat_history` is initialized as an empty list to store user inputs and assistant responses. This is later used for reformulating user questions to enable contextual question answering. 

### Interactive Question-Answer Loop
- A loop takes user input (prompt) to ask questions based on the uploaded PDF documents.
- The loop breaks if the user types "exit".

### Getting the AI Response
- The `get_conversational_answer` function is called using `asyncio.run()`, taking in the retriever, user prompt, and chat history to generate contextually relevant responses.
- The AI’s answer (`ai_msg["answer"]`) and the user’s question are added to `chat_history` for providing context in the future responses.

### Displaying the Assistant’s Response
- The assistant’s response is printed to the console.
- This loop continues until the user types in 'exit'. 


In [6]:
def main():
    # Specify the directory where the PDF is stored
    pdf_directory = "./data"

    # Load the PDF documents
    loader = PyPDFDirectoryLoader(pdf_directory)
    documents = loader.load()

    # Initialize the vector store using the embeddings model
    embed_model = OllamaEmbeddings(model='mistral:7b')
    vector_store = FAISS.from_documents(documents, embed_model)
    retriever = vector_store.as_retriever()

    # Initialize the conversation state
    chat_history = []

    while True:
        # Take user input for a question
        prompt = input("Ask your question based on the uploaded PDF (or type 'exit' to quit): ")

        if prompt.lower() == 'exit':
            print("Exiting the conversation.")
            break

        # Get the AI response using the retriever and chain
        ai_msg = asyncio.run(get_conversational_answer(retriever, prompt, chat_history))

        # Store the user input and AI response in the chat history
        chat_history.extend([HumanMessage(content=prompt), ai_msg["answer"]])

        # Display the assistant's response
        print("Assistant: ", ai_msg["answer"])

## Call the `main` function to initiate the chat

In [7]:
if __name__ == '__main__':
    main()

Ask your question based on the uploaded PDF (or type 'exit' to quit):  what are the things to know about customers


  llm = ChatOllama(model="mistral:7b")


Assistant:  1. Understand Their Needs: Customers purchase products or services based on their needs, desires, and problems they want to solve. Understanding these needs is crucial for providing them with a product or service that meets their requirements.

2. Know Their Preferences: Every customer has unique preferences when it comes to products, services, and the way they are delivered. This can include everything from color choices, brand loyalty, payment methods, and more.

3. Demographics: Basic demographic information such as age, gender, income level, education level, occupation, and location can provide valuable insights into customer behavior, preferences, and purchasing power.

4. Psychographics: This refers to a customer's values, attitudes, interests, and lifestyle. Understanding these aspects can help businesses connect with customers on a deeper emotional level and tailor their marketing strategies accordingly.

5. Customer Journey: Understanding the steps a customer takes

Ask your question based on the uploaded PDF (or type 'exit' to quit):  exit


Exiting the conversation.


## Integration with Streamlit UI 

Run this cell to copy the entire code to a `.py` named `app.py`. Launch a new terminal an type `streamlit run app.py` to see the entire rag system demonstarted above with an interactive UI.

In [None]:
%%writefile ./app.py

import streamlit as st
from langchain_community.chat_models import ChatOllama
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain_community.vectorstores import FAISS
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate,MessagesPlaceholder
from langchain.chains import create_history_aware_retriever
from langchain_core.messages import HumanMessage
import os
import asyncio


async def get_conversational_answer(retriever,input,chat_history):
    contextualize_q_system_prompt = """Given a chat history and the latest user question \
    which might reference context in the chat history, formulate a standalone question \
    which can be understood without the chat history. Do NOT answer the question, \
    just reformulate it if needed and otherwise return it as is."""
    contextualize_q_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", contextualize_q_system_prompt),
            MessagesPlaceholder("chat_history"),
            ("human", "{input}"),
        ]
    )


    llm = ChatOllama(model="mistral")

    history_aware_retriever = create_history_aware_retriever(
        llm, retriever, contextualize_q_prompt
    )

    qa_system_prompt = """You are an assistant for question-answering tasks. \
    Use the following pieces of retrieved context to answer the question. \
    If you don't know the answer, just say that you don't know. \
    Use three sentences maximum and keep the answer concise.\
    Donot generate any additional text unless you are asked to.\
    Keep the answers really short and concise.\

    {context}"""
    qa_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", qa_system_prompt),
            MessagesPlaceholder("chat_history"),
            ("human", "{input}"),
        ]
    )

    question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
    rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)
    ai_msg = rag_chain.invoke({"input": input, "chat_history": chat_history})
    return  ai_msg


def main():
    st.header('Chat with your PDF')
    
    if "conversation" not in st.session_state:
        st.session_state.conversation = None

    if "activate_chat" not in st.session_state:
        st.session_state.activate_chat = False

    if "messages" not in st.session_state:
        st.session_state.messages = []
        st.session_state.chat_history=[]

    for message in st.session_state.messages:
        with st.chat_message(message["role"], avatar = message['avatar']):
            st.markdown(message["content"])

    embed_model = OllamaEmbeddings(model='mistral')

    with st.sidebar:
        st.subheader('Upload Your PDF File')
        docs = st.file_uploader('Upload your PDF & Click to process',accept_multiple_files = True, type=['pdf'])
        if st.button('Process'):
            if docs is not None:
                os.makedirs('./data', exist_ok=True)
                for doc in docs:
                    save_path = os.path.join('./data', doc.name)
                    with open(save_path, 'wb') as f:
                        f.write(doc.getbuffer())
                    st.write(f'Processed file: {save_path}')
           
            with st.spinner('Processing'):
                loader = PyPDFDirectoryLoader("./data")
                documents = loader.load()
                vector_store = FAISS.from_documents(documents, embed_model)
                retriever=vector_store.as_retriever()
                if "retriever" not in st.session_state:
                    st.session_state.retriever = retriever
                st.session_state.activate_chat = True

            # Delete uploaded PDF files after loading
            for doc in os.listdir('./data'):
                os.remove(os.path.join('./data', doc))

    if st.session_state.activate_chat == True:
        if prompt := st.chat_input("Ask your question based on the uploaded PDF"):
            with st.chat_message("user", avatar = '👨🏻'):
                st.markdown(prompt)
            st.session_state.messages.append({"role": "user",  "avatar" :'👨🏻', "content": prompt})
            retriever = st.session_state.retriever

            ai_msg = asyncio.run(get_conversational_answer(retriever,prompt,st.session_state.chat_history))
            st.session_state.chat_history.extend([HumanMessage(content=prompt), ai_msg["answer"]])
            cleaned_response=ai_msg["answer"]
            with st.chat_message("assistant", avatar='🤖'):
                st.markdown(cleaned_response)
            st.session_state.messages.append({"role": "assistant",  "avatar" :'🤖', "content": cleaned_response})
        else:
            st.markdown('Upload your PDFs to chat')


if __name__ == '__main__':
    main()