# Conversational Agent with Retrieval Augmented Generation

### Step 1: Import necessary libraries

In [None]:
# Standard library imports
import os  # For interacting with the operating system, e.g., file paths
import asyncio  # For managing asynchronous tasks

# Third-party library imports
from dotenv import load_dotenv  # For loading environment variables from a .env file
from PyPDF2 import PdfReader  # For reading PDF files
import tqdm  # For displaying progress bars in loops

# LangChain imports - Core functionality
from langchain.text_splitter import RecursiveCharacterTextSplitter  # For splitting text into manageable chunks
from langchain.prompts import PromptTemplate  # For defining and managing prompt templates
from langchain.chains.combine_documents import create_stuff_documents_chain  # For combining retrieved documents into a coherent chain
from langchain.globals import set_debug  # For enabling debug mode in LangChain

# LangChain - Google Generative AI integrations
from langchain_google_genai import GoogleGenerativeAIEmbeddings  # For generating embeddings using Google Generative AI
from langchain_google_genai import ChatGoogleGenerativeAI  # For chat-based interactions with Google Generative AI

# LangChain - Vector store
from langchain_community.vectorstores import FAISS  # For storing and retrieving embeddings using the FAISS library

# LangChain - Advanced prompt management and messages
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder  # For creating structured chat prompts
from langchain_core.messages import HumanMessage, AIMessage  # For handling human and AI messages
from langchain_core.output_parsers import StrOutputParser  # For parsing string outputs from models
from langchain_core.runnables import RunnableBranch  # For creating branches in the chain of execution

from typing import Dict
from langchain_core.runnables import RunnablePassthrough

import nest_asyncio
import weave
nest_asyncio.apply()


### Step 2: Setting Up Environment Variables and PDF Path

In this section, we:
1. **Load Environment Variables**: We use the `load_dotenv()` function to load key-value pairs from a `.env` file into the environment. This allows us to securely manage sensitive information such as API keys.
   - The API key for Google Generative AI is stored in an environment variable called `GOOGLE_API_KEY`.
2. **Define the PDF Path**: The `pdf_path` variable specifies the location of the PDF file that we will process in subsequent steps.
   - Ensure that the file exists at the specified path before proceeding.

In [None]:
load_dotenv()
key = os.getenv("GOOGLE_API_KEY")
pdf_path = "data/nihms-1901028.pdf"

### Step 3: Reading and Extracting Text from the PDF

In this step, we process the PDF file to extract its textual content.

1. **Open the PDF File**
 
2. **Initialize the PDF Reader**: We use the `PdfReader` class from the `PyPDF2` library to parse the PDF.

3. **Extract Text**: 
   - A generator expression iterates over all pages in the PDF, using `page.extract_text()` to extract the text content of each page.
   - Pages that do not contain text are skipped (`if page.extract_text()`).

4. **Combine Text**: The extracted text from all pages is concatenated into a single string using `"".join(...)`.

#### Notes:
- If the PDF is large, this approach might consume significant memory. For large PDFs, consider processing pages in smaller batches.
- The output variable `text` contains all the text extracted from the PDF and will be used in subsequent steps.

In [None]:
with open(pdf_path, "rb") as file:
    reader = PdfReader(file)
     # Extract text from all pages in the PDF
    text = "".join(page.extract_text() for page in reader.pages if page.extract_text())
    
# Display the extracted text
text

### Step 4: Splitting Text into Manageable Chunks

In this step, we divide the extracted text into smaller, overlapping chunks for better processing in later stages.

1. **Initialize the Text Splitter**:
   - We use the `RecursiveCharacterTextSplitter` from LangChain, which is designed to split large texts into smaller pieces without necessarily respecting semantic coherence.
   - Parameters:
     - `chunk_size=10000`: Each chunk contains up to 10,000 characters.
     - `chunk_overlap=1000`: Adjacent chunks overlap by 1,000 characters. This overlap ensures context is maintained across chunks.

3. **Split the Text**:
   - The `split_text()` method splits the input text (from the previous step) into chunks based on the specified parameters.
   - The resulting `chunks` is a list of strings, each representing a section of the original text.

#### Notes:
- The choice of `chunk_size` and `chunk_overlap` depends on the use case and model constraints. Larger models can typically handle larger chunks.
- The `chunks` will be used in downstream tasks such as retrieval or generation.

In [None]:
# Initialize a text splitter with specified chunk size and overlap
splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=1000)
# Split the extracted text into manageable chunks
chunks = splitter.split_text(text)

# Display the resulting chunks
chunks

<p style="color:green; font-size: 16px;">
<b>Exercise: Experimenting with Text Chunking Parameters</b><br>
<b>Goal:</b> Develop an understanding of how chunking parameters affect text segmentation and its impact on downstream tasks.<br><br>

<b>Instructions:</b><br>
<ul style="color:green;">
<li>After completing <b>Step 4: Splitting Text into Manageable Chunks (Cell 4)</b>, adjust the <code>chunk_size</code> and <code>chunk_overlap</code> parameters.</li>
<li>Observe how these adjustments influence the number and size of the resulting chunks.</li>
</ul>

<b>Purpose:</b> This exercise helps you build intuition about text chunking processes, enabling you to see how parameter choices can affect tasks that depend on text segmentation.
</p>

### Step 5: Creating and Saving a Vector Store

In this step, we generate embeddings for the text chunks and store them in a vector database for efficient retrieval.

1. **Generate Embeddings**:
   - We use `GoogleGenerativeAIEmbeddings` to create embeddings for each text chunk.
   - The parameter `model="models/embedding-001"` specifies the embedding model to use. Ensure this model is available and properly configured in your environment.

2. **Create the FAISS Vector Store**:
   - FAISS (Facebook AI Similarity Search) is a library for efficient similarity search and clustering of dense vectors.

3. **Save the Vector Store**:
   - The `save_local("faiss_index")` method saves the FAISS index to a local file.
   - This allows us to reuse the index in later sessions without re-processing the text or regenerating embeddings.

#### Notes:
- **Why Use FAISS?**
  - It is highly optimized for large-scale vector searches and enables quick retrieval of relevant chunks for a given query.
- **Next Steps**:
  - The stored vector database will be used to retrieve the most relevant chunks of text when querying the system.

In [None]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector_store = FAISS.from_texts(chunks, embedding=embeddings)
vector_store.save_local("faiss_index")


### Step 6: Initializing Weave for Project Tracking

In this step, we initialize **Weave**, a library designed for tracking and visualizing machine learning workflows and data flows.

1. **What is Weave?**
   - Weave is a tool for logging, monitoring, and debugging machine learning experiments and pipelines.
   - It allows you to visualize your project’s structure, metrics, and progress, which is especially useful in iterative development.

2. **Initialization**:
   - The `weave.init()` function initializes a new Weave project. The argument `"medical-data-chatbot"` specifies the project name.
   - This name will help organize and track this specific project in the Weave dashboard.

In [None]:
weave.init("medical-data-chatbot")

### Step 7: Setting Up the Retriever

In this step, we configure a **retriever**.

**Create a Retriever**:
   - The `vector_store.as_retriever()` method converts the FAISS vector store into a retriever object.
   - The parameter `k=4` specifies the maximum number of chunks to retrieve for each query. This ensures that only the top 4 most relevant chunks are returned.

#### Notes:
- **Why Limit the Results?**
  - Limiting the number of results ensures that the model processes only the most relevant information, which can improve efficiency and response quality.
  - The value of `k` can be adjusted based on the complexity of the query and the size of the text chunks.

- **Next Steps**:
  - The retriever will be used in conjunction with a generative model to form a **retrieval-augmented generation (RAG)** pipeline.

In [None]:
retriever = vector_store.as_retriever(search_kwargs={"k": 4})

<p style="color:green; font-size: 16px;">
<b>Exercise: Exploring Retriever Functionality</b><br>
<b>Goal:</b> Gain an understanding of how a retriever operates and how it determines document relevance.<br><br>

<b>Instructions:</b><br>
<ul style="color:green;">
<li>After completing <b>Step 7: Setting Up the Retriever (Cell 7)</b>, test the retriever with three distinct queries:</li>
<ul style="color:green;">
    <li>A query closely related to the dataset content (relevant).</li>
    <li>A query with ambiguous wording (vague).</li>
    <li>A query entirely unrelated to the dataset (unrelated).</li>
</ul>
<li>For each query, analyze the documents retrieved and discuss the following:</li>
<ul style="color:green;">
    <li>Why were these specific documents selected?</li>
    <li>How well do the retrieved documents align with the query intent?</li>
    <li>What patterns or limitations do you observe in the retriever's behavior?</li>
</ul>
</ul>

<b>Purpose:</b> This exercise helps you understand the principles and limitations of similarity-based retrieval, fostering insight into its performance across different types of queries.
</p>

### Step 8: Demonstrating the Retriever in Action

In this step, we test the retriever by providing a sample query and observing the returned results.

1. **Purpose of this Demonstration**:
   - This step shows that the retriever functions independently and (correctly) fetches the most relevant chunks of text based on the query.

2. **Query the Retriever**:
   - The `retriever.invoke()` method takes a query (in this case, `"What is the difference between high and medium protein-based diets?"`) and searches the vector store for the most relevant chunks.
   - The retriever returns the top `k=4` results, as configured earlier.

3. **Output**:
   - The `docs` variable contains the retrieved chunks as a list of text strings.

In [None]:
docs = retriever.invoke("What is the difference between high and medium protein-based diets?")
docs

### Step 9: Creating the Question-Answering System

In this step, we set up the components needed to answer user questions based on the retrieved documents.

1. **Define the System Template**:
   - The `system_template` string specifies how the generative model should process the retrieved context.
   - It instructs the model to answer the user's question using the information provided in the `<context>` placeholder.

2. **Create a Prompt Template**:
   - The `PromptTemplate` wraps the `system_template` into a reusable object.
   - The `input_variables=["context"]` defines which variables need to be filled in when the prompt is used.

3. **Initialize the Generative Model**:
   - The `ChatGoogleGenerativeAI` class is used to instantiate a chat-based model.
   - Parameters:
     - `model="gemini-1.5-pro-latest"` specifies the version of the model to use.
     - `temperature=0.5` controls the randomness of the responses. A value of `0.5` balances creativity and determinism.

4. **Create the Document Chain**:
   - The `create_stuff_documents_chain()` function integrates the model and the prompt into a chain.

#### Notes:
- **Model Selection**:
   - The `"gemini-1.5-pro-latest"` model is used here, but it can be replaced with other compatible models if needed.
- **Customizable Prompt**:
   - The `system_template` can be adjusted to meet the requirements of different use cases
- **Next Steps**:
   - Use this document chain to generate answers for specific queries in combination with the retriever.

In [None]:
# Define the template for answering user questions based on a provided context
system_template = """
Answer the users question based on the below context:
<context> {context} </context>
Say that you don't know the answer if you the context is not relevant to the question.
"""
# Create a prompt template for the question-answering system
question_answering_prompt = PromptTemplate(template=system_template, input_variables=["context"])

# Initialize the generative model for question answering
model = ChatGoogleGenerativeAI(model="gemini-1.5-pro-latest", temperature=0.5)

# Create a document chain to handle the retrieval and response generation process
document_chain = create_stuff_documents_chain(llm=model, prompt=question_answering_prompt)


### Step 10: Testing the Document Chain for Question-Answering

In this step, we test the full retrieval-augmented generation (RAG) chain by invoking the `document_chain` with a user query and the retrieved context.

1. **Purpose**:
   - This step demonstrates how the `document_chain` integrates the retrieved context (`docs`) and the user query to generate a response using the generative model.

2. **Components of the Input**:
   - **Context**:
     - The `context` key is assigned the value of `docs`, which contains the top chunks retrieved by the retriever in **Step 7**.
     - This ensures the model has access to relevant information when answering the query.
   - **Messages**:
     - A list of messages simulates a conversational interaction.
     - The `HumanMessage` object represents the user query: `"What is the difference between high and medium protein-based diets?"`.

4. **Expected Output**:
   - The output is a response from the model, which synthesizes the retrieved context and the query to provide an accurate and relevant answer.

In [None]:
document_chain.invoke(
    {
        "context": docs,
        "messages": [
            HumanMessage(content="What is the difference between high and medium protein-based diets?")
        ],
    }
)

### Step 11: Building a Combined Retrieval Chain

In this step, we create a combined retrieval-augmented generation (RAG) chain that integrates retrieval and document generation into a seamless pipeline.

1. **Helper Function**:
   - The `parse_retriever_input` function extracts the latest user query from the `params` dictionary.
   - Specifically, it accesses the `"messages"` key and retrieves the content of the last message, which represents the most recent user query.

2. **RunnablePassthrough**:
   - A `RunnablePassthrough` is a utility that passes data through the specified processing steps without additional transformation.
   - We use its `.assign()` method to define the sequence of operations in the chain.

3. **Assigning Operations**:
   - The chain is composed of two key steps:
     - **Step 1: Retrieve Context**:
       - The `parse_retriever_input` function extracts the user query.
       - This query is passed through the `retriever` to fetch the relevant text chunks.
       - The result is assigned to the `context` key.
     - **Step 2: Generate Answer**:
       - The `document_chain` takes the retrieved context and generates an answer based on the query.
       - The result is assigned to the `answer` key.

4. **Output**:
   - The `retrieval_chain` object is now a runnable pipeline that combines retrieval and response generation in a single operation.

In [None]:
# Define a helper function to extract the latest user query from the input parameters
def parse_retriever_input(params: Dict):
    return params["messages"][-1].content

# Create a retrieval chain with a passthrough mechanism
retrieval_chain = RunnablePassthrough.assign(
    # First step: Extract the user query and use it to retrieve relevant context
    context=parse_retriever_input | retriever,
).assign(
    # Second step: Use the retrieved context to generate an answer
    answer=document_chain,
)

### Step 12: Testing the Combined Retrieval Chain

In this step, we test the more complex `retrieval_chain` pipeline to ensure that it seamlessly integrates the retrieval and generation steps.

1. **Purpose**:
   - This test validates that the chain can process user queries end-to-end:
     - Extracting the query.
     - Retrieving the relevant context using the `retriever`.
     - Generating a coherent response using the `document_chain`.

2. **Input Structure**:
   - The input is a dictionary with the key `"messages"`, which contains a list of messages.
   - Each message is represented as a `HumanMessage` object.

3. **Pipeline Execution**:
   - **Query Extraction**: The `parse_retriever_input` function extracts the query from the last message in the list.
   - **Context Retrieval**: The query is passed to the `retriever` to fetch the most relevant text chunks.
   - **Answer Generation**: The retrieved context is fed into the `document_chain`, which uses the prompt and generative model to produce the final answer.

4. **Expected Output**:
   - A dictionary containing:
     - `"context"`: The retrieved text chunks.
     - `"answer"`: The generated response to the user query.

In [None]:
retrieval_chain.invoke(
    {
        "messages": [
            HumanMessage(content="What is the difference between high and medium protein-based diets?")
        ],
    }
)

### Step 13: Testing the Retrieval Chain with a Follow-Up Query

This step demonstrates a limitation of the current retrieval pipeline when handling vague or follow-up queries without explicit reference to the context of the previous conversation.

1. **Purpose**:
   - To test how the retriever responds to a vague query such as `"Tell me more"`.
   - Highlight the challenge of maintaining conversational context in the current implementation.

2. **Current Behavior**:
   - The `retrieval_chain` processes the input query `"Tell me more"` independently, without considering previous queries or their context.
   - The retriever fetches documents that match the new query, but since `"Tell me more"` is nonspecific, the results may be irrelevant or nonsensical.

3. **Expected Behavior**:
   - Ideally, the system should infer that `"Tell me more"` is a continuation of the prior query (`"What is the difference between high and medium protein-based diets?"`).
   - The retrieved documents should provide additional information about the initial topic.

4. **Limitation**:
   - The current design does not track conversational context or incorporate previous messages into the retrieval process.

In [None]:
retrieval_chain.invoke(
    {
        "messages": [
            HumanMessage(content="Tell me more")
        ],
    }
)

### Step 14: Testing the Retriever Directly with a Vague Query

1. **Purpose**:
   - To show that the retriever retrieves documents directly related to the query without any consideration of conversational history or prior context.
   - This highlights the challenge of vague queries in isolation.

2. **Current Behavior**:
   - The retriever processes the query `"Tell me more!"` independently and returns documents that match this phrase based on the embedding similarity.
   - Since `"Tell me more!"` lacks specific content, the results are likely to be generic or nonsensical unless a context happens to align by chance.


In [None]:
retriever.invoke("Tell me more!")

### Step 15: Adding Query Transformation to Improve Contextual Relevance

This step introduces a **query transformation prompt** to address the limitations of vague queries like `"Tell me more!"`. The goal is to reframe such queries in the context of the conversation, producing a more meaningful query for the retriever.

1. **Purpose**:
   - To improve the retrieval system by generating a context-aware search query that reflects the ongoing conversation.
   - This ensures that follow-up queries are relevant and meaningful, even if they are vague.

2. **Query Transformation Prompt**:
   - The `ChatPromptTemplate.from_messages()` method creates a prompt that uses all prior messages (`messages`) as context.
   - The prompt asks the model to:
     - Analyze the prior conversation.
     - Generate a search query tailored to the user's intent and the ongoing context.
     - Output **only** the transformed query for use with the retriever.

#### Notes:
- **Benefits**:
   - This approach bridges the gap between conversational input and the retriever's expectations for specific queries.

- **Limitations**:
   - The effectiveness depends on the quality of the generative model used for query transformation.
   - Ambiguous conversations might still produce suboptimal queries.

In [None]:

query_transform_prompt = ChatPromptTemplate.from_messages(
    [
        MessagesPlaceholder(variable_name="messages"),
        (
            "user",
            "Given the above conversation, generate a search query to look up in order to get information relevant to the conversation. Only respond with the query, nothing else.",
        ),
    ]
)

<p style="color:green; font-size: 16px;">
<b>Exercise: Experimenting with Query Transformation</b><br>
<b>Goal:</b> Understand the role of query transformation in improving contextual relevance and retrieval accuracy.<br><br>

<b>Instructions:</b><br>
<ul style="color:green;">
<li>After completing <b>Step 15: Adding Query Transformation to Improve Contextual Relevance (Cell 15)</b>, modify the <code>query_transform_prompt</code> to tailor the generated search queries for a specific domain, such as:</li>
<ul style="color:green;">
    <li>Scientific research</li>
    <li>Customer support</li>
</ul>
<li>Test the modified prompt by providing multiple follow-up queries, such as:</li>
<ul style="color:green;">
    <li>"Tell me more"</li>
    <li>"Explain further"</li>
    <li>"Can you provide examples?"</li>
</ul>
<li>Observe and analyze the transformed queries for each example. Reflect on the following:</li>
<ul style="color:green;">
    <li>How well do the transformed queries align with the domain-specific context?</li>
    <li>Do the transformed queries improve retrieval accuracy for the intended domain?</li>
    <li>What potential improvements could be made to the transformation prompt?</li>
</ul>
</ul>

<b>Purpose:</b> This exercise demonstrates how query transformation can enhance retrieval performance by aligning queries more closely with domain-specific needs, helping to fine-tune the system for specialized use cases.
</p>


### Step 16: Adding a Model to the Query Transformation Chain

In this step, we enhance the query transformation process by integrating a generative model into the chain. This enables the system to dynamically generate refined, context-aware search queries.

1. **Purpose**:
   - To implement a **query transformation chain** that processes conversational context and user input to produce an optimized query for the retriever.
   - By combining the transformation prompt with the generative model, we create an end-to-end pipeline for query reformulation.

5. **Improved Retrieval**:
   - The transformed query should be more specific and meaningful now, improving the retriever's ability to fetch relevant results.

In [None]:
query_transformation_chain = query_transform_prompt | model

### Step 17: Testing the Query Transformation Chain

In this step, we test the **query transformation chain** to verify its ability to refine vague queries like `"Tell me more!"` into meaningful search queries using the context of the preceding conversation.

1. **Purpose**:
   - To validate that the `query_transformation_chain` can analyze the conversational context and generate a search query relevant to the user’s intent.
   - This test demonstrates how the chain integrates the conversational history when reformulating vague follow-up queries.

2. **Input Structure**:
   - **Messages**:
     - A list of messages simulating a conversation:
       - **First Message (Human)**: `"What is the difference between high and medium protein-based diets?"`
       - **Second Message (AI)**: A detailed response summarizing research findings on protein-based diets.
       - **Third Message (Human)**: `"Tell me more!"`, a vague follow-up query.
   - The chain uses this conversation history to generate a refined query.

In [None]:
query_transformation_chain.invoke(
    {
        "messages": [
            HumanMessage(content="What is the difference between high and medium protein-based diets?"),
            AIMessage(
                content="he study found that both high and normal protein diets improved body composition and glucose control in adults with type 2 diabetes. The lack of observed effects of dietary protein and red meat consumption on weight loss and improved cardiometabolic health suggest that achieved weight loss – rather than diet composition – should be the principal target of dietary interventions for T2D management."
            ),
            HumanMessage(content="Tell me more!"),
        ],
    }
)

### Step 18: Building the Query-Transforming Retriever Chain

In this step, we create a **query-transforming retriever chain** that dynamically adapts its behavior based on the structure of the input.

1. **Purpose**:
   - To handle two scenarios seamlessly:
     - **Single Message**: When there is only one message, pass the content directly to the retriever.
     - **Multiple Messages**: When there is a conversational history, use the query transformation chain to refine the query before passing it to the retriever.
   - This flexible chain improves the system's ability to handle both straightforward and context-dependent queries.

2. **Components**:
   - **RunnableBranch**:
     - Dynamically selects a branch to execute based on the input condition.
   - **Condition**:
     - The lambda function `lambda x: len(x.get("messages", [])) == 1` checks if the input contains only one message.
     - If `True`, the first branch is executed. Otherwise, the second branch is used.
   - **First Branch**:
     - If there is only one message:
       - Extract the content of the last message with `lambda x: x["messages"][-1].content`.
       - Pass this content directly to the `retriever` to fetch relevant documents.
   - **Second Branch**:
     - If there are multiple messages:
       - The input is passed through the query transformation pipeline:
         1. `query_transform_prompt`: Captures and reformulates the query in context.
         2. `model`: Generates the refined query.
         3. `StrOutputParser()`: Parses the output string for compatibility with the retriever.
       - The transformed query is passed to the `retriever`.

3. **Configuration**:
   - The `with_config(run_name="chat_retriever_chain")` method assigns a unique name to this chain, making it easier to track during execution and debugging.

In [None]:
query_transforming_retriever_chain = RunnableBranch(
    (
        lambda x: len(x.get("messages", [])) == 1,
        # If only one message, then we just pass that message's content to retriever
        (lambda x: x["messages"][-1].content) | retriever,
    ),
    # If messages, then we pass inputs to LLM chain to transform the query, then pass to retriever
    query_transform_prompt | model | StrOutputParser() | retriever,
).with_config(run_name="chat_retriever_chain")

### Step 19: Finalizing the Conversational Retrieval-Augmented Generation (RAG) Pipeline

This step integrates all the components to build the final conversational RAG pipeline, which can handle multi-turn conversations, transform queries, retrieve relevant documents, and generate accurate answers.

1. **System Template**:
   - The `SYSTEM_TEMPLATE` defines the behavior of the answer generation system:
     - Instructs the model to base its answers solely on the provided context.
     - Explicitly directs the model to say `"I don't know"` if the context lacks relevant information, reducing the risk of hallucinations.

2. **Question-Answering Prompt**:
   - The `ChatPromptTemplate.from_messages()` creates a structured prompt for the system.
   - Components:
     - **System Message**: Sets the rules and behavior for answer generation.
     - **Messages Placeholder**: Captures the conversational context and query for generating the final response.

3. **Document Chain**:
   - The `create_stuff_documents_chain()` function combines the generative model (`model`) with the question-answering prompt.
   - This chain processes the retrieved context and conversational history to generate a coherent and relevant answer.

4. **Conversational Retrieval Chain**:
   - The `RunnablePassthrough.assign()` method is used to sequentially integrate:
     - **Context Retrieval**:
       - The `query_transforming_retriever_chain` retrieves relevant documents based on transformed queries or direct input.
     - **Answer Generation**:
       - The `document_chain` generates a final response based on the retrieved context and user input.

In [None]:
# Define the system template for generating answers
SYSTEM_TEMPLATE = """
Answer the user's questions based on the below context. 
If the context doesn't contain any relevant information to the question, don't make something up and just say "I don't know":

<context>
{context}
</context>
"""

# Create a prompt template for question answering (refer to Step 9 for prompt creation)
question_answering_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            SYSTEM_TEMPLATE,
        ),
        MessagesPlaceholder(variable_name="messages"),  # Adds conversational context (Step 9)
    ]
)

# Create a document chain for answering user questions (refer to Step 9)
document_chain = create_stuff_documents_chain(model, question_answering_prompt)

# Build the final conversational retrieval chain
# Combine the transformed query retrieval (Step 18) with the document chain (Step 9)
conversational_retrieval_chain = RunnablePassthrough.assign(
    # Assign the transformed query context to the retrieval chain (refer to Step 18)
    context=query_transforming_retriever_chain,
).assign(
    # Assign the answer generation process to the document chain (refer to Step 9)
    answer=document_chain,
)

### Step 20: Testing the Conversational Retrieval Chain with an Unrelated Query

This step tests the robustness of the final conversational retrieval-augmented generation (RAG) pipeline by providing a query unrelated to the available context in the documents.

#### Notes:
- **Relevance of the Test**:
   - Real-world systems often encounter queries beyond their knowledge scope, making this a critical behavior to validate.

In [None]:
conversational_retrieval_chain.invoke(
    {
        "messages": [
            HumanMessage(content="Can LangSmith help test my LLM applications?"),
        ]
    }
)

### Step 21: Verifying the Conversational Retrieval Chain in the Target Use Case

1. **Test Input**:
   - **Initial Query**:
     - `"What is the difference between high and medium protein-based diets?"`
   - **AI Response** (provided in the test input to simulate a prior response):
     - Summarizes a study about the effects of high and normal protein diets on body composition and glucose control.
   - **Follow-Up Query**:
     - `"Tell me more!"`—a vague request that relies on the system to infer and retrieve additional context.

In [None]:
conversational_retrieval_chain.invoke(
    {
        "messages": [
            HumanMessage(content="What is the difference between high and medium protein-based diets?"),
            AIMessage(
                content="he study found that both high and normal protein diets improved body composition and glucose control in adults with type 2 diabetes. The lack of observed effects of dietary protein and red meat consumption on weight loss and improved cardiometabolic health suggest that achieved weight loss – rather than diet composition – should be the principal target of dietary interventions for T2D management."
            ),
            HumanMessage(content="Tell me more!"),
        ],
    }
    )

### Step 22: Wrapping the Conversational Retrieval Pipeline in a Function

This step wraps the entire conversational retrieval-augmented generation (RAG) pipeline into a single reusable function. The function manages the conversation history, tracks the query, and generates answers, ensuring a seamless user experience.

It is decorated with `@weave.op()` to log and track the process in **Weights & Biases** (Weave).

The `@weave.op()` decorator ensures that each query and response are logged in **Weights & Biases**, enabling tracking, monitoring, and debugging of the conversational process.

5. **Example Usage**:
   ```python
   conversation = {"messages": []}
   print(await get_answer("What is the difference between high and medium protein-based diets?", conversation))
   print(await get_answer("Tell me more!", conversation)

In [None]:
@weave.op()
async def get_answer(question: str, messages: dict):
    """
    Handles user queries by appending them to the conversation history, 
    processing the query through the conversational retrieval chain, 
    and appending the AI's response back to the messages.

    Parameters:
    - question (str): The user's input question.
    - messages (dict): A dictionary containing the conversation history 
                       with a "messages" key holding a list of message objects.

    Returns:
    - str: The generated answer from the system.
    """
    # Add the user's question to the conversation history
    messages["messages"].append(HumanMessage(content=question))
    
    # Process the query through the conversational retrieval chain
    answer = conversational_retrieval_chain.invoke(messages)
    
    # Add the system's response to the conversation history
    messages["messages"].append(AIMessage(content=answer["answer"]))
    
    # Return the generated answer
    return answer["answer"]

In [None]:

messages = {"messages": []} 
answer = asyncio.get_event_loop().run_until_complete(get_answer("What is the difference between high and medium protein-based diets?", messages))
print(answer) 

<p style="color:green; font-size: 16px;">
<b>Exercise: Testing the Full Retrieval-Augmented Generation (RAG) Pipeline</b><br>
<b>Goal:</b> Evaluate the performance of the complete RAG pipeline by testing it with diverse queries and identifying its strengths and weaknesses.<br><br>

<b>Instructions:</b><br>
<ul style="color:green;">
<li><b>Prepare a Set of Queries:</b></li>
<ul style="color:green;">
    <li>Create at least five queries that vary in nature, including:</li>
    <ul style="color:green;">
        <li>A highly specific query.</li>
        <li>A vague or open-ended query.</li>
        <li>A multi-part or follow-up query.</li>
    </ul>
</ul>
<li><b>Run the Pipeline:</b></li>
<ul style="color:green;">
    <li>Input each query into the full RAG pipeline.</li>
    <li>Observe and document the generated responses, including:</li>
    <ul style="color:green;">
        <li>The documents retrieved.</li>
        <li>The final generated output.</li>
    </ul>
</ul>
<li><b>Analyze Performance:</b></li>
<ul style="color:green;">
    <li>For each query, analyze:</li>
    <ul style="color:green;">
        <li><b>Strengths:</b> What aspects of the query were handled well (e.g., relevance, coherence, accuracy)?</li>
        <li><b>Weaknesses:</b> What challenges did the pipeline face (e.g., irrelevant documents, poor contextual understanding, incomplete answers)?</li>
        <li><b>Patterns:</b> Are there consistent issues or successes across the queries?</li>
    </ul>
</ul>
<li><b>Document Findings:</b></li>
<ul style="color:green;">
    <li>Checke the following in weave:</li>
    <ul style="color:green;">
        <li><b>Query type.</b></li>
        <li><b>Retrieved documents</b> (relevance and quality).</li>
        <li><b>Generated output</b> (clarity and accuracy).</li>
        <li><b>Overall evaluation</b> (what worked, what didn’t).</li>
    </ul>
</ul>
</ul>

<b>Purpose:</b> This exercise helps you critically evaluate the end-to-end functionality of the RAG pipeline, highlighting areas of success and identifying limitations that could guide future improvements.
</p>