<a href="https://colab.research.google.com/github/AarifCha/RAG-HF-Langchain/blob/main/Gemini_RAG_for_Scientific_Papers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Gemini RAG for scientific papers.

In this notebook, we setup a RAG for scientific papers. The Gemini LLM (LMM) is used since it has a free tier that is ideal for application development. The prompt-to-answer pipeline is constructed using LangChain due to its flexibility. We also use LlamaParse (the gpt4o version) to convert pdf files into text. We use this perticular pdf parser due to equations and other complex structures in scientific papers. Regular pdf readers such as pypdf aren't able to properly process the documents resulting in poor performancing RAGs. Chroma vector store is paired with this LlamaIndex library to create the index.

### API Keys

We need to setup our API keys for both Google's Gemini and LlamaCloud. These can be accesed through these links:
https://aistudio.google.com/app/apikey

https://cloud.llamaindex.ai/api-key

In [8]:
import getpass
import os

os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google AI API key: ")
os.environ["LLAMA_CLOUD_API_KEY"] = getpass.getpass("Enter your Llama Cloud API key: ")

Enter your Google AI API key: ··········
Enter your Llama Cloud API key: ··········


We also need to pip install these libraries to work in google colab

In [2]:
%pip install -q google-generativeai
%pip install -q langchain-google-genai
%pip install -q llama-index-embeddings-langchain
%pip install -q langchain
%pip install -q langchain-community
%pip install -q tiktoken
%pip install -q chromadb
%pip install -q pypdf
%pip install -q llama-index-vector-stores-chroma
%pip install -q llama-index
%pip install -q llama-index-embeddings-gemini
%pip install -q llama-index-core llama-parse llama-index-readers-file python-dotenv

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m399.9/399.9 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.2/290.2 kB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m141.9/141.9 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

### Setting up Gemini

We set up three llm models to be used in our langchain chain.

In [4]:
from langchain_google_genai import ChatGoogleGenerativeAI

# This llm is used on the inital prompt and chat history.
llm_prompt = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0,
    max_tokens=2000,
    timeout=None,
    max_retries=2,
)

# This llm is used to help with document/node retrival.
llm_context = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0.4,
    max_tokens=1500,
    timeout=None,
    max_retries=2,
)

# This is the final llm used for answering the question asked.
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-pro",
    temperature=0,
    max_tokens=10000,
    timeout=None,
    max_retries=2,
)

Let's try the model on a question. We will also use display and Markdown to print the response neatly.

In [5]:
from IPython.display import Markdown, display

messages = [
    (
        "system",
        "You assist in answering questions for the question asked.",
    ),
    ("human", "How do multimodal models work?"),
]

ai_msg = llm_prompt.invoke(messages)

display(Markdown(ai_msg.content))

Multimodal models are a fascinating area of AI that are revolutionizing how we interact with information. Here's a breakdown of how they work:

**1. Combining Multiple Data Sources:**

* **The Key:** Multimodal models excel by processing and understanding information from multiple sources like text, images, audio, and even video. This allows them to grasp a more complete picture of the world.
* **Example:** Imagine a model analyzing a recipe. It can read the text instructions, see the ingredients in an image, and even hear the chef's voice explaining the steps. This multi-sensory input gives it a much richer understanding than a model that only reads text.

**2. Representation Learning:**

* **The Challenge:**  The different data types (text, images, etc.) have different structures and require different processing techniques.
* **The Solution:** Multimodal models use techniques like **embedding** to represent each data type in a common, shared space. This allows them to compare and relate information from different sources.
* **Example:**  Text and images can be converted into numerical vectors (embeddings) that capture their meaning. The model can then compare these vectors to understand how the text and image relate to each other.

**3. Fusion and Interaction:**

* **The Goal:**  Once the data is represented in a common space, the model needs to combine and interact with it.
* **The Methods:**  Various techniques are used for fusion, including:
    * **Early Fusion:** Combining data at the input level (e.g., concatenating image features with text features).
    * **Late Fusion:** Combining data at the output level (e.g., using separate models for each modality and then combining their predictions).
    * **Intermediate Fusion:** Combining data at different stages of processing.
* **The Outcome:**  The model learns to understand the relationships between different modalities and how they contribute to the overall meaning.

**4. Applications:**

* **Image Captioning:** Generating descriptions for images.
* **Video Understanding:** Analyzing and understanding the content of videos.
* **Machine Translation:** Translating text while considering the context of images or videos.
* **Question Answering:** Answering questions that require understanding both text and images.
* **Medical Diagnosis:** Analyzing medical images and patient records to assist in diagnosis.

**In a Nutshell:**

Multimodal models are like having multiple senses working together. They learn to understand the world by combining information from different sources, representing it in a common language, and then using that information to make predictions or generate outputs. This ability to integrate multiple data types makes them incredibly powerful for a wide range of applications.


### Parsing PDF using LlamaParse

We will use the LlamaCloud API to parse our PDF file using openai-GPT4o model. If your document is mostly text, maybe using the default setting might be beneficial since it is much cheaper.

We first mount the google drive.

In [6]:
from google.colab import drive
drive.mount("/content/drive", force_remount=True)

Mounted at /content/drive


Now that the drive is mounted, lets parse a file from the drive.

In [9]:
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader
import nest_asyncio

nest_asyncio.apply()

# set up parser
parser = LlamaParse(
    result_type="markdown",
#    fast_mode=True, # Uncomment this to use fast parser that does not use any OCR (cheapest)
    use_vendor_multimodal_model=True,            # Comment these two lines out
    vendor_multimodal_model_name="openai-gpt4o", # if you want to use the default parser.
    parsing_instruction="The following is a research paper. Write the equations in latex format."
)

file_extractor = {".pdf": parser}
file_path = "/content/drive/MyDrive/ML_Colab_Notebooks/NLP_Projects/Test_Docs/GANs.pdf"
documents = SimpleDirectoryReader(input_files=[file_path], file_extractor=file_extractor).load_data()

Started parsing the file under job_id df0fa807-d4a4-4532-b49b-65cdcc9bcb12


Now that we have our document converted into text (Markdown format). We need to parse it into chunks. To do so we will use Langchain's Recursive Character Text Splitter to split the documents (pages) into nodes (chunks).

In [14]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
from llama_index.core.node_parser import LangchainNodeParser

# Characters used to determine where to split in order.
SEPARATORS = [
    "\n\n",
    "\n",
    " ",
    ".",
    "?",
    "!",
    ",",
    ""
]

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=3000,
    chunk_overlap=300,
    add_start_index=True,
    strip_whitespace=True,
    separators=SEPARATORS,
)

for i, doc in enumerate(documents):
    doc.metadata["page_number"] = "page " + str(i)

# Create a Node parser for LlamaIndex using langchain splitter.
text_parser = LangchainNodeParser(text_splitter)
nodes = text_parser.get_nodes_from_documents(documents)

The nodes can now be embedded into vectors and a vector store database (db) using chroma can be created. We can also use previously created/saved vectore store to create our index.

In [15]:
from llama_index.embeddings.gemini import GeminiEmbedding
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore

import chromadb

gemini_embeddings = GeminiEmbedding(model_name="models/text-embedding-004")

persist_directory = '/content/drive/MyDrive/ML_Colab_Notebooks/NLP_Projects/chroma/' # Replace with your desired directory path
# !rm -r /content/drive/MyDrive/ML_Colab_Notebooks/NLP_Projects/chroma/*   # Uncomment to remove old database files

db = chromadb.PersistentClient(path=persist_directory)
chroma_collection = db.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# Comment this line and ...
index = VectorStoreIndex(nodes, vector_store=vector_store, embed_model = gemini_embeddings)

# ... Uncomment these lines to use a previously created vector store instead.
# index = VectorStoreIndex.from_vector_store(
#     vector_store,
#     embed_model=gemini_embeddings
# )

Now let's test and see how our vector store performs. We will again use display to print out the equations and nicely :D

In [16]:
from llama_index.core.retrievers import VectorIndexRetriever

retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=3,
    mode = "cosine"
)

ret_nodes = retriever.retrieve("What is Algorithm 1 about?")
print(len(ret_nodes))
res = "\n".join([i.get_content() + "\n\n" + i.metadata["file_name"] +" - " + i.metadata["page_number"] +
                 "\n\n-----------------------------------------------------------------\n\n" for i in ret_nodes])


# The parsed documents have \[ \] and \( \) instead of $$ and $ respecitivly. So we need to format it to have proper outputs.
import re

res = re.sub(r'\\([\[\]()])', lambda m: '$$' if m.group(1) in '[]' else '$', res)
display(Markdown(f"<b>{res}</b>"))

3


<b>$$
C(G) = \max_D V(G, D)
$$

$$
= \mathbb{E}_{x \sim p_{\text{data}}} [\log D_G^*(x)] + \mathbb{E}_{x \sim p_g} [\log(1 - D_G^*(x))]
$$

$$
= \mathbb{E}_{x \sim p_{\text{data}}} \left[ \log \frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_g(x)} \right] + \mathbb{E}_{x \sim p_g} \left[ \log \frac{p_g(x)}{p_{\text{data}}(x) + p_g(x)} \right] \tag{4}
$$

GANs.pdf - page 4

-----------------------------------------------------------------


In other words, $ D $ and $ G $ play the following two-player minimax game with value function $ V(G, D) $:

$$
\min_G \max_D V(D, G) = \mathbb{E}_{\mathbf{x} \sim p_{\text{data}}(\mathbf{x})} [\log D(\mathbf{x})] + \mathbb{E}_{\mathbf{z} \sim p_{\mathbf{z}}(\mathbf{z})} [\log(1 - D(G(\mathbf{z})))].
$$

(1)

In the next section, we present a theoretical analysis of adversarial nets, essentially showing that the training criterion allows one to recover the data generating distribution as $ G $ and $ D $ are given enough capacity, i.e., in the non-parametric limit. See Figure 1 for a less formal, more pedagogical explanation of the approach. In practice, we must implement the game using an iterative, numerical approach. Optimizing $ D $ to completion in the inner loop of training is computationally prohibitive, and on finite datasets would result in overfitting. Instead, we alternate between $ k $ steps of optimizing $ D $ and one step of optimizing $ G $. This results in $ D $ being maintained near its optimal solution, so long as $ G $ changes slowly enough. This strategy is analogous to the way that SML/PCD [31, 29] training maintains samples from a Markov chain from one learning step to the next in order to avoid burning in a Markov chain as part of the inner loop of learning. The procedure is formally presented in Algorithm 1.

In practice, equation 1 may not provide sufficient gradient for $ G $ to learn well. Early in learning, when $ G $ is poor, $ D $ can reject samples with high confidence because they are clearly different from the training data. In this case, $\log(1 - D(G(\mathbf{z})))$ saturates. Rather than training $ G $ to minimize $\log(1 - D(G(\mathbf{z})))$ we can train $ G $ to maximize $\log(D(G(\mathbf{z})))$. This objective function results in the same fixed point of the dynamics of $ G $ and $ D $ but provides much stronger gradients early in learning.

![Figure 1](https://example.com/figure1.png)

GANs.pdf - page 3

-----------------------------------------------------------------


**Algorithm 1** Minibatch stochastic gradient descent training of generative adversarial nets. The number of steps to apply to the discriminator, $ k $, is a hyperparameter. We used $ k = 1 $, the least expensive option, in our experiments.

for number of training iterations do

&nbsp;&nbsp;&nbsp;&nbsp;for $ k $ steps do

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;• Sample minibatch of $ m $ noise samples $\{z^{(1)}, \ldots, z^{(m)}\}$ from noise prior $ p_g(z) $.

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;• Sample minibatch of $ m $ examples $\{x^{(1)}, \ldots, x^{(m)}\}$ from data generating distribution $ p_{\text{data}}(x) $.

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;• Update the discriminator by ascending its stochastic gradient:

$$
\nabla_{\theta_d} \frac{1}{m} \sum_{i=1}^{m} \left[ \log D \left( x^{(i)} \right) + \log \left( 1 - D \left( G \left( z^{(i)} \right) \right) \right) \right].
$$

&nbsp;&nbsp;&nbsp;&nbsp;end for

&nbsp;&nbsp;&nbsp;&nbsp;• Sample minibatch of $ m $ noise samples $\{z^{(1)}, \ldots, z^{(m)}\}$ from noise prior $ p_g(z) $.

&nbsp;&nbsp;&nbsp;&nbsp;• Update the generator by descending its stochastic gradient:

$$
\nabla_{\theta_g} \frac{1}{m} \sum_{i=1}^{m} \log \left( 1 - D \left( G \left( z^{(i)} \right) \right) \right).
$$

end for

The gradient-based updates can use any standard gradient-based learning rule. We used momentum in our experiments.

## 4.1 Global Optimality of $ p_g = p_{\text{data}} $

We first consider the optimal discriminator $ D $ for any given generator $ G $.

**Proposition 1.** For $ G $ fixed, the optimal discriminator $ D $ is

$$
D_G^*(x) = \frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_g(x)} \tag{2}
$$

**Proof.** The training criterion for the discriminator $ D $, given any generator $ G $, is to maximize the quantity $ V(G, D) $

$$
V(G, D) = \int_x p_{\text{data}}(x) \log D(x) dx + \int_x p_g(z) \log(1 - D(g(z))) dz
$$

$$
= \int_x p_{\text{data}}(x) \log(D(x)) + p_g(x) \log(1 - D(x)) dx \tag{3}
$$

For any $ (a, b) \in \mathbb{R}^2 \setminus \{0, 0\} $, the function $ y \to a \log(y) + b \log(1 - y) $ achieves its maximum in $[0, 1]$ at $ \frac{a}{a+b} $. The discriminator does not need to be defined outside of $ \text{Supp}(p_{\text{data}}) \cup \text{Supp}(p_g) $, concluding the proof.

Note that the training objective for $ D $ can be interpreted as maximizing the log-likelihood for estimating the conditional probability $ P(Y = y|x) $, where $ Y $ indicates whether $ x $ comes from $ p_{\text{data}} $ (with $ y = 1 $) or from $ p_g $ (with $ y = 0 $). The minimax game in Eq. 1 can now be reformulated as:

$$
C(G) = \max_D V(G, D)
$$

$$
= \mathbb{E}_{x \sim p_{\text{data}}} [\log D_G^*(x)] + \mathbb{E}_{x \sim p_g} [\log(1 - D_G^*(x))]
$$

GANs.pdf - page 4

-----------------------------------------------------------------

</b>

As we expected the outputs look really good! You can try to use pypdf (Langchain: PyPDFLoader) and see what the results look like to compare. They will be no where as good!

### Setting up the Chain

Now we will set up a not so basic Langchain chain. We will use function calling, output/structure parsing and prompt engeering. Lets go through this slowly.

In [17]:
from langchain.prompts import ChatPromptTemplate
from langchain.agents import tool

from pydantic import BaseModel, Field
from typing import Optional, List

# First we create a prompt to determine if we need any context. We correct the user query,
# add a set of topics to help with retrival using the chat history. Read through the system
# prompt to understand what this first model call is used for. Note that we use a positive
# example (what to do) in order to help the model. Negative examples (what not to do) are not
# recommended.

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a research assistant that helps the user with understanding topics and "
    "finding information relavant to what they are asking. You will use the get_context function "
    "when necessary to retrive the information you need to answer the question. If the question "
    "does not require an calls such as 'hello, how are you?', pass an empty string as the 'query' parameter to the get_context function. "
    "The user might ask a question based on the previous response. In such cases, you have been given the chat history "
    "to rephrase the question with missing details to have a better retrival of relavant documents. "
    "Make sure the edits are as minimal as possible, such as replacing this, that etc."
    "The chat history is surrounded by [[[...]]] and is given in a reverse order, meaning the latest output is at the top and the oldest at the bottom. "
    "The user and assistant messages in the history are labeled with a prefix h_, so h_user and h_assistant."
    "Hence when asked about the previous question or response, use the top chat history to rephrase the question. "
    "If the question contains all the necessary information and does not rely on previous chats use the "
    "unaltered version of the question to call the get_context function. If there is a chat history relavant to the "
    "question asked, such as when the user asks for more details about the previous question, create a string of topics from that chat history. "
    "First have a thought, then output the question to be used for the function, and then finally the function call. "
    "Output the thought, question, and function parameter in a json format. Always set the k value to 2. "
    "If the query parameter is set to an empty string, set the value of the Question key as just the inputted question. "
    "If nothing is relavant in the chat history to the question created put an empty string for topics and do not try to make up or forcefully fit in topics. "
    "Do not repeat these topics in the rephrased question neither add the topic in the question to the string of topics.\n\n"
    "Example: user: What can you use that model used for?\n"
    "Chat History:[[[\nh_assistant: A generative adversarial network (GAN) is a deep learning architecture. "
    "It trains two neural networks to compete against each other to generate more authentic new data from a given training dataset.\n"
    "h_user: what is a GAN?]]]\n\n"
    """model:
    "Thought": "The question asks about uses of 'that model'. This is not an independent question hence it is
important to modify this question. From the chat history I can tell that the 'that model' corresponds to GANs/ Generative Adversarial Networks.
Hence I should replace the question 'What can you use a Generative Adversarial Network (GAN) for?' From the chat history I can see
that the topics 'Deep Learning' and 'Neural Networks' might be helpful.",
    "Question": "What can you use a Generative Adversarial Network (GAN) for?",
    "Function": "get_context",
    "Parameters":
        "query": "What can you use a Generative Adversarial Network (GAN) for?",
        "topics": "Deep Learning, Neural Networks",
        "k": 2
    """),
    ("user", "{input}\nChat History: [[[{chat_history}]]]"),
    ])


# Now we set up a pydantic class to help define our tool and parameters for the
# model to call. This function returns context to be used by the final model.
# We will also use a model call within the get_context window (unless the query is
# an empty string) to help retrive more information.

class get_context_pydantic(BaseModel):
    query: str = Field(description="The query to search for.")
    topics: str = Field(description="The topics to search for.")
    k: int = Field(description="The number of documents to retrieve for each topic.")

@tool(args_schema=get_context_pydantic)
def get_context(query: str, topics: str, k: int = 2):
    """
    This function gets k most relevant documents for a given query
    and concatenates their content into a single string.

    Args:
        query (str): The query to search for.
        topics (str): The topics to search for.
        k (int): The number of documents to retrieve for each topic.

    Returns:
        str: A string containing the concatenated content of the retrieved documents.
    """

    # This helps reduce the model calling by 1 when no context is needed.
    if query == "":
        return ""

    # Again make sure to read this prompt to understand what this model call is supposed to do.
    message = [
        (
            "system",
            "You are a research assistant that helps elaborate the question asked by the user by asking more questions. "
            "Suggest five additional related questions to help them find the information they need, for the provided question. "
            "You are also given a list of topics, create "+str(k)+" questions for each topic as well. "
            "Try to make those questions be related to the provided question if possible."
            "Suggest only short questions without compound sentences. Suggest a variety of questions that cover different aspects of the topics. "
            "Make sure they are complete questions, and that they are related to the original question. "
            "Output one question per line. Do not number the questions. Do not output anything other than the questions."
        ),\
        (
            "user",
            "Question: "+query+"\n"+"Topics: "+topics
        )
    ]

    multiple_query = [query] + llm_prompt.invoke(message).content.split("\n")[:-1]

    unique_documents = set()
    retriever = VectorIndexRetriever(
        index=index,
        similarity_top_k=k,
        mode="cosine",
        )

    for q in multiple_query:
        nodes = retriever.retrieve(q)

        for i,node in enumerate(nodes):
            unique_documents.add(node.get_content()+"\n"+"Metadata: "
                                    +str(node.metadata.get("file_name")) +" - "
                                    +str(node.metadata.get("page_number"))+"\n\n")

    context = ""
    context += "".join(
        [f"Document {i}: "+doc
         for i, doc in enumerate(unique_documents)])

    return context

# Now that we have the function created, we will attach it to the model that determines how to call this function.
llm_prompt.bind(functions=[get_context], tool_choice="get_context")


# We also need another function that uses the information/context to answer the question asked.
# We restrict it to answer only based on the context give and also give reference to where
# that information was retrieved to make sure we are getting correct information.

class QA_answer_pydantic(BaseModel):
    query: str = Field(description="The question to be answered")
    doc_context: str = Field(description="The context to use for the answering the question")

@tool(args_schema=QA_answer_pydantic)
def QA_answer(query: str, doc_context: str):
    """
    This function returns an answer to a query using the QA LLM pipeline and context
    provided by the get_context method.

    Args:
        query (str): The question to be answered.
        doc_context (str): The context to use for the answering the question.
    Returns:
        str: The answer to the query.
    """

    messages = [
    (
        "system",
        """You are a research assistant that answers a question asked by the user based only on the documents given to you.
Explain with as much detail as possible to help the user with their research. Make sure the answer are elaborate and have a lot of detail.
If the user is having a simple chat such as "Hey!, How are you doing?", just repond normally. But if it is a question about some topic do not answer unless the documents contain the answer.
If the question was answered using the documents, use the metadata to give the page number and name of the book or paper used to answer the question.
If multiple books were relavant in answering the question list them all. Make sure to add
references but only using the metadata and not make up references. If there are many pages from the same book or pdf, reference the book/pdf once but with all the pages listed.
Format the response in a markdown (.md) format but keep the equations in latex format. Do not use ``` in the markdown text.""",
        # We tell the model to not use ``` since equations inside it are not printed properly.
    ),\
    (
        "user",
        "Question: " + query
    ),\
    (
        "context"
        "Documents:\n" + doc_context
    )]

    answer = llm.invoke(messages).content

    return answer

# The first model, llm_prompt, outputs a json formatted string of information. We need to parse this and convert
# it into a python dictionary for function calling. We have to provide a schema of the output and
from langchain.output_parsers import ResponseSchema, StructuredOutputParser

response_schemas = [ResponseSchema(name="Thought", description="The thought process used to find the correct question to be used."),
    ResponseSchema(name="Question",description="The question to be used to find the context needed to answer that question."),
    ResponseSchema(name="Function",description="The function to be called."),
    ResponseSchema(name="Parameters",description="A dictionary of parameters that will used in the function call.")]

output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

# This uses the parsed dictonary to call the get_context function with the parameters
# set by the model and passes those results as parameters to the QA_answer funciton.
def route(result):
    tools = {"get_context": get_context}
    doc_context = tools[result["Function"]].run(result["Parameters"])
    return {"query": result["Question"], "doc_context": doc_context}

# Now we use LCEL to chain our different parts into a pipeline
chain = prompt | llm_prompt | output_parser | route | QA_answer

Now lets try the chain on a question.

In [18]:
Question = "Can you tell me about algorithm 1 and also write it out?"
result_of_q = chain.invoke({"input":Question, "chat_history":""})
res_formatted = re.sub(r'\\([\[\]()])', lambda m: '$$' if m.group(1) in '[]' else '$', result_of_q)
display(Markdown(res_formatted))

**Algorithm 1: Minibatch stochastic gradient descent training of generative adversarial nets.** The number of steps to apply to the discriminator, $k$, is a hyperparameter. We used $k = 1$, the least expensive option, in our experiments.

**for** number of training iterations **do**

&nbsp;&nbsp;&nbsp;&nbsp;**for** $k$ steps **do**

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;• Sample minibatch of $m$ noise samples $\{z^{(1)}, \ldots, z^{(m)}\}$ from noise prior $p_g(z)$.

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;• Sample minibatch of $m$ examples $\{x^{(1)}, \ldots, x^{(m)}\}$ from data generating distribution $p_{\text{data}}(x)$.

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;• Update the discriminator by ascending its stochastic gradient:

$$
\nabla_{\theta_d} \frac{1}{m} \sum_{i=1}^{m} \left[ \log D \left( x^{(i)} \right) + \log \left( 1 - D \left( G \left( z^{(i)} \right) \right) \right) \right].
$$

&nbsp;&nbsp;&nbsp;&nbsp;**end for**

&nbsp;&nbsp;&nbsp;&nbsp;• Sample minibatch of $m$ noise samples $\{z^{(1)}, \ldots, z^{(m)}\}$ from noise prior $p_g(z)$.

&nbsp;&nbsp;&nbsp;&nbsp;• Update the generator by descending its stochastic gradient:

$$
\nabla_{\theta_g} \frac{1}{m} \sum_{i=1}^{m} \log \left( 1 - D \left( G \left( z^{(i)} \right) \right) \right).
$$

**end for**

The gradient-based updates can use any standard gradient-based learning rule. We used momentum in our experiments.

The Algorithm 1 discussed above is for training generative adversarial networks. The main idea of this algorithm is to use the two networks, generator $G$ and discriminator $D$, to learn the data generating distribution. The generator $G$ tries to generate data that looks similar to the real data and the discriminator $D$ tries to distinguish between the real data and the generated data. The algorithm uses stochastic gradient descent to update the parameters of the generator and discriminator networks. The number of steps to apply to the discriminator, $k$, is a hyperparameter. The authors used $k = 1$, the least expensive option, in their experiments.

**References:**

[1] GANs.pdf, pp. 3, 5. 


Wonderful! It answered the question with beautifully formatted equations and cited the pages! Lets put this response and the question into a chat history and see if it is able to answer an question that refereces to algorithm 1 indirectly.

In [19]:
chat_history = "[[["+"\nh_assistant: "+res_formatted+"\n"+"h_user: "+Question+"]]]"
Question = "Can you tell me about its convergence?"
result_of_q = chain.invoke({"input":Question, "chat_history":chat_history})
res_formatted = re.sub(r'\\([\[\]()])', lambda m: '$$' if m.group(1) in '[]' else '$', result_of_q)
display(Markdown(res_formatted))

Proposition 2 in *"Generative Adversarial Nets"* (Goodfellow et al.) states that if the discriminator and generator networks have sufficient capacity, and during each iteration of Algorithm 1, the discriminator reaches its optimal state given the generator, and the generator is updated to improve the criterion:

$$
E_{x \sim p_{\text{data}}} [\log D_G^*(x)] + E_{x \sim p_g} [\log(1 - D_G^*(x))],
$$

then the generator's distribution ($p_g$) will converge to the true data distribution ($p_{\text{data}}$).

**Proof:**

1. **Convexity:** The proof begins by considering the value function ($V(G, D) = U(p_g, D)$) as a function of the generator's distribution ($p_g$). It highlights that $U(p_g, D)$ is convex with respect to $p_g$.

2. **Subderivatives and Gradient Descent:** The proof utilizes the property that the subderivatives of a supremum of convex functions encompass the derivative of the function at its maximum point. This implies that performing gradient descent updates for $p_g$ at the optimal discriminator ($D$) for a given generator ($G$) is equivalent to finding the subderivative of the supremum of $U(p_g, D)$.

3. **Unique Global Optima:** Theorem 1 in the paper establishes that the supremum of $U(p_g, D)$ is convex in $p_g$ and has a unique global optimum when $p_g = p_{\text{data}}$.

4. **Convergence:** Given the convexity and the unique global optima, with sufficiently small updates to $p_g$ in the direction provided by the gradient descent, $p_g$ will converge to $p_{\text{data}}$.

Therefore, under the specified conditions, Algorithm 1 ensures the convergence of the generator's distribution to the true data distribution.

**In simpler terms:**

Imagine training a counterfeiter (generator) and a detective (discriminator) to create realistic fake currency. The detective tries to distinguish real from fake, while the counterfeiter learns from the detective's feedback to improve their fakes. If the detective is good enough and the counterfeiter is constantly learning, eventually the fakes will become indistinguishable from the real currency. This convergence, where the fake distribution matches the real distribution, is what Proposition 2 guarantees under ideal conditions.

**References:**

- Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In *Advances in neural information processing systems* (pp. 2672-2680). (pages 3, 5) 


It was able to answer it without a problem! You should try changing these questions, and use your own documents to test how well this does. Playing around with the prompts is also encouraged to see how the results can even drastically vary when prompts are modified.