## Step 1:  set up NVIDIA API KEY and import Chat Models and Embeddings

- complete the [setup]{https://python.langchain.com/v0.1/docs/integrations/text_embedding/nvidia_ai_endpoints/#setup} and get the api key.

- run the code snippet below and enter your api key

In [161]:
# import getpass
# import os

# ## API Key can be found by going to NVIDIA NGC -> AI Foundation Models -> (some model) -> Get API Code or similar.
# ## 10K free queries to any endpoint (which is a lot actually).

# # del os.environ['NVIDIA_API_KEY']  ## delete key and reset
# if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
#     print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
# else:
#     nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
#     assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
#     os.environ["NVIDIA_API_KEY"] = nvapi_key

Valid NVIDIA_API_KEY already in environment. Delete to reset


In [1]:
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings

# get all available models
ChatNVIDIA.get_available_models()

ValueError: Failed to query model endpoint https://api.nvcf.nvidia.com/v2/nvcf/functions.
[401] Unauthorized
Bearer error="invalid_token"
error_description="Bearer token is malformed"
error_uri="https://tools.ietf.org/html/rfc6750#section-3.1"
Please check or regenerate your API key.

## Step 2: Initialize LLM model and embeddings
- I will be using LLAMA3-8b open-source model and Nvidia embedding model
- will also work with GPT-4o

In [22]:
%%capture --no-stderr
%pip install -U langgraph langchain langchain-openai

In [23]:
# os.environ["OPENAI_API_KEY"] = getpass.getpass()

In [99]:
# from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# model = ChatOpenAI(model="gpt-4o")
# embedder = OpenAIEmbeddings()

In [163]:
# Doesn't work with Open-Source models

llm = ChatNVIDIA(model='ai-mixtral-8x7b-instruct') #mistralai/mixtral-8x22b-instruct-v0.1, ai-mistral-large, ai-mixtral-8x7b-instruct, ai-llama3-8b

embedder = NVIDIAEmbeddings(model="ai-embed-qa-4")

## Step 3: load documents using document loaders provided by LangChain

- Here, we will be using `UnstructuredPDFLoader` for loading documents in directory 

- and `OnlinePDFLoader` for loading online documents from the arxiv repository

In [164]:
from langchain_community.document_loaders import OnlinePDFLoader
from langchain.document_loaders import UnstructuredPDFLoader

# load document from directory
loader = UnstructuredPDFLoader("../Generative-AI-Book.pdf")
book = loader.load()

# load document from arxiv
loader_01 = OnlinePDFLoader("https://arxiv.org/pdf/2312.10997")
paper = loader_01.load()

## Step 4: Storing Embeddings in Vector Store

In [165]:
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)

# text splits for both documents
chunks_book = text_splitter.split_documents(book)
chunks_paper = text_splitter.split_documents(paper)

# storing in vector store
faiss_db_book = FAISS.from_documents(chunks_book, embedder)
faiss_db_paper = FAISS.from_documents(chunks_paper, embedder)

## Step 5: Creating Retriever

In [166]:
retriever_book = faiss_db_book.as_retriever()
retriever_paper = faiss_db_paper.as_retriever()

In [None]:
# docs = retriever_paper.invoke("what is Retrieval-Augmented Generation?")[0].page_content.split('|')
# response = '\n'.join(docs) #?
# print(response)

## Step 6: Construct Retriever for the Book and the arxiv paper

In [167]:
from langchain.tools import BaseTool

# Class for Book Retriever
class BookRetriever(BaseTool):
    name = 'Book'
    description = 'Useful for understanding the content of the Generative AI book'
    
    def _run(self, query):
        out = retriever_book.invoke(query)[0].page_content.split('|')
        output = '\n'.join(out)
        return output
    
    def _arun(self, query: str):
        raise NotImplementedError("This tool does not support async")

# Class for Paper Retriever
class PaperRetriever(BaseTool):
    name = 'Paper'
    description = 'Useful for understanding the latest on RAG techniques'
    
    def _run(self, query):
        out = retriever_paper.invoke(query)[0].page_content.split('|')
        output = '\n'.join(out)
        return output
    
    def _arun(self, query: str):
        raise NotImplementedError("This tool does not support async")
    

# initialize the retrievers
book_retriever = BookRetriever()
paper_retriever = PaperRetriever()

## Step 7: Load and initialize available tools in Langchain

- Arxiv

- Wikipedia

In [168]:
from langchain_community.utilities import ArxivAPIWrapper
from langchain_community.tools.wikipedia.tool import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper

# arxiv tool
arxiv =  ArxivAPIWrapper()

# wikipedia tool
wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())


In [169]:
from langchain.tools import Tool

## A proper name and a good description helps to know on how to use the tools
wiki_tool = Tool.from_function(
    func=wikipedia.run,
    name="Wiki",
    description="useful for when you need to search certain topic on Wikipedia, aka wiki")

arxiv_tool = Tool.from_function(
    func=arxiv.run,
    name="Arxiv",
    description="useful for querying from arxiv repository")

book_tool = Tool.from_function(
    func= book_retriever.invoke,
    name = 'book',
    description = 'Useful for understanding the latest Generative AI book'
)

paper_tool = Tool.from_function(
    func= paper_retriever.invoke,
    name = 'paper',
    description = 'Useful for understanding the RAG from a specific paper'
)

# List of all tools
tools = [book_tool, paper_tool] # wiki_tool, arxiv_tool, book_tool, paper_tool

## Step 8: Wrap the tools in ToolExecutor

In [170]:
%pip install httpx
from langchain_core.agents import AgentFinish
from langgraph.prebuilt.tool_executor import ToolExecutor # reinstall langgraph if fails!

# This a helper class we have that is useful for running tools
# It takes in an agent action and calls that tool and returns the result
tool_executor = ToolExecutor(tools)

Note: you may need to restart the kernel to use updated packages.


## Step 9: Prompt Template for Agents

In [188]:
import os
from langchain.agents import AgentExecutor
from langchain.agents import initialize_agent
from langchain.prompts import MessagesPlaceholder
from langchain.memory import ConversationBufferMemory
from langchain.agents import AgentType, Agent, ConversationalAgent
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate

## set up memory
memory = ConversationBufferMemory(memory_key="chat_history", input_key='input', output_key="output")

prompt_template = """
### [INST]

Assistant is a large language model to answer questions on Generative AI.


Context:
------

Assistant has access to the following tools:

{tools}

To use a tool, please use the following format:

'''
Thought: Do I need to use a tool? Yes
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
'''

When you have a response to say to the Human, or if you do not need to use a tool, you MUST use the format:

'''
Thought: Do I need to use a tool? No
Final Answer: [your response here]
'''

Begin!

Previous conversation history:
{chat_history}

New input: {input}

Current Scratchpad:
{agent_scratchpad}

[/INST]
 """

# Create prompt from prompt template
prompt = PromptTemplate(
    input_variables=['agent_scratchpad', 'chat_history', 'input', 'tool_names', 'tools'],
    template=prompt_template,
)

prompt = prompt.partial(
    tools=[t.name for t in tools],
    tool_names=", ".join([t.name for t in tools]),
)
print("prompt ---> \n", prompt)

prompt ---> 
 input_variables=['agent_scratchpad', 'chat_history', 'input'] partial_variables={'tools': ['Search'], 'tool_names': 'Search'} template="\n### [INST]\n\nAssistant is a large language model to answer questions on Generative AI.\n\n\nContext:\n------\n\nAssistant has access to the following tools:\n\n{tools}\n\nTo use a tool, please use the following format:\n\n'''\nThought: Do I need to use a tool? Yes\nAction: the action to take, should be one of [{tool_names}]\nAction Input: the input to the action\nObservation: the result of the action\n'''\n\nWhen you have a response to say to the Human, or if you do not need to use a tool, you MUST use the format:\n\n'''\nThought: Do I need to use a tool? No\nFinal Answer: [your response here]\n'''\n\nBegin!\n\nPrevious conversation history:\n{chat_history}\n\nNew input: {input}\n\nCurrent Scratchpad:\n{agent_scratchpad}\n\n[/INST]\n "


In [181]:
# os.environ['SERPAPI_API_KEY']=getpass.getpass()

In [194]:
## EXPERIMENT ## (to-be deleted) **does not work with open-source LLM model**
# from langchain import hub
# from langchain.agents import AgentExecutor, load_tools
# from langchain.agents.format_scratchpad import format_log_to_str
# from langchain.tools.render import render_text_description
# from langchain.agents.output_parsers import (
#     ReActJsonSingleInputOutputParser,
# )

# # setup tools
# tools = load_tools(["serpapi"], llm=llm)

# # prompt
# prompt = hub.pull("hwchase17/react-json")
# prompt = prompt.partial(
#     tools=render_text_description(tools),
#     tool_names=", ".join([t.name for t in tools]),
# )

# # define the agent
# chat_model = llm
# chat_model_with_stop = chat_model.bind(stop=["\nObservation"])
# agent = (
#     {
#         "input": lambda x: x["input"],
#         "agent_scratchpad": lambda x: format_log_to_str(x["intermediate_steps"]),
#     }
#     | prompt
#     | chat_model_with_stop
#     | ReActJsonSingleInputOutputParser()
# )

# instantiate AgentExecutor
# agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# # creating agent that has access to the tool list
# agent = create_tool_calling_agent(model, tools, prompt) #model

# agent_execute = AgentExecutor(agent=agent,
#                               tools=tools,
#                               verbose=True,
#                               max_iterations=3,
#                               early_stopping_method="generate"
#                               )

In [None]:
# agent_executor.invoke(
#     {
#         "input": "what is retrieval augmented generation?",
        
#     }
# )

In [173]:
from typing import Any, Optional, Sequence

from langchain_core._api import deprecated
from langchain_core.callbacks import BaseCallbackManager
from langchain_core.language_models import BaseLanguageModel
from langchain_core.tools import BaseTool

from langchain.agents.agent import AgentExecutor
from langchain.agents.agent_types import AgentType
from langchain.agents.loading import AGENT_TO_CLASS, load_agent

agent_cls = AGENT_TO_CLASS[AgentType.CONVERSATIONAL_REACT_DESCRIPTION] # didn't understand this!!! CONVERSATIONAL_REACT_DESCRIPTION
agent_kwargs = {}
agent_obj = agent_cls.from_llm_and_tools(
    model, tools, callback_manager=None, **agent_kwargs)

agent_execute=AgentExecutor.from_agent_and_tools(
        agent=agent_obj,
        tools=tools,
        callback_manager=None,
        # handle_parsing_errors=True,
        handle_parsing_errors="Check your output and make sure it conforms, use the Action/Action Input syntax",
        verbose=True,
        output_key = "output",
        max_iterations=3,
        return_intermediate_steps=True,
        early_stopping_method="generate", # or use **force**
        memory=ConversationBufferMemory(memory_key="chat_history", input_key='input', output_key="output"),
        # reduce_k_below_max_tokens=True,
        # max_tokens = 1024
)

In [113]:
from typing import TypedDict, Annotated, List, Union
from langchain_core.agents import AgentAction, AgentFinish
from langchain_core.messages import BaseMessage
import operator


class AgentState(TypedDict):
    # The input string
    input: str
    # The list of previous messages in the conversation
    chat_history: list[BaseMessage]
    # The outcome of a given call to the agent
    # Needs `None` as a valid type, since this is what this will start as
    agent_outcome: Union[AgentAction, AgentFinish, None]
    # List of actions and corresponding observations
    # Here we annotate this with `operator.add` to indicate that operations to
    # this state should be ADDED to the existing values (not overwrite it)
    intermediate_steps: Annotated[list[tuple[AgentAction, str]], operator.add]

In [1]:
# Define the agent
from langchain_core.agents import AgentActionMessageLog

def run_agent(data):
    inputs = data.copy()
    text = inputs['input']
    # chat_history = inputs['chat_history']
    agent_outcome = agent_execute.invoke({"input":text}) #, "chat_history": chat_history
    return {"agent_outcome": agent_outcome}

# Define the function to execute tools
def execute_tools(data):
    # Get the most recent agent_outcome - this is the key added in the `agent` above
    agent_output = data["agent_outcome"]
    if len(agent_output['intermediate_steps'])>=1 :
        agent_action = agent_output['intermediate_steps'][0][0]
        output = tool_executor.invoke(agent_action)
        return {"intermediate_steps": [(agent_action, str(output))]}
    else:
        return {"intermediate_steps":[]}

# Define logic that is used to determine which conditional edge to go down
def should_continue(data):
    # If the agent outcome is an AgentFinish, then we return `exit` string
    # This will be used when setting up the graph to define the flow
    if data["agent_outcome"]["output"] is not None:
        print(" **AgentFinish** " )
        return "end"
    # Otherwise, an AgentAction is returned
    # Here we return `continue` string
    # This will be used when setting up the graph to define the flow
    else:
        print(" **continue** " )
        return "continue"

In [143]:
from langgraph.graph import END, StateGraph

# Define a new graph
workflow = StateGraph(AgentState)

# Define the two nodes we will cycle between
workflow.add_node("agent", run_agent)
workflow.add_node("action", execute_tools)

# Set the entrypoint as `agent`
# This means that this node is the first one called
workflow.set_entry_point("agent")

# We now add a conditional edge
workflow.add_conditional_edges(
    # First, we define the start node. We use `agent`.
    # This means these are the edges taken after the `agent` node is called.
    "agent",
    # Next, we pass in the function that will determine which node is called next.
    should_continue,
    # Finally we pass in a mapping.
    # The keys are strings, and the values are other nodes.
    # END is a special node marking that the graph should finish.
    # What will happen is we will call `should_continue`, and then the output of that
    # will be matched against the keys in this mapping.
    # Based on which one it matches, that node will then be called.
    {
        # If `tools`, then we call the tool node.
        "continue": "action",
        # Otherwise we finish.
        "end": END,
    },
)

# We now add a normal edge from `tools` to `agent`.
# This means that after `tools` is called, `agent` node is called next.
workflow.add_edge("action", "agent")

# Finally, we compile it!
# This compiles it into a LangChain Runnable,
# meaning you can use it as you would any other runnable
app = workflow.compile()

In [177]:
inputs = {"input": "Explain Adaptive Retrieval methods"} # ,"chat_history": []
outputs = app.invoke(inputs)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mCould not parse LLM output: ` Thought: Do I need to use a tool? Yes

Action: paper`[0m
Observation: Check your output and make sure it conforms, use the Action/Action Input syntax
Thought:[32;1m[1;3mCould not parse LLM output: ` Do I need to use a tool? Yes

Action: paper

`[0m
Observation: Check your output and make sure it conforms, use the Action/Action Input syntax
Thought:[32;1m[1;3mCould not parse LLM output: ` It seems like there was an issue with the previous output. To explain Adapt`[0m
Observation: Check your output and make sure it conforms, use the Action/Action Input syntax
Thought:

OutputParserException: Could not parse LLM output: ` Thought: Do I need to use a tool? Yes

Action: paper`

In [155]:
inputs = {"input": "Can you explain Self-RAG?"} # ,"chat_history": []
outputs = app.invoke(inputs)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mCould not parse LLM output: `Self-Retrieval-Augmented Generation (Self-RAG) is an advanced technique in generative AI designed to enhance a model's ability to retrieve relevant information autonomously during the text generation process. Here are the main components and functionalities of Self-RAG:

1. **Reflection Tokens**: These are special tokens such as "retrieve" and "critic" that the model uses to introspect and decide when to trigger the retrieval mechanism. The decision can also be influenced by a predefined threshold based on the probability of certain generated terms.

2. **Autonomous Retrieval Activation**: The model can independently determine when to activate the retrieval mechanism by analyzing the context and using reflection tokens. This allows the model to recognize the need for additional information to maintain coherence and relevance in the generated text.

3. **Fragment-level Beam Search**: During the ret

KeyboardInterrupt: 

In [156]:
inputs = {"input": "what is one advantage and one disadvantage of Self-RAG?"} # ,"chat_history": []
outputs = app.invoke(inputs)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m```
Thought: Do I need to use a tool? No
AI: One advantage of Self-RAG is that it allows the model to autonomously determine when and how to retrieve relevant information, leading to more coherent and contextually accurate text generation. This reduces the need for manual intervention and makes the model more efficient.

One disadvantage of Self-RAG is the potential computational complexity and resource requirements involved in performing fragment-level beam searches and dynamic updates with critic scores. This can make the model slower and more resource-intensive, potentially limiting its scalability for large-scale applications.
```[0m

[1m> Finished chain.[0m
 **AgentFinish** 


In [157]:
inputs = {"input": "explain the Scaling laws of RAG?"} # ,"chat_history": []
outputs = app.invoke(inputs)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mCould not parse LLM output: `Scaling laws in the context of Retrieval-Augmented Generation (RAG) refer to how the performance of RAG models changes as a function of various factors such as model size, dataset size, and computational resources. These laws help researchers and practitioners understand the trade-offs and benefits of scaling up different components of the RAG system. Here are some key aspects of scaling laws in RAG:

1. **Model Size**: Increasing the size of the generative model typically leads to better performance in terms of generating coherent and relevant text. Larger models have more capacity to understand and generate complex patterns in the data. However, this comes at the cost of increased computational resources and memory requirements.

2. **Dataset Size**: The amount of data available for both training and retrieval is crucial. A larger dataset can provide more diverse and relevant information for the

In [159]:
inputs = {"input": "explain what is Flowise AI doing for production-ready RAG?"} # ,"chat_history": []
outputs = app.invoke(inputs)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mCould not parse LLM output: `Flowise AI focuses on providing tools and frameworks to make Retrieval-Augmented Generation (RAG) production-ready, enabling seamless integration into real-world applications. Here are some key areas where Flowise AI contributes:

1. **Optimized Infrastructure**: Flowise AI offers optimized infrastructure to handle the computational demands of RAG models. This includes scalable cloud solutions and efficient data pipelines that can manage large datasets and high-throughput retrieval operations.

2. **Pre-trained Models**: Flowise AI provides access to pre-trained RAG models that are fine-tuned on diverse datasets. These models are ready to be deployed and can be further customized to specific use cases, reducing the time and effort required for model training.

3. **Efficient Retrieval Mechanisms**: Flowise AI implements advanced retrieval mechanisms such as vector search and fragment-level beam se

In [160]:
outputs['agent_outcome']['output']
# outputs

'Flowise AI focuses on providing tools and frameworks to make Retrieval-Augmented Generation (RAG) production-ready, enabling seamless integration into real-world applications. Here are some key areas where Flowise AI contributes:\n\n1. **Optimized Infrastructure**: Flowise AI offers optimized infrastructure to handle the computational demands of RAG models. This includes scalable cloud solutions and efficient data pipelines that can manage large datasets and high-throughput retrieval operations.\n\n2. **Pre-trained Models**: Flowise AI provides access to pre-trained RAG models that are fine-tuned on diverse datasets. These models are ready to be deployed and can be further customized to specific use cases, reducing the time and effort required for model training.\n\n3. **Efficient Retrieval Mechanisms**: Flowise AI implements advanced retrieval mechanisms such as vector search and fragment-level beam search, which are optimized for speed and accuracy. These mechanisms are designed to 