<a href="https://colab.research.google.com/github/anjelammcgraw/LangChain-Agentic-RAG-with-LangGraph-LangSmith/blob/main/13_Agentic_RAG_powered_by_LangChain_with_LangGraph_and_LangSmith.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LangGraph and LangSmith - Agentic RAG Powered by LangChain


## LangGraph - Building Cyclic Applications with LangChain


## Dependencies


In [None]:
!pip install -qU langchain langchain_openai langgraph arxiv duckduckgo-search

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m812.8/812.8 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.4/52.4 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m13.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m274.6/274.6 kB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.9/86.9 kB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m266.9/266.9 kB[0m [31m12.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m20.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.1/81.1 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setu

## Environment Variables


In [None]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

OpenAI API Key:··········


In [None]:
from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"AIE1 - LangGraph - {uuid4().hex[0:8]}"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangSmith API Key: ")

LangSmith API Key: ··········


## Creating our Tool Belt



In [None]:
from langchain_community.tools.ddg_search import DuckDuckGoSearchRun
from langchain_community.tools.arxiv.tool import ArxivQueryRun

tool_belt = [
    ArxivQueryRun(),
    DuckDuckGoSearchRun()
]

### Actioning with Tools


In [None]:
from langgraph.prebuilt import ToolExecutor

tool_executor = ToolExecutor(tool_belt)

### Model


In [None]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(temperature=0)

In [None]:
from langchain_core.utils.function_calling import convert_to_openai_function

functions = [convert_to_openai_function(t) for t in tool_belt]
model = model.bind_functions(functions)

#### ❓ Question #1:

How does the model determine which tool to use?

#### ✅ Answer:

The model assesses the context of the users' input to understand the intent and specific requirements of the request.

## Putting the State in Stateful


1. We initialize our state object:
  - `{"messages" : []}`
2. User submits a query to our application.
  - New State: `HumanMessage(#1)`
  - `{"messages" : [HumanMessage(#1)}`
3. We pass our state object to an Agent node which is able to read the current state. It will use the last `HumanMessage` as input. It gets some kind of output which it will add to the state.
  - New State: `AgentMessage(#1, additional_kwargs {"function_call" : "WebSearchTool"})`
  - `{"messages" : [HumanMessage(#1), AgentMessage(#1, ...)]}`
4. We pass our state object to a "conditional node" (more on this later) which reads the last state to determine if we need to use a tool - which it can determine properly because of our provided object!

In [None]:
from typing import TypedDict, Annotated, Sequence
import operator
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
  messages: Annotated[Sequence[BaseMessage], operator.add]

## It's Graphing Time!

![image](https://i.imgur.com/2NFLnIc.png)


In [None]:
from langgraph.prebuilt import ToolInvocation
import json
from langchain_core.messages import FunctionMessage

def call_model(state):
  messages = state["messages"]
  response = model.invoke(messages)
  return {"messages" : [response]}

def call_tool(state):
  last_message = state["messages"][-1]

  action = ToolInvocation(
      tool=last_message.additional_kwargs["function_call"]["name"],
      tool_input=json.loads(
          last_message.additional_kwargs["function_call"]["arguments"]
      )
  )

  response = tool_executor.invoke(action)

  function_message = FunctionMessage(content=str(response), name=action.tool)

  return {"messages" : [function_message]}

In [None]:
from langgraph.graph import StateGraph, END

workflow = StateGraph(AgentState)

workflow.add_node("agent", call_model)
workflow.add_node("action", call_tool)


![image](https://i.imgur.com/md7inqG.png)

In [None]:
workflow.set_entry_point("agent")

![image](https://i.imgur.com/wNixpJe.png)

In [None]:
def should_continue(state):
  last_message = state["messages"][-1]

  if "function_call" not in last_message.additional_kwargs:
    return "end"

  return "continue"

workflow.add_conditional_edges(
    "agent",
    should_continue,
    {
        "continue" : "action",
        "end" : END
    }
)



![image](https://i.imgur.com/8ZNwKI5.png)

In [None]:
workflow.add_edge("action", "agent")


![image](https://i.imgur.com/NWO7usO.png)

In [None]:
app = workflow.compile()

#### ❓ Question #2:

Is there any specific limit to how many times we can cycle?

If not, how could we impose a limit to the number of cycles?

#### Answer:

Cycling is infinite. However, we can use for loops to define a number of fixed iterations. We can use a conditional check within the loop to exit once a certain condition is met, or store the cycle limit in an external configuration file or database. We can use environment variables to set the cycle limit or implement monitoring of system resources. We can set a maximum time limit for the loop to run, or allow the user to define the limit either at the start of the process or dynamically adjust during runtime.

In [None]:
from langchain_core.messages import HumanMessage

inputs = {"messages" : [HumanMessage(content="What is RAG in the context of Large Language Models? When did it break onto the scene?")]}

app.invoke(inputs)

{'messages': [HumanMessage(content='What is RAG in the context of Large Language Models? When did it break onto the scene?'),
  AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"query":"RAG in the context of Large Language Models"}', 'name': 'duckduckgo_search'}}, response_metadata={'token_usage': {'completion_tokens': 25, 'prompt_tokens': 171, 'total_tokens': 196}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_b28b39ffa8', 'finish_reason': 'function_call', 'logprobs': None}),
  FunctionMessage(content='Large language models (LLMs) are incredibly powerful tools for processing and generating text. However, they inherently struggle to understand the broader context of information, especially when dealing with lengthy conversations or complex tasks. This is where large context windows and Retrieval-Augmented Generation (RAG) come into play. Retrieval Augmented Generation. Retrieval Augmented Generation (RAG) is based on research produced by the Meta tea



1. Our state object was populated with our request
2. The state object was passed into our entry point (agent node) and the agent node added an `AIMessage` to the state object and passed it along the conditional edge
3. The conditional edge received the state object, found the "function_call" `additional_kwarg`, and sent the state object to the action node
4. The action node added the response from the OpenAI function calling endpoint to the state object and passed it along the edge to the agent node
5. The agent node added a response to the state object and passed it along the conditional edge
6. The conditional edge received the state object, could not find the "function_call" `additional_kwarg` and passed the state object to END where we see it output in the cell above!


In [None]:
inputs = {"messages" : [HumanMessage(content="What is QLoRA in Machine Learning? Are their any papers that could help me understand? Once you have that information, can you look up the bio of the first author on the QLoRA paper?")]}

app.invoke(inputs)

{'messages': [HumanMessage(content='What is QLoRA in Machine Learning? Are their any papers that could help me understand? Once you have that information, can you look up the bio of the first author on the QLoRA paper?'),
  AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"query":"QLoRA in Machine Learning"}', 'name': 'arxiv'}}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 193, 'total_tokens': 212}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_b28b39ffa8', 'finish_reason': 'function_call', 'logprobs': None}),
  FunctionMessage(content="Published: 2023-05-23\nTitle: QLoRA: Efficient Finetuning of Quantized LLMs\nAuthors: Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer\nSummary: We present QLoRA, an efficient finetuning approach that reduces memory usage\nenough to finetune a 65B parameter model on a single 48GB GPU while preserving\nfull 16-bit finetuning task performance. QLoRA backpropagates gradients 

####🏗️ Activity #2:

Please write out the steps the agent took to arrive at the correct answer.

#### Answer:

Here's a breakdown of the steps:

1. Understand the user's request
2. Initial query for QLoRA information
3. Fetched information for Arxiv
4. Identified the first author
5. Conducted  a secondary query for author information
6. Retrieve author's background and contributions
7. Compiled and presented information

### Pre-processing for LangSmith

In [None]:
def convert_inputs(input_object):
  return {"messages" : [HumanMessage(content=input_object["question"])]}

def parse_output(input_state):
  return input_state["messages"][-1].content

agent_chain = convert_inputs | app | parse_output

In [None]:
agent_chain.invoke({"question" : "What is RAG?"})

"RAG stands for Retrieval-Augmented Generation. It is an AI framework for improving the quality of responses generated by Large Language Models (LLMs) by grounding the model on external sources of knowledge to supplement the model's internal representation of information. RAG works with pretrained LLMs and external data to generate more accurate and reliable responses. It helps in checking claims, clearing up ambiguity, and reducing hallucination in queries. RAG is continuously evolving and is categorized into three stages: Naive RAG, Advanced RAG, and Modular RAG."

## Creating An Evaluation Dataset


In [None]:
questions = [
    "What optimizer is used in QLoRA?",
    "What data type was created in the QLoRA paper?",
    "What is a Retrieval Augmented Generation system?",
    "Who authored the QLoRA paper?",
    "What is the most popular deep learning framework?",
    "What significant improvements does the LoRA system make?"
]

answers = [
    {"must_mention" : ["paged", "optimizer"]},
    {"must_mention" : ["NF4", "NormalFloat"]},
    {"must_mention" : ["ground", "context"]},
    {"must_mention" : ["Tim", "Dettmers"]},
    {"must_mention" : ["PyTorch", "TensorFlow"]},
    {"must_mention" : ["reduce", "parameters"]},
]

In [None]:
from langsmith import Client

client = Client()
dataset_name = f"Retrieval Augmented Generation - Evaluation Dataset - {uuid4().hex[0:8]}"

dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="Questions about the QLoRA Paper to Evaluate RAG over the same paper."
)

client.create_examples(
    inputs=[{"question" : q} for q in questions],
    outputs=answers,
    dataset_id=dataset.id,
)

#### ❓ Question #3:

How are the correct answers associated with the questions?

> NOTE: Feel free to indicate if this is problematic or not

#### Answer:

While the method provided is straightforward and efficient for associating questions with their answers based on list positions, it assumes a strict one-to-one correspondence and order matching between the two lists. For applications where accuracy and data integrity are paramount, additional validation, error handling, and more robust data structuring mechanisms should be considered to ensure the correct associations are maintained and clearly understood.

##Adding Evaluators


In [None]:
from langsmith.evaluation import EvaluationResult, run_evaluator

@run_evaluator
def must_mention(run, example) -> EvaluationResult:
    prediction = run.outputs.get("output") or ""
    required = example.outputs.get("must_mention") or []
    score = all(phrase in prediction for phrase in required)
    return EvaluationResult(key="must_mention", score=score)

#### ❓ Question #4:

What are some ways you could improve this metric as-is?

> NOTE: Alternatively you can suggest where gaps exist in this method.

#### Answer:

For a more sophisticated evaluation, consider the context in which phrases are mentioned. The method strictly checks for exact matches of phrases in the prediction. In some contexts, allowing partial matches or using fuzzy matching could make the evaluation more flexible and accurate, especially if typos or slight variations in wording are expected. Also, introducing a more granular scoring system, such as the proportion of required phrases found in the prediction, could provide more detailed feedback on the model's performance.

In [None]:
from langchain.smith import RunEvalConfig, run_on_dataset

eval_config = RunEvalConfig(
    custom_evaluators=[must_mention],
    evaluators=[
        "criteria",
        "cot_qa",
    ],
)

In [None]:
client.run_on_dataset(
    dataset_name=dataset_name,
    llm_or_chain_factory=agent_chain,
    evaluation=eval_config,
    verbose=True,
    project_name=f"RAG Pipeline - Evaluation - {uuid4().hex[0:8]}",
    project_metadata={"version": "1.0.0"},
)

View the evaluation results for project 'RAG Pipeline - Evaluation - c6779e19' at:
https://smith.langchain.com/o/151fde86-5fb9-523a-b8a7-240c9ecb04b7/datasets/5c6566e7-4375-4d6f-b051-59bc3f5665f8/compare?selectedSessions=44120d62-de11-4620-b350-39668d73156e

View all tests for Dataset Retrieval Augmented Generation - Evaluation Dataset - b621fd0a at:
https://smith.langchain.com/o/151fde86-5fb9-523a-b8a7-240c9ecb04b7/datasets/5c6566e7-4375-4d6f-b051-59bc3f5665f8
[------------------------------------------------->] 6/6
 Experiment Results:
        feedback.helpfulness  feedback.COT Contextual Accuracy feedback.must_mention error  execution_time                                run_id
count                   6.00                              5.00                     6     0            6.00                                     6
unique                   NaN                               NaN                     2     0             NaN                                     6
top                  

{'project_name': 'RAG Pipeline - Evaluation - c6779e19',
 'results': {'af6f0887-27d0-4e48-ab77-960fe960f5e9': {'input': {'question': 'What optimizer is used in QLoRA?'},
   'feedback': [EvaluationResult(key='helpfulness', score=1, value='Y', comment='The criterion for this task is "helpfulness". \n\nThe submission provides information about QLoRA, including the fact that it uses 4-bit quantization to compress a pretrained language model and that it uses the Transformers library for handling pre-trained language models and fine-tuning. \n\nHowever, the question asked specifically about the optimizer used in QLoRA. The submission does not provide a direct answer to this question. Instead, it states that the optimizer is not explicitly mentioned in the search results. \n\nWhile the additional information provided might be useful in a broader context, it does not directly answer the question asked. Therefore, it may not be considered helpful in this specific context.\n\nY', correction=None