# LangGraph and LangSmith - Agentic RAG Powered by LangChain

In the following notebook we'll complete the following tasks!

## Part 1: LangGraph - Building Cyclic Applications with LangChain

LangGraph is a tool that leverages LangChain Expression Language to build coordinated multi-actor and stateful applications that includes cyclic behaviour.

### Why Cycles?

In essence, we can think of a cycle in our graph as a more robust and customizable loop. It allows us to keep our application agent-forward while still giving the powerful functionality of traditional loops.

Due to the inclusion of cycles over loops, we can also compose rather complex flows through our graph in a much more readable and natural fashion. Effetively allowing us to recreate appliation flowcharts in code in an almost 1-to-1 fashion.

### Why LangGraph?

Beyond the agent-forward approach - we can easily compose and combine traditional "DAG" (directed acyclic graph) chains with powerful cyclic behaviour due to the tight integration with LCEL. This means it's a natural extension to LangChain's core offerings!

## Task 1:  Dependencies

We'll first install all our required libraries.

In [None]:
!pip install -qU langchain langchain_openai langchain_huggingface langchain-community langgraph arxiv duckduckgo_search==5.3.1b1

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m983.6/983.6 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.1/46.1 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m34.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.6/91.6 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m357.9/357.9 kB[0m [31m29.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m328.5/328.5 kB[0m [31m21.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━

In [None]:
!pip install -qU pymupdf qdrant-client langchain_qdrant

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m254.1/254.1 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m309.3/309.3 kB[0m [31m30.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.7/5.7 MB[0m [31m40.0 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf-cu12 24.4.1 requires protobuf<5,>=3.20, but you have protobuf 5.27.2 which is incompatible.
google-ai-generativelanguage 0.6.4 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 5.27.2 which is incompatible.
google-api-core 2.16.2 requires protob

## Task 2: Environment Variables

We'll want to set both our OpenAI API key and our LangSmith environment variables.

In [None]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

OpenAI API Key:··········


In [None]:
from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"AIE3 - LangGraph - {uuid4().hex[0:8]}"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangSmith API Key: ")

LangSmith API Key: ··········


### Open Source RAG Tool

We'll leverage the previous tools we created to provide an Open Source RAG chain to be used as a tool.

In [None]:
from langchain.document_loaders import PyMuPDFLoader

documents = PyMuPDFLoader("https://www.courthousenews.com/wp-content/uploads/2024/02/musk-v-altman-openai-complaint-sf.pdf").load()

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 50
)

eval_documents = text_splitter.split_documents(documents)

In [None]:
HF_EMBED_URL = "https://eb08ghvx9kufhyjj.us-east-1.aws.endpoints.huggingface.cloud"

In [None]:
os.environ["HF_TOKEN"] = getpass.getpass("Hugging Face Token: ")

Hugging Face Token: ··········


In [None]:
from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddings

embedding_model = HuggingFaceEndpointEmbeddings(
    model=HF_EMBED_URL,
    task="feature-extraction",
    huggingfacehub_api_token=os.environ["HF_TOKEN"],
)

In [None]:
from langchain_community.vectorstores import Qdrant

for i in range(0, len(documents), 32):
  if i == 0:
    vectorstore = Qdrant.from_documents(
        eval_documents[i:i+32],
        embedding_model,
        location=":memory:",
        collection_name="Elon's Complaint")
    continue
  vectorstore.add_documents(eval_documents[i:i+32])

In [None]:
retriever = vectorstore.as_retriever()

In [None]:
from langchain.prompts import ChatPromptTemplate

RAG_PROMPT = """\
Given a provided context and question, you must answer the question based only on context.

If you cannot answer the question based on the context - you must say "I don't know".

Context: {context}
Question: {question}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)

In [None]:
HF_LLM_URL = "https://meqlao8jlyg1d5w2.us-east-1.aws.endpoints.huggingface.cloud" + "/v1/"

In [None]:
from langchain_openai import ChatOpenAI

hf_llm = ChatOpenAI(
    model="tgi",
    openai_api_base=HF_LLM_URL,
    openai_api_key=os.environ["HF_TOKEN"]
)

In [None]:
from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain.schema import StrOutputParser

rag_chain = (
    {"context": itemgetter("question") | retriever, "question": itemgetter("question")}
    | rag_prompt | hf_llm | StrOutputParser()
)

In [None]:
from typing import Annotated, List, Tuple, Union
from langchain_core.tools import tool

@tool
def retrieve_information(
    query: Annotated[str, "query to ask the retrieve information tool"]
    ):
  """Used to answer questions about the Elon Musk complaint against OpenAI."""
  return rag_chain.invoke({"question" : query})

## Task 3: Creating our Tool Belt

As is usually the case, we'll want to equip our agent with a toolbelt to help answer questions and add external knowledge.

There's a tonne of tools in the [LangChain Community Repo](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools) but we'll stick to a couple just so we can observe the cyclic nature of LangGraph in action!

We'll leverage:

- [Duck Duck Go Web Search](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/ddg_search)
- [Arxiv](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/arxiv)

In [None]:
from langchain_community.tools.ddg_search import DuckDuckGoSearchRun
from langchain_community.tools.arxiv.tool import ArxivQueryRun

tool_belt = [
    DuckDuckGoSearchRun(),
    ArxivQueryRun(),
    retrieve_information,
]

### Actioning with Tools

Now that we've created our tool belt - we need to create a process that will let us leverage them when we need them.

We'll use the built-in [`ToolExecutor`](https://github.com/langchain-ai/langgraph/blob/fab950acfbf5fea46c9313dca34ee2ae01f1728b/libs/langgraph/langgraph/prebuilt/tool_executor.py#L50) to do so.

In [None]:
from langgraph.prebuilt import ToolExecutor

tool_executor = ToolExecutor(tool_belt)

### Model

Now we can set-up our model! We'll leverage the familiar OpenAI model suite for this example - but it's not *necessary* to use with LangGraph. LangGraph supports all models - though you might not find success with smaller models - as such, they recommend you stick with:

- OpenAI's GPT-3.5 and GPT-4
- Anthropic's Claude
- Google's Gemini

> NOTE: Because we're leveraging the OpenAI function calling API - we'll need to use OpenAI *for this specific example* (or any other service that exposes an OpenAI-style function calling API.

In [None]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model_name="gpt-4o",
    temperature=0.1,
    max_tokens=1024
)

Now that we have our model set-up, let's "put on the tool belt", which is to say: We'll bind our LangChain formatted tools to the model in an OpenAI function calling format.

In [None]:
from langchain_core.utils.function_calling import convert_to_openai_function

functions = [convert_to_openai_function(t) for t in tool_belt]
model = model.bind_functions(functions)

## Task 4: Putting the State in Stateful

Earlier we used this phrasing:

`coordinated multi-actor and stateful applications`

So what does that "stateful" mean?

To put it simply - we want to have some kind of object which we can pass around our application that holds information about what the current situation (state) is. Since our system will be constructed of many parts moving in a coordinated fashion - we want to be able to ensure we have some commonly understood idea of that state.

LangGraph leverages a `StatefulGraph` which uses an `AgentState` object to pass information between the various nodes of the graph.

There are more options than what we'll see below - but this `AgentState` object is one that is stored in a `TypedDict` with the key `messages` and the value is a `Sequence` of `BaseMessages` that will be appended to whenever the state changes.

Let's think about a simple example to help understand exactly what this means (we'll simplify a great deal to try and clearly communicate what state is doing):

1. We initialize our state object:
  - `{"messages" : []}`
2. Our user submits a query to our application.
  - New State: `HumanMessage(#1)`
  - `{"messages" : [HumanMessage(#1)}`
3. We pass our state object to an Agent node which is able to read the current state. It will use the last `HumanMessage` as input. It gets some kind of output which it will add to the state.
  - New State: `AgentMessage(#1, additional_kwargs {"function_call" : "WebSearchTool"})`
  - `{"messages" : [HumanMessage(#1), AgentMessage(#1, ...)]}`
4. We pass our state object to a "conditional node" (more on this later) which reads the last state to determine if we need to use a tool - which it can determine properly because of our provided object!

In [None]:
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
import operator
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
  messages: Annotated[list, add_messages]

## Task 5: It's Graphing Time!

Now that we have state, and we have tools, and we have an LLM - we can finally start making our graph!

Let's take a second to refresh ourselves about what a graph is in this context.

Graphs, also called networks in some circles, are a collection of connected objects.

The objects in question are typically called nodes, or vertices, and the connections are called edges.

Let's look at a simple graph.

![image](https://i.imgur.com/2NFLnIc.png)

Here, we're using the coloured circles to represent the nodes and the yellow lines to represent the edges. In this case, we're looking at a fully connected graph - where each node is connected by an edge to each other node.

If we were to think about nodes in the context of LangGraph - we would think of a function, or an LCEL runnable.

If we were to think about edges in the context of LangGraph - we might think of them as "paths to take" or "where to pass our state object next".

Let's create some nodes and expand on our diagram.

> NOTE: Due to the tight integration with LCEL - we can comfortably create our nodes in an async fashion!

In [None]:
from langgraph.prebuilt import ToolInvocation
import json
from langchain_core.messages import FunctionMessage

def call_model(state):
  messages = state["messages"]
  response = model.invoke(messages)
  print(response)
  return {"messages" : [response]}

def call_tool(state):
  last_message = state["messages"][-1]

  action = ToolInvocation(
      tool=last_message.additional_kwargs["function_call"]["name"],
      tool_input=json.loads(
          last_message.additional_kwargs["function_call"]["arguments"]
      )
  )

  response = tool_executor.invoke(action)

  function_message = FunctionMessage(content=str(response), name=action.tool)

  return {"messages" : [function_message]}

Now we have two total nodes. We have:

- `call_model` is a node that will...well...call the model
- `call_tool` is a node which will call a tool

Let's start adding nodes! We'll update our diagram along the way to keep track of what this looks like!


In [None]:
from langgraph.graph import StateGraph, END

workflow = StateGraph(AgentState)

workflow.add_node("agent", call_model)
workflow.add_node("action", call_tool)

Let's look at what we have so far:

![image](https://i.imgur.com/md7inqG.png)

Next, we'll add our entrypoint. All our entrypoint does is indicate which node is called first.

In [None]:
workflow.set_entry_point("agent")

![image](https://i.imgur.com/wNixpJe.png)

Now we want to build a "conditional edge" which will use the output state of a node to determine which path to follow.

We can help conceptualize this by thinking of our conditional edge as a conditional in a flowchart!

Notice how our function simply checks if there is a "function_call" kwarg present.

Then we create an edge where the origin node is our agent node and our destination node is *either* the action node or the END (finish the graph).

It's important to highlight that the dictionary passed in as the third parameter (the mapping) should be created with the possible outputs of our conditional function in mind. In this case `should_continue` outputs either `"end"` or `"continue"` which are subsequently mapped to the action node or the END node.

In [None]:
def should_continue(state):
  last_message = state["messages"][-1]

  print(last_message)

  if "function_call" not in last_message.additional_kwargs:
    return "end"

  return "continue"

workflow.add_conditional_edges(
    "agent",
    should_continue,
    {
        "continue" : "action",
        "end" : END
    }
)

Let's visualize what this looks like.

![image](https://i.imgur.com/8ZNwKI5.png)

Finally, we can add our last edge which will connect our action node to our agent node. This is because we *always* want our action node (which is used to call our tools) to return its output to our agent!

In [None]:
workflow.add_edge("action", "agent")

Let's look at the final visualization.

![image](https://i.imgur.com/NWO7usO.png)

All that's left to do now is to compile our workflow - and we're off!

In [None]:
app = workflow.compile()

#### Helper Function to print messages

In [None]:
def print_messages(messages):
  next_is_tool = False
  initial_query = True
  for message in messages["messages"]:
    if "function_call" in message.additional_kwargs:
      print()
      print(f'Tool Call - Name: {message.additional_kwargs["function_call"]["name"]} + Query: {message.additional_kwargs["function_call"]["arguments"]}')
      next_is_tool = True
      continue
    if next_is_tool:
      print(f"Tool Response: {message.content}")
      next_is_tool = False
      continue
    if initial_query:
      print(f"Initial Query: {message.content}")
      print()
      initial_query = False
      continue
    print()
    print(f"Agent Response: {message.content}")

## Using Our Graph

Now that we've created and compiled our graph - we can call it *just as we'd call any other* `Runnable`!

Let's try out a few examples to see how it fairs:

In [None]:
from langchain_core.messages import HumanMessage

inputs = {"messages" : [HumanMessage(content="What is the Elon Musk complaint against OpenAI about?")]}

messages = app.invoke(inputs)

print_messages(messages)

content='' additional_kwargs={'function_call': {'arguments': '{"query":"Elon Musk complaint against OpenAI"}', 'name': 'retrieve_information'}} response_metadata={'token_usage': {'completion_tokens': 20, 'prompt_tokens': 193, 'total_tokens': 213}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_d33f7b429e', 'finish_reason': 'function_call', 'logprobs': None} id='run-da556236-0c22-4f06-87d8-8611a366530b-0' usage_metadata={'input_tokens': 193, 'output_tokens': 20, 'total_tokens': 213}
content='' additional_kwargs={'function_call': {'arguments': '{"query":"Elon Musk complaint against OpenAI"}', 'name': 'retrieve_information'}} response_metadata={'token_usage': {'completion_tokens': 20, 'prompt_tokens': 193, 'total_tokens': 213}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_d33f7b429e', 'finish_reason': 'function_call', 'logprobs': None} id='run-da556236-0c22-4f06-87d8-8611a366530b-0' usage_metadata={'input_tokens': 193, 'output_tokens': 20, 'total_tokens': 213

Let's look at what happened:

1. Our state object was populated with our request
2. The state object was passed into our entry point (agent node) and the agent node added an `AIMessage` to the state object and passed it along the conditional edge
3. The conditional edge received the state object, found the "function_call" `additional_kwarg`, and sent the state object to the action node
4. The action node added the response from the OpenAI function calling endpoint to the state object and passed it along the edge to the agent node
5. The agent node added a response to the state object and passed it along the conditional edge
6. The conditional edge received the state object, could not find the "function_call" `additional_kwarg` and passed the state object to END where we see it output in the cell above!

Now let's look at an example that shows a multiple tool usage - all with the same flow!

In [None]:
inputs = {"messages" : [HumanMessage(content="Did Elon Musk ever beef with Mark Zuckerberg? Was there ever a legal complaint filed by Elon Musk? Use all your tools to answer these questions.")]}

messages = app.invoke(inputs)

print_messages(messages)

content='' additional_kwargs={'function_call': {'arguments': '{"query":"Elon Musk Mark Zuckerberg feud"}', 'name': 'duckduckgo_search'}} response_metadata={'token_usage': {'completion_tokens': 22, 'prompt_tokens': 211, 'total_tokens': 233}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_d33f7b429e', 'finish_reason': 'function_call', 'logprobs': None} id='run-114a71b7-beca-4496-916a-381367450882-0' usage_metadata={'input_tokens': 211, 'output_tokens': 22, 'total_tokens': 233}
content='' additional_kwargs={'function_call': {'arguments': '{"query":"Elon Musk Mark Zuckerberg feud"}', 'name': 'duckduckgo_search'}} response_metadata={'token_usage': {'completion_tokens': 22, 'prompt_tokens': 211, 'total_tokens': 233}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_d33f7b429e', 'finish_reason': 'function_call', 'logprobs': None} id='run-114a71b7-beca-4496-916a-381367450882-0' usage_metadata={'input_tokens': 211, 'output_tokens': 22, 'total_tokens': 233}
content="Elo

## Part 1: LangSmith Evaluator

### Pre-processing for LangSmith

To do a little bit more preprocessing, let's wrap our LangGraph agent in a simple chain.

In [None]:
def convert_inputs(input_object):
  return {"messages" : [HumanMessage(content=input_object["question"])]}

def parse_output(input_state):
  return input_state["messages"][-1].content

agent_chain = convert_inputs | app | parse_output

In [None]:
agent_chain.invoke({"question" : "What is QLoRA?"})

content='QLoRA, or Quantized Low-Rank Adaptation, is a technique used in the field of machine learning, particularly in the context of fine-tuning large language models. The primary goal of QLoRA is to reduce the computational and memory requirements associated with fine-tuning large models, making it more efficient and accessible.\n\nHere are some key points about QLoRA:\n\n1. **Quantization**: QLoRA involves quantizing the weights of the neural network. Quantization is the process of reducing the precision of the weights, typically from 32-bit floating-point numbers to lower precision formats like 8-bit integers. This reduces the memory footprint and computational load.\n\n2. **Low-Rank Adaptation**: In addition to quantization, QLoRA employs low-rank adaptation techniques. This involves approximating the weight matrices of the neural network with low-rank matrices. By doing so, the number of parameters that need to be fine-tuned is significantly reduced.\n\n3. **Efficiency**: By com

'QLoRA, or Quantized Low-Rank Adaptation, is a technique used in the field of machine learning, particularly in the context of fine-tuning large language models. The primary goal of QLoRA is to reduce the computational and memory requirements associated with fine-tuning large models, making it more efficient and accessible.\n\nHere are some key points about QLoRA:\n\n1. **Quantization**: QLoRA involves quantizing the weights of the neural network. Quantization is the process of reducing the precision of the weights, typically from 32-bit floating-point numbers to lower precision formats like 8-bit integers. This reduces the memory footprint and computational load.\n\n2. **Low-Rank Adaptation**: In addition to quantization, QLoRA employs low-rank adaptation techniques. This involves approximating the weight matrices of the neural network with low-rank matrices. By doing so, the number of parameters that need to be fine-tuned is significantly reduced.\n\n3. **Efficiency**: By combining q

### Task 1: Creating An Evaluation Dataset

Just as we saw last week, we'll want to create a dataset to test our Agent's ability to answer questions.

In order to do this - we'll want to provide some questions and some answers. Let's look at how we can create such a dataset below.

####🏗️ Activity #3:

Please create a dataset in the above format with at least 5 questions.

In [None]:
questions = [
    "What optimizer is used in QLoRA?",
    "What data type was created in the QLoRA paper?",
    "What is a Retrieval Augmented Generation system?",
    "Who authored the QLoRA paper?",
    "What is the most popular deep learning framework?",
    "What significant improvements does the LoRA system make?"
]

answers = [
    {"must_mention" : ["paged", "optimizer"]},
    {"must_mention" : ["NF4", "NormalFloat"]},
    {"must_mention" : ["ground", "context"]},
    {"must_mention" : ["Tim", "Dettmers"]},
    {"must_mention" : ["PyTorch", "TensorFlow"]},
    {"must_mention" : ["reduce", "parameters"]},
]

Now we can add our dataset to our LangSmith project using the following code which we saw last Thursday!

In [None]:
from langsmith import Client

client = Client()
dataset_name = f"Retrieval Augmented Generation - Evaluation Dataset - {uuid4().hex[0:8]}"

dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="Questions about the QLoRA Paper to Evaluate RAG over the same paper."
)

client.create_examples(
    inputs=[{"question" : q} for q in questions],
    outputs=answers,
    dataset_id=dataset.id,
)

### Task 2: Adding Evaluators

Now we can add a custom evaluator to see if our responses contain the expected information.

We'll be using a fairly naive exact-match process to determine if our response contains specific strings.

In [None]:
from langsmith.evaluation import EvaluationResult, run_evaluator

@run_evaluator
def must_mention(run, example) -> EvaluationResult:
    prediction = run.outputs.get("output") or ""
    required = example.outputs.get("must_mention") or []
    score = all(phrase in prediction for phrase in required)
    return EvaluationResult(key="must_mention", score=score)

Now that we have created our custom evaluator - let's initialize our `RunEvalConfig` with it, and a few others:

- `"criteria"` includes the default criteria which, in this case, means "helpfulness"
- `"cot_qa"` includes a criteria that bases whether or not the answer is correct by utilizing a Chain of Thought prompt and the provided context to determine if the response is correct or not.

In [None]:
from langchain.smith import RunEvalConfig, run_on_dataset

eval_config = RunEvalConfig(
    custom_evaluators=[must_mention],
    evaluators=[
        "criteria",
        "cot_qa",
    ],
)

Task 3: Evaluating

All that is left to do is evaluate our agent's response!

In [None]:
client.run_on_dataset(
    dataset_name=dataset_name,
    llm_or_chain_factory=agent_chain,
    evaluation=eval_config,
    verbose=True,
    project_name=f"RAG Pipeline - Evaluation - {uuid4().hex[0:8]}",
    project_metadata={"version": "1.0.0"},
)

View the evaluation results for project 'RAG Pipeline - Evaluation - 34472ee0' at:
https://smith.langchain.com/o/340cd80b-3296-5752-9a9e-58582118073a/datasets/3f5732ac-645c-4180-b948-1429ee5f2878/compare?selectedSessions=b28f8953-8bd1-4c02-9145-ebfe310023c1

View all tests for Dataset Retrieval Augmented Generation - Evaluation Dataset - cd4b1eb9 at:
https://smith.langchain.com/o/340cd80b-3296-5752-9a9e-58582118073a/datasets/3f5732ac-645c-4180-b948-1429ee5f2878
[>                                                 ] 0/6content='' additional_kwargs={'function_call': {'arguments': '{"query":"QLoRA data type"}', 'name': 'arxiv'}} response_metadata={'token_usage': {'completion_tokens': 18, 'prompt_tokens': 194, 'total_tokens': 212}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_d33f7b429e', 'finish_reason': 'function_call', 'logprobs': None} id='run-1d3c08b0-502d-43d5-b77d-03151e53f656-0' usage_metadata={'input_tokens': 194, 'output_tokens': 18, 'total_tokens': 212}
content='' 

{'project_name': 'RAG Pipeline - Evaluation - 34472ee0',
 'results': {'6ea33375-e863-473f-bc37-ff1d476f8967': {'input': {'question': 'What optimizer is used in QLoRA?'},
   'feedback': [EvaluationResult(key='helpfulness', score=1, value='Y', comment='The criterion for this task is "helpfulness". \n\n1. The submission provides a direct answer to the question, stating that the AdamW optimizer is used in QLoRA. This is helpful as it directly addresses the query.\n\n2. The submission goes beyond just naming the optimizer, it also provides additional information about the AdamW optimizer, explaining that it is a variant of the Adam optimizer and includes weight decay for regularization. This is insightful as it provides more context about the optimizer.\n\n3. The submission also explains why the AdamW optimizer is used in QLoRA, stating that it is well-suited for training large language models and is commonly used in fine-tuning tasks, including those involving quantized models like QLoRA. 

## Part 2: LangGraph with Helpfulness:

### Task 3: Adding Helpfulness Check and "Loop" Limits

Now that we've done evaluation - let's see if we can add an extra step where we review the content we've generated to confirm if it fully answers the user's query!

We're going to make a few key adjustments to account for this:

1. We're going to add an artificial limit on how many "loops" the agent can go through - this will help us to avoid the potential situation where we never exit the loop.
2. We'll add a custom node and conditional edge to determine if the response was helpful enough.

First, let's define our state again - we can check the length of the state object, so we don't need additional state for this.

In [None]:
class AgentState(TypedDict):
  messages: Annotated[list, add_messages]

We're going to add a custom helpfulness check here!

In [None]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

def check_helpfulness(state):
  initial_query = state["messages"][0]
  final_response = state["messages"][-1]

  if len(state["messages"]) > 10:
    return "END"

  prompt_template = """\
  Given an initial query and a final response, determine if the final response is extremely helpful or not. Please indicate helpfulness with a 'Y' and unhelpfulness as an 'N'.

  Initial Query:
  {initial_query}

  Final Response:
  {final_response}"""

  prompt_template = PromptTemplate.from_template(prompt_template)

  helpfulness_check_model = ChatOpenAI(model="gpt-4")

  helpfulness_chain = prompt_template | helpfulness_check_model | StrOutputParser()

  helpfulness_response = helpfulness_chain.invoke({"initial_query" : initial_query.content, "final_response" : final_response.content})

  if "Y" in helpfulness_response:
    print("Helpful!")
    return "end"
  else:
    print("Not helpful!")
    return "continue"

def dummy_node(state):
  return

Now we can set our graph up! This process will be almost entirely the same - with the inclusion of one additional node/conditional edge!

####🏗️ Activity #5:

Please write markdown for the following cells to explain what each is doing.

##### YOUR MARKDOWN HERE

In [None]:
graph_with_helpfulness_check = StateGraph(AgentState)

graph_with_helpfulness_check.add_node("agent", call_model)
graph_with_helpfulness_check.add_node("action", call_tool)
graph_with_helpfulness_check.add_node("passthrough", dummy_node)

##### YOUR MARKDOWN HERE

In [None]:
graph_with_helpfulness_check.set_entry_point("agent")

##### YOUR MARKDOWN HERE

In [None]:
graph_with_helpfulness_check.add_conditional_edges(
    "agent",
    should_continue,
    {
        "continue" : "action",
        "end" : "passthrough"
    }
)

graph_with_helpfulness_check.add_conditional_edges(
    "passthrough",
    check_helpfulness,
    {
        "continue" : "agent",
        "end" : END
    }
)

##### YOUR MARKDOWN HERE

In [None]:
graph_with_helpfulness_check.add_edge("action", "agent")

##### YOUR MARKDOWN HERE

In [None]:
agent_with_helpfulness_check = graph_with_helpfulness_check.compile()

##### YOUR MARKDOWN HERE

In [None]:
inputs = {"messages" : [HumanMessage(content="What state did Elon Musk make a complaint against OpenAI? And did he really claim that OpenAI had achieved AGI?")]}

messages = agent_with_helpfulness_check.invoke(inputs)

print_messages(messages)

content='' additional_kwargs={'function_call': {'arguments': '{"query":"Elon Musk complaint against OpenAI state and AGI claim"}', 'name': 'retrieve_information'}} response_metadata={'token_usage': {'completion_tokens': 25, 'prompt_tokens': 207, 'total_tokens': 232}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_298125635f', 'finish_reason': 'function_call', 'logprobs': None} id='run-70b28434-3d03-41fd-9b8d-4852f4e5ae2d-0' usage_metadata={'input_tokens': 207, 'output_tokens': 25, 'total_tokens': 232}
content='' additional_kwargs={'function_call': {'arguments': '{"query":"Elon Musk complaint against OpenAI state and AGI claim"}', 'name': 'retrieve_information'}} response_metadata={'token_usage': {'completion_tokens': 25, 'prompt_tokens': 207, 'total_tokens': 232}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_298125635f', 'finish_reason': 'function_call', 'logprobs': None} id='run-70b28434-3d03-41fd-9b8d-4852f4e5ae2d-0' usage_metadata={'input_tokens': 207, 

### Task 4: LangGraph for the "Patterns" of GenAI

Let's ask our system about the 4 patterns of Generative AI:

1. Prompt Engineering
2. RAG
3. Fine-tuning
4. Agents

In [None]:
patterns = ["prompt engineering", "RAG", "fine-tuning", "LLM-based agents"]

In [None]:
for pattern in patterns:
  what_is_string = f"What is {pattern} and when did it break onto the scene??"
  inputs = {"messages" : [HumanMessage(content=what_is_string)]}
  messages = agent_with_helpfulness_check.invoke(inputs)
  print_messages(messages)
  print("\n\n")

content="Prompt engineering is a technique used primarily in the field of artificial intelligence (AI) and natural language processing (NLP) to design and refine prompts that elicit desired responses from language models. The goal is to craft inputs (prompts) that guide the model to generate useful, relevant, and accurate outputs. This involves understanding the model's behavior and iteratively adjusting the prompts to improve performance on specific tasks.\n\nPrompt engineering became more prominent with the advent of large-scale language models like OpenAI's GPT-3, which was released in June 2020. These models demonstrated significant capabilities in generating human-like text, answering questions, and performing various language tasks, but their performance heavily depended on how they were prompted. As a result, the practice of prompt engineering emerged as a crucial skill for leveraging the full potential of these models.\n\nThe concept of prompt engineering, however, has roots in