# LangGraph and LangSmith - Agentic RAG Powered by LangChain

In the following notebook we'll complete the following tasks:

- 🤝 Breakout Room #1:
  1. Install required libraries
  2. Set Environment Variables
  3. Creating our Tool Belt
  4. Creating Our State
  5. Creating and Compiling A Graph!

  - 🤝 Breakout Room #2:
  1. Evaluating the LangGraph Application with LangSmith
  2. Adding Helpfulness Check and "Loop" Limits
  3. LangGraph for the "Patterns" of GenAI

# 🤝 Breakout Room #1

## Part 1: LangGraph - Building Cyclic Applications with LangChain

LangGraph is a tool that leverages LangChain Expression Language to build coordinated multi-actor and stateful applications that includes cyclic behaviour.

### Why Cycles?

In essence, we can think of a cycle in our graph as a more robust and customizable loop. It allows us to keep our application agent-forward while still giving the powerful functionality of traditional loops.

Due to the inclusion of cycles over loops, we can also compose rather complex flows through our graph in a much more readable and natural fashion. Effectively allowing us to recreate application flowcharts in code in an almost 1-to-1 fashion.

### Why LangGraph?

Beyond the agent-forward approach - we can easily compose and combine traditional "DAG" (directed acyclic graph) chains with powerful cyclic behaviour due to the tight integration with LCEL. This means it's a natural extension to LangChain's core offerings!

## Task 1:  Dependencies

We'll first install all our required libraries.

In [1]:
!poetry install

[34mInstalling dependencies from lock file[39m

No dependencies to install or update


## Task 2: Environment Variables

We'll want to set both our OpenAI API key and our LangSmith environment variables.

In [2]:
import os
import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("OpenAI API Key: ")

In [3]:
from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_PROJECT"] = f"AIE4 - LangGraph - {uuid4().hex[0:8]}"
os.environ["LANGCHAIN_PROJECT"] = f"LangGraph - Intro"
if "LANGCHAIN_API_KEY" not in os.environ:
    os.environ["LANGCHAIN_API_KEY"] = getpass.getpass('Enter your LangSmith API key: ')

## Task 3: Creating our Tool Belt

As is usually the case, we'll want to equip our agent with a toolbelt to help answer questions and add external knowledge.

There's a tonne of tools in the [LangChain Community Repo](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools) but we'll stick to a couple just so we can observe the cyclic nature of LangGraph in action!

We'll leverage:

- [Duck Duck Go Web Search](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/ddg_search)
- [Arxiv](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/arxiv)

####🏗️ Activity #1:

Please add the tools to use into our toolbelt.

> NOTE: Each tool in our toolbelt should be a method.

In [4]:
from langchain_community.tools.ddg_search import DuckDuckGoSearchRun
from langchain_community.tools.arxiv.tool import ArxivQueryRun
from langchain_core.tools import BaseTool
from typing import List
tool_belt:List[BaseTool] = [
    DuckDuckGoSearchRun(),
    ArxivQueryRun(),
]

alternatively , you can use `load_tools` helper function

In [5]:
from langchain_core.tools import BaseTool
from typing import Sequence,Union,Callable
from langchain.agents import load_tools 
# NOTE: load_tools returns `List[BaseTool]` while `ToolNode` expects
# `Sequence[Union[BaseTool, Callable]]`
tool_belt:Sequence[Union[BaseTool, Callable]] = load_tools([
    "arxiv",
    "ddg-search"
    ],
)

### Model

Now we can set-up our model! We'll leverage the familiar OpenAI model suite for this example - but it's not *necessary* to use with LangGraph. LangGraph supports all models - though you might not find success with smaller models - as such, they recommend you stick with:

- OpenAI's GPT-3.5 and GPT-4
- Anthropic's Claude
- Google's Gemini

> NOTE: Because we're leveraging the OpenAI function calling API - we'll need to use OpenAI *for this specific example* (or any other service that exposes an OpenAI-style function calling API.

In [6]:
from langchain_openai import ChatOpenAI
from langchain_core.language_models.chat_models import BaseChatModel
llm : BaseChatModel = ChatOpenAI(model="gpt-4o", temperature=0)

Now that we have our model set-up, let's "put on the tool belt", which is to say: We'll bind our LangChain formatted tools to the model in an OpenAI function calling format.

In [7]:

from langchain_core.runnables import Runnable
from langchain_core.language_models import LanguageModelInput
from langchain_core.messages import BaseMessage

runnable : Runnable[LanguageModelInput, BaseMessage] = llm.bind_tools(tool_belt)

#### ❓ Question #1:

How does the model determine which tool to use?

#### 🔍Answer #1:


The model determines which tool to use through a few key mechanisms:

1. Function calling capabilities: As mentioned in the OpenAI Developer Forum
   discussion, models like GPT have been fine-tuned on using functions. The
   model analyzes the user's input and determines if calling a particular
   function/tool would be helpful in generating a relevant response [1] [2].

2. Tool descriptions: The tools are described to the model, usually in the
   system message or as part of the function specifications. The model uses
   these descriptions to understand when each tool would be appropriate to use.
   Essentially, in the first API call to the model, the user includes a
   `description` field in objects under `functions` which tells the model what
   each function does.
   [1] [2].

3. Perception of external tool utility: The model makes a judgment on whether an external tool can better satisfy the user's needs than its own knowledge alone. For example, it may decide to use a web search tool for very recent information or a code interpreter for mathematical calculations [1].

4. Training on tool usage patterns: Models are often trained on examples of when different tools are typically used, allowing them to learn appropriate usage patterns [3].

5. Prompt engineering: The way tools are presented to the model in the prompt can influence when and how it chooses to use them [4].



[1]:
      https://community.openai.com/t/how-does-the-assistants-api-select-tools/516221
[2]: https://platform.openai.com/docs/guides/structured-outputs/supported-schemas
[3]: https://www.ibm.com/topics/ai-model
[4]: https://viso.ai/deep-learning/ml-ai-models/


## Task 4: Putting the State in Stateful

Earlier we used this phrasing:

`coordinated multi-actor and stateful applications`

So what does that "stateful" mean?

To put it simply - we want to have some kind of object which we can pass around our application that holds information about what the current situation (state) is. Since our system will be constructed of many parts moving in a coordinated fashion - we want to be able to ensure we have some commonly understood idea of that state.

LangGraph leverages a `StatefulGraph` which uses an `AgentState` object to pass information between the various nodes of the graph.

There are more options than what we'll see below - but this `AgentState` object is one that is stored in a `TypedDict` with the key `messages` and the value is a `Sequence` of `BaseMessages` that will be appended to whenever the state changes.

Let's think about a simple example to help understand exactly what this means (we'll simplify a great deal to try and clearly communicate what state is doing):

1. We initialize our state object:
  - `{"messages" : []}`
2. Our user submits a query to our application.
  - New State: `HumanMessage(#1)`
  - `{"messages" : [HumanMessage(#1)}`
3. We pass our state object to an Agent node which is able to read the current state. It will use the last `HumanMessage` as input. It gets some kind of output which it will add to the state.
  - New State: `AgentMessage(#1, additional_kwargs {"function_call" : "WebSearchTool"})`
  - `{"messages" : [HumanMessage(#1), AgentMessage(#1, ...)]}`
4. We pass our state object to a "conditional node" (more on this later) which reads the last state to determine if we need to use a tool - which it can determine properly because of our provided object!

In [8]:
from typing import TypedDict, Annotated,List
from langgraph.graph.message import add_messages
import operator
from langchain_core.messages import BaseMessage
# NOTE: reference
# https://github.com/langchain-ai/langgraph/blob/a93775413281df9ddf6ba29cc388b2460d94b9af/libs/langgraph/langgraph/graph/state.py#L80
reducer = add_messages
class AgentState(TypedDict):
  # messages: Annotated[list, add_messages]
  messages: Annotated[List[BaseMessage], reducer]


## Task 5: It's Graphing Time!

Now that we have state, and we have tools, and we have an LLM - we can finally start making our graph!

Let's take a second to refresh ourselves about what a graph is in this context.

Graphs, also called networks in some circles, are a collection of connected objects.

The objects in question are typically called nodes, or vertices, and the connections are called edges.

Let's look at a simple graph.

![image](https://i.imgur.com/2NFLnIc.png)

Here, we're using the coloured circles to represent the nodes and the yellow lines to represent the edges. In this case, we're looking at a fully connected graph - where each node is connected by an edge to each other node.

If we were to think about nodes in the context of LangGraph - we would think of a function, or an LCEL runnable.

If we were to think about edges in the context of LangGraph - we might think of them as "paths to take" or "where to pass our state object next".

Let's create some nodes and expand on our diagram.

> NOTE: Due to the tight integration with LCEL - we can comfortably create our nodes in an async fashion!

Let's create our graph

In [9]:
from langgraph.graph import StateGraph

uncompiled_graph:StateGraph = StateGraph(AgentState)

Let's create and add our first node to the graph

In [10]:
from langchain_core.messages.base import BaseMessage
from langchain_core.runnables import Runnable
from typing import List

def call_model(state:AgentState,runnable: Runnable ) -> dict[str, list[BaseMessage]]:
  messages:List[BaseMessage] = state["messages"]
  response:BaseMessage = runnable.invoke(messages)
  return {"messages" : [response]}

uncompiled_graph.add_node(
    node = "agent",
    action = lambda state: call_model(state, runnable),
    metadata = {"purpose": "call LLM" }
)

Let's create and add our second node to the graph

In [11]:
from langgraph.prebuilt import ToolNode

tool_node: ToolNode = ToolNode(
  tools=tool_belt,
  tags=[ "arxiv", "ddg-search"],
  handle_tool_errors = True,
)
uncompiled_graph.add_node("action", tool_node)

Now we have two total nodes. We have:

- `call_model` is a node that will...well...call the model
- `tool_node` is a node which can call a tool

Let's start adding nodes! We'll update our diagram along the way to keep track of what this looks like!


Let's look at what we have so far:

![image](https://i.imgur.com/md7inqG.png)

Next, we'll add our entrypoint. All our entrypoint does is indicate which node is called first.

In [12]:
uncompiled_graph.set_entry_point("agent")

![image](https://i.imgur.com/wNixpJe.png)

Now we want to build a "conditional edge" which will use the output state of a node to determine which path to follow.

We can help conceptualize this by thinking of our conditional edge as a conditional in a flowchart!

Notice how our function simply checks if there is a "function_call" kwarg present.

Then we create an edge where the origin node is our agent node and our destination node is *either* the action node or the END (finish the graph).

It's important to highlight that the dictionary passed in as the third parameter (the mapping) should be created with the possible outputs of our conditional function in mind. In this case `should_continue` outputs either `"end"` or `"continue"` which are subsequently mapped to the action node or the END node.

In [13]:
from langgraph.graph import END
from langgraph.graph import StateGraph

def should_continue(state:StateGraph):
  last_message = state["messages"][-1]
  if last_message.tool_calls:
    return "action"
  return END

uncompiled_graph.add_conditional_edges(
    "agent",
    should_continue
)

Let's visualize what this looks like.

![image](https://i.imgur.com/8ZNwKI5.png)

Finally, we can add our last edge which will connect our action node to our agent node. This is because we *always* want our action node (which is used to call our tools) to return its output to our agent!

In [14]:
uncompiled_graph.add_edge("action", "agent")

Let's look at the final visualization.

![image](https://i.imgur.com/NWO7usO.png)

All that's left to do now is to compile our workflow - and we're off!

In [15]:
compiled_graph = uncompiled_graph.compile()

#### ❓ Question #2:

Is there any specific limit to how many times we can cycle?

If not, how could we impose a limit to the number of cycles?

#### 🔍Answer #2:

There is default Recursion limit is `25` [1] . You can increase the limit by setting
the `recursion_limit` [2] config key when calling `invoke()` on the graph object.

Otherwise, you can use the following methods


1. Add a counter to the state:
   We can modify the `AgentState` class to include a counter:

   ```python
   class AgentState(TypedDict):
     messages: Annotated[list, add_messages]
     cycle_count: int
   ```

   Then, increment this counter in the `tool_call_or_helpful` function and check it against a maximum value:

   ```python
   def should_continue(state):
     # ... start of the function
     if 'cycle_count' not in state:
      state['cycle_count'] = 1
     state['cycle_count'] += 1
     if state['cycle_count'] > MAX_CYCLES:
       return "end"
     # ... rest of the function
   ```

2. Use the length of the messages list in `should_continue` function:
   As shown in the existing code, we can use the length of the messages list to limit cycles:

   ```python
   def should_continue(state):
     # ... start of the function
     if len(state["messages"]) > MAX_MESSAGES:
       return "END"
     # ... rest of the function
   ```

   This approach is already implemented in the `tool_call_or_helpful` function, effectively limiting the number of cycles to 10.

3. Add a timeout:
   We could implement a timeout mechanism that ends the cycle after a certain
   amount of time has passed. we can add an `if` branch in `should_continue`
   function

   ```python
   def should_continue(state):
     # ... start of the function
     if 'start_time' not in state:
      state['start_time'] = time.time()
     elif time.time() - state['start_time'] > MAX_EXECUTION_TIME:
      return END
     # ... rest of the function
   ```

By implementing any of these methods, we can ensure that the agent doesn't get stuck in an infinite loop and always terminates after a certain number of cycles or amount of time.

[1]: https://github.com/langchain-ai/langchain/blob/ef329f681915ee696a5ddbe0cf9bf87a4406cdec/libs/core/langchain_core/runnables/config.py#L126
[2]: https://github.com/langchain-ai/langgraph/blob/main/examples/recursion-limit.ipynb


## Using Our Graph

Now that we've created and compiled our graph - we can call it *just as we'd call any other* `Runnable`!

Let's try out a few examples to see how it fairs:

In [16]:
from langchain_core.messages import HumanMessage
import json

inputs = {"messages" : [HumanMessage(content="Who is the current captain of the Winnipeg Jets?")]}

async for chunk in compiled_graph.astream(
    inputs,
    config={"recursion_limit": 10},
    stream_mode="updates"
):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        print(json.dumps(
            values["messages"],
            indent=4,
            default=lambda x: x.__dict__,
            ensure_ascii=False
        ))
        print("\n\n")

Receiving update from node: 'agent'
[
    {
        "content": "",
        "additional_kwargs": {
            "tool_calls": [
                {
                    "id": "call_XZtbOyvhJTUCSfMyisfY26oq",
                    "function": {
                        "arguments": "{\"query\":\"current captain of the Winnipeg Jets 2023\"}",
                        "name": "duckduckgo_search"
                    },
                    "type": "function"
                }
            ],
            "refusal": null
        },
        "response_metadata": {
            "token_usage": {
                "completion_tokens": 25,
                "prompt_tokens": 156,
                "total_tokens": 181
            },
            "model_name": "gpt-4o-2024-05-13",
            "system_fingerprint": "fp_157b3831f5",
            "finish_reason": "tool_calls",
            "logprobs": null
        },
        "type": "ai",
        "name": null,
        "id": "run-03e75c5c-222e-451f-8f98-f46cca2334a5-0",
    

Let's look at what happened:

1. Our state object was populated with our request
2. The state object was passed into our entry point (agent node) and the agent node added an `AIMessage` to the state object and passed it along the conditional edge
3. The conditional edge received the state object, found the "tool_calls" `additional_kwarg`, and sent the state object to the action node
4. The action node added the response from the OpenAI function calling endpoint to the state object and passed it along the edge to the agent node
5. The agent node added a response to the state object and passed it along the conditional edge
6. The conditional edge received the state object, could not find the "tool_calls" `additional_kwarg` and passed the state object to END where we see it output in the cell above!

Now let's look at an example that shows a multiple tool usage - all with the same flow!

In [17]:
from langchain_core.messages.human import HumanMessage


inputs: dict[str, list[HumanMessage]] = {"messages" : [HumanMessage(content="Search Arxiv for the QLoRA paper, then search each of the authors to find out their latest Tweet using DuckDuckGo.")]}

async for chunk in compiled_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        if node == "action":
          print(f"Tool Used: {values['messages'][0].name}")
        print(json.dumps(
            values["messages"],
            indent=4,
            default=lambda x: x.__dict__,
            ensure_ascii=False
        ))
        print("\n\n")

Receiving update from node: 'agent'
[
    {
        "content": "",
        "additional_kwargs": {
            "tool_calls": [
                {
                    "id": "call_xDPHsqFo8MVkh9STsy2dWKbO",
                    "function": {
                        "arguments": "{\"query\": \"QLoRA\"}",
                        "name": "arxiv"
                    },
                    "type": "function"
                },
                {
                    "id": "call_C9i1lLV3KbtUlJERduDZwzlD",
                    "function": {
                        "arguments": "{\"query\": \"QLoRA paper authors\"}",
                        "name": "duckduckgo_search"
                    },
                    "type": "function"
                }
            ],
            "refusal": null
        },
        "response_metadata": {
            "token_usage": {
                "completion_tokens": 53,
                "prompt_tokens": 173,
                "total_tokens": 226
            },
            "mo

#### 🏗️ Activity #2:

Please write out the steps the agent took to arrive at the correct answer.

Here’s a breakdown of the steps the agent took to arrive at the correct answer based on the provided trace:

1. **Initial Query Handling**:
   - The agent received the query related to "QLoRA" and decided to perform two actions:
     - Search for relevant papers using the `arxiv` tool.
     - Search for additional information about the "QLoRA paper authors" using the `duckduckgo_search` tool.

2. **Tool Execution**:
   - The agent executed the `arxiv` tool with the query "QLoRA." This returned detailed information about multiple papers, including their titles, authors, and summaries, which provided crucial details about the QLoRA approach and its authors.
   - Simultaneously, the agent executed the `duckduckgo_search` tool with the query "QLoRA paper authors." This search provided additional context and potential author-related information from various sources.

3. **Additional Author Information Search**:
   - The agent then took the results from the first two searches and initiated another round of searches. This time, the focus was on the latest tweets from the authors of the QLoRA paper, using the `duckduckgo_search` tool:
     - Searched for Tim Dettmers' latest tweet.
     - Searched for Artidoro Pagnoni's latest tweet.
     - Searched for Ari Holtzman's latest tweet.
     - Searched for Luke Zettlemoyer's latest tweet.

4. **Collection of Author Tweets**:
   - The agent successfully retrieved recent tweets or related information about the authors from the search results. This included tweets discussing topics relevant to their research and professional activities.

5. **Final Compilation and Summary**:
   - The agent compiled the information obtained from all the searches:
     - Summarized the QLoRA paper, including the title, authors, and key points from the summary.
     - Provided details from the latest tweets of each author, giving additional context and recent activities or opinions related to the field.
   - The final response was a comprehensive summary, including both the technical details of the QLoRA paper and recent updates from the authors, ensuring a well-rounded and informed answer.

This sequence of actions shows how the agent iteratively gathered information using various tools, cross-referenced data, and compiled the final answer by combining technical details with the latest relevant information from the authors.

## Part 1: LangSmith Evaluator

### Pre-processing for LangSmith

To do a little bit more preprocessing, let's wrap our LangGraph agent in a simple chain.

In [18]:
def convert_inputs(input_object):
  return {"messages" : [HumanMessage(content=input_object["question"])]}

def parse_output(input_state):
  return input_state["messages"][-1].content

agent_chain = convert_inputs | compiled_graph | parse_output

In [19]:
response=agent_chain.invoke({"question" : "What is RAG?"})

In [20]:
response

'RAG stands for Retrieval-Augmented Generation. It is a technique used in natural language processing (NLP) and machine learning to improve the performance of language models by combining retrieval-based methods with generative models. Here’s a brief overview of how it works:\n\n1. **Retrieval**: In the first step, the system retrieves relevant documents or pieces of information from a large corpus or database. This is typically done using a retrieval model, such as BM25 or a dense retrieval model like DPR (Dense Passage Retrieval).\n\n2. **Augmentation**: The retrieved documents are then used to augment the input to the generative model. This means that the generative model has access to additional context or information that can help it produce more accurate and relevant responses.\n\n3. **Generation**: Finally, the generative model, such as GPT-3 or BERT, uses the augmented input to generate a response. The additional context provided by the retrieved documents helps the model gener

### Task 1: Creating An Evaluation Dataset

Just as we saw last week, we'll want to create a dataset to test our Agent's ability to answer questions.

In order to do this - we'll want to provide some questions and some answers. Let's look at how we can create such a dataset below.

```python
questions = [
    "What optimizer is used in QLoRA?",
    "What data type was created in the QLoRA paper?",
    "What is a Retrieval Augmented Generation system?",
    "Who authored the QLoRA paper?",
    "What is the most popular deep learning framework?",
    "What significant improvements does the LoRA system make?"
]

answers = [
    {"must_mention" : ["paged", "optimizer"]},
    {"must_mention" : ["NF4", "NormalFloat"]},
    {"must_mention" : ["ground", "context"]},
    {"must_mention" : ["Tim", "Dettmers"]},
    {"must_mention" : ["PyTorch", "TensorFlow"]},
    {"must_mention" : ["reduce", "parameters"]},
]
```

#### 🏗️ Activity #3:

Please create a dataset in the above format with at least 5 questions.

In [21]:
questions = [
    "What is the main innovation introduced in the QLoRA paper?",
    "How does QLoRA compare to full fine-tuning in terms of memory usage?",
    "What is the significance of the 4-bit NormalFloat (NF4) data type in QLoRA?",
    "Who are the authors of the QLoRA paper?",
    "What is the relationship between QLoRA and PEFT (Parameter-Efficient Fine-Tuning)?"
]

answers = [
    {"must_mention": ["4-bit quantization", "LoRA", "memory efficiency"]},
    {"must_mention": ["95% less memory", "comparable performance"]},
    {"must_mention": ["NF4", "information retention", "4-bit quantization"]},
    {"must_mention": ["Tim Dettmers", "Artidoro Pagnoni", "Ari Holtzman", "Luke Zettlemoyer"]},
    {"must_mention": ["PEFT technique", "LoRA adaptation", "quantization"]}
]

Now we can add our dataset to our LangSmith project using the following code which we saw last Thursday!

In [22]:
from langsmith import Client

client = Client()
dataset_name = f"Langgraph Intro - Evaluation Dataset"
existing_datasets = client.list_datasets()
existing_dataset = next((ds for ds in existing_datasets if ds.name == dataset_name), None)
if existing_dataset:
    dataset = existing_dataset
    print(f"Dataset '{dataset_name}' already exists. Using existing dataset.")
else:
    dataset = client.create_dataset(
        dataset_name=dataset_name, description="Questions about the QLoRA Paper to Evaluate RAG over the same paper."
    )
    print(f"Created new dataset: '{dataset_name}'")



Created new dataset: 'Langgraph Intro - Evaluation Dataset'


In [23]:


from langsmith.schemas import Example


def create_new_examples(client, dataset_id, questions, answers):
    existing_examples:Example = client.list_examples(dataset_id=dataset_id)
    existing_questions = set(example.inputs["question"] for example in existing_examples)
    
    new_inputs = []
    new_outputs = []
    
    for question, answer in zip(questions, answers):
        if question not in existing_questions:
            new_inputs.append({"question": question})
            new_outputs.append(answer)
    
    if new_inputs:
        client.create_examples(
            inputs=new_inputs,
            outputs=new_outputs,
            dataset_id=dataset_id,
        )
        print(f"Added {len(new_inputs)} new examples to the dataset.")
    else:
        print("No new examples to add. All questions already exist in the dataset.")

# Usage
create_new_examples(client, dataset.id, questions, answers)
# NOTE: the following is commented out as it does not skip when questions
# already exist in the dataset
# client.create_examples(
#     inputs=[{"question" : q} for q in questions],
#     outputs=answers,
#     dataset_id=dataset.id,
# )


Added 5 new examples to the dataset.


#### ❓ Question #3:

How are the correct answers associated with the questions?

> NOTE: Feel free to indicate if this is problematic or not

#### 🔍Answer #3:


The correct answers are associated with the questions through the `answers` list, which corresponds to the `questions` list. Each answer in the `answers` list is a dictionary with a "must_mention" key, containing a list of phrases or keywords that should be present in a correct response to the corresponding question.

For example:

```python
questions[0] = "What is the main innovation introduced in the QLoRA paper?"
answers[0] = {"must_mention": ["4-bit quantization", "LoRA", "memory efficiency"]}
```

This approach is somewhat simplistic and could be problematic for a few reasons:
1. It doesn't account for synonyms or paraphrasing.
2. It might miss correct answers that use different terminology.
3. It doesn't consider the context or completeness of the answer.

However, it provides a basic way to automatically check if key concepts are present in the response.


### Task 2: Adding Evaluators

Now we can add a custom evaluator to see if our responses contain the expected information.

We'll be using a fairly naive exact-match process to determine if our response contains specific strings.

In [24]:
from langsmith.evaluation import EvaluationResult, run_evaluator

@run_evaluator
def must_mention(run, example) -> EvaluationResult:
    prediction = run.outputs.get("output") or ""
    required = example.outputs.get("must_mention") or []
    score = all(phrase in prediction for phrase in required)
    return EvaluationResult(key="must_mention", score=score)

#### ❓ Question #4:

What are some ways you could improve this metric as-is?

> NOTE: Alternatively you can suggest where gaps exist in this method.

#### 🔍Answer #4:

1. Use semantic similarity instead of exact matching, allowing for paraphrasing and synonyms.
2. Implement a scoring system based on how many key points are mentioned, rather than a binary pass/fail.
3. Use more sophisticated NLP techniques like named entity recognition or topic modeling.
4. Incorporate a human-in-the-loop system for edge cases or ambiguous responses.
5. Use a pre-trained language model to evaluate answer quality more holistically.
6. Include negative examples or common misconceptions to check if they're absent from the response.
7. Implement a weighted scoring system where some concepts are more important than others.
8. Use regular expressions to allow for variations in phrasing or formatting.

Now that we have created our custom evaluator - let's initialize our `RunEvalConfig` with it!

In [25]:
from langchain.smith import RunEvalConfig, run_on_dataset

eval_config = RunEvalConfig(
    custom_evaluators=[must_mention],
)

Task 3: Evaluating

All that is left to do is evaluate our agent's response!

In [26]:
client.run_on_dataset(
    dataset_name=dataset_name,
    llm_or_chain_factory=agent_chain,
    evaluation=eval_config,
    verbose=True,
    project_name=f"Langgraph Intro - Evaluation - {uuid4().hex[0:8]}",
    project_metadata={"version": "1.0.0"},
)

View the evaluation results for project 'Langgraph Intro - Evaluation - 876b09af' at:
https://smith.langchain.com/o/33fd52fe-2e6b-54d3-84ea-a791c2c90840/datasets/768861e9-d0b8-44e5-b64f-88b833a23b1d/compare?selectedSessions=2df1751c-741e-4782-9275-697b90dd3dd7

View all tests for Dataset Langgraph Intro - Evaluation Dataset at:
https://smith.langchain.com/o/33fd52fe-2e6b-54d3-84ea-a791c2c90840/datasets/768861e9-d0b8-44e5-b64f-88b833a23b1d
[------------------------------------------------->] 5/5

{'project_name': 'Langgraph Intro - Evaluation - 876b09af',
 'results': {'67bf1436-415b-4699-a319-167f2594e05c': {'input': {'question': 'What is the main innovation introduced in the QLoRA paper?'},
   'feedback': [EvaluationResult(key='must_mention', score=False, value=None, comment=None, correction=None, evaluator_info={}, feedback_config=None, source_run_id=UUID('72209a60-c341-46da-874b-499ac4a6f529'), target_run_id=None)],
   'execution_time': 4.392089,
   'run_id': 'eb4b5eea-1e9b-4dc3-a3a5-9f7874b4b2bf',
   'output': 'The main innovation introduced in the QLoRA paper is the development of a novel method called IR-QLoRA (Information Retention Quantized Low-Rank Adaptation). This method aims to enhance the accuracy of quantized large language models (LLMs) that have been fine-tuned using Low-Rank Adaptation (LoRA). The key innovations in IR-QLoRA are:\n\n1. **Statistics-based Information Calibration Quantization**: This technique ensures that the quantized parameters of the LLM reta

## Part 2: LangGraph with Helpfulness:

### Task 3: Adding Helpfulness Check and "Loop" Limits

Now that we've done evaluation - let's see if we can add an extra step where we review the content we've generated to confirm if it fully answers the user's query!

We're going to make a few key adjustments to account for this:

1. We're going to add an artificial limit on how many "loops" the agent can go through - this will help us to avoid the potential situation where we never exit the loop.
2. We'll add to our existing conditional edge to obtain the behaviour we desire.

First, let's define our state again - we can check the length of the state object, so we don't need additional state for this.

In [27]:
class AgentState(TypedDict):
  messages: Annotated[list, add_messages]

Now we can set our graph up! This process will be almost entirely the same - with the inclusion of one additional node/conditional edge!

#### 🏗️ Activity #5:

Please write markdown for the following cells to explain what each is doing.

##### Initializing the Graph and Adding Nodes

This code initializes our LangGraph and adds two key nodes:

1. We create a new `StateGraph` called `graph_with_helpfulness_check`, using our previously defined `AgentState` as the state type.

2. We add two nodes to the graph:
   - An "agent" node that calls our language model (using the `call_model` function).
   - An "action" node that executes tool actions (using the `tool_node` function).

These nodes form the core of our agent's decision-making and action-execution process. The "agent" node will be responsible for deciding what to do next, while the "action" node will carry out the chosen actions.

This structure allows for a flexible, graph-based approach to agent design, as described in the Pinecone article on LangGraph and research agents [1]. By using a graph structure, we can more easily customize the agent's behavior and add additional nodes or edges as needed.

[1]: https://www.pinecone.io/learn/langgraph-research-agent/

In [28]:
graph_with_helpfulness_check = StateGraph(AgentState)

graph_with_helpfulness_check.add_node("agent", action = lambda state: call_model(state, runnable))
graph_with_helpfulness_check.add_node("action", tool_node)

##### Setting the Entry Point


This line of code sets the entry point of our LangGraph to the "agent" node. Here's what this means:

1. The entry point is the first node that will be executed when the graph is run.
2. By setting it to "agent", we're indicating that our graph execution should always start with the agent making a decision or taking an action.
3. This is a crucial step in defining the flow of our agent-based system, as it determines where the execution begins each time the graph is invoked.

Setting the entry point to "agent" aligns with the typical structure of an agent-based system, where the agent is the primary decision-maker that initiates actions or responses. This setup allows the agent to assess the current state and decide on the next steps, whether it's to use a tool, respond to the user, or take any other action defined in our graph.

In [29]:
graph_with_helpfulness_check.set_entry_point("agent")

##### Decision-making node


The `tool_call_or_helpful` function is a crucial part of the LangGraph implementation, serving as a decision-making node in the graph. Here's a breakdown of what's happening in this function:

1. State Examination:
   - It first checks the last message in the state for any tool calls.
   - If tool calls are present, it returns "action", indicating that a tool should be used.

2. Message Limit Check:
   - If the number of messages in the state exceeds 10, it returns "END", preventing infinite loops.

3. Helpfulness Evaluation:
   - If no tool calls are needed and the message limit isn't reached, it proceeds to evaluate the helpfulness of the response.
   - It extracts the initial query and the final response from the state.

4. Prompt Creation:
   - A prompt template is defined to assess the helpfulness of the response.

5. Model Setup:
   - It initializes a ChatOpenAI model (GPT-4) for the helpfulness check.

6. Helpfulness Chain:
   - A chain is created using the prompt template, the GPT-4 model, and a string output parser.

7. Helpfulness Assessment:
   - The chain is invoked with the initial query and final response.
   - The model determines if the response is helpful, returning 'Y' for yes or 'N' for no.

8. Final Decision:
   - If the response is deemed helpful ('Y'), it returns "end", concluding the process.
   - If not helpful, it returns "continue", allowing the graph to cycle again for improvement.

This function embodies key concepts of LangGraph, such as state management,
conditional logic, and the use of language models for decision-making within the
graph structure. 

It demonstrates how LangGraph can create more sophisticated, self-improving AI systems by incorporating feedback loops and decision points based on the quality of responses.

In [30]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langgraph.graph import END

def tool_call_or_helpful(state):
  last_message = state["messages"][-1]

  if last_message.tool_calls:
    return "action"

  initial_query = state["messages"][0]
  final_response = state["messages"][-1]

  if len(state["messages"]) > 10:
    return "END"

  prompt_template = """\
  Given an initial query and a final response, determine if the final response is extremely helpful or not. Please indicate helpfulness with a 'Y' and unhelpfulness as an 'N'.

  Initial Query:
  {initial_query}

  Final Response:
  {final_response}"""

  prompt_template = PromptTemplate.from_template(prompt_template)

  helpfulness_check_model = ChatOpenAI(model="gpt-4")

  helpfulness_chain = prompt_template | helpfulness_check_model | StrOutputParser()

  helpfulness_response = helpfulness_chain.invoke({"initial_query" : initial_query.content, "final_response" : final_response.content})

  if "Y" in helpfulness_response:
    return "end"
  else:
    return "continue"

##### Adding Conditional Logic

This code snippet adds conditional edges to the `graph_with_helpfulness_check` StateGraph. Here's a breakdown of what it does:

1. It starts from the "agent" node.

2. It uses the `tool_call_or_helpful` function to determine the next step in the graph. This function likely evaluates the agent's response to decide whether to:
   - Continue with the agent ("continue")
   - Perform an action ("action")
   - End the conversation ("end")

3. The conditional edges are defined with a dictionary mapping the possible outcomes to their corresponding nodes:
   - If the result is "continue", it loops back to the "agent" node.
   - If the result is "action", it moves to the "action" node (likely to execute a tool).
   - If the result is "end", it terminates the graph execution using the special END node.

This structure allows for a flexible conversation flow, where the agent can decide to continue thinking, use a tool, or conclude the interaction based on the current state of the conversation. It's a key component in implementing the cyclic behavior characteristic of LangGraph, as described in the LangGraph documentation [1].

[1]: https://langchain-ai.github.io/langgraph/how-tos/tool-calling/

In [31]:
graph_with_helpfulness_check.add_conditional_edges(
    "agent",
    tool_call_or_helpful,
    {
        "continue" : "agent",
        "action" : "action",
        "end" : END
    }
)

##### Adding an Edge from Action to Agent

This line of code adds a directed edge in our graph from the "action" node to the "agent" node. Here's what this means:

1. After an action is performed (e.g., using a tool), the graph will always return to the agent node.
2. This creates a cycle in our graph, allowing the agent to potentially use multiple tools in sequence if needed.
3. It ensures that after every action, the agent has a chance to evaluate the new state and decide on the next step.

In [32]:
graph_with_helpfulness_check.add_edge("action", "agent")

##### Compiling the Graph

This line of code compiles the `graph_with_helpfulness_check` into an executable
agent

Compiling the graph is a crucial step in the LangGraph workflow. Here's what this does:

1. Validation: It performs basic checks on the structure of the graph, ensuring there are no orphaned nodes or other structural issues.

2. Optimization: It may perform some internal optimizations to improve the execution efficiency of the graph.

3. Finalization: It finalizes the graph structure, making it ready for execution.

4. Runnable Creation: It turns the graph into a LangChain Runnable object, which provides methods like `.invoke()`, `.stream()`, and `.batch()` for executing the graph with inputs.

5. Runtime Arguments: While not shown in this specific example, the `compile()` method is where you can specify runtime arguments like checkpointers and breakpoints if needed.

Compiling is a necessary step before you can use your graph. It transforms the graph definition into an executable form, preparing it for actual use in your application.

This step aligns with the LangGraph documentation, which states: 

> You **MUST** compile your graph before you can use it [1].

[1]: https://langchain-ai.github.io/langgraph/concepts/low_level

In [33]:
agent_with_helpfulness_check = graph_with_helpfulness_check.compile()

### Testing the Agent with Helpfulness Check

This code sets up a test for the agent we've created that includes a helpfulness check. Here's a breakdown of what it does:


1. It imports the `HumanMessage` class from LangChain, which represents a message from a human user.

2. It creates an input dictionary with a single message asking about LoRA, Tim Dettmers, and Attention in machine learning.

3. It uses the `astream` method of our `agent_with_helpfulness_check` to asynchronously stream the agent's response. The `stream_mode="updates"` parameter tells it to stream updates to the state after each step of the graph.

4. For each chunk of the streamed response:
   - It prints the name of the node that produced the update.
   - It prints the messages in the current state, formatted as JSON for readability.

This setup allows us to see how the agent processes the query step by step, including any tool usage and the final response. It's particularly useful for debugging and understanding the agent's decision-making process.

The use of `astream` and the streaming setup aligns with LangGraph's support for streaming, as described in the LangGraph documentation [1]. This approach allows for real-time observation of the agent's thought process and actions.

[1]: https://langchain-ai.github.io/langgraph/concepts/low_level/#streaming

In [34]:
from langchain_core.messages.human import HumanMessage

inputs: dict[str, list[HumanMessage]] = {"messages" : [HumanMessage(content="Related to machine learning, what is LoRA? Also, who is Tim Dettmers? Also, what is Attention?")]}

async for chunk in agent_with_helpfulness_check.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        print(json.dumps(
            values["messages"],
            indent=4,
            default=lambda x: x.__dict__,
            ensure_ascii=False
        ))
        print("\n\n")

Receiving update from node: 'agent'
[
    {
        "content": "",
        "additional_kwargs": {
            "tool_calls": [
                {
                    "id": "call_F2h2FWdsRoMYhwLjJEbKhvlt",
                    "function": {
                        "arguments": "{\"query\": \"LoRA machine learning\"}",
                        "name": "arxiv"
                    },
                    "type": "function"
                },
                {
                    "id": "call_3G2WsdcvLZ3BLS1KsSWeW4j9",
                    "function": {
                        "arguments": "{\"query\": \"Tim Dettmers\"}",
                        "name": "duckduckgo_search"
                    },
                    "type": "function"
                },
                {
                    "id": "call_ivKuAVxeaggFuHPZTUzhLfZY",
                    "function": {
                        "arguments": "{\"query\": \"Attention mechanism machine learning\"}",
                        "name": "arxiv"
    

### Task 4: LangGraph for the "Patterns" of GenAI

Let's ask our system about the 4 patterns of Generative AI:

1. Prompt Engineering
2. RAG
3. Fine-tuning
4. Agents

In [35]:
patterns = ["prompt engineering", "RAG", "fine-tuning", "LLM-based agents"]

In [36]:
for pattern in patterns:
  what_is_string = f"What is {pattern} and when did it break onto the scene??"
  inputs = {"messages" : [HumanMessage(content=what_is_string)]}
  messages = agent_with_helpfulness_check.invoke(inputs)
  print(messages["messages"][-1].content)
  print("\n\n")

### What is Prompt Engineering?

Prompt engineering is the process of creating and refining input prompts to guide generative AI models, especially those using natural language processing (NLP). This involves designing inputs that help AI systems understand tasks and generate accurate responses. The goal is to optimize the interaction between humans and AI by carefully crafting input prompts to help the language model understand the context and produce the desired output.

### Key Aspects of Prompt Engineering:
- **Refinement**: Continuously improving the prompts to get better results.
- **Structure and Language**: Using the proper structure and language to ensure the AI understands the task.
- **Optimization**: Enhancing the output of large language models (LLMs) like OpenAI's GPT-4.
- **Application**: Used in various generative AI models such as ChatGPT, DALL-E, and others.

### History of Prompt Engineering

Prompt engineering has evolved significantly over the years, particularly i

## References

- https://github.com/langchain-ai/langgraph/blob/main/libs/langgraph/langgraph/prebuilt/chat_agent_executor.py