# LangGraph and LangSmith - Agentic RAG Powered by LangChain

In the following notebook we'll complete the following tasks:

- 🤝 Breakout Room #1:
  1. Install required libraries
  2. Set Environment Variables
  3. Creating our Tool Belt
  4. Creating Our State
  5. Creating and Compiling A Graph!

- 🤝 Breakout Room #2:
  1. Evaluating the LangGraph Application with LangSmith
  2. Adding Helpfulness Check and "Loop" Limits
  3. LangGraph for the "Patterns" of GenAI

# 🤝 Breakout Room #1

## Part 1: LangGraph - Building Cyclic Applications with LangChain

LangGraph is a tool that leverages LangChain Expression Language to build coordinated multi-actor and stateful applications that includes cyclic behaviour.

### Why Cycles?

In essence, we can think of a cycle in our graph as a more robust and customizable loop. It allows us to keep our application agent-forward while still giving the powerful functionality of traditional loops.

Due to the inclusion of cycles over loops, we can also compose rather complex flows through our graph in a much more readable and natural fashion. Effectively allowing us to recreate application flowcharts in code in an almost 1-to-1 fashion.

### Why LangGraph?

Beyond the agent-forward approach - we can easily compose and combine traditional "DAG" (directed acyclic graph) chains with powerful cyclic behaviour due to the tight integration with LCEL. This means it's a natural extension to LangChain's core offerings!

## Task 1:  Dependencies


## Task 2: Environment Variables

We'll want to set our OpenAI, Tavily, and LangSmith API keys along with our LangSmith environment variables.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [2]:
os.environ["TAVILY_API_KEY"] = getpass.getpass("TAVILY_API_KEY")

In [3]:
from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"AIE8 - LangGraph - {uuid4().hex[0:8]}"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangSmith API Key: ")

## Task 3: Creating our Tool Belt

As is usually the case, we'll want to equip our agent with a toolbelt to help answer questions and add external knowledge.

There's a tonne of tools in the [LangChain Community Repo](https://github.com/langchain-ai/langchain-community/tree/main/libs/community) but we'll stick to a couple just so we can observe the cyclic nature of LangGraph in action!

We'll leverage:

- [Tavily Search Results](https://github.com/langchain-ai/langchain-community/blob/main/libs/community/langchain_community/tools/tavily_search/tool.py)
- [Arxiv](https://github.com/langchain-ai/langchain-community/blob/main/libs/community/langchain_community/tools/arxiv/tool.py)

#### 🏗️ Activity #1:

Please add the tools to use into our toolbelt.

> NOTE: Each tool in our toolbelt should be a method.

In [4]:
%pip install -qU duckduckgo-search langchain-community
%pip install -U ddgs

/Users/ashimamangla/AI_Makerspace/code/AI8/05_Our_First_Agent_with_LangGraph/.venv/bin/python: No module named pip
Note: you may need to restart the kernel to use updated packages.
/Users/ashimamangla/AI_Makerspace/code/AI8/05_Our_First_Agent_with_LangGraph/.venv/bin/python: No module named pip
Note: you may need to restart the kernel to use updated packages.


In [5]:
#from langchain_tavily import TavilySearchResults
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_community.tools.arxiv.tool import ArxivQueryRun
from langchain_community.tools import DuckDuckGoSearchRun

tavily_tool = TavilySearchResults(max_results=5)

tool_belt = [
    tavily_tool,
    ArxivQueryRun(),
    DuckDuckGoSearchRun(),
]

  tavily_tool = TavilySearchResults(max_results=5)


### Model

Now we can set-up our model! We'll leverage the familiar OpenAI model suite for this example - but it's not *necessary* to use with LangGraph. LangGraph supports all models - though you might not find success with smaller models - as such, they recommend you stick with:

- OpenAI's GPT-3.5 and GPT-4
- Anthropic's Claude
- Google's Gemini

> NOTE: Because we're leveraging the OpenAI function calling API - we'll need to use OpenAI *for this specific example* (or any other service that exposes an OpenAI-style function calling API.

In [6]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4.1-nano", temperature=0)

Now that we have our model set-up, let's "put on the tool belt", which is to say: We'll bind our LangChain formatted tools to the model in an OpenAI function calling format.

In [7]:
model = model.bind_tools(tool_belt)

#### ❓ Question #1:

How does the model determine which tool to use?

##### ✅ Answer:
Model uses 
(I) Structured approach : function calling based on prompt and the tool description given. A natural language query or prompt will get converted to a function call. A tool/function definition incudes name, description of what it does and input paramaters in json format. The Model analyzes the input prompt and matches it with tools in tool belt - it will check for (1)relevance: is a tool relevant to the input query  (2) Does the tool functionality match the user request (3) does it have the input parameters the tool needs
(II) (more freeform) ReAct Prompting : Agent uses the model to reason about the tool it should use and what inputs to provide

## Task 4: Putting the State in Stateful

Earlier we used this phrasing:

`coordinated multi-actor and stateful applications`

So what does that "stateful" mean?

To put it simply - we want to have some kind of object which we can pass around our application that holds information about what the current situation (state) is. Since our system will be constructed of many parts moving in a coordinated fashion - we want to be able to ensure we have some commonly understood idea of that state.

LangGraph leverages a `StatefulGraph` which uses an `AgentState` object to pass information between the various nodes of the graph.

There are more options than what we'll see below - but this `AgentState` object is one that is stored in a `TypedDict` with the key `messages` and the value is a `Sequence` of `BaseMessages` that will be appended to whenever the state changes.

Let's think about a simple example to help understand exactly what this means (we'll simplify a great deal to try and clearly communicate what state is doing):

1. We initialize our state object:
  - `{"messages" : []}`
2. Our user submits a query to our application.
  - New State: `HumanMessage(#1)`
  - `{"messages" : [HumanMessage(#1)}`
3. We pass our state object to an Agent node which is able to read the current state. It will use the last `HumanMessage` as input. It gets some kind of output which it will add to the state.
  - New State: `AgentMessage(#1, additional_kwargs {"function_call" : "WebSearchTool"})`
  - `{"messages" : [HumanMessage(#1), AgentMessage(#1, ...)]}`
4. We pass our state object to a "conditional node" (more on this later) which reads the last state to determine if we need to use a tool - which it can determine properly because of our provided object!

In [8]:
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
import operator
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
  messages: Annotated[list, add_messages]

## Task 5: It's Graphing Time!

Now that we have state, and we have tools, and we have an LLM - we can finally start making our graph!

Let's take a second to refresh ourselves about what a graph is in this context.

Graphs, also called networks in some circles, are a collection of connected objects.

The objects in question are typically called nodes, or vertices, and the connections are called edges.

Let's look at a simple graph.

![image](https://i.imgur.com/2NFLnIc.png)

Here, we're using the coloured circles to represent the nodes and the yellow lines to represent the edges. In this case, we're looking at a fully connected graph - where each node is connected by an edge to each other node.

If we were to think about nodes in the context of LangGraph - we would think of a function, or an LCEL runnable.

If we were to think about edges in the context of LangGraph - we might think of them as "paths to take" or "where to pass our state object next".

Let's create some nodes and expand on our diagram.

> NOTE: Due to the tight integration with LCEL - we can comfortably create our nodes in an async fashion!

In [9]:
from langgraph.prebuilt import ToolNode

def call_model(state):
  messages = state["messages"]
  response = model.invoke(messages)
  return {"messages" : [response]}

tool_node = ToolNode(tool_belt)

Now we have two total nodes. We have:

- `call_model` is a node that will...well...call the model
- `tool_node` is a node which can call a tool

Let's start adding nodes! We'll update our diagram along the way to keep track of what this looks like!


In [10]:
from langgraph.graph import StateGraph, END

uncompiled_graph = StateGraph(AgentState)

uncompiled_graph.add_node("agent", call_model)
uncompiled_graph.add_node("action", tool_node)

<langgraph.graph.state.StateGraph at 0x119682510>

Let's look at what we have so far:

![image](https://i.imgur.com/md7inqG.png)

Next, we'll add our entrypoint. All our entrypoint does is indicate which node is called first.

In [11]:
uncompiled_graph.set_entry_point("agent")

<langgraph.graph.state.StateGraph at 0x119682510>

![image](https://i.imgur.com/wNixpJe.png)

Now we want to build a "conditional edge" which will use the output state of a node to determine which path to follow.

We can help conceptualize this by thinking of our conditional edge as a conditional in a flowchart!

Notice how our function simply checks if there is a "function_call" kwarg present.

Then we create an edge where the origin node is our agent node and our destination node is *either* the action node or the END (finish the graph).

It's important to highlight that the dictionary passed in as the third parameter (the mapping) should be created with the possible outputs of our conditional function in mind. In this case `should_continue` outputs either `"end"` or `"continue"` which are subsequently mapped to the action node or the END node.

In [12]:
def should_continue(state):
  last_message = state["messages"][-1]

  if last_message.tool_calls:
    return "action"

  return END

uncompiled_graph.add_conditional_edges(
    "agent",
    should_continue
)

<langgraph.graph.state.StateGraph at 0x119682510>

Let's visualize what this looks like.

![image](https://i.imgur.com/8ZNwKI5.png)

Finally, we can add our last edge which will connect our action node to our agent node. This is because we *always* want our action node (which is used to call our tools) to return its output to our agent!

In [13]:
uncompiled_graph.add_edge("action", "agent")

<langgraph.graph.state.StateGraph at 0x119682510>

Let's look at the final visualization.

![image](https://i.imgur.com/NWO7usO.png)

All that's left to do now is to compile our workflow - and we're off!

In [14]:
simple_agent_graph = uncompiled_graph.compile()

#### ❓ Question #2:

Is there any specific limit to how many times we can cycle?

If not, how could we impose a limit to the number of cycles?

##### ✅ Answer: Yes, max_iterations is the parameter that limits the number of cycles. Default value is 15 (according to https://python.langchain.com/api_reference/langchain/agents/langchain.agents.agent.AgentExecutor.html) but can be overriden . 


In [15]:
from IPython.display import HTML, display
def set_output_wrapping():
    display(HTML('''
    <style>
    pre {
        white-space: pre-wrap;
        word-wrap: break-word;
        max-width: 100%;
        overflow-x: hidden;
    }
    .output_area {
        white-space: pre-wrap;
        word-wrap: break-word;
        max-width: 100%;
        overflow-x: hidden;
    }
    .output_text {
        white-space: pre-wrap;
        word-wrap: break-word;
    }
    div.output {
        white-space: pre-wrap;
        word-wrap: break-word;
        max-width: 100%;
    }
    span {
        white-space: pre-wrap;
        word-wrap: break-word;
    }
    </style>
    '''))
set_output_wrapping()

## Using Our Graph

Now that we've created and compiled our graph - we can call it *just as we'd call any other* `Runnable`!

Let's try out a few examples to see how it fairs:

In [16]:
from langchain_core.messages import HumanMessage
import sys

inputs = {"messages" : [HumanMessage(content="How are technical professionals using AI to improve their work?")]}
async for chunk in simple_agent_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        print(values["messages"])
        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_LQRu0qjR0hwKTMAey7hOxkI1', 'function': {'arguments': '{"query": "How are technical professionals using AI to improve their work?"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 44, 'prompt_tokens': 213, 'total_tokens': 257, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-nano-2025-04-14', 'system_fingerprint': 'fp_7c233bf9d1', 'id': 'chatcmpl-CLHATskkCtIRtTYB5sRYUPsLYUJ48', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run--51fde11c-bc4c-4838-83a2-36a7658f82e1-0', tool_calls=[{'name': 'tavily_search_results_json', 'args': {'query': 'How are technical professionals using AI to impr

Let's look at what happened:

1. Our state object was populated with our request
2. The state object was passed into our entry point (agent node) and the agent node added an `AIMessage` to the state object and passed it along the conditional edge
3. The conditional edge received the state object, found the "tool_calls" `additional_kwarg`, and sent the state object to the action node
4. The action node added the response from the OpenAI function calling endpoint to the state object and passed it along the edge to the agent node
5. The agent node added a response to the state object and passed it along the conditional edge
6. The conditional edge received the state object, could not find the "tool_calls" `additional_kwarg` and passed the state object to END where we see it output in the cell above!

Now let's look at an example that shows a multiple tool usage - all with the same flow!

In [17]:
inputs = {"messages" : [HumanMessage(content="Search Arxiv for the A Comprehensive Survey of Deep Research paper, then search each of the authors to find out where they work now using Tavily!")]}

async for chunk in simple_agent_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        #if node == "action":
          #print(f"Tool Used: {values['messages'][0].name}")
        print(values["messages"])

        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_mCopQeuXQe6mAl4cFc0HODLx', 'function': {'arguments': '{"query": "A Comprehensive Survey of Deep Research"}', 'name': 'arxiv'}, 'type': 'function'}, {'id': 'call_A8ZpRPBOhSft80ftgeVxBsH2', 'function': {'arguments': '{"query": "A Comprehensive Survey of Deep Research paper"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 59, 'prompt_tokens': 232, 'total_tokens': 291, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-nano-2025-04-14', 'system_fingerprint': 'fp_7c233bf9d1', 'id': 'chatcmpl-CLHAeU9oTkrIFOtgdAvFHNOUXjXNZ', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run--8f274f47-f473-4a

#### 🏗️ Activity #2:

Please write out the steps the agent took to arrive at the correct answer.

##### ✅ Answer:
 Our state object was populated with our request
2. The state object was passed into our entry point (agent node) and the agent node.
The agent used the LLM model and determined it needs to call both tools. It adds call to arxiv with the name of the paper and call to tavily_search_results_json with 'author of' the name of the paper.
3. The flow passes to conditional node which determines actions as described above need to be taken and calls the tools
4. Each tool is executed and the results are appended to the state object and returned to the agent
5. agent executest the LLM to parse the results and determine the next course of action
6. LLM parses out the two author names and adds two calls to the state object. Each call per author to tavily_search_results_json
7. the flow passes to the conditional which determines actions as described above need to be taken and calls the tavily_search_results_json twice
6. the two calls are executed and the results are appended to the state and returned to the agent.
7. Agent LLM decides to add no further action to the state since it has the required information and the conditional node returns a call to end the flow.

Hindsight: More prompt tuning maybe needed 


In [18]:
inputs = {"messages" : [HumanMessage(content="First Search only on Arxiv for 'A Comprehensive Survey of Deep Research' paper to find the authors, then after authors are available use Tavily to search to find out where they work now!")]}

async for chunk in simple_agent_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        #if node == "action":
          #print(f"Tool Used: {values['messages'][0].name}")
        print(values["messages"])

        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_qBkJDzDtUv6qdarrdfqAPLyw', 'function': {'arguments': '{"query": "A Comprehensive Survey of Deep Research"}', 'name': 'arxiv'}, 'type': 'function'}, {'id': 'call_YTYsVBlQDsvF08Bftoj8dOX2', 'function': {'arguments': '{"query": "A Comprehensive Survey of Deep Research"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 58, 'prompt_tokens': 241, 'total_tokens': 299, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-nano-2025-04-14', 'system_fingerprint': 'fp_7c233bf9d1', 'id': 'chatcmpl-CLHArW9icN9korf6PAJlsVCT6Sb6G', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run--afdc0bae-2b77-430b-aeb

# 🤝 Breakout Room #2

## Part 1: LangSmith Evaluator

### Pre-processing for LangSmith

To do a little bit more preprocessing, let's wrap our LangGraph agent in a simple chain.

In [19]:
def convert_inputs(input_object):
  return {"messages" : [HumanMessage(content=input_object["text"])]}

def parse_output(input_state):
  return {"answer" : input_state["messages"][-1].content}

agent_chain_with_formatting = convert_inputs | simple_agent_graph | parse_output

agent_chain_with_formatting.invoke({"text" : "What is Deep Research?"})

{'answer': 'Deep Research is a tool designed to perform advanced, in-depth research and analysis across the web. It can independently discover, reason about, and consolidate insights from multiple sources to generate comprehensive reports on various topics within minutes. This tool is useful for conducting detailed literature reviews, case studies, and complex research tasks by browsing the web, synthesizing information, and providing precise and thorough reports.'}

### Task 1: Creating An Evaluation Dataset

Just as we saw last week, we'll want to create a dataset to test our Agent's ability to answer questions.

In order to do this - we'll want to provide some questions and some answers. Let's look at how we can create such a dataset below.

```python
questions = [
    {
        "inputs" : {"text" : "Who were the main authors on the 'A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications' paper?"},
        "outputs" : {"must_mention" : ["Peng", "Xu"]}   
    },
    ...,
    {
        "inputs" : {"text" : "Where do the authors of the 'A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications' work now?"},
        "outputs" : {"must_mention" : ["Zhejiang", "Liberty Mutual"]}
    }
]
```

#### 🏗️ Activity #3:

Please create a dataset in the above format with at least 5 questions that pertain to the cohort use-case (more information [here](https://www.notion.so/Session-4-RAG-with-LangGraph-OSS-Local-Models-Eval-w-LangSmith-26acd547af3d80838d5beba464d7e701#26acd547af3d81d08809c9c82a462bdd)), or the use-case you're hoping to tackle in your Demo Day project.

In [20]:
questions = [
    {
        "inputs" : {"text" : "What are the key bnefits of using AI at work?"},
        "outputs" : {"must_mention_improved" : ["efficiency", "speed"]}   
    },
    {
        "inputs" : {"text" : "What are the most common ways people use AI in their work?"},
        "outputs" : {"must_mention_improved" : ["Search", "Retrieve","Generate"]}   
    },
    {
        "inputs" : {"text" : "What are the various use cases per domain that AI can be leveraged to great effect?"},
        "outputs" : {"must_mention_improved" : ["Healthcare", "finance", "education"]}
    },
    {
        "inputs" : {"text" : "What concerns or challenges do people have when using AI?"},
        "outputs" : {"must_mention_improved" : ["privacy", "cost"]}
    },
    {
        "inputs" : {"text" : "How can these concerns people have about AI be mitigated?"},
        "outputs" : {"must_mention_improved" : ["Regulations", "Ethics", "Transparency"]}
    }
]

Now we can add our dataset to our LangSmith project using the following code which we saw last Thursday!

In [21]:
from langsmith import Client

client = Client()

dataset_name = f"Simple Search Agent - Evaluation Dataset - {uuid4().hex[0:8]}"

dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="Questions about the cohort use-case to evaluate the Simple Search Agent."
)

client.create_examples(
    dataset_id=dataset.id,
    examples=questions
)

{'example_ids': ['4f4b42eb-10a3-494a-88bf-c0edaed7e53b',
  'f69268b1-7132-42c3-b85e-4dd323bcdba3',
  '13d93509-ec51-48d6-9a86-9182c6a9a058',
  '342dc188-752e-4eef-8c2e-c421b733c4d5',
  '1bf75396-09a6-465a-a653-602dc9823c35'],
 'count': 5}

### Task 2: Adding Evaluators

Let's use the OpenEvals library to product an evaluator that we can then pass into LangSmith!

> NOTE: Examine the `CORRECTNESS_PROMPT` below!

In [22]:
from openevals.prompts import CORRECTNESS_PROMPT
print(CORRECTNESS_PROMPT)

You are an expert data labeler evaluating model outputs for correctness. Your task is to assign a score based on the following rubric:

<Rubric>
  A correct answer:
  - Provides accurate and complete information
  - Contains no factual errors
  - Addresses all parts of the question
  - Is logically consistent
  - Uses precise and accurate terminology

  When scoring, you should penalize:
  - Factual errors or inaccuracies
  - Incomplete or partial answers
  - Misleading or ambiguous statements
  - Incorrect terminology
  - Logical inconsistencies
  - Missing key information
</Rubric>

<Instructions>
  - Carefully read the input and output
  - Check for factual accuracy and completeness
  - Focus on correctness of information rather than style or verbosity
</Instructions>

<Reminder>
  The goal is to evaluate factual correctness and completeness of the response.
</Reminder>

<input>
{inputs}
</input>

<output>
{outputs}
</output>

Use the reference outputs below to help you evaluate the

In [23]:
from openevals.llm import create_llm_as_judge

correctness_evaluator = create_llm_as_judge(
        prompt=CORRECTNESS_PROMPT,
        model="openai:o3-mini", # very impactful to the final score
        feedback_key="correctness",
    )

Let's also create a custom Evaluator for our created dataset above - we do this by first making a simple Python function!

In [None]:
def must_mention_improved(inputs: dict, outputs: dict, reference_outputs: dict) -> bool:
  """
  Improved version that returns True if ANY of the required phrases are found in the answer.
  Uses 'any()' instead of 'all()' to be more lenient and compatible with LangSmith boolean evaluators.
  """
  # Get the required phrases from reference_outputs
  required = reference_outputs.get("must_mention") or []
  
  # If no phrases are required, return True (perfect score)
  if not required:
    return True
  
  # Get the answer text (convert to lowercase for case-insensitive matching)
  answer_text = outputs["answer"].lower()
  
  # Return True if ANY of the required phrases are found in the answer
  return any(phrase.lower() in answer_text for phrase in required)

# Example usage and testing
test_inputs = {"text": "What are the main authors of the Deep Research paper?"}
test_outputs = {"answer": "The main authors are Renjun Xu and Jingwen Peng from Zhejiang University."}
test_reference = {"must_mention": ["Renjun Xu", "Jingwen Peng", "Zhejiang University"]}

# Test the improved function
result = must_mention_improved(test_inputs, test_outputs, test_reference)
print(f"Result: {result}")
print(f"Found at least one required phrase: {'Yes' if result else 'No'}")

# Test with partial match
test_outputs_partial = {"answer": "The author is Renjun Xu."}
result_partial = must_mention_improved(test_inputs, test_outputs_partial, test_reference)
print(f"Partial match result: {result_partial}")

# Test with no match
test_outputs_none = {"answer": "The authors are Smith and Jones."}
result_none = must_mention_improved(test_inputs, test_outputs_none, test_reference)
print(f"No match result: {result_none}")


Score: 1.00 (100.0%)
Found: 3 out of 3 required phrases


In [25]:
def must_mention(inputs: dict, outputs: dict, reference_outputs: dict) -> float:
  # determine if the phrases in the reference_outputs are in the outputs
  required = reference_outputs.get("must_mention") or []
  score = all(phrase in outputs["answer"] for phrase in required)
  return score

#### ❓ Question #4:

What are some ways you could improve this metric as-is?

> NOTE: Alternatively you can suggest where gaps exist in this method.

##### ✅ Answer: 
Instead of must mention which is an exact word search- it should check for semantic similarity  to the must have words. Also it seems it is checkign for all the words to exist, the answer can still be correct with only some of the words.
other ways:
Ask for cited sources of answers to be able to compare that the answer provided is indeed supported by the provided documents
Check for relaevance of answers to the query (uses LLM again)


Task 3: Evaluating

All that is left to do is evaluate our agent's response!

In [26]:
results = client.evaluate(
    agent_chain_with_formatting,
    data=dataset.name,
    #evaluators=[correctness_evaluator, must_mention],
    evaluators=[correctness_evaluator, must_mention_improved],
    experiment_prefix="simple_agent, baseline",  # optional, experiment name prefix
    description="Testing the baseline system.",  # optional, experiment description
    max_concurrency=4, # optional, add concurrency
)

View the evaluation results for experiment: 'simple_agent, baseline-10e5039f' at:
https://smith.langchain.com/o/4528ad4f-8895-4efc-95a1-834ac718fac4/datasets/98066dbc-11be-44e6-a641-5609074ff29f/compare?selectedSessions=f6af7213-629c-41d9-b8c4-a60657cc80b4




0it [00:00, ?it/s]

## Part 2: LangGraph with Helpfulness:

### Task 3: Adding Helpfulness Check and "Loop" Limits

Now that we've done evaluation - let's see if we can add an extra step where we review the content we've generated to confirm if it fully answers the user's query!

We're going to make a few key adjustments to account for this:

1. We're going to add an artificial limit on how many "loops" the agent can go through - this will help us to avoid the potential situation where we never exit the loop.
2. We'll add to our existing conditional edge to obtain the behaviour we desire.

First, let's define our state again - we can check the length of the state object, so we don't need additional state for this.

In [27]:
class AgentState(TypedDict):
  messages: Annotated[list, add_messages]

Now we can set our graph up! This process will be almost entirely the same - with the inclusion of one additional node/conditional edge!

#### 🏗️ Activity #4:

Please write markdown for the following cells to explain what each is doing.

##### YOUR MARKDOWN HERE
### Creating the Enhanced Graph with Helpfulness Check

This code initializes a new `StateGraph` that includes an additional helpfulness checking mechanism to ensure the agent provides complete and useful responses.

**What this code does:**

1. **`graph_with_helpfulness_check = StateGraph(AgentState)`**
   - Creates a new state graph using the same `AgentState` structure as the basic agent
   - This graph will be enhanced with helpfulness checking capabilities

2. **`graph_with_helpfulness_check.add_node("agent", call_model)`**
   - Adds the "agent" node that calls the language model
   - This is the same as the basic agent - it processes messages and decides whether to use tools

3. **`graph_with_helpfulness_check.add_node("action", tool_node)`**
   - Adds the "action" node that executes tools when needed
   - This handles the actual tool execution (search, arxiv, etc.)

**Key Difference from Basic Agent:**
This enhanced graph will later include additional logic to:
- Check if the agent's response is helpful enough
- Add loop limits to prevent infinite cycles
- Provide more intelligent stopping conditions

The graph structure starts the same as the basic agent but will be enhanced with conditional edges that evaluate response quality before deciding whether to continue or stop.

In [28]:
graph_with_helpfulness_check = StateGraph(AgentState)

graph_with_helpfulness_check.add_node("agent", call_model)
graph_with_helpfulness_check.add_node("action", tool_node)

<langgraph.graph.state.StateGraph at 0x11c102fd0>

##### YOUR MARKDOWN HERE
### Setting the Entry Point for the Enhanced Graph

This line establishes where the graph execution begins when a new conversation or query is processed.

**What this code does:**

- **`set_entry_point("agent")`** - Designates the "agent" node as the starting point for all graph executions
- When a user sends a message, the graph will always begin by routing to the "agent" node first
- The agent node will then decide the next steps based on the input and current state

**Why start with the agent node?**

1. **Natural Flow**: The agent (LLM) is the "brain" that analyzes the user's request
2. **Decision Making**: The agent determines whether tools are needed or if it can respond directly
3. **State Management**: The agent processes the conversation history and maintains context
4. **Routing Logic**: Based on the agent's analysis, the graph will route to either:
   - The "action" node (if tools are needed)
   - The helpfulness check (if a response is ready)
   - The end (if the response is complete and helpful)

**Graph Flow:**

In [29]:
graph_with_helpfulness_check.set_entry_point("agent")

<langgraph.graph.state.StateGraph at 0x11c102fd0>

##### YOUR MARKDOWN HERE
### Conditional Edge Function: Tool Call or Helpfulness Check

### Conditional Edge Function: Tool Call or Helpfulness Check

This function serves as a sophisticated decision-making node that determines the next step in the graph based on the current state of the conversation.

**Function Purpose:**
The `tool_call_or_helpful` function implements intelligent routing logic that decides whether to:
1. Execute tools (if needed)
2. Check response helpfulness (if response is ready)
3. End the conversation (if response is helpful or max iterations reached)

**Step-by-Step Logic:**

1. **Tool Call Detection:**
   ```python
   if last_message.tool_calls:
       return "action"
   ```
   - If the last message contains tool calls, route to the "action" node
   - This ensures tools are executed when the agent decides they're needed

2. **Loop Limit Protection:**
   ```python
   if len(state["messages"]) > 10:
       return "END"
   ```
   - Prevents infinite loops by limiting conversation to 10 messages
   - Safety mechanism to avoid runaway conversations

3. **Helpfulness Evaluation:**
   ```python
   initial_query = state["messages"][0]      # Original user question
   final_response = state["messages"][-1]    # Most recent response
   ```
   - **Extracts the original user query** (first message in the conversation)
   - **Extracts the last response** (most recent message, which is the agent's current response)
   - Uses a separate LLM to evaluate if this last response adequately addresses the original query

4. **Response Processing:**
   ```python
   if "Y" in helpfulness_response:
       return "end"  # Last response is helpful, end conversation
   else:
       return "continue"  # Last response needs improvement, try again
   ```

**Key Insight:**
The function compares the **original user question** against the **most recent agent response** to determine if the agent has provided a satisfactory answer. If not, it routes back to the agent to try generating a better response.

**Graph Flow:**

In [30]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

def tool_call_or_helpful(state):
  last_message = state["messages"][-1]

  if last_message.tool_calls:
    return "action"

  initial_query = state["messages"][0]
  final_response = state["messages"][-1]

  if len(state["messages"]) > 10:
    return "END"

  prompt_template = """\
  Given an initial query and a final response, determine if the final response is extremely helpful or not. Please indicate helpfulness with a 'Y' and unhelpfulness as an 'N'.

  Initial Query:
  {initial_query}

  Final Response:
  {final_response}"""

  helpfullness_prompt_template = PromptTemplate.from_template(prompt_template)

  helpfulness_check_model = ChatOpenAI(model="gpt-4.1-mini")

  helpfulness_chain = helpfullness_prompt_template | helpfulness_check_model | StrOutputParser()

  helpfulness_response = helpfulness_chain.invoke({"initial_query" : initial_query.content, "final_response" : final_response.content})

  if "Y" in helpfulness_response:
    return "end"
  else:
    return "continue"

##### YOUR MARKDOWN HERE
### Adding Conditional Edges to the Enhanced Graph

This code creates the intelligent routing system that determines the flow of the conversation based on the agent's responses and tool usage decisions.

**What this code does:**

- **`add_conditional_edges("agent", tool_call_or_helpful, {...})`** - Creates conditional routing from the "agent" node
- The `tool_call_or_helpful` function acts as the decision-maker
- The dictionary maps possible return values to destination nodes

**Routing Logic:**

1. **`"continue" : "agent"`**
   - When the helpfulness check determines the response needs improvement
   - Routes back to the agent node to generate a better response
   - Creates a feedback loop for response refinement

2. **`"action" : "action"`**
   - When the agent decides to use tools (tool_calls detected)
   - Routes to the action node to execute the required tools
   - Enables external data gathering and processing

3. **`"end" : END`**
   - When the response is deemed helpful and complete or max conv has been reached
   - Terminates the conversation flow
   - Prevents unnecessary iterations

**Graph Flow Visualization:**

In [31]:
graph_with_helpfulness_check.add_conditional_edges(
    "agent",
    tool_call_or_helpful,
    {
        "continue" : "agent",
        "action" : "action",
        "end" : END
    }
)

<langgraph.graph.state.StateGraph at 0x11c102fd0>

##### YOUR MARKDOWN HERE
### Adding the Action-to-Agent Edge

This code creates a direct connection from the action node back to the agent node, completing the tool execution cycle.

**What this code does:**

- **`add_edge("action", "agent")`** - Creates a direct, unconditional edge from "action" to "agent"
- After tools are executed, the results always flow back to the agent for processing
- This ensures the agent can analyze tool outputs and decide the next steps

**Why this edge is essential:**

1. **Tool Result Processing**: After executing tools (search, arxiv, etc.), the results need to be analyzed
2. **Context Integration**: The agent incorporates tool outputs into its understanding
3. **Next Decision Point**: The agent decides whether to:
   - Use the tool results to answer the user
   - Execute additional tools if needed
   - End the conversation if satisfied

**Complete Graph Flow:**

In [32]:
graph_with_helpfulness_check.add_edge("action", "agent")

<langgraph.graph.state.StateGraph at 0x11c102fd0>

##### YOUR MARKDOWN HERE
### Compiling the Enhanced Graph into an Executable Agent

This line transforms the graph definition into a runnable agent that can process user inputs and execute the complete conversation flow.

**What this code does:**

- **`graph_with_helpfulness_check.compile()`** - Converts the graph structure into an executable agent
- **`agent_with_helpfulness_check =`** - Assigns the compiled agent to a variable for use
- The agent is now ready to process inputs and execute the full conversation logic

**What happens during compilation:**

1. **Graph Validation**: Ensures all nodes and edges are properly connected
2. **State Management**: Sets up the AgentState handling system
3. **Execution Engine**: Creates the runtime that can process streaming inputs
4. **Error Handling**: Prepares the agent for robust execution

**Key Capabilities of the Compiled Agent:**

- **Streaming Support**: Can process inputs with `astream()` for real-time updates
- **State Persistence**: Maintains conversation context across multiple interactions
- **Tool Integration**: Seamlessly executes external tools when needed
- **Helpfulness Checking**: Automatically evaluates response quality
- **Loop Protection**: Prevents infinite conversations with built-in limits

**Usage:**
```python
# The agent is now ready to use
inputs = {"messages": [HumanMessage(content="Your question here")]}
async for chunk in agent_with_helpfulness_check.astream(inputs, stream_mode="updates"):
    # Process streaming updates
```

**From Graph to Agent:**
- **Before Compilation**: Graph definition (nodes, edges, conditions)
- **After Compilation**: Executable agent with full conversation logic
- **Result**: A sophisticated AI agent that can reason, use tools, and self-evaluate

This compiled agent represents the complete implementation of an intelligent conversational AI with built-in quality control and tool usage capabilities.

In [33]:
agent_with_helpfulness_check = graph_with_helpfulness_check.compile()

##### YOUR MARKDOWN HERE
### Testing the Enhanced Agent with Helpfulness Check

This code demonstrates the enhanced agent in action, showing how it processes a query with built-in helpfulness evaluation and tool usage capabilities.

**What this code does:**

1. **Input Preparation:**
   ```python
   inputs = {"messages" : [HumanMessage(content="What are Deep Research Agents?")]}
   ```
   - Creates a conversation input with a specific question about Deep Research Agents
   - Wraps the question in a `HumanMessage` object for proper state management

2. **Streaming Execution:**
   ```python
   async for chunk in agent_with_helpfulness_check.astream(inputs, stream_mode="updates"):
   ```
   - Uses the compiled enhanced agent with helpfulness checking
   - Streams updates in real-time as the agent processes the request
   - `stream_mode="updates"` provides updates when each node completes

3. **Real-time Monitoring:**
   ```python
   for node, values in chunk.items():
       print(f"Receiving update from node: '{node}'")
       print(values["messages"])
       print("\n\n")
   ```
   - Displays which node is currently executing
   - Shows the messages being processed at each step
   - Provides visibility into the agent's decision-making process

**Expected Flow:**
1. **Agent Node**: Analyzes the question about Deep Research Agents
2. **Decision Point**: Determines if tools are needed or can answer directly
3. **Tool Execution** (if needed): Searches for relevant information
4. **Helpfulness Check**: Evaluates if the response adequately answers the question
5. **Final Response**: Either provides the answer or continues refining

**Key Benefits Demonstrated:**
- **Transparency**: You can see exactly what the agent is doing at each step
- **Quality Control**: The agent evaluates its own responses for helpfulness
- **Tool Integration**: Seamlessly uses external tools when needed
- **Iterative Improvement**: Can refine responses if not initially helpful enough

This execution showcases the sophisticated conversation flow of the enhanced agent with built-in quality assurance.

In [34]:
inputs = {"messages" : [HumanMessage(content="What are Deep Research Agents?")]}

async for chunk in agent_with_helpfulness_check.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        print(values["messages"])
        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_pCxqnm2cGFrWRQLXYOtJfOQt', 'function': {'arguments': '{"query": "Deep Research Agents"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}, {'id': 'call_54Nbi9UchSrJozMymUalyGnc', 'function': {'arguments': '{"query": "Deep Research Agents"}', 'name': 'duckduckgo_search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 55, 'prompt_tokens': 208, 'total_tokens': 263, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-nano-2025-04-14', 'system_fingerprint': 'fp_7c233bf9d1', 'id': 'chatcmpl-CLHBUJRSljzrnRnpqxwuj4vSMBcnK', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run--b206e62e-0b58-4ec8-b0b1-68cf3d2207df-0', tool_ca

## Part 3: LangGraph for the "Patterns" of GenAI

### Task 4: Helpfulness Check of Gen AI Pattern Descriptions

Let's ask our system about the 3 main patterns in Generative AI:

1. Context Engineering
2. Fine-tuning
3. Agents

In [35]:
patterns = ["Context Engineering", "Fine-tuning", "LLM-based agents"]

In [36]:
for pattern in patterns:
  what_is_string = f"What is {pattern} and when did it break onto the scene??"
  inputs = {"messages" : [HumanMessage(content=what_is_string)]}
  messages = agent_with_helpfulness_check.invoke(inputs)
  print(messages["messages"][-1].content)
  print("\n\n")

Context Engineering is a discipline focused on providing AI systems, such as large language models (LLMs), with the necessary information and tools to successfully complete specific tasks. It involves building dynamic systems that supply the right data in the appropriate format at the right time, enabling AI to perform reliably and effectively. 

The concept has gained prominence around mid-2025, with references to it breaking onto the scene in June and July of that year. It is considered a core skill for developing powerful AI applications, transforming AI from simple chatbots into more capable and context-aware tools.



Fine-tuning is a machine learning technique used to adapt a pre-trained model to a specific task or dataset. Instead of training a model from scratch, which can be resource-intensive and time-consuming, fine-tuning involves taking an existing model that has already learned general features from a large dataset and then further training it on a smaller, task-specific 