<h1>Using LangGraph with an Amazon SageMaker LLM Endpoint</h1>

<h2>Overview</h2>

<p>
    <a href="https://python.langchain.com/docs/langgraph">LangGraph</a> is a library for building stateful, multi-actor applications with LLMs, built on top of (and intended to be used with) <a href="https://github.com/langchain-ai/langchain">LangChain</a>. There are a number of <a href="https://github.com/langchain-ai/langgraph/tree/main/examples">useful examples</a> for how to use it with the LangGraph documentation, and they are a great way to start out.
</p>
<p>
    <a href="https://aws.amazon.com/pm/sagemaker/">Amazon SageMaker</a> has great support for Large Language Models, and <b>its</b> <a href="https://github.com/aws-samples/amazon-sagemaker-generativeai/">examples repository for generative AI</a> is again a great place for finding out how to use it.
</p>
<p>
    This Notebook merges the two, demonstrating how a <b>simple</b> LangGraph Agent based on this example can be created to run with a LLM deployed in a SageMaker Endpoint. The model used could be replaced with any model deployed in SageMaker, but for this Notebook we are using <a href="https://mistral.ai/news/announcing-mistral-7b/">Mistal 7B</a> due to its high performance and low footprint.
</p>

<h2>Setup / Prerequisites</h2>

<h3>LLM</h3>
<p>
    It is always a good idea to separate the <b>creation</b> of your LLM from the <b>use</b> of it. As this Notebook is about <b>using</b> Mistral 7B with LangGraph, the <b>creation</b> of the SageMaker Endpoint for Mistral 7B is not done here. For details on how to set up Mistral 7B Instruct using SageMaker Jumpstart, please refer to the <a href="https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html">Developer Guide</a>, but it boils down to:
<ol>
    <li>Access Sagemaker Studio through the AWS Console</li>
    <li>Inside Studio, navigate to "Classic"</li>
    <li>Inside Studio Classic, navigate to Jumpstart</li>
    <li>Search for "Mistral 7B Instruct"</li>
    <li>Click "Deploy"</li>
</ol>

<h3>Tavily</h3>
Just like the example in the LangGraph examples repository, this Notebook uses <a href="https://tavily.com/">Tavily</a> for its web searching functionality. This is easy to do as the connections are already built into LangChain. In order to use Tavily, you need to have an API token. For small/temporary use like this, you can <a href="https://app.tavily.com/sign-in">get a free one</a>. You will need to enter this into the Notebook in order for it to work. Please see the <b>README</b> in the main repository folder for details around using 3rd party services like Tavily.
</p>

<h2>Installing and updating to latest versions of SageMaker and LangChain/LangGraph</h2>

In [20]:
%pip install --upgrade pip --root-user-action=ignore --quiet #One should always get latest of pip, just in case...
%pip install sagemaker boto3 huggingface_hub --upgrade --root-user-action=ignore --quiet #These are used to interface with Amazon SageMaker
%pip install -U langchain langgraph langchainhub --root-user-action=ignore --quiet #And this is the LangGraph bit

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


<h2>Some sample questions that we want answered</h2>

<p>
Throughout this notebook, we'll make use of a set of questions that we will apply to "test out" our LangGraph setup. The idea is simple: We'll first try the questions against an unaided version of our LLM (maybe it's good enough to answer them just based on its own internal knowledge base that it built up using its training data?). We will then run the same questions against the setup where the model is managed by LangGraph and has access to tooling for searching the Web.
</p>
<p>
    <b>DISCLAIMER!:</b> These questions are picked at random, based on being illustrative for this Notebook.    
</p>

In [3]:
questions_and_answers = [
    {
        "question": "What is the recipe of mayonnaise?",
        "answers": {}
    },
    {
        "question": "What is the name of the latest storm to hit the UK, and when and where did it cause the most damage?",
        "answers": {}
    }
    ]

<h2>Setting up and testing access to Mistral 7B Instruct model running as a SageMaker endpoint</h2>

In [4]:
import json
import boto3

#Reference to SageMaker Inference Endpoint in account that this notebook has access to. If you are running this in SageMaker Studio, then the access is implicit as long as the endpoint is in the same Domain and permissioned with defaults
endpoint_name = input("SakeMaker Endpoint Name:")

SakeMaker Endpoint Name: jumpstart-dft-hf-llm-mistral-7b-instruct


<h3>Some helper functions for asking the LLM a simple question using plain boto3 APIs <b>without</b> using LangChain</h3>
<p>
    These are not required for the LangGraph implementation and are <b>very</b> boilerplate.
</p>

In [5]:
client = boto3.client("runtime.sagemaker")

def query_endpoint(payload):    
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/json", Body=json.dumps(payload).encode("utf-8")
    )
    response = response["Body"].read().decode("utf8")
    response = json.loads(response)
    return response
from typing import Dict, List


def format_instructions(instructions: List[Dict[str, str]]) -> List[str]:
    """Format instructions where conversation roles must alternate user/assistant/user/assistant/..."""
    prompt: List[str] = []
    for user, answer in zip(instructions[::2], instructions[1::2]):
        prompt.extend(["<s>", "[INST] ", (user["content"]).strip(), " [/INST] ", (answer["content"]).strip(), "</s>"])
    prompt.extend(["<s>", "[INST] ", (instructions[-1]["content"]).strip(), " [/INST] "])
    return "".join(prompt)


def print_instructions(prompt: str, response: str) -> None:
    bold, unbold = '\033[1m', '\033[0m'
    print(f"{bold}> Input{unbold}\n{prompt}\n\n{bold}> Output{unbold}\n{response[0]['generated_text']}\n")
    
def ask_question(question):
    instructions = [{"role": "user", "content": question}]

    prompt = format_instructions(instructions)
    payload = {
        "inputs": prompt,
        "parameters": {"max_new_tokens": 500, "do_sample": True, "temperature": 0.001}
    }

    response = query_endpoint(payload)
    print_instructions(prompt, response)
    return response[0]['generated_text']
    

<h3>Asking our questions to the LLM without LangChain or LangGraph</h3>
<p>
    This is where we test out our SageMaker endpoint <b>without</b> using LangGraph, to give us that baseline.
</p>

In [6]:
for question in questions_and_answers:
    question["answers"]["llm"] = ask_question(question["question"])

[1m> Input[0m
<s>[INST] What is the recipe of mayonnaise? [/INST] 

[1m> Output[0m
1. Gather all ingredients: 
- 2 large egg yolks
- 1 tablespoon Dijon mustard
- 1 tablespoon white wine vinegar
- 1 tablespoon lemon juice
- 1/2 teaspoon salt
- 1/4 teaspoon sugar
- 1 cup vegetable oil
- 1/4 cup olive oil
- 1 tablespoon chopped fresh herbs (optional)

2. In a medium bowl, whisk together egg yolks, mustard, vinegar, lemon juice, salt, and sugar until well combined.

3. Slowly drizzle in the vegetable oil and olive oil, whisking constantly, until the mixture thickens and emulsifies.

4. Taste and adjust seasoning as needed. Stir in chopped fresh herbs, if using.

5. Cover the bowl with plastic wrap and refrigerate for at least 30 minutes before serving to allow the flavors to develop.

6. Serve chilled and enjoy your homemade mayonnaise!

[1m> Input[0m
<s>[INST] What is the name of the latest storm to hit the UK, and when and where did it cause the most damage? [/INST] 

[1m> Output

<h2>Setting up and Testing LangChain access to the SageMaker Endpoint</h2>

<h3>Hooking up the LLM to LangChain, and setting Content Handlers</h3>

In [7]:
from langchain.prompts import PromptTemplate
from langchain_community.llms import SagemakerEndpoint
from langchain_community.llms.sagemaker_endpoint import LLMContentHandler

class ContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs: Dict) -> bytes:
        #print("transforming_input") #for debugging
        #print(prompt) #for debugging
        input_str = json.dumps({"inputs": prompt, "parameters": model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        #print("transforming_output") #for debugging
        decoded_output = output.read().decode("utf-8")
        #print("decoded: " +  decoded_output)
        response_json = json.loads(decoded_output)
        response = response_json[0]["generated_text"]
        #print("response: " + response) #for debugging
        return response

content_handler = ContentHandler()

llm=SagemakerEndpoint(
        endpoint_name=endpoint_name,
        region_name="us-east-1",
        model_kwargs={"max_new_tokens": 500, "do_sample": True, "temperature": 0.001}, #extending the max_tokens is VITAL, as the response will otherwise be cut, breaking the agent functionality by not giving it access to the LLM's full answer. The value has been picked empirically
        content_handler=content_handler
    )

<h3>Asking our questions to the LLM with LangChain but still without LangGraph</h3>
<p>
    This is where we test out our Lang<b>Chain</b> setup, to see if it makes any difference against the baseline (it shouldn't, apart from the random differences that we get from the non deterministic nature of the LLM...).
</p>

In [10]:
for question in questions_and_answers:
    print("Question: " + question["question"])
    answer = llm.invoke(question["question"])
    print("Answer: " + answer)
    question["answers"]["langchain"] = answer

Question: What is the recipe of mayonnaise?
Answer: 

Ingredients:
- 2 large egg yolks
- 1 tablespoon Dijon mustard
- 2 tablespoons white wine vinegar
- 1 tablespoon lemon juice
- 1/2 teaspoon salt
- 1/4 teaspoon sugar
- 1 cup vegetable oil or canola oil

Instructions:
1. In a medium bowl, whisk together the egg yolks, mustard, vinegar, lemon juice, salt, and sugar until well combined.
2. Slowly drizzle in the oil, whisking constantly, until the mixture thickens and emulsifies.
3. Taste and adjust seasoning as needed.
4. Cover the bowl with plastic wrap and refrigerate for at least 30 minutes before serving to allow the flavors to meld.
5. Mayonnaise can be stored in an airtight container in the refrigerator for up to 7 days.
Question: What is the name of the latest storm to hit the UK, and when and where did it cause the most damage?
Answer: 

The latest storm to hit the UK is Storm Dennis, which hit the UK on February 14th and 15th, 2020. The storm caused the most damage in the South

<h2>Setting up LangGraph</h2>
<p>
    For this first example, we will replicate the basics of the high level "Agent Executor" example in the <a href="https://github.com/langchain-ai/langgraph/blob/main/examples/agent_executor/high-level.ipynb">LangGraph github repo found here</a>.
</p>

In [14]:
import os
import getpass

os.environ["TAVILY_API_KEY"] = getpass.getpass("Tavily API Key:")

Tavily API Key: ········


<h2>Create the LangChain agent and Define the Graph</h2>

In [15]:
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain import hub
from langchain.agents import AgentExecutor, create_xml_agent
from langgraph.prebuilt import create_agent_executor

tools = [TavilySearchResults(max_results=1)]

# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/xml-agent-convo")
print(prompt)

# Construct the XML agent using the LLM that we defined earlier
agent_runnable  = create_xml_agent(llm, tools, prompt)
app = create_agent_executor(agent_runnable, tools)

input_variables=['agent_scratchpad', 'input', 'tools'] partial_variables={'chat_history': ''} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['agent_scratchpad', 'chat_history', 'input', 'tools'], template="You are a helpful assistant. Help the user answer any questions.\n\nYou have access to the following tools:\n\n{tools}\n\nIn order to use a tool, you can use <tool></tool> and <tool_input></tool_input> tags. You will then get back a response in the form <observation></observation>\nFor example, if you have a tool called 'search' that could run a google search, in order to search for the weather in SF you would respond:\n\n<tool>search</tool><tool_input>weather in SF</tool_input>\n<observation>64 degrees</observation>\n\nWhen you are done, respond with a final answer between <final_answer></final_answer>. For example:\n\n<final_answer>The weather in SF is 64 degrees</final_answer>\n\nBegin!\n\nPrevious Conversation:\n{chat_history}\n\nQuestion: {input}\n{a

<h2>Ask the Agent our questions, and trace the calls through the Graph</h2>

In [17]:
from langchain_core.agents import AgentFinish

def ask_langgraph_question(question):
    inputs = {"input": question, "chat_history": []}
    print("Asking question: " + question + "\n")
    
    stepIndex = 1
    agentFinish = None 
    for s in app.stream(inputs):
        print("Step " + str(stepIndex) + ":")
        agentValues = list(s.values())[0]
        print(agentValues)
        print("----")
        stepIndex = stepIndex + 1
        if 'agent_outcome' in agentValues and isinstance(agentValues['agent_outcome'], AgentFinish):
            agentFinish = agentValues['agent_outcome']


    print("Final Outcome:\n")
    print(agentFinish)
    print("----\n")
    return agentFinish.return_values["output"]
    
    
for question in questions_and_answers:
    question["answers"]["langgraph"] = ask_langgraph_question(question["question"])

Asking question: What is the recipe of mayonnaise?

Step 1:
{'agent_outcome': AgentAction(tool='tavily_search_results_json', tool_input='recipe of mayonnaise', log='\n<tool>tavily_search_results_json</tool><tool_input>recipe of mayonnaise')}
----
Step 2:
{'intermediate_steps': [(AgentAction(tool='tavily_search_results_json', tool_input='recipe of mayonnaise', log='\n<tool>tavily_search_results_json</tool><tool_input>recipe of mayonnaise'), '[{\'url\': \'https://downshiftology.com/recipes/how-to-make-homemade-mayonnaise/\', \'content\': "Homemade Mayonnaise Ingredients  Homemade Mayonnaise Tips  Tasty Recipes That Use Mayonnaise  How To Fix Broken MayonnaiseAug 6, 2023 — Aug 6, 2023Learn how to make mayonnaise at home with this easy, foolproof recipe! It\'s so fresh, creamy, and better than store-bought."}]')]}
----
Step 3:
{'agent_outcome': AgentFinish(return_values={'output': 'Here is a recipe for homemade mayonnaise: <a href="https://downshiftology.com/recipes/how-to-make-homemade-ma

<h2>Compare our Approaches</h2>
<p>
We've done all the work! 
</p>
<p>
Now we just need to collect it all in a way where we can compare them, and hopefully see the benefits of using LangGraph over simply using the LLM's own knowledge base!
</p>
<p>
As before, the LLM is non deterministic, so your answers may not match mine, but for me, LangGraph:
    <ul>
        <li>gave me a link to a recipe for mayo instead of trying to list it</li>
        <li>correctly identified that the LLM's data model for the "latest" storms was outdated instead of giving an incorrect and outdated answer</li>
    </ul>
</p>

In [18]:
import pandas as pd
from IPython.display import display, HTML
from tabulate import tabulate

display_table = [["Question", "LLM Only", "LangChain", "LangGraph"]]
for qa in questions_and_answers:
    display_table.append([qa["question"], qa["answers"]["llm"], qa["answers"]["langchain"], qa["answers"]["langgraph"]])
    
df = pd.DataFrame(display_table)
df.style.set_properties(**{'text-align': 'left'}).set_table_styles([ dict(selector='th', props=[('text-align', 'left')] ) ])
pd.set_option('display.max_colwidth', None)

display(HTML(df.to_html().replace("\\n","<br>")))


  from pandas.core.computation.check import NUMEXPR_INSTALLED
  from pandas.core import (


Unnamed: 0,0,1,2,3
0,Question,LLM Only,LangChain,LangGraph
1,What is the recipe of mayonnaise?,"1. Gather all ingredients: - 2 large egg yolks - 1 tablespoon Dijon mustard - 1 tablespoon white wine vinegar - 1 tablespoon lemon juice - 1/2 teaspoon salt - 1/4 teaspoon sugar - 1 cup vegetable oil - 1/4 cup olive oil - 1 tablespoon chopped fresh herbs (optional) 2. In a medium bowl, whisk together egg yolks, mustard, vinegar, lemon juice, salt, and sugar until well combined. 3. Slowly drizzle in the vegetable oil and olive oil, whisking constantly, until the mixture thickens and emulsifies. 4. Taste and adjust seasoning as needed. Stir in chopped fresh herbs, if using. 5. Cover the bowl with plastic wrap and refrigerate for at least 30 minutes before serving to allow the flavors to develop. 6. Serve chilled and enjoy your homemade mayonnaise!","Ingredients: - 2 large egg yolks - 1 tablespoon Dijon mustard - 2 tablespoons white wine vinegar - 1 tablespoon lemon juice - 1/2 teaspoon salt - 1/4 teaspoon sugar - 1 cup vegetable oil or canola oil Instructions: 1. In a medium bowl, whisk together the egg yolks, mustard, vinegar, lemon juice, salt, and sugar until well combined. 2. Slowly drizzle in the oil, whisking constantly, until the mixture thickens and emulsifies. 3. Taste and adjust seasoning as needed. 4. Cover the bowl with plastic wrap and refrigerate for at least 30 minutes before serving to allow the flavors to meld. 5. Mayonnaise can be stored in an airtight container in the refrigerator for up to 7 days.","Here is a recipe for homemade mayonnaise: <a href=""https://downshiftology.com/recipes/how-to-make-homemade-mayonnaise/"">https://downshiftology.com/recipes/how-to-make-homemade-mayonnaise/</a>"
2,"What is the name of the latest storm to hit the UK, and when and where did it cause the most damage?","1. Storm Arwen: This storm hit the UK on November 25th and 26th, 2021. It caused the most damage in Scotland, particularly in the north and northeast regions. The storm brought strong winds, heavy rainfall, and even snow in some areas. It caused widespread power outages, flooding, and damage to buildings and infrastructure.","The latest storm to hit the UK is Storm Dennis, which hit the UK on February 14th and 15th, 2020. The storm caused the most damage in the South East of England, particularly in Kent and Sussex, where heavy rainfall and strong winds caused flooding, landslides, and damage to buildings and infrastructure.","The name of the latest storm to hit the UK is Storm Henk, and it caused the most damage in the south-west of England."
