# Persistence and Streaming

Based on [**this tutorial**](https://learn.deeplearning.ai/courses/ai-agents-in-langgraph/lesson/5/persistence-and-streaming)

# Setup

In [None]:
from dotenv import load_dotenv

In [None]:
_ = load_dotenv()

# Imports and Basic Implementation

In [1]:
import operator
from typing import Annotated, TypedDict

from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.messages import AnyMessage, HumanMessage, SystemMessage, ToolMessage
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END

In [2]:
# Instanciate Search Tool
tool = TavilySearchResults(max_results=2)

In [3]:
# Define AgentState
class AgentState(TypedDict):
    messages: Annotated[list[AnyMessage], operator.add]

# Persistence

In order to deal with **persistence**, we will use what's called a **checkpointer** into LangGraph.

It basically **checkpoints the state between every node**.

Here, we'll make use of a `SqliteSaver`, and use it "in memory"

In [4]:
from langgraph.checkpoint.sqlite import SqliteSaver

In [5]:
memory = SqliteSaver.from_conn_string(":memory:")

It's then really easy to incorporate it within our `Agent` class.

In [6]:
# Define Agent
class Agent:
    def __init__(self, model, tools, checkpointer, system=""):  # 👈 REFERENCE TO CHECKPOINTER
        self.system = system
        graph = StateGraph(AgentState)
        graph.add_node("llm", self.call_openai)
        graph.add_node("action", self.take_action)
        graph.add_conditional_edges("llm", self.exists_action, {True: "action", False: END})
        graph.add_edge("action", "llm")
        graph.set_entry_point("llm")
        self.graph = graph.compile(checkpointer=checkpointer)  # 👈 REFERENCE TO CHECKPOINTER
        self.tools = {t.name: t for t in tools}
        self.model = model.bind_tools(tools)

    def call_openai(self, state: AgentState):
        messages = state['messages']
        if self.system:
            messages = [SystemMessage(content=self.system)] + messages
        message = self.model.invoke(messages)
        return {'messages': [message]}

    def exists_action(self, state: AgentState):
        result = state['messages'][-1]
        return len(result.tool_calls) > 0

    def take_action(self, state: AgentState):
        tool_calls = state['messages'][-1].tool_calls
        results = []
        for t in tool_calls:
            print(f"Calling: {t}")
            result = self.tools[t['name']].invoke(t['args'])
            results.append(ToolMessage(tool_call_id=t['id'], name=t['name'], content=str(result)))
        print("Back to the model!")
        return {'messages': results}

> **NOTE**
>
> For data persistence, it can also be made use of other databases, or [**Redis**](https://redis.io/) for example.

In [7]:
prompt = """You are a smart research assistant. Use the search engine to look up information. \
You are allowed to make multiple calls (either together or in sequence). \
Only look up information when you are sure of what you want. \
If you need to look up some information before asking a follow up question, you are allowed to do that!
"""
model = ChatOpenAI(model="gpt-3.5-turbo")  # 👈 Change to -4o for POC
abot = Agent(model, [tool], system=prompt, checkpointer=memory)

# Streaming

In [8]:
messages = [HumanMessage(content="What is the weather in sf?")]

We'll now implement **threads**, in order to keep track of different conversations.

They can simply be configured in the following way:

In [9]:
thread = {"configurable": {"thread_id": "1"}}

We'll now call the graph, not with `invoke`, but with `stream`, passing:
- the same dictionary,
- `thread` as a second argument.

We're then gonna get back a **stream** of events.



The following cell is here to temporarily manage an issue with LangSmith, don't bother with understanding it for the moment.

In [11]:
!export LANGCHAIN_TRACING_V2="false"

## Initial Stream

In [12]:
for event in abot.graph.stream({"messages": messages}, thread):
    for v in event.values():
        print(v["messages"])

[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_eqDDo3U0wuFrWTitkRpMCIw4', 'function': {'arguments': '{"query":"weather in San Francisco"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 1186, 'total_tokens': 1207}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-af070dca-58fc-42d1-9d5b-c2d081e72a86-0', tool_calls=[{'name': 'tavily_search_results_json', 'args': {'query': 'weather in San Francisco'}, 'id': 'call_eqDDo3U0wuFrWTitkRpMCIw4'}], usage_metadata={'input_tokens': 1186, 'output_tokens': 21, 'total_tokens': 1207})]
Calling: {'name': 'tavily_search_results_json', 'args': {'query': 'weather in San Francisco'}, 'id': 'call_eqDDo3U0wuFrWTitkRpMCIw4'}
Back to the model!
[AIMessage(content='The current weather in San Francisco is clear with a temperature of 54.0°F (12.2°C). The wind is blowing at 4.3 mph 

We get back a stream of events:

> - **first, we get an `AIMessage`, which is the first result from the language model**.

```python
AIMessage(
    content='',
    additional_kwargs={
        'tool_calls': [{
            'id': 'call_eqDDo3U0wuFrWTitkRpMCIw4',
            'function': {
                'arguments': '{"query":"weather in San Francisco"}',
                'name': 'tavily_search_results_json'}, 
                'type': 'function'
            }]
        }, 
    response_metadata={
        'token_usage': {'completion_tokens': 21, 'prompt_tokens': 1186, 'total_tokens': 1207}, 
        'model_name': 'gpt-3.5-turbo', 
        'system_fingerprint': None, 
        'finish_reason': 'tool_calls', 
        'logprobs': None
        }, 
    id='run-af070dca-58fc-42d1-9d5b-c2d081e72a86-0', 
    tool_calls=[{
        'name': 'tavily_search_results_json', 
        'args': {'query': 'weather in San Francisco'}, 
        'id': 'call_eqDDo3U0wuFrWTitkRpMCIw4'
        }], 
    usage_metadata={'input_tokens': 1186, 'output_tokens': 21, 'total_tokens': 1207}
)
```
> - **It tells us to call `tavily`, which is logged with this printing**:

```python
Calling: {
    'name': 'tavily_search_results_json',
    'args': {'query': 'weather in San Francisco'},
    'id': 'call_eqDDo3U0wuFrWTitkRpMCIw4'
}
```
> - **Then, the action is performed, is logged with the printing of `Back to the model!`, and we get the following `ToolMessage` (I won't parse it now as it doesn't really improves readability), which is the result of calling `tavily` and, hence, the result of the search**:

```python
ToolMessage(
    content='[{\'url\': \'https://www.weatherapi.com/\', \'content\': "{\'location\': {\'name\': \'San Francisco\', \'region\': \'California\', \'country\': \'United States of America\', \'lat\': 37.78, \'lon\': -122.42, \'tz_id\': \'America/Los_Angeles\', \'localtime_epoch\': 1717751204, \'localtime\': \'2024-06-07 2:06\'}, \'current\': {\'last_updated_epoch\': 1717750800, \'last_updated\': \'2024-06-07 02:00\', \'temp_c\': 12.2, \'temp_f\': 54.0, \'is_day\': 0, \'condition\': {\'text\': \'Clear\', \'icon\': \'//cdn.weatherapi.com/weather/64x64/night/113.png\', \'code\': 1000}, \'wind_mph\': 4.3, \'wind_kph\': 6.8, \'wind_degree\': 10, \'wind_dir\': \'N\', \'pressure_mb\': 1011.0, \'pressure_in\': 29.84, \'precip_mm\': 0.0, \'precip_in\': 0.0, \'humidity\': 93, \'cloud\': 0, \'feelslike_c\': 10.7, \'feelslike_f\': 51.2, \'windchill_c\': 9.8, \'windchill_f\': 49.6, \'heatindex_c\': 11.4, \'heatindex_f\': 52.6, \'dewpoint_c\': 9.3, \'dewpoint_f\': 48.8, \'vis_km\': 14.0, \'vis_miles\': 8.0, \'uv\': 1.0, \'gust_mph\': 14.0, \'gust_kph\': 22.5}}"}, {\'url\': \'https://www.weather.gov/index.php/mtr/\', \'content\': \'Current Conditions showing NA; Customize Your Weather.gov. Enter Your City, ST or ZIP Code ... 2024 at 9:40:09 am PDT Watches, Warnings & Advisories. Zoom Out. Excessive Heat Warning. Gale Warning. Heat Advisory. Small Craft Advisory. ... National Weather Service San Francisco Bay Area, CA 21 Grace Hopper Ave, Stop 5 Monterey, CA 93943-5505\'}]', 
    name='tavily_search_results_json', 
    tool_call_id='call_eqDDo3U0wuFrWTitkRpMCIw4'
)
```
> - **Finally, there's an `AIMessage`, which is the result of the LLM, answering our question**:

```python
AIMessage(
    content='The current weather in San Francisco is clear with a temperature of 54.0°F (12.2°C). The wind is blowing at 4.3 mph from the north, and the humidity is at 93%.', 
    response_metadata={
        'token_usage': {'completion_tokens': 46, 'prompt_tokens': 1732, 'total_tokens': 1778}, 
        'model_name': 'gpt-3.5-turbo', 
        'system_fingerprint': None, 
        'finish_reason': 'stop', 
        'logprobs': None
    }, 
    id='run-de9261f7-2481-4194-b1ac-0e9e5cebaf7b-0', 
    usage_metadata={'input_tokens': 1732, 'output_tokens': 46, 'total_tokens': 1778}
)
```

With this `stream` method, we get back all of these intermediate results, and we have a good visibility of what exactly is going on.

## Follow-Up Question

In [13]:
messages = [HumanMessage(content="What about in la?")]
thread = {"configurable": {"thread_id": "1"}}
for event in abot.graph.stream({"messages": messages}, thread):
    for v in event.values():
        print(v)

{'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_qqHKrIxFv4OIpxO5p1JiekdC', 'function': {'arguments': '{"query":"weather in Los Angeles"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 1790, 'total_tokens': 1811}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-bf5f9952-2066-46b4-9051-8985d832bc51-0', tool_calls=[{'name': 'tavily_search_results_json', 'args': {'query': 'weather in Los Angeles'}, 'id': 'call_qqHKrIxFv4OIpxO5p1JiekdC'}], usage_metadata={'input_tokens': 1790, 'output_tokens': 21, 'total_tokens': 1811})]}
Calling: {'name': 'tavily_search_results_json', 'args': {'query': 'weather in Los Angeles'}, 'id': 'call_qqHKrIxFv4OIpxO5p1JiekdC'}
Back to the model!
{'messages': [ToolMessage(content='[{\'url\': \'https://www.weatherapi.com/\', \'content\': "{\'location\': {\'name\': \'Los Ang

This is **continuing the conversation from before, with asking a follow-up question**.

This isn't explicitely mentioned within the query, but it will work **as we mentioned the same `thread_id`**.

## Comparing Previous Answers

In [14]:
messages = [HumanMessage(content="Which one is warmer?")]
thread = {"configurable": {"thread_id": "1"}}
for event in abot.graph.stream({"messages": messages}, thread):
    for v in event.values():
        print(v)

{'messages': [AIMessage(content='Los Angeles is slightly warmer than San Francisco. Los Angeles currently has a temperature of 60.1°F (15.6°C), while San Francisco has a temperature of 54.0°F (12.2°C).', response_metadata={'token_usage': {'completion_tokens': 46, 'prompt_tokens': 2364, 'total_tokens': 2410}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-576f4331-b701-4b2c-882c-deb54d982348-0', usage_metadata={'input_tokens': 2364, 'output_tokens': 46, 'total_tokens': 2410})]}


## Changing `thread_id` ➡️ **REWORK IT**

> **NOTE**
> 
> Strangely, it works... 😱

In [16]:
messages = [HumanMessage(content="Which one is warmer?")]
thread = {"configurable": {"thread_id": "2"}}
for event in abot.graph.stream({"messages": messages}, thread):
    for v in event.values():
        print(v)

{'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_6pT9LoHPIkJHlQxXZBPSgCLT', 'function': {'arguments': '{"query": "temperature in Los Angeles"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}, {'id': 'call_UgrAKwuct414HYa99uk6cxN8', 'function': {'arguments': '{"query": "temperature in New York City"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 58, 'prompt_tokens': 1171, 'total_tokens': 1229}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-7b73501c-14fb-4cdf-ab30-95e199de4fcc-0', tool_calls=[{'name': 'tavily_search_results_json', 'args': {'query': 'temperature in Los Angeles'}, 'id': 'call_6pT9LoHPIkJHlQxXZBPSgCLT'}, {'name': 'tavily_search_results_json', 'args': {'query': 'temperature in New York City'}, 'id': 'call_UgrAKwuct414HYa99uk6cxN8'}], usage_metadata={'input_tokens': 1171, 'output_tokens'

# Streaming Tokens

In [23]:
try:
    from langgraph.checkpoint.aiosqlite import AsyncSqliteSaver
except ImportError as e:
    print(e)

If the previous code doesn't work, it can be fixed with installing [**aiosqlite**](https://pypi.org/project/aiosqlite/), as mentioned in [**AsyncSqliteSaver's documentation**](https://langchain-ai.github.io/langgraph/reference/checkpoints/#asyncsqlitesaver).

In [25]:
messages = [HumanMessage(content="What is the weather in SF?")]
thread = {"configurable": {"thread_id": "4"}}
async for event in abot.graph.astream_events({"messages": messages}, thread, version="v1"):
    kind = event["event"]
    if kind == "on_chat_model_stream":
        content = event["data"]["chunk"].content
        if content:
            # Empty content in the context of OpenAI means
            # that the model is asking for a tool to be invoked.
            # So we only print non-empty content
            print(content, end="|")

NotImplementedError: The SqliteSaver does not support async methods. Consider using AsyncSqliteSaver instead.
from langgraph.checkpoint.aiosqlite import AsyncSqliteSaver
Note: AsyncSqliteSaver requires the aiosqlite package to use.
Install with:
`pip install aiosqlite`
See https://langchain-ai.github.io/langgraph/reference/checkpoints/#asyncsqlitesaverfor more information.