# Filtering and trimming messages

## Review

Now, we have a deeper understanding of a few things:

* How to customize the graph state schema
* How to define custom state reducers
* How to use multiple graph state schemas

## Goals

Now, we can start using these concepts with models in LangGraph!

In the next few sessions, we'll build towards a chatbot that has long-term memory.

Because our chatbot will use messages, let's first talk a bit more about advanced ways to work with messages in graph state.

In [140]:
%%capture --no-stderr
%pip install --quiet -U langchain langchain_core langgraph langchain_google_genai python-dotenv


In [None]:
import os
from dotenv import load_dotenv, dotenv_values

# Load environment variables from .env.example
load_dotenv("../.env.example")

def debug_api_key(key_name):
    print(f"\nDebugging {key_name}:")
    
    # Check environment variable
    env_value = os.getenv(key_name)
    print(f"1. Value from os.getenv('{key_name}'): {env_value}")
    
    # Check .env.example file directly
    config = dotenv_values("../.env.example")
    dotenv_value = config.get(key_name)
    print(f"2. Value from .env.example: {dotenv_value}")
    
    # Read .env.example file manually
    try:
        with open("../.env.example", 'r') as f:
            content = f.read()
            print(f"3. Content of .env.example:")
            print(content)
    except FileNotFoundError:
        print("3. Error: .env.example file not found")
    
    # Try to parse the value manually
    if dotenv_value:
        cleaned_value = dotenv_value.strip().strip("'").strip('"')
        print(f"4. Cleaned value: {cleaned_value}")
        
        # Set the environment variable
        os.environ[key_name] = cleaned_value
        print(f"5. Environment variable set. New value: {os.getenv(key_name)}")
    else:
        print("4. Unable to parse value from .env.example")

# Debug both API keys
debug_api_key('GOOGLE_API_KEY')
debug_api_key('LANGCHAIN_API_KEY')

print("\nFinal environment variable values:")
print(f"GOOGLE_API_KEY: {os.getenv('GOOGLE_API_KEY')}")
print(f"LANGCHAIN_API_KEY: {os.getenv('LANGCHAIN_API_KEY')}")

print("\nExample of how to use the API keys in your code:")
print("import os")
print("google_api_key = os.environ.get('GOOGLE_API_KEY')")
print("langchain_api_key = os.environ.get('LANGCHAIN_API_KEY')")
print("print(f'My Google API Key: {google_api_key}')")
print("print(f'My LangChain API Key: {langchain_api_key}')")

We'll use [LangSmith](https://docs.smith.langchain.com/) for [tracing](https://docs.smith.langchain.com/concepts/tracing).

We'll log to a project, `langchain-academy`.

In [2]:
from langsmith import Client

os.environ["LANGCHAIN_API_KEY"] = os.environ.get('LANGCHAIN_API_KEY')
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["GOOGLE_API_KEY"] = os.environ.get('GOOGLE_API_KEY')
os.environ["LANGCHAIN_ENDPOINT"]="https://api.smith.langchain.com"
os.environ["LANGCHAIN_PROJECT"] = "langraphlearning2.0"
client = Client()


In [None]:
print(f"LANGCHAIN_TRACING_V2: {os.getenv('LANGCHAIN_TRACING_V2')}")
print(f"LANGCHAIN_PROJECT: {os.getenv('LANGCHAIN_PROJECT')}")
print({os.getenv("LANGCHAIN_API_KEY")})
print({os.getenv("LANGCHAIN_ENDPOINT")})

## Messages as state

First, let's define some messages.

In [None]:
from pprint import pprint
from langchain_core.messages import AIMessage, HumanMessage
messages = [AIMessage(f"So you said you were researching llms?", name="AI")]
messages.append(HumanMessage(f"Yes, I know about chatgpt 4o. But what others should I learn about?", name="Shayan"))

for m in messages:
    m.pretty_print()

Recall we can pass them to a chat model.

In [9]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0,

)

We can run our chat model in a simple graph with `MessagesState`.

In [None]:
from IPython.display import Image, display
from langgraph.graph import MessagesState
from langgraph.graph import StateGraph, START, END
from langgraph.graph.state import CompiledStateGraph

# Node
def chat_model_node(state: MessagesState) -> MessagesState:
    return {"messages": llm.invoke(state["messages"])}

# Build graph
builder: StateGraph = StateGraph(MessagesState)
builder.add_node("chat_model", chat_model_node)
builder.add_edge(START, "chat_model")
builder.add_edge("chat_model", END)
graph: CompiledStateGraph = builder.compile()

# View
display(Image(graph.get_graph().draw_mermaid_png()))

In [None]:
output = graph.invoke({'messages': messages})
for m in output['messages']:
    m.pretty_print()

In [None]:
async for m in graph.astream_events({'messages': messages}, version="v2"): # The version argument is now correctly passed as a keyword argument.
      print(m)
      print("\n--------------\n")

## Reducer

A practical challenge when working with messages is managing long-running conversations.

Long-running conversations result in high token usage and latency if we are not careful, because we pass a growing list of messages to the model.

We have a few ways to address this.

First, recall the trick we saw using `RemoveMessage` and the `add_messages` reducer.

In [None]:
from langchain_core.messages import RemoveMessage

# Nodes
def filter_messages(state: MessagesState) -> MessagesState:
    # Delete all but the 2 most recent messages
    delete_messages = [RemoveMessage(id=m.id) for m in state["messages"][:-2]]
    return {"messages": delete_messages}

def chat_model_node(state: MessagesState) -> MessagesState:
    return {"messages": [llm.invoke(state["messages"])]}

# Build graph
builder: StateGraph = StateGraph(MessagesState)
builder.add_node("filter", filter_messages)
builder.add_node("chat_model", chat_model_node)
builder.add_edge(START, "filter")
builder.add_edge("filter", "chat_model")
builder.add_edge("chat_model", END)
graph: CompiledStateGraph = builder.compile()

# View
display(Image(graph.get_graph().draw_mermaid_png()))

In [None]:
# Message list with a preamble
messages = [AIMessage("Hi.", name="AI", id="1")]
messages.append(HumanMessage("Hi.", name="Shayan", id="2"))
messages.append(AIMessage("So you said you were researching about llms?", name="AI", id="3"))
messages.append(HumanMessage("Yes, I know about llms. But what others should I learn about?", name="Shayan", id="4"))

# Invoke
output = graph.invoke({'messages': messages})
for m in output['messages']:
    m.pretty_print()

## Filtering messages

If you don't need or want to modify the graph state, you can just filter the messages you pass to the chat model.

For example, just pass in a filtered list: `llm.invoke(messages[-1:])` to the model.

In [None]:
# Node
def chat_model_node(state: MessagesState):
    return {"messages": [llm.invoke(state["messages"][-1:])]}

# Build graph
builder: StateGraph = StateGraph(MessagesState)
builder.add_node("chat_model", chat_model_node)
builder.add_edge(START, "chat_model")
builder.add_edge("chat_model", END)
graph: CompiledStateGraph = builder.compile()

# View
display(Image(graph.get_graph().draw_mermaid_png()))

Let's take our existing list of messages, append the above LLM response, and append a follow-up question.

In [16]:
messages.append(output['messages'][-1])
messages.append(HumanMessage(f"Tell me more about llms!", name="Shayan"))

In [None]:
for m in messages:
    m.pretty_print()

In [None]:
# Invoke, using message filtering
output = graph.invoke({'messages': messages})
for m in output['messages']:
    m.pretty_print()

The state has all of the mesages.

But, let's look at the LangSmith trace to see that the model invocation only uses the last message:

https://smith.langchain.com/public/75aca3ce-ef19-4b92-94be-0178c7a660d9/r

## Trim messages

Another approach is to [trim messages](https://python.langchain.com/v0.2/docs/how_to/trim_messages/#getting-the-last-max_tokens-tokens), based upon a set number of tokens.

This restricts the message history to a specified number of tokens.

While filtering only returns a post-hoc subset of the messages between agents, trimming restricts the number of tokens that a chat model can use to respond.

See the `trim_messages` below.

In [None]:
from langchain_core.messages import trim_messages

# Node
def chat_model_node(state: MessagesState):
    messages = trim_messages(
            state["messages"],
            max_tokens=100,
            strategy="last",
            token_counter=ChatGoogleGenerativeAI(model="gemini-1.5-flash"),
            allow_partial=True,
        )
    return {"messages": [llm.invoke(messages)]}

# Build graph
builder = StateGraph(MessagesState)
builder.add_node("chat_model", chat_model_node)
builder.add_edge(START, "chat_model")
builder.add_edge("chat_model", END)
graph = builder.compile()

# View
display(Image(graph.get_graph().draw_mermaid_png()))

In [20]:
messages.append(output['messages'][-1])
messages.append(HumanMessage(f"Tell me in AI domain where google beats openai !", name="Shayan"))

In [None]:
# Example of trimming messages
trim_messages(
            messages,
            max_tokens=100,
            strategy="last",
            token_counter=ChatGoogleGenerativeAI(model="gemini-1.5-flash"),
            allow_partial=False
        )

In [None]:
# Invoke, using message trimming in the chat_model_node
messages_out_trim = graph.invoke({'messages': messages})
for m in output['messages']:
    m.pretty_print()

Let's look at the LangSmith trace to see the model invocation:

https://smith.langchain.com/public/b153f7e9-f1a5-4d60-8074-f0d7ab5b42ef/r