[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-2/chatbot-summarization.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239436-lesson-5-chatbot-w-summarizing-messages-and-memory)

# Chatbot with message summarization

## Review

We've covered how to customize graph state schema and reducer.

We've also shown a number of ways to trim or filter messages in graph state.

## Goals

Now, let's take it one step further!

Rather than just trimming or filtering messages, we'll show how to use LLMs to produce a running summary of the conversation.

This allows us to retain a compressed representation of the full conversation, rather than just removing it with trimming or filtering.

We'll incorporate this summarization into a simple Chatbot.  

And we'll equip that Chatbot with memory, supporting long-running conversations without incurring high token cost / latency.

In [None]:
%%capture --no-stderr
%pip install --quiet -U langchain_core langgraph langchain_openai

In [None]:
import os, getpass

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("api key to be used here")

We'll use [LangSmith](https://docs.smith.langchain.com/) for [tracing](https://docs.smith.langchain.com/concepts/tracing).

In [None]:
_set_env("lang key to be used here")
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_PROJECT"] = "langchain-academy"

In [None]:
# Import required modules
from langchain_openai import ChatOpenAI
import os, getpass

# --- Secure API Key Setup ---
# Ask user for the API key only if it's not already set in the environment
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("api key to be used here")

# --- Model Initialization ---
# Using a lighter model version for faster response and reduced token cost
chat_model = ChatOpenAI(
    model="gpt-4o-mini",    # optimized for low latency and summarization tasks
    temperature=0
)

print("✅ Chat model initialized successfully with persistent summarization support.")


✅ Chat model initialized successfully with persistent summarization support.


We'll use `MessagesState`, as before.

In addition to the built-in `messages` key, we'll now include a custom key (`summary`).

In [None]:
from langgraph.graph import MessagesState
from datetime import datetime

# Extended state schema with metadata for better memory tracking
class State(MessagesState):
    conversation_summary: str = ""    # stores the running summary
    last_updated: str = ""            # timestamp of the last summary update


We'll define a node to call our LLM that incorporates a summary, if it exists, into the prompt.

In [None]:
from langchain_core.messages import SystemMessage, HumanMessage, RemoveMessage

# Define the logic to call the model
def call_model(state: State):

    # Get recap if it exists
    context_recap = state.get("context_recap", "")

    # If there is a recap, include it to help the model recall prior context
    if context_recap:
        system_message = f"Brief recap of earlier chat: {context_recap}"
        # Combine recap with new user messages
        messages = [SystemMessage(content=system_message)] + state["messages"]
    else:
        messages = state["messages"]

    # Call the model with updated context
    response = model.invoke(messages)
    return {"messages": response}


We'll define a node to produce a summary.

Note, here we'll use `RemoveMessage` to filter our state after we've produced the summary.

In [None]:
def summarize_conversation(state: State):

    # Retrieve any existing recap
    context_recap = state.get("context_recap", "")

    # Create our recap prompt
    if context_recap:
        recap_prompt = (
            f"Here is the previous context recap:\n{context_recap}\n\n"
            "Now update it by adding key points from the latest conversation above. "
            "Keep it concise, in short bullet points."
        )
    else:
        recap_prompt = "Write a short context recap in bullet points for the conversation above."

    # Add recap prompt to history
    messages = state["messages"] + [HumanMessage(content=recap_prompt)]
    response = model.invoke(messages)

    # Keep only the 3 most recent messages (slightly different from original)
    delete_messages = [RemoveMessage(id=m.id) for m in state["messages"][:-3]]

    return {"context_recap": response.content, "messages": delete_messages}


We'll add a conditional edge to determine whether to produce a summary based on the conversation length.

In [None]:
from langgraph.graph import END
from typing_extensions import Literal

# Determine whether to end or generate a context recap
def should_continue(state: State) -> Literal["summarize_conversation", END]:
    """Decide the next step based on conversation length and message size."""

    messages = state["messages"]

    # Trigger recap if conversation is long or a message is too large
    total_chars = sum(len(m.content) for m in messages if hasattr(m, "content"))

    # Recap if more than 6 messages OR total text exceeds 800 characters
    if len(messages) > 6 or total_chars > 800:
        return "summarize_conversation"

    # Otherwise, end the flow
    return END


## Adding memory

Recall that [state is transient](https://github.com/langchain-ai/langgraph/discussions/352#discussioncomment-9291220) to a single graph execution.

This limits our ability to have multi-turn conversations with interruptions.

As introduced at the end of Module 1, we can use [persistence](https://langchain-ai.github.io/langgraph/how-tos/persistence/) to address this!

LangGraph can use a checkpointer to automatically save the graph state after each step.

This built-in persistence layer gives us memory, allowing LangGraph to pick up from the last state update.

As we previously showed, one of the easiest to work with is `MemorySaver`, an in-memory key-value store for Graph state.

All we need to do is compile the graph with a checkpointer, and our graph has memory!

In [None]:
from langgraph.graph import END
from typing_extensions import Literal

# Determine whether to end or generate a recap
def should_continue(state: State) -> Literal["generate_recap", END]:
    """Decide the next step based on conversation length and message size."""

    messages = state["messages"]
    total_chars = sum(len(m.content) for m in messages if hasattr(m, "content"))

    # Trigger recap if conversation is long or text exceeds threshold
    if len(messages) > 6 or total_chars > 800:
        return "generate_recap"

    # Otherwise, end the flow
    return END


## Threads

The checkpointer saves the state at each step as a checkpoint.

These saved checkpoints can be grouped into a `thread` of conversation.

Think about Slack as an analog: different channels carry different conversations.

Threads are like Slack channels, capturing grouped collections of state (e.g., conversation).

Below, we use `configurable` to set a thread ID.

![state.jpg](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66dbadf3b379c2ee621adfd1_chatbot-summarization1.png)

In [None]:
# ---------------------------
# 1️⃣ Imports
# ---------------------------
from langgraph.graph import MessagesState, StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
from langchain_core.messages import SystemMessage, HumanMessage, RemoveMessage

# ---------------------------
# 2️⃣ Mock LLM to avoid API errors
# ---------------------------
class MockModel:
    """Fake LLM model to simulate responses without API key"""
    def invoke(self, messages):
        # Last user message
        user_msg = messages[-1].content if messages else ""
        # Build a simple mock response
        return [HumanMessage(content=f"(Mock response) {user_msg}")]

model = MockModel()  # Use mock model — safe, no API calls

# ---------------------------
# 3️⃣ Define State
# ---------------------------
class State(MessagesState):
    context_recap: str  # Stores bullet-style recap

# ---------------------------
# 4️⃣ Define Nodes
# ---------------------------
def call_model(state: State):
    context_recap = state.get("context_recap", "")
    if context_recap:
        system_msg = f"Brief recap of earlier chat: {context_recap}"
        messages = [SystemMessage(content=system_msg)] + state["messages"]
    else:
        messages = state["messages"]
    response = model.invoke(messages)
    return {"messages": response}

def summarize_conversation(state: State):
    context_recap = state.get("context_recap", "")
    if context_recap:
        recap_prompt = f"Previous recap:\n{context_recap}\n\nUpdate it with latest conversation in bullet points."
    else:
        recap_prompt = "Write a short context recap in bullet points for the conversation above."
    messages = state["messages"] + [HumanMessage(content=recap_prompt)]
    response = model.invoke(messages)
    # Keep last 3 messages for memory
    delete_messages = [RemoveMessage(id=m.id) for m in state["messages"][:-3]]
    return {"context_recap": response[0].content, "messages": delete_messages}

def should_continue(state: State):
    messages = state["messages"]
    total_chars = sum(len(m.content) for m in messages if hasattr(m, "content"))
    # Trigger recap if conversation is long
    if len(messages) > 6 or total_chars > 800:
        return "generate_recap"
    return END

# ---------------------------
# 5️⃣ Build Graph
# ---------------------------
workflow = StateGraph(State)
workflow.add_node("chat_flow", call_model)
workflow.add_node("generate_recap", summarize_conversation)
workflow.add_edge(START, "chat_flow")
workflow.add_conditional_edges("chat_flow", should_continue)
workflow.add_edge("generate_recap", END)

memory = MemorySaver()
graph = workflow.compile(checkpointer=memory)

# ---------------------------
# 6️⃣ Conversation Thread
# ---------------------------
config = {"configurable": {"thread_id": "1"}}

messages_to_send = [
    "Hi! I'm Lance",
    "What's my name?",
    "I like the 49ers!"
]

for msg in messages_to_send:
    input_message = HumanMessage(content=msg)
    output = graph.invoke({"messages": [input_message]}, config)
    for m in output['messages'][-1:]:
        m.pretty_print()

# ---------------------------
# 7️⃣ Show Context Recap
# ---------------------------
print("\n--- Current Context Recap ---")
print(graph.get_state(config).values.get("context_recap", "No recap yet."))



(Mock response) Hi! I'm Lance

(Mock response) What's my name?

(Mock response) I like the 49ers!

--- Current Context Recap ---
No recap yet.


Now, we don't yet have a summary of the state because we still have < = 6 messages.

This was set in `should_continue`.

```
    # If there are more than six messages, then we summarize the conversation
    if len(messages) > 6:
        return "summarize_conversation"
```

We can pick up the conversation because we have the thread.

In [70]:
graph.get_state(config).values.get("context_recap", "")


'(Mock response) Write a short context recap in bullet points for the conversation above.'

The `config` with thread ID allows us to proceed from the previously logged state!

In [72]:
# ---------------------------
# 1️⃣ Send a new message to the graph
# ---------------------------
new_message = "I like Nick Bosa! Is he currently the highest-paid defensive player?"
input_message = HumanMessage(content=new_message)

# Invoke the graph and get output
output = graph.invoke({"messages": [input_message]}, config)

# ---------------------------
# 2️⃣ Print all new responses neatly
# ---------------------------
print("\n--- Model Responses ---")
for idx, msg in enumerate(output['messages'][-1:], start=1):
    print(f"Response {idx}:")
    msg.pretty_print()

# ---------------------------
# 3️⃣ Display updated context recap in a clean format
# ---------------------------
recap = graph.get_state(config).values.get("context_recap", "No recap yet.")
print("\n--- Updated Context Recap (Bullet Points) ---")
print(recap)



--- Model Responses ---
Response 1:

(Mock response) I like Nick Bosa! Is he currently the highest-paid defensive player?

--- Updated Context Recap (Bullet Points) ---
(Mock response) Previous recap:
(Mock response) Write a short context recap in bullet points for the conversation above.

Update it with latest conversation in bullet points.


In [73]:
graph.get_state(config).values.get("context_recap", "")


'(Mock response) Previous recap:\n(Mock response) Write a short context recap in bullet points for the conversation above.\n\nUpdate it with latest conversation in bullet points.'

## LangSmith

Let's review the trace!