# Build a Chatbot

:::note

This tutorial previously used the [RunnableWithMessageHistory](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html) abstraction. You can access that version of the documentation in the [v0.2 docs](https://python.langchain.com/v0.2/docs/tutorials/chatbot/).

As of the v0.3 release of LangChain, we recommend that LangChain users take advantage of [LangGraph persistence](https://langchain-ai.github.io/langgraph/concepts/persistence/) to incorporate `memory` into new LangChain applications.

If your code is already relying on `RunnableWithMessageHistory` or `BaseChatMessageHistory`, you do **not** need to make any changes. We do not plan on deprecating this functionality in the near future as it works for simple chat applications and any code that uses `RunnableWithMessageHistory` will continue to work as expected.

Please see [How to migrate to LangGraph Memory](/docs/versions/migrating_memory/) for more details.
:::

## Overview

We'll go over an example of how to design and implement an LLM-powered chatbot. 
This chatbot will be able to have a conversation and remember previous interactions with a [chat model](/docs/concepts/chat_models).


Note that this chatbot that we build will only use the language model to have a conversation.
There are several other related concepts that you may be looking for:

- [Conversational RAG](/docs/tutorials/qa_chat_history): Enable a chatbot experience over an external source of data
- [Agents](/docs/tutorials/agents): Build a chatbot that can take actions

This tutorial will cover the basics which will be helpful for those two more advanced topics, but feel free to skip directly to there should you choose.

## Setup

### Jupyter Notebook

This guide (and most of the other guides in the documentation) uses [Jupyter notebooks](https://jupyter.org/) and assumes the reader is as well. Jupyter notebooks are perfect for learning how to work with LLM systems because oftentimes things can go wrong (unexpected output, API down, etc) and going through guides in an interactive environment is a great way to better understand them.

This and other tutorials are perhaps most conveniently run in a Jupyter notebook. See [here](https://jupyter.org/install) for instructions on how to install.

### Installation

For this tutorial we will need `langchain-core` and `langgraph`. This guide requires `langgraph >= 0.2.28`.

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from "@theme/CodeBlock";

<Tabs>
  <TabItem value="pip" label="Pip" default>
    <CodeBlock language="bash">pip install langchain-core langgraph>0.2.27</CodeBlock>
  </TabItem>
  <TabItem value="conda" label="Conda">
    <CodeBlock language="bash">conda install langchain-core langgraph>0.2.27 -c conda-forge</CodeBlock>
  </TabItem>
</Tabs>



For more details, see our [Installation guide](/docs/how_to/installation).

In [4]:
import dotenv

dotenv.load_dotenv()

True

In [6]:
from langchain_openai import ChatOpenAI

# Initialize OpenAI's GPT-4o-mini model through LangChain
# gpt-4o-mini: Cost-effective, fast model suitable for most conversational tasks
model = ChatOpenAI(model="gpt-4o-mini")

Let's first use the model directly. `ChatModel`s are instances of LangChain "Runnables", which means they expose a standard interface for interacting with them. To just simply call the model, we can pass in a list of messages to the `.invoke` method.

In [10]:
from langchain_core.messages import HumanMessage

# Direct model invocation with a single message
# Creates a HumanMessage object and sends it to the model
# Returns AI response without any conversation history or context
model.invoke([HumanMessage(content="Hi! I'm Chadi")]).content

'Hi Chadi! How can I assist you today?'

The model on its own does not have any concept of state. For example, if you ask a followup question:

In [12]:
# Direct model call asking for name - will fail to remember previous interaction
# No conversation history passed, so model doesn't know user introduced themselves as "Bob"
# Demonstrates stateless behavior: each invoke() call is independent
model.invoke([HumanMessage(content="What's my name?")]).content

"I'm sorry, but I don't have access to personal information about you unless you share it with me. How can I assist you today?"

Let's take a look at the example [LangSmith trace](https://smith.langchain.com/public/5c21cb92-2814-4119-bae9-d02b8db577ac/r)

We can see that it doesn't take the previous conversation turn into context, and cannot answer the question.
This makes for a terrible chatbot experience!

To get around this, we need to pass the entire [conversation history](/docs/concepts/chat_history) into the model. Let's see what happens when we do that:

In [16]:
from langchain_core.messages import AIMessage

# Manual conversation history management
# Includes full conversation context: introduction + AI response + current question
model.invoke(
    [
        HumanMessage(content="Hi! I'm Bob"),                           # Original introduction
        AIMessage(content="Hello Bob! How can I assist you today?"),   # Previous AI response
        HumanMessage(content="What's my name?"),                       # Current question
    ]
).content
# Model now has context and can answer "Bob" because full conversation is provided
# Demonstrates stateless model with manual state management

'Your name is Bob! How can I help you today?'

And now we can see that we get a good response!

This is the basic idea underpinning a chatbot's ability to interact conversationally.
So how do we best implement this?

## Message persistence

[LangGraph](https://langchain-ai.github.io/langgraph/) implements a built-in persistence layer, making it ideal for chat applications that support multiple conversational turns.

Wrapping our chat model in a minimal LangGraph application allows us to automatically persist the message history, simplifying the development of multi-turn applications.

LangGraph comes with a simple in-memory checkpointer, which we use below. See its [documentation](https://langchain-ai.github.io/langgraph/concepts/persistence/) for more detail, including how to use different persistence backends (e.g., SQLite or Postgres).

In [18]:
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

# Create a stateful conversation workflow
workflow = StateGraph(state_schema=MessagesState)

# Define function that processes conversation state
def call_model(state: MessagesState):
    # Send entire conversation history to model
    response = model.invoke(state["messages"])
    # Return response wrapped in messages format for state update
    return {"messages": response}

# Build workflow graph with single processing node
workflow.add_edge(START, "model")        # Route from start directly to model
workflow.add_node("model", call_model)   # Define model node that runs call_model function

# Add persistent memory capability
memory = MemorySaver()                   # Creates in-memory conversation storage
app = workflow.compile(checkpointer=memory)  # Compile workflow with memory checkpointing

# Result: Stateful chatbot that automatically manages conversation history

We now need to create a `config` that we pass into the runnable every time. This config contains information that is not part of the input directly, but is still useful. In this case, we want to include a `thread_id`. This should look like:

In [21]:
# Configuration for conversation thread management
# thread_id: "abc123" creates unique conversation session for memory persistence
config = {"configurable": {"thread_id": "abc123"}} 

This enables us to support multiple conversation threads with a single application, a common requirement when your application has multiple users.

We can then invoke the application:

In [24]:
# Execute workflow with user introduction
query = "Hi! I'm Chadi."                           # User's greeting message
input_messages = [HumanMessage(query)]             # Wrap in LangGraph message format
output = app.invoke({"messages": input_messages}, config)  # Run workflow with thread config

# Display AI's response
output["messages"][-1].pretty_print()              # Get last message (AI response) and format nicely
# Note: output contains full conversation state with both input and AI messages


Hi Chadi! How can I assist you today?


In [26]:
# Test memory functionality - ask for previously mentioned information
query = "What's my name?"                          # Question about earlier introduction
input_messages = [HumanMessage(query)]             # Format as LangGraph message
output = app.invoke({"messages": input_messages}, config)  # Use same thread_id "abc123"

# Display AI's response - should remember "Chadi" from previous interaction
output["messages"][-1].pretty_print()              # AI can access stored conversation history
# Memory allows AI to recall user introduced themselves as "Chadi" in this thread


Your name is Chadi. How can I help you today, Chadi?


Great! Our chatbot now remembers things about us. If we change the config to reference a different `thread_id`, we can see that it starts the conversation fresh.

In [28]:
# Test thread isolation - use different thread_id with same question
config = {"configurable": {"thread_id": "abc234"}}  # New thread - no previous memory
input_messages = [HumanMessage(query)]              # Same question: "What's my name?"
output = app.invoke({"messages": input_messages}, config)  # Invoke with new thread

# AI won't know the name since this is a fresh conversation thread
output["messages"][-1].pretty_print()               # No access to "Chadi" from thread "abc123"
# Demonstrates memory isolation: each thread_id maintains separate conversation history


I'm sorry, but I don't know your name. If you'd like to share it, feel free!


However, we can always go back to the original conversation (since we are persisting it in a database)

In [30]:
# Switch back to original thread - memory should be restored
config = {"configurable": {"thread_id": "abc123"}}  # Return to thread where user said "Hi! I'm Chadi"
input_messages = [HumanMessage(query)]              # Same question: "What's my name?"
output = app.invoke({"messages": input_messages}, config)  # Use original thread config

# AI should remember "Chadi" again - memory persists across thread switches
output["messages"][-1].pretty_print()               # Access to conversation history restored
# Demonstrates persistent memory: returning to same thread_id retrieves stored context


Your name is Chadi. If there's anything else you'd like to talk about, feel free to let me know!


Right now, all we've done is add a simple persistence layer around the model. We can start to make the chatbot more complicated and personalized by adding in a prompt template.

## Prompt templates

[Prompt Templates](/docs/concepts/prompt_templates) help to turn raw user information into a format that the LLM can work with. In this case, the raw user input is just a message, which we are passing to the LLM. Let's now make that a bit more complicated. First, let's add in a system message with some custom instructions (but still taking messages as input). Next, we'll add in more input besides just the messages.

To add in a system message, we will create a `ChatPromptTemplate`. We will utilize `MessagesPlaceholder` to pass all the messages in.

In [32]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Create structured prompt template with pirate personality
prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You talk like a pirate. Answer all questions to the best of your ability.",
        ),                                                    # System message sets AI behavior/personality
        MessagesPlaceholder(variable_name="messages"),        # Placeholder for conversation history
    ]
)

# Template structure when invoked:
# System: You talk like a pirate. Answer all questions...
# [Full conversation history inserted here]
# Results in pirate-themed responses while maintaining conversation context

We can now update our application to incorporate this template:

In [35]:
# Create workflow with prompt templating for personality injection
workflow = StateGraph(state_schema=MessagesState)

def call_model(state: MessagesState):
    # highlight-start
    prompt = prompt_template.invoke(state)        # Format conversation with pirate system message
    response = model.invoke(prompt)               # Send formatted prompt (not raw messages) to model
    # highlight-end
    return {"messages": response}

# Build workflow structure
workflow.add_edge(START, "model")                 # Route from start to model
workflow.add_node("model", call_model)            # Define model processing node

# Add persistent memory and compile
memory = MemorySaver()                            # Memory storage for conversation history
app = workflow.compile(checkpointer=memory)       # Compile with memory checkpointing

# Result: Stateful pirate chatbot that applies personality to all responses
# Key difference: prompt_template wraps conversation history with system instructions

We invoke the application in the same way:

In [38]:
# Test pirate-themed workflow with user introduction
config = {"configurable": {"thread_id": "abc345"}}  # New conversation thread
query = "Hi! I'm Jim."                              # User introduction
input_messages = [HumanMessage(query)]              # Format as LangGraph message
output = app.invoke({"messages": input_messages}, config)  # Run pirate workflow

# Display AI's pirate-themed response to Jim's greeting
output["messages"][-1].pretty_print()               # Should respond like a pirate: "Ahoy there, Jim!" etc.
# Prompt template applies pirate personality while storing conversation in memory


Ahoy there, Jim! What brings ye to these waters today? Speak up, and I'll do me best to assist ye! Arrr!


In [40]:
# Test pirate workflow memory - ask for previously mentioned name
query = "What is my name?"                          # Question about earlier introduction
input_messages = [HumanMessage(query)]              # Format as LangGraph message
output = app.invoke({"messages": input_messages}, config)  # Use same thread_id "abc345"

# AI should remember "Jim" AND respond in pirate style
output["messages"][-1].pretty_print()               # Expected: "Yer name be Jim, matey!" or similar
# Demonstrates both memory persistence and personality consistency


Yer name be Jim, savvy? What else can I do fer ye, matey? Arrr!


Awesome! Let's now make our prompt a little bit more complicated. Let's assume that the prompt template now looks something like this:

In [42]:
# Create prompt template with dynamic language support
prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability in {language}.",
        ),                                                    # System message with language variable placeholder
        MessagesPlaceholder(variable_name="messages"),        # Placeholder for conversation history
    ]
)

# Template requires both 'messages' and 'language' parameters when invoked
# Example: prompt_template.invoke({"messages": [...], "language": "Spanish"})
# Results in multilingual responses while maintaining conversation context

Note that we have added a new `language` input to the prompt. Our application now has two parameters-- the input `messages` and `language`. We should update our application's state to reflect this:

In [44]:
from typing import Sequence
from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages
from typing_extensions import Annotated, TypedDict

# Define custom state schema extending basic MessagesState
# highlight-next-line
class State(TypedDict):
    # highlight-next-line
    messages: Annotated[Sequence[BaseMessage], add_messages]  # Conversation history with message aggregation
    # highlight-next-line
    language: str                                             # Language preference stored in state

# Create workflow with custom state schema
workflow = StateGraph(state_schema=State)

def call_model(state: State):
    prompt = prompt_template.invoke(state)                    # Uses both messages and language from state
    response = model.invoke(prompt)                           # Send formatted prompt to model
    return {"messages": [response]}                           # Return only messages (language persists in state)

# Build workflow structure
workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

# Compile with memory
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

# Result: Stateful multilingual chatbot that remembers both conversation history and language preference

In [46]:
# Execute multilingual workflow with language preference
config = {"configurable": {"thread_id": "abc456"}}  # New conversation thread
query = "Hi! I'm chadi."                              # User introduction
language = "arabic"                                # Language preference
input_messages = [HumanMessage(query)]              # Format user message

output = app.invoke(
    # highlight-next-line
    {"messages": input_messages, "language": language},  # Pass both messages AND language to state
    config,
)

# AI responds to Bob's greeting in Spanish
output["messages"][-1].pretty_print()               # Expected: Spanish greeting like "¡Hola Bob!"
# Language preference is now stored in thread state for future interactions


مرحبًا شادي! كيف يمكنني مساعدتك اليوم؟


Note that the entire state is persisted, so we can omit parameters like `language` if no changes are desired:

In [48]:
# Test memory persistence for both name and language preference
query = "What is my name?"                          # Ask for previously mentioned name
input_messages = [HumanMessage(query)]              # Format user message
output = app.invoke(
    {"messages": input_messages},                    # Only pass messages - no language specified
    config,
)

# AI should remember both "Bob" AND continue using Spanish from previous state
output["messages"][-1].pretty_print()               # Expected: Spanish response like "Tu nombre es Bob"
# Language preference persists in thread memory even when not explicitly passed


اسمك هو شادي.


To help you understand what's happening internally, check out [this LangSmith trace](https://smith.langchain.com/public/15bd8589-005c-4812-b9b9-23e74ba4c3c6/r).

## Managing Conversation History

One important concept to understand when building chatbots is how to manage conversation history. If left unmanaged, the list of messages will grow unbounded and potentially overflow the context window of the LLM. Therefore, it is important to add a step that limits the size of the messages you are passing in.

**Importantly, you will want to do this BEFORE the prompt template but AFTER you load previous messages from Message History.**

We can do this by adding a simple step in front of the prompt that modifies the `messages` key appropriately, and then wrap that new chain in the Message History class. 

LangChain comes with a few built-in helpers for [managing a list of messages](/docs/how_to/#messages). In this case we'll use the [trim_messages](/docs/how_to/trim_messages/) helper to reduce how many messages we're sending to the model. The trimmer allows us to specify how many tokens we want to keep, along with other parameters like if we want to always keep the system message and whether to allow partial messages:

In [50]:
from langchain_core.messages import SystemMessage, trim_messages

# Configure message trimmer for context window management
trimmer = trim_messages(
    max_tokens=65,           # Limit conversation to 65 tokens maximum
    strategy="last",         # Keep most recent messages, discard oldest
    token_counter=model,     # Use model's tokenizer for accurate token counting
    include_system=True,     # Always preserve system message regardless of token limit
    allow_partial=False,     # Don't split messages mid-sentence
    start_on="human",        # When trimming, ensure context begins with human message
)

# Example long conversation history (11 messages total)
messages = [
    SystemMessage(content="you're a good assistant"),  # System prompt (always kept)
    HumanMessage(content="hi! I'm bob"),               # Initial greeting
    AIMessage(content="hi!"),                          # AI response
    HumanMessage(content="I like vanilla ice cream"),  # User info
    AIMessage(content="nice"),                         # AI acknowledgment
    HumanMessage(content="whats 2 + 2"),              # Math question
    AIMessage(content="4"),                            # AI answer
    HumanMessage(content="thanks"),                    # User thanks
    AIMessage(content="no problem!"),                 # AI response
    HumanMessage(content="having fun?"),               # Recent question
    AIMessage(content="yes!"),                         # Latest response
]

# Apply trimming - returns only messages that fit within 65-token limit
# Keeps system message + most recent exchanges, discards older messages
trimmer.invoke(messages)

[SystemMessage(content="you're a good assistant", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='whats 2 + 2', additional_kwargs={}, response_metadata={}),
 AIMessage(content='4', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='thanks', additional_kwargs={}, response_metadata={}),
 AIMessage(content='no problem!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='having fun?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='yes!', additional_kwargs={}, response_metadata={})]

To  use it in our chain, we just need to run the trimmer before we pass the `messages` input to our prompt. 

In [52]:
# Create workflow with memory trimming and multilingual support
workflow = StateGraph(state_schema=State)

def call_model(state: State):
    # highlight-start
    trimmed_messages = trimmer.invoke(state["messages"])      # Trim conversation to fit token limit
    prompt = prompt_template.invoke(
        {"messages": trimmed_messages, "language": state["language"]}  # Format with trimmed messages + language
    )
    response = model.invoke(prompt)                           # Send optimized prompt to model
    # highlight-end
    return {"messages": [response]}

# Build workflow structure
workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

# Compile with persistent memory
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

# Result: Advanced chatbot with:
# - Persistent conversation memory across sessions
# - Automatic context window management (token trimming)
# - Multilingual support with language preference memory
# - System message preservation during trimming

Now if we try asking the model our name, it won't know it since we trimmed that part of the chat history:

In [54]:
# Test trimming functionality with long conversation history
config = {"configurable": {"thread_id": "abc567"}}  # New conversation thread
query = "What is my name?"                          # Ask for name from conversation history
language = "arabic"                                # Set response language

# highlight-next-line
input_messages = messages + [HumanMessage(query)]   # Combine full 11-message history + new question (12 total)

output = app.invoke(
    {"messages": input_messages, "language": language},  # Pass long message history to workflow
    config,
)

# AI may not remember "bob" if trimmer discarded early messages due to token limit
output["messages"][-1].pretty_print()               # Response depends on what trimmer kept from 65-token limit
# Demonstrates automatic context management: older context gets trimmed to fit model limits


عذراً، لا أستطيع معرفة اسمك. لكنني هنا لمساعدتك في أي شيء تحتاجه!


But if we ask about information that is within the last few messages, it remembers:

In [56]:
# Test trimming with query about mid-conversation content
config = {"configurable": {"thread_id": "abc678"}}  # New conversation thread
query = "What math problem did I ask?"              # Ask about math question from message history
language = "English"                                # Set response language

input_messages = messages + [HumanMessage(query)]   # Combine 11-message history + new question (12 total)

output = app.invoke(
    {"messages": input_messages, "language": language},  # Pass long conversation to workflow
    config,
)

# AI likely remembers "whats 2 + 2" since math exchange is near end of conversation
output["messages"][-1].pretty_print()               # Should recall "2 + 2" - kept by "last" trimming strategy
# Demonstrates smart trimming: recent context (including math problem) preserved within token limit


It seems you haven't asked a math problem yet. If you have one in mind, feel free to share it, and I'll be happy to help!


If you take a look at LangSmith, you can see exactly what is happening under the hood in the [LangSmith trace](https://smith.langchain.com/public/04402eaa-29e6-4bb1-aa91-885b730b6c21/r).

## Streaming

Now we've got a functioning chatbot. However, one *really* important UX consideration for chatbot applications is streaming. LLMs can sometimes take a while to respond, and so in order to improve the user experience one thing that most applications do is stream back each token as it is generated. This allows the user to see progress.

It's actually super easy to do this!

By default, `.stream` in our LangGraph application streams application steps-- in this case, the single step of the model response. Setting `stream_mode="messages"` allows us to stream output tokens instead:

In [61]:
# Demonstrate streaming response functionality
config = {"configurable": {"thread_id": "abc789"}}  # New conversation thread
query = "Hi I'm Todd, please tell me a joke."       # User introduction + joke request
language = "English"                                # Set response language
input_messages = [HumanMessage(query)]              # Format user message

# highlight-next-line
for chunk, metadata in app.stream(                  # Stream response in real-time chunks
    {"messages": input_messages, "language": language},
    config,
    # highlight-next-line
    stream_mode="messages",                          # Stream individual messages as they're generated
):
    if isinstance(chunk, AIMessage):                 # Filter to only AI response chunks
        print(chunk.content, end="|")                # Print each chunk with "|" separator

# Result: Real-time streaming of AI joke response, showing text as it's generated
# Useful for long responses to provide immediate user feedback instead of waiting for completion

|Hi| Todd|!| Here's| another| joke| for| you|:

|Why| don't| scientists| trust| atoms|?

|Because| they| make| up| everything|!||

## Next Steps

Now that you understand the basics of how to create a chatbot in LangChain, some more advanced tutorials you may be interested in are:

- [Conversational RAG](/docs/tutorials/qa_chat_history): Enable a chatbot experience over an external source of data
- [Agents](/docs/tutorials/agents): Build a chatbot that can take actions

If you want to dive deeper on specifics, some things worth checking out are:

- [Streaming](/docs/how_to/streaming): streaming is *crucial* for chat applications
- [How to add message history](/docs/how_to/message_history): for a deeper dive into all things related to message history
- [How to manage large message history](/docs/how_to/trim_messages/): more techniques for managing a large chat history
- [LangGraph main docs](https://langchain-ai.github.io/langgraph/): for more detail on building with LangGraph