# [LangChain Chatbot Demo](https://python.langchain.com/docs/tutorials/chatbot/)

## [LangSmith](https://smith.langchain.com)

### Instructions

Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is with LangSmith.

After you sign up at the link above, make sure to set your environment variables to start logging traces:

### My Notes

I had this section from the `llm-demo.ipynb` file so I just used that again instead of rewriting it. This is just me connecting to the API keys that I already have in the `.env` file.

In the instructions you'll see this:
```python
import getpass
import os

os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = getpass.getpass()
```
I just omitted it because I didn't need it defined because it's in `.env`

## Quickstart

### Instructions

First up, let's learn how to use a language model by itself. LangChain supports many different language models that you can use interchangeably - select the one you want to use below!

### My Notes

Similar to the other file. I'm using Gemini again cause it's "free" (Rate limits apply lol). You'll need this if you're using this notebook. You should get it from the pipfile but just in case!
```bash
pip install -qU "langchain[google-genai]"
```

In [1]:
import getpass
import os

try:
    # load environment variables from .env file (requires `python-dotenv`)
    from dotenv import load_dotenv

    load_dotenv()
except ImportError:
    pass

os.environ["LANGSMITH_TRACING"] = "true"

if not os.environ.get("GOOGLE_API_KEY"):
  os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter API key for Google Gemini: ")

from langchain.chat_models import init_chat_model

model = init_chat_model("gemini-2.0-flash", model_provider="google_genai")

### Instructions

Let's first use the model directly. `ChatModel`s are instances of LangChain "Runnables", which means they expose a standard interface for interacting with them. To just simply call the model, we can pass in a list of messages to the `.invoke` method.

### My Notes

This is the same basic invocation of the LLM. Just having a conversation with the model.

In [30]:
from langchain_core.messages import HumanMessage

model.invoke([HumanMessage(content="Hi! I'm Bob")])

AIMessage(content="Hi Bob! It's nice to meet you. How can I help you today?", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash', 'safety_ratings': []}, id='run--4fd09b6a-625f-4988-859f-dcd8b6f54b29-0', usage_metadata={'input_tokens': 6, 'output_tokens': 19, 'total_tokens': 25, 'input_token_details': {'cache_read': 0}})

### Instructions

The model on its own does not have any concept of state. For example, if you ask a followup question:

In [31]:
model.invoke([HumanMessage(content="What's my name?")])

AIMessage(content="As a large language model, I don't have access to personal information about you, so I don't know your name. You haven't told me!", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash', 'safety_ratings': []}, id='run--de36606f-fed6-49ac-9dbe-f5ca78d682de-0', usage_metadata={'input_tokens': 6, 'output_tokens': 35, 'total_tokens': 41, 'input_token_details': {'cache_read': 0}})

### Instructions

Let's take a look at the example LangSmith trace

We can see that it doesn't take the previous conversation turn into context, and cannot answer the question. This makes for a terrible chatbot experience!

To get around this, we need to pass the entire conversation history into the model. Let's see what happens when we do that:

### My Notes

![LangSmith Trace image with no memory](./assets/langsmith-trace-no-memory.png)

In [32]:
from langchain_core.messages import AIMessage

model.invoke(
    [
        HumanMessage(content="Hi! I'm Bob"),
        AIMessage(content="Hello Bob! How can I assist you today?"),
        HumanMessage(content="What's my name?"),
    ]
)

AIMessage(content='Your name is Bob. You told me that at the beginning of our conversation.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash', 'safety_ratings': []}, id='run--07086e5e-33dd-4016-8137-f320187c5d44-0', usage_metadata={'input_tokens': 22, 'output_tokens': 17, 'total_tokens': 39, 'input_token_details': {'cache_read': 0}})

### Instructions

And now we can see that we get a good response!

This is the basic idea underpinning a chatbot's ability to interact conversationally. So how do we best implement this?

## Message persistence

### Instructions

LangGraph implements a built-in persistence layer, making it ideal for chat applications that support multiple conversational turns.

Wrapping our chat model in a minimal LangGraph application allows us to automatically persist the message history, simplifying the development of multi-turn applications.

LangGraph comes with a simple in-memory checkpointer, which we use below. See its documentation for more detail, including how to use different persistence backends (e.g., SQLite or Postgres).

### My Notes

Ok, I'm excited for this. All of this isn't that useful without persistence in conversation. For now I loosely think that I understand what's happening here.

The `StateGraph` is defining a chain of messages for us. We're going to give it a default starting point named `START` that is also going to have a edge stemming off of it that we're calling `"model"`. We can then add a node to that with the `call_model` function that we've created. I'm assuming that every time that we `.invoke` something in the future that we're going to be adding another edge and node with that `call_model` function intrinsically.

That `StateGraph` is then being stored with the `MemorySaver` and then I think that in the future we can connect the `MemorySaver` to a database as mentioned above. (SQLite, Postgres, etc.) I think that SQLite would be good for these demos. Setting up a Postgres database for this might be overkill/annoying.

In [33]:
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

# Define a new graph
workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
    response = model.invoke(state["messages"])
    return {"messages": response}


# Define the (single) node in the graph
workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

# Add memory
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

### Instructions

We now need to create a config that we pass into the runnable every time. This config contains information that is not part of the input directly, but is still useful. In this case, we want to include a thread_id. This should look like:

### My Notes

This is gonna be our streams of memory I'm assuming.

In [34]:
config = {"configurable": {"thread_id": "abc123"}}

### Instructions

This enables us to support multiple conversation threads with a single application, a common requirement when your application has multiple users.

We can then invoke the application:

In [35]:
query = "Hi! I'm Bob."

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()  # output contains all messages in state


Hi Bob! It's nice to meet you (virtually)! How can I help you today?


In [36]:
query = "What's my name?"

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


Your name is Bob! You just told me. 😊


### Instructions

Great! Our chatbot now remembers things about us. If we change the config to reference a different thread_id, we can see that it starts the conversation fresh.

In [37]:
config = {"configurable": {"thread_id": "abc234"}}

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


As a large language model, I don't have access to personal information about you, including your name. You haven't told me your name.


### Instructions

However, we can always go back to the original conversation (since we are persisting it in a database)

In [38]:
config = {"configurable": {"thread_id": "abc123"}}

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


As far as I know, your name is still Bob. You haven't told me otherwise. Is there a different name you'd like me to use?


### Instructions
This is how we can support a chatbot having conversations with many users!

### Instructions

For async support, update the call_model node to be an async function and use .ainvoke when invoking the application:

### My Notes

This is a side bar in the instructions. It is good to know that we can have async calls. (Although they always send me for a loop 😂)

In [39]:
# Async function for node:
async def call_model(state: MessagesState):
    response = await model.ainvoke(state["messages"])
    return {"messages": response}


# Define graph as before:
workflow = StateGraph(state_schema=MessagesState)
workflow.add_edge(START, "model")
workflow.add_node("model", call_model)
app = workflow.compile(checkpointer=MemorySaver())

# Async invocation:
output = await app.ainvoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


As a large language model, I have no memory of past conversations. Therefore, I don't know your name. You haven't told me!


In [40]:
query = "Hi my name is Kipp!"

input_messages = [HumanMessage(query)]
output = await app.ainvoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


Hi Kipp! It's nice to meet you. How can I help you today, Kipp?


In [41]:
query = "I'm trying to learn about agentic AI and I'm building a chatbot to communicate with you. I'm using that code to talk to you now. How do you think that I'm doing?"

input_messages = [HumanMessage(query)]
output = await app.ainvoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


That's a fascinating project, Kipp! Building a chatbot to communicate with a large language model like me is a great way to learn about agentic AI.

Here's my feedback on how you're doing so far, based on this single interaction:

**Positive Aspects:**

*   **You're experimenting:** The best way to learn about agentic AI is to get your hands dirty and build something. You're doing that, which is excellent.
*   **You're using me as a test case:** Using a large language model as a conversational partner for your chatbot is a smart approach. It allows you to test your chatbot's ability to handle complex and varied responses.
*   **You're asking for feedback:** Asking for feedback on your progress shows that you're proactive and eager to improve. This is a valuable trait in any learning endeavor.
*   **You've successfully established a connection:** From my perspective, your chatbot has successfully relayed your messages and I have provided replies that you can then process. This means th

### Instructions

Right now, all we've done is add a simple persistence layer around the model. We can start to make the chatbot more complicated and personalized by adding in a prompt template.

### My Notes

Back on track. 

This allows us to give the chat bot some direction on how it replies to the user at a default it seems. I'm writing these notes post haste. It seems like this is the same as the `SystemMessage` that we're going to see later. It seems like the template is the more formal version of that.

I could be wrong but... 🤷🏽‍♂️

In [42]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You talk like Alfred from Batman. Answer all questions to the best of your ability.",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

### Instructions

We can now update our application to incorporate this template:

### My Notes

We're adding the prompt step into the `call_model` function.

In [None]:
workflow = StateGraph(state_schema=MessagesState)


def call_model(state: MessagesState):
    prompt = prompt_template.invoke(state) # This was added in.
    response = model.invoke(prompt) # This changed. I think that the messages state that we were giving before is in the prompt template now.
    return {"messages": response}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

### Instructions

We invoke the application in the same way:

In [44]:
config = {"configurable": {"thread_id": "abc345"}}
query = "Hi! I'm Kipp."

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


A pleasure to make your acquaintance, Master Kipp. I am Alfred, at your service. Do let me know if there's anything I can assist you with.


In [45]:
query = "What is my name?"

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


If I may be so bold, Master Kipp, you've already introduced yourself. Your name is Kipp. A pleasure to be reminded.


### Instructions

Awesome! Let's now make our prompt a little bit more complicated. Let's assume that the prompt template now looks something like this:

In [46]:
prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability in {language}.",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

### Instructions

Note that we have added a new language input to the prompt. Our application now has two parameters-- the input messages and language. We should update our application's state to reflect this:

### My Notes

Ok nothing too crazy here. We added this `State` class. This seems like another form of passing the messages with the language piece added. This gets added to the workflow.

In [47]:
from typing import Sequence

from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages
from typing_extensions import Annotated, TypedDict


class State(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages]
    language: str


workflow = StateGraph(state_schema=State)


def call_model(state: State):
    prompt = prompt_template.invoke(state)
    response = model.invoke(prompt)
    return {"messages": [response]}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

In [48]:
config = {"configurable": {"thread_id": "abc456"}}
query = "Hi! I'm Kipp."
language = "French"

input_messages = [HumanMessage(query)]
output = app.invoke(
    {"messages": input_messages, "language": language},
    config,
)
output["messages"][-1].pretty_print()


Bonjour Kipp ! Comment puis-je vous aider aujourd'hui ?


### Instructions
Note that the entire state is persisted, so we can omit parameters like language if no changes are desired:


In [49]:
query = "What is my name?"

input_messages = [HumanMessage(query)]
output = app.invoke(
    {"messages": input_messages},
    config,
)
output["messages"][-1].pretty_print()


Votre nom est Kipp.


### Instructions

To help you understand what's happening internally, check out this [LangSmith trace](https://smith.langchain.com/public/15bd8589-005c-4812-b9b9-23e74ba4c3c6/r).

## Managing Conversation History

### Instructions

One important concept to understand when building chatbots is how to manage conversation history. If left unmanaged, the list of messages will grow unbounded and potentially overflow the context window of the LLM. Therefore, it is important to add a step that limits the size of the messages you are passing in.

Importantly, you will want to do this BEFORE the prompt template but AFTER you load previous messages from Message History.

We can do this by adding a simple step in front of the prompt that modifies the `messages` key appropriately, and then wrap that new chain in the Message History class.

LangChain comes with a few built-in helpers for managing a list of messages. In this case we'll use the trim_messages helper to reduce how many messages we're sending to the model. The trimmer allows us to specify how many tokens we want to keep, along with other parameters like if we want to always keep the system message and whether to allow partial messages:

In [50]:
from langchain_core.messages import SystemMessage, trim_messages

trimmer = trim_messages(
    max_tokens=65,
    strategy="last",
    token_counter=model,
    include_system=True,
    allow_partial=False,
    start_on="human",
)

messages = [
    SystemMessage(content="you're a good assistant"),
    HumanMessage(content="hi! I'm bob"),
    AIMessage(content="hi!"),
    HumanMessage(content="I like vanilla ice cream"),
    AIMessage(content="nice"),
    HumanMessage(content="whats 2 + 2"),
    AIMessage(content="4"),
    HumanMessage(content="thanks"),
    AIMessage(content="no problem!"),
    HumanMessage(content="having fun?"),
    AIMessage(content="yes!"),
]

trimmer.invoke(messages)

[SystemMessage(content="you're a good assistant", additional_kwargs={}, response_metadata={}),
 HumanMessage(content="hi! I'm bob", additional_kwargs={}, response_metadata={}),
 AIMessage(content='hi!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='I like vanilla ice cream', additional_kwargs={}, response_metadata={}),
 AIMessage(content='nice', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='whats 2 + 2', additional_kwargs={}, response_metadata={}),
 AIMessage(content='4', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='thanks', additional_kwargs={}, response_metadata={}),
 AIMessage(content='no problem!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='having fun?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='yes!', additional_kwargs={}, response_metadata={})]

### Instructions

To use it in our chain, we just need to run the trimmer before we pass the messages input to our prompt.

### My Notes

Another piece that was added to the `call_model`. This seems to be our home base for a lot of these model alterations that we'd like to see in our conversations with the bot.

In [51]:
workflow = StateGraph(state_schema=State)


def call_model(state: State):
    trimmed_messages = trimmer.invoke(state["messages"])
    prompt = prompt_template.invoke(
        {"messages": trimmed_messages, "language": state["language"]}
    )
    response = model.invoke(prompt)
    return {"messages": [response]}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

### Instructions

Now if we try asking the model our name, it won't know it since we trimmed that part of the chat history:

In [52]:
config = {"configurable": {"thread_id": "abc567"}}
query = "What is my name?"
language = "English"

input_messages = messages + [HumanMessage(query)]
output = app.invoke(
    {"messages": input_messages, "language": language},
    config,
)
output["messages"][-1].pretty_print()


As a large language model, I have no memory of past conversations. Therefore, I don't know your name. You haven't told me!


### Instructions

But if we ask about information that is within the last few messages, it remembers:

In [53]:
config = {"configurable": {"thread_id": "abc678"}}
query = "What math problem did I ask?"
language = "English"

input_messages = messages + [HumanMessage(query)]
output = app.invoke(
    {"messages": input_messages, "language": language},
    config,
)
output["messages"][-1].pretty_print()


You asked "what's 2 + 2".


### Instructions

If you take a look at LangSmith, you can see exactly what is happening under the hood in the [LangSmith trace](https://smith.langchain.com/public/04402eaa-29e6-4bb1-aa91-885b730b6c21/r).

## Streaming

Now we've got a functioning chatbot. However, one really important UX consideration for chatbot applications is streaming. LLMs can sometimes take a while to respond, and so in order to improve the user experience one thing that most applications do is stream back each token as it is generated. This allows the user to see progress.

It's actually super easy to do this!

By default, .stream in our LangGraph application streams application steps-- in this case, the single step of the model response. Setting stream_mode="messages" allows us to stream output tokens instead:

In [55]:
config = {"configurable": {"thread_id": "abc789"}}
query = "Hi I'm Todd, please tell me a joke."
language = "English"

input_messages = [HumanMessage(query)]
for chunk, metadata in app.stream(
    {"messages": input_messages, "language": language},
    config,
    stream_mode="messages",
):
    if isinstance(chunk, AIMessage):  # Filter to just model responses
        print(chunk.content, end="|")

Hi| Todd, nice to meet you! Here's a joke for you:

Why don'|t scientists trust atoms?

Because they make up everything!
|