




# **Build a Chatbot**






***
**Français :** L'objectif est de créer un chatbot intelligent en Python qui est capable de tenir une conversation fluide avec mémoire. Pour cela, on utilisera LangChain v0.3 et LangGraph. Il faudra également permettre la personnalisation du comportement via des prompts dynamiques.

***
***
**English :** The goal is to create an intelligent chatbot in Python that can hold a smooth conversation with memory. To do this, we will use LangChain v0.3 and LangGraph. We will also need to allow behavior customization through dynamic prompts.

***



---

# **QUICK START**



---



In [1]:
import os
from langchain.chat_models import init_chat_model
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
from langchain_core.globals import set_llm_cache
from langchain_core.caches import InMemoryCache

set_llm_cache(InMemoryCache())

load_dotenv()

ollama_api_key = os.environ["OLLAMA_API_KEY"]

heders = headers = {
'Authorization': f"Bearer {ollama_api_key}"
}

model = init_chat_model("llama3.3:70b", model_provider="ollama", base_url="https://ollama.ccad.unc.edu.ar/ollama",
                        client_kwargs={'headers': headers})


In [2]:
from langchain_core.messages import HumanMessage

model.invoke([HumanMessage(content="Hi! I'm Bob")])

AIMessage(content="Hello Bob! It's nice to meet you. Is there something I can help you with or would you like to chat?", additional_kwargs={}, response_metadata={'model': 'llama3.3:70b', 'created_at': '2025-06-10T21:10:13.782283039Z', 'done': True, 'done_reason': 'stop', 'total_duration': 48015950925, 'load_duration': 20477560, 'prompt_eval_count': 15, 'prompt_eval_duration': 2734006868, 'eval_count': 26, 'eval_duration': 45260409671, 'model_name': 'llama3.3:70b'}, id='run--ea15f658-f0a0-460b-aefb-f84db7ade4b5-0', usage_metadata={'input_tokens': 15, 'output_tokens': 26, 'total_tokens': 41})

In [3]:
print(model.invoke([HumanMessage(content="Hi! I'm Bob")]).content)

Hello Bob! It's nice to meet you. Is there something I can help you with or would you like to chat?


In [4]:
model.invoke([HumanMessage(content="What's my name?")])

AIMessage(content="I don't know your name. I'm a large language model, I don't have any information about you or your personal details. Each time you interact with me, it's a new conversation and I don't retain any context or information from previous conversations. If you'd like to share your name with me, I'd be happy to chat with you!", additional_kwargs={}, response_metadata={'model': 'llama3.3:70b', 'created_at': '2025-06-10T21:12:43.511477989Z', 'done': True, 'done_reason': 'stop', 'total_duration': 127505335315, 'load_duration': 19263574, 'prompt_eval_count': 15, 'prompt_eval_duration': 3390411150, 'eval_count': 73, 'eval_duration': 124094996114, 'model_name': 'llama3.3:70b'}, id='run--7d688936-fe79-4ed8-bfad-19d536b0d086-0', usage_metadata={'input_tokens': 15, 'output_tokens': 73, 'total_tokens': 88})

In [5]:
print(model.invoke([HumanMessage(content="What's my name?")]).content)

I don't know your name. I'm a large language model, I don't have any information about you or your personal details. Each time you interact with me, it's a new conversation and I don't retain any context or information from previous conversations. If you'd like to share your name with me, I'd be happy to chat with you!


***

**Français :** On remarque ici, que le model seul ne possède pas de mémoire. Il est incapable de retrouver le nom qui a été donné à la question précédante.

***
***
**English :** Here, we notice that the model alone does not have memory.
It cannot remember the name given to the previous question.

***

In [6]:
from langchain_core.messages import AIMessage

model.invoke(
    [
        HumanMessage(content="Hi! I'm Bob"),
        AIMessage(content="Hello Bob! How can I assist you today?"),
        HumanMessage(content="What's my name?"),
    ]
)

AIMessage(content='Your name is Bob! You told me that when we started chatting.', additional_kwargs={}, response_metadata={'model': 'llama3.3:70b', 'created_at': '2025-06-10T21:39:25.294352029Z', 'done': True, 'done_reason': 'stop', 'total_duration': 15003876117, 'load_duration': 3450937216, 'prompt_eval_count': 40, 'prompt_eval_duration': 3214802017, 'eval_count': 15, 'eval_duration': 8335553423, 'model_name': 'llama3.3:70b'}, id='run--bf90c057-53d2-47ab-8d55-5e2f3b2e1f8c-0', usage_metadata={'input_tokens': 40, 'output_tokens': 15, 'total_tokens': 55})

In [7]:
print(model.invoke(
    [
        HumanMessage(content="Hi! I'm Bob"),
        AIMessage(content="Hello Bob! How can I assist you today?"),
        HumanMessage(content="What's my name?"),
    ]
).content)

Your name is Bob! You told me that when we started chatting.


***

**Français :** Quand on met dans le même message toutes les questions, alors le chat est capable de renvoyer le nom de l'utilisateur. Mais ce n'est pas pratique, car l'utilisateur ne va pas faire une liste de toutes ses informations pour poser sa question. Il faut que le chatbot soit capable de mémoriser les informations envoyées par l'utilisateur.

***
***
**English :** When we put all the questions in the same message, the chat can return the user’s name. But this is not practical, because the user will not list all their information every time they ask a question. The chatbot needs to be able to remember the information the user has given.

***



---

# **PERTINANCE DES MESSAGES**

---



In [8]:
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

# Define a new graph
workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
    response = model.invoke(state["messages"])
    return {"messages": response}


# Define the (single) node in the graph
workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

# Add memory
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

***
**Français :** On définit un pointeur de contrôle pour mémoriser la conversation, avec : "MemorySaver".

***
***
**English :** We define a checkpointer to save the conversation, called "MemorySaver".

***

In [9]:
config = {"configurable": {"thread_id": "abc123"}}

***
**Français :** On utilise "thread_id" comme identifiant de la conversation. Ainsi, plusieurs utilisateurs peuvent utiliser simultanément le chatbot.

***
***
**English :** We use "thread_id" as the conversation identifier. This way, multiple users can use the chatbot at the same time.

***

In [10]:
query = "Hi! I'm Bob."

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()  # output contains all messages in state


Hello Bob! It's nice to meet you. Is there something I can help you with or would you like to chat?


In [11]:
query = "What's my name?"

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


Your name is Bob! You told me that when we started chatting.


***
**Français :** On remarque ainsi que le chatbot a ici mémorisé le nom de l'utilisateur sachant qu'il n'a pas été donnée dans le même message.

***
***
**English :** We see that the chatbot has remembered the user’s name here, even though it was not given in the same message.

***

In [12]:
config = {"configurable": {"thread_id": "abc234"}}

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


I don't know your name. I'm a large language model, I don't have the ability to retain information about individual users or their personal details, including names. Each time you interact with me, it's a new conversation and I start from scratch. If you'd like to share your name with me, I'd be happy to chat with you!


***

**Français :** On remarque aussi que si on change d'identifiant, le chat bot n'est plus capable de répondre à la question. Ainsi, plusieurs utilisateurs peuvent donc utiliser le chatbot simultanément.

***
***
**English :** We also notice that if we change the identifier, the chatbot can no longer answer the question. This way, multiple users can use the chatbot at the same time.

***

In [13]:
config = {"configurable": {"thread_id": "abc123"}}

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


I remember! Your name is Bob.


***
**Français :** Si on repose la question du nom avec le bon identifiant de conversation, le chatbot est à nouveau capable de retrouver la réponse.

La conversion est donc bien mémorisée au bon endroit.

***
***
**English :** If we ask the name question again with the correct conversation identifier, the chatbot is able to find the answer again.

The conversation is thus properly saved in the right place.

***



---

# **PROMPT TEMPLATES**

---



In [14]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You talk like a pirate. Answer all questions to the best of your ability.",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

In [15]:
workflow = StateGraph(state_schema=MessagesState)


def call_model(state: MessagesState):
    prompt = prompt_template.invoke(state)
    response = model.invoke(prompt)
    return {"messages": response}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

***
**Français :** On ajoute des modèles d'invite pour transformer le texte en langage brut envoyé par l'utilisateur en texte utilisable par le LLM. Pour celà, on crée un "ChatPromptTemplate" grâce à "MessagPlaceholder".

On a plus qu'à mettre à jour la fonction qui appelle notre modèle.

***
***
**English :** We add prompt templates to transform the raw text sent by the user into text usable by the LLM. To do this, we create a "ChatPromptTemplate" using "MessagePlaceholder". Then, we just need to update the function that calls our model.

***

In [16]:
config = {"configurable": {"thread_id": "abc345"}}
query = "Hi! I'm Jim."

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


Ahoy, Jim me lad! 'Tis a grand day to be havin' a chat, don't ye think? What brings ye to these fair waters? Are ye lookin' fer treasure, or just passin' the time with a swashbucklin' pirate like meself?


***
**Français :** On remarque que la demande de l'utilisateur a été respecté. En effet, ici le model répond en langage pirate.

***
***
**English :** We notice that the user’s request was followed. Indeed, here the model responds in pirate language.

***

In [17]:
query = "What is my name?"

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


Yer name be Jim, me hearty! I remember ye introducin' yerself to me just a moment ago. Yer a landlubber with a fine moniker, if I do say so meself!


***
**Français :** Le modèle fonctionne toujours aussi bien puisqu'il se souvient du nom et parle toujours en langage pirate.

***
***
**English :** The model still works well because it remembers the name and still speaks in pirate language.

***

In [18]:
prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability in {language}.",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

In [19]:
from typing import Sequence

from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages
from typing_extensions import Annotated, TypedDict


class State(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages]
    language: str


workflow = StateGraph(state_schema=State)


def call_model(state: State):
    prompt = prompt_template.invoke(state)
    response = model.invoke(prompt)
    return {"messages": [response]}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

***
**Français :** On complexifie un peu le prompt en ajoutant un nouveau paramètre qui permet de personnaliser davantage la réponse. On ajoute le choix de la langue de la réponse.

***
***
**English :** We make the prompt a bit more complex by adding a new parameter that allows more customization of the response. We add the choice of the response language.

***

In [20]:
config = {"configurable": {"thread_id": "abc456"}}
query = "Hi! I'm Bob."
language = "Spanish"

input_messages = [HumanMessage(query)]
output = app.invoke(
    {"messages": input_messages, "language": language},
    config,
)
output["messages"][-1].pretty_print()


¡Hola Bob! Encantado de conocerte. ¿En qué puedo ayudarte hoy?


In [21]:
query = "What is my name?"

input_messages = [HumanMessage(query)]
output = app.invoke(
    {"messages": input_messages},
    config,
)
output["messages"][-1].pretty_print()


Tu nombre es Bob. Me lo dijiste cuando te presentaste. ¿Te acuerdas?


***
**Français :** On remarque que tout fonction.

Le chatbot est capable de retrouver son prénom même si l'information n'est pas dans le même message.

Le chatbot répond dans le langue choisie par l'utilisateur, même si celle-ci diffère de la langue du message envoyé par l'utilisateur.

***
***
**English :** We notice that everything works.

The chatbot can remember its name even if the information is not in the same message.

The chatbot responds in the language chosen by the user, even if it is different from the language of the user’s message.

***



---
# **MANAGING CONVERSATION HISTORY**

---




In [23]:
from langchain_core.messages import SystemMessage, trim_messages

trimmer = trim_messages(
    max_tokens=65,
    strategy="last",
    token_counter=model,
    include_system=True,
    allow_partial=False,
    start_on="human",
)

messages = [
    SystemMessage(content="you're a good assistant"),
    HumanMessage(content="hi! I'm bob"),
    AIMessage(content="hi!"),
    HumanMessage(content="I like vanilla ice cream"),
    AIMessage(content="nice"),
    HumanMessage(content="whats 2 + 2"),
    AIMessage(content="4"),
    HumanMessage(content="thanks"),
    AIMessage(content="no problem!"),
    HumanMessage(content="having fun?"),
    AIMessage(content="yes!"),
]

trimmer.invoke(messages)

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

[SystemMessage(content="you're a good assistant", additional_kwargs={}, response_metadata={}),
 HumanMessage(content="hi! I'm bob", additional_kwargs={}, response_metadata={}),
 AIMessage(content='hi!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='I like vanilla ice cream', additional_kwargs={}, response_metadata={}),
 AIMessage(content='nice', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='whats 2 + 2', additional_kwargs={}, response_metadata={}),
 AIMessage(content='4', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='thanks', additional_kwargs={}, response_metadata={}),
 AIMessage(content='no problem!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='having fun?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='yes!', additional_kwargs={}, response_metadata={})]

In [24]:
workflow = StateGraph(state_schema=State)


def call_model(state: State):
    trimmed_messages = trimmer.invoke(state["messages"])
    prompt = prompt_template.invoke(
        {"messages": trimmed_messages, "language": state["language"]}
    )
    response = model.invoke(prompt)
    return {"messages": [response]}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

***
**Français :** On ajoute une étape dans notre code, après avoir chargé les messages précédents et avant le modèle de l'invite. Ainsi, on peut libérer de l'espace mémoire.

Le "Trimmer" permet de spécifier le nombre de tokens que nous voulons conserver et beaucoup d'autres paramètres.

***
***
**English :** We add a step in our code, after loading the previous messages and before the prompt model. This way, we can free up memory space.

The "Trimmer" lets us specify the number of tokens we want to keep and many other settings.

***

In [25]:
config = {"configurable": {"thread_id": "abc567"}}
query = "What is my name?"
language = "English"

input_messages = messages + [HumanMessage(query)]
output = app.invoke(
    {"messages": input_messages, "language": language},
    config,
)
output["messages"][-1].pretty_print()


Your name is Bob. We established that earlier in our conversation!


***
**Français :** On remarque qu'il ne se souveint plus du nom, qui a été retiré de la mémoire pour gain de place.

***
***
**English :** We notice that it no longer remembers the name, which was removed from memory to save space.

***

In [26]:
config = {"configurable": {"thread_id": "abc678"}}
query = "What math problem did I ask?"
language = "English"

input_messages = messages + [HumanMessage(query)]
output = app.invoke(
    {"messages": input_messages, "language": language},
    config,
)
output["messages"][-1].pretty_print()


You asked what 2 + 2 is, and the answer was 4.


***
**Français :** Cependant, on remarque qu'il se souvient cependant du problème de maths posé.

***
***
**English :** However, we notice that it still remembers the math problem that was asked.

***



---

# **STREAMING**


---



In [27]:
config = {"configurable": {"thread_id": "abc789"}}
query = "Hi I'm Todd, please tell me a joke."
language = "English"

input_messages = [HumanMessage(query)]
for chunk, metadata in app.stream(
    {"messages": input_messages, "language": language},
    config,
    stream_mode="messages",
):
    if isinstance(chunk, AIMessage):  # Filter to just model responses
        print(chunk.content, end="|")

Hello| Todd|!| Here|'s| one| for| you|:| Why| couldn|'t| the| bicycle| stand| up| by| itself|?| Because| it| was| two|-t|ired|!| Hope| that| made| you| smile|!| Do| you| want| to| hear| another| one|?||