# Chatbot with Collection Schema 

## Review

We extended our chatbot to save semantic memories to a single [user profile](https://docs.langchain.com/oss/python/concepts/memory#profile). 

We also introduced a library, [Trustcall](https://github.com/hinthornw/trustcall), to update this schema with new information. 

## Goals

Sometimes we want to save memories to a [collection](https://docs.google.com/presentation/d/181mvjlgsnxudQI6S3ritg9sooNyu4AcLLFH1UK0kIuk/edit#slide=id.g30eb3c8cf10_0_200) rather than single profile. 

Here we'll update our chatbot to [save memories to a collection](https://docs.langchain.com/oss/python/concepts/memory#collection).

We'll also show how to use Trustcall to update this collection. 


In [54]:
import uuid
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage
from pydantic import BaseModel, Field
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
from langgraph.store.memory import InMemoryStore
from trustcall import create_extractor
from IPython.display import Image, display
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.store.memory import InMemoryStore
from langchain_core.messages import merge_message_runs
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.runnables.config import RunnableConfig
from langgraph.checkpoint.memory import MemorySaver
from IPython.display import display, Markdown, Image
from langgraph.store.base import BaseStore

## Defining a collection schema

Instead of storing user information in a fixed profile structure, we'll create a flexible collection schema to store memories about user interactions.

Each memory will be stored as a separate entry with a single `content` field for the main information we want to remember

This approach allows us to build an open-ended collection of memories that can grow and change as we learn more about the user.

We can define a collection schema as a [Pydantic](https://docs.pydantic.dev/latest/) object. 

In [2]:
class Memory(BaseModel):
    content: str = Field(
        description="The main content of the memory. For example: User expressed interest in learning about French."
    )

class MemoryCollection(BaseModel):
    memories: list[Memory] = Field(
        description="A list of memories about the user."
    )

We can used LangChain's chat model  [chat model](https://docs.langchain.com/oss/python/langchain/models) interface's [`with_structured_output`](https://docs.langchain.com/oss/python/langchain/models#structured-outputs) method to enforce structured output.

In [3]:
model = ChatOpenAI(temperature=0, model_name="gpt-4o")

In [5]:
# Bind schema to model
model_structured = model.with_structured_output(MemoryCollection)

# Invoke the model to produce structured output that matches the schema
memory_collection = model_structured.invoke([
    HumanMessage(content="Hi, I'm Umer. I love hiking and wandering.")
])

In [6]:
memory_collection.memories

[Memory(content='User introduced themselves as Umer.'),
 Memory(content='User expressed a love for hiking.'),
 Memory(content='User enjoys wandering.')]

In [8]:
memory_collection.memories[0].model_dump()

{'content': 'User introduced themselves as Umer.'}

In [9]:
# Initialize the in-memory store
store = InMemoryStore()

# Namespace for the memory to save
user_id = str(1)
memory_namespace = (user_id, "memories")

# Save a memory to namespace as key and value
key = str(uuid.uuid4())
value = memory_collection.memories[0].model_dump()
store.put(memory_namespace, key, value)

key = str(uuid.uuid4())
value = memory_collection.memories[1].model_dump()
store.put(memory_namespace, key, value)

Search for memories in the store. 

In [10]:
store.get(memory_namespace, key)

Item(namespace=['1', 'memories'], key='b1c294b8-f2a8-4d6a-a422-91b4728d94d7', value={'content': 'User expressed a love for hiking.'}, created_at='2026-01-06T15:34:28.885688+00:00', updated_at='2026-01-06T15:34:28.885688+00:00')

In [11]:
store.search(memory_namespace)

[Item(namespace=['1', 'memories'], key='60da373f-0f7a-4590-854f-b8735bd4ad14', value={'content': 'User introduced themselves as Umer.'}, created_at='2026-01-06T15:34:28.885688+00:00', updated_at='2026-01-06T15:34:28.885688+00:00', score=None),
 Item(namespace=['1', 'memories'], key='b1c294b8-f2a8-4d6a-a422-91b4728d94d7', value={'content': 'User expressed a love for hiking.'}, created_at='2026-01-06T15:34:28.885688+00:00', updated_at='2026-01-06T15:34:28.885688+00:00', score=None)]

In [12]:
for item in store.search(memory_namespace):
    print(item.dict())

{'namespace': ['1', 'memories'], 'key': '60da373f-0f7a-4590-854f-b8735bd4ad14', 'value': {'content': 'User introduced themselves as Umer.'}, 'created_at': '2026-01-06T15:34:28.885688+00:00', 'updated_at': '2026-01-06T15:34:28.885688+00:00', 'score': None}
{'namespace': ['1', 'memories'], 'key': 'b1c294b8-f2a8-4d6a-a422-91b4728d94d7', 'value': {'content': 'User expressed a love for hiking.'}, 'created_at': '2026-01-06T15:34:28.885688+00:00', 'updated_at': '2026-01-06T15:34:28.885688+00:00', 'score': None}


## Updating collection schema

We discussed the challenges with updating a profile schema in the last lesson. 

The same applies for collections! 

We want the ability to update the collection with new memories as well as update existing memories in the collection. 

Now we'll show that [Trustcall](https://github.com/hinthornw/trustcall) can be also used to update a collection. 

This enables both addition of new memories as well as [updating existing memories in the collection](https://github.com/hinthornw/trustcall?tab=readme-ov-file#simultanous-updates--insertions
).

Let's define a new extractor with Trustcall. 

As before, we provide the schema for each memory, `Memory`.  

But, we can supply `enable_inserts=True` to allow the extractor to insert new memories to the collection. 

In [13]:
# define trustcall extractor
trustcall_extractor = create_extractor(
    model,
    tools=[Memory],
    tool_choice="Memory",
    enable_inserts=True
)

In [15]:
# Instruction
instruction = """Extract memories from the following conversation:"""

# Conversation
conversation = [
    HumanMessage(content="Hi, I'm Lance."), 
    AIMessage(content="Nice to meet you, Lance."), 
    HumanMessage(content="This morning I had a nice bike ride in San Francisco.")
]

# Run the extractor
memory_collection = trustcall_extractor.invoke(
    {"messages": [SystemMessage(content=instruction)] + conversation}
)

In [18]:
memory_collection.keys()

dict_keys(['messages', 'responses', 'response_metadata', 'attempts'])

In [20]:
memory_collection['messages']

[AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 108, 'total_tokens': 125, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_deacdd5f6f', 'id': 'chatcmpl-Cv3QBoxlnUyDYpliCoy8RigZk6VZe', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='lc_run--019b93f5-46f0-7b80-8da1-c8b4ea0cb31a-0', tool_calls=[{'name': 'Memory', 'args': {'content': 'Lance had a nice bike ride in San Francisco this morning.'}, 'id': 'call_kMXrQ7qCoN8bvNoySFrJdxIQ', 'type': 'tool_call'}], usage_metadata={'input_tokens': 108, 'output_tokens': 17, 'total_tokens': 125, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning':

In [21]:
# Messages contain the tool calls
for m in memory_collection['messages']:
    m.pretty_print()

Tool Calls:
  Memory (call_kMXrQ7qCoN8bvNoySFrJdxIQ)
 Call ID: call_kMXrQ7qCoN8bvNoySFrJdxIQ
  Args:
    content: Lance had a nice bike ride in San Francisco this morning.


In [23]:
# Metadata contains the tool call  
for m in memory_collection["response_metadata"]: 
    print(m)

{'id': 'call_kMXrQ7qCoN8bvNoySFrJdxIQ'}


In [24]:
# Update the conversation
updated_conversation = [
    AIMessage(content="That's great, did you do after?"), 
    HumanMessage(content="I went to Tartine and ate a croissant."),                        
    AIMessage(content="What else is on your mind?"),
    HumanMessage(content="I was thinking about my Japan, and going back this winter!")
]

# update the instruction
system_msg = """Update existing memories and create new ones based on the following conversation:"""

In [25]:
memory_collection['responses']

[Memory(content='Lance had a nice bike ride in San Francisco this morning.')]

In [28]:
# We'll save existing memories, giving them an ID, key (tool name), and value
tool_name = "Memory"
existing_mem = [(str(i), tool_name, m) for i, m in enumerate(memory_collection['responses'])] if memory_collection['responses'] else []
existing_mem

[('0',
  'Memory',
  Memory(content='Lance had a nice bike ride in San Francisco this morning.'))]

In [31]:
# Invoke the extractor with our updated conversation and existing memories
memory_collection = trustcall_extractor.invoke({
    "messages": [SystemMessage(content=system_msg)] + updated_conversation, 
    "existing": existing_mem
})

In [32]:
# Messages from the model indicate two tool calls were made
for m in memory_collection['messages']:
    m.pretty_print()

Tool Calls:
  Memory (call_o4f8oSqVSIoc0W4sjLRYlXrp)
 Call ID: call_o4f8oSqVSIoc0W4sjLRYlXrp
  Args:
    content: Lance had a nice bike ride in San Francisco this morning.
    -: Lance went to Tartine and ate a croissant after his bike ride in San Francisco.
  Memory (call_mP92FQYT50Mopp8rQrJZ8eCK)
 Call ID: call_mP92FQYT50Mopp8rQrJZ8eCK
  Args:
    content: Lance is thinking about his trip to Japan and considering going back this winter.


In [33]:
# Responses contain the memories that adhere to the schema
for m in memory_collection["responses"]:
    print(m.content)
    print("="*50)

Lance had a nice bike ride in San Francisco this morning.
Lance is thinking about his trip to Japan and considering going back this winter.


In [35]:
# Metadata contains the tool call  
for m in memory_collection["response_metadata"]: 
    print(m)

{'id': 'call_o4f8oSqVSIoc0W4sjLRYlXrp', 'json_doc_id': '0'}
{'id': 'call_mP92FQYT50Mopp8rQrJZ8eCK'}


In [45]:
for r, m in zip(memory_collection['responses'], memory_collection['response_metadata']):
    print(r.model_dump(mode="json"))
    print("="*50)
    print(m)

{'content': 'Lance had a nice bike ride in San Francisco this morning.'}
{'id': 'call_o4f8oSqVSIoc0W4sjLRYlXrp', 'json_doc_id': '0'}
{'content': 'Lance is thinking about his trip to Japan and considering going back this winter.'}
{'id': 'call_mP92FQYT50Mopp8rQrJZ8eCK'}


In [46]:
for r, m in zip(memory_collection['responses'], memory_collection['response_metadata']):
    store.put(
        memory_namespace, 
        m.get("json_doc_id", str(uuid.uuid4())), 
        r.model_dump(mode="json")
    )

In [47]:
memories = store.search(memory_namespace)
memories

[Item(namespace=['1', 'memories'], key='60da373f-0f7a-4590-854f-b8735bd4ad14', value={'content': 'User introduced themselves as Umer.'}, created_at='2026-01-06T15:34:28.885688+00:00', updated_at='2026-01-06T15:34:28.885688+00:00', score=None),
 Item(namespace=['1', 'memories'], key='b1c294b8-f2a8-4d6a-a422-91b4728d94d7', value={'content': 'User expressed a love for hiking.'}, created_at='2026-01-06T15:34:28.885688+00:00', updated_at='2026-01-06T15:34:28.885688+00:00', score=None),
 Item(namespace=['1', 'memories'], key='0', value={'content': 'Lance had a nice bike ride in San Francisco this morning.'}, created_at='2026-01-06T15:58:20.767299+00:00', updated_at='2026-01-06T15:58:20.767299+00:00', score=None),
 Item(namespace=['1', 'memories'], key='1c6ff609-275b-4210-a231-b3807ccaef19', value={'content': 'Lance is thinking about his trip to Japan and considering going back this winter.'}, created_at='2026-01-06T15:58:20.767299+00:00', updated_at='2026-01-06T15:58:20.767299+00:00', score=

In [58]:
display(Markdown("\n".join(mem.value['content'] for mem in memories)))

User introduced themselves as Umer.
User expressed a love for hiking.
Lance had a nice bike ride in San Francisco this morning.
Lance is thinking about his trip to Japan and considering going back this winter.

In [62]:
for mem in memories:
    print("Key ->", mem.key)
    print("Value ->", mem.value)
    print("="*50)

Key -> 60da373f-0f7a-4590-854f-b8735bd4ad14
Value -> {'content': 'User introduced themselves as Umer.'}
Key -> b1c294b8-f2a8-4d6a-a422-91b4728d94d7
Value -> {'content': 'User expressed a love for hiking.'}
Key -> 0
Value -> {'content': 'Lance had a nice bike ride in San Francisco this morning.'}
Key -> 1c6ff609-275b-4210-a231-b3807ccaef19
Value -> {'content': 'Lance is thinking about his trip to Japan and considering going back this winter.'}


In [63]:
[   
    (existing_itm.key, "Memory", existing_itm.value)
    for existing_itm in memories
] if memories else None

[('60da373f-0f7a-4590-854f-b8735bd4ad14',
  'Memory',
  {'content': 'User introduced themselves as Umer.'}),
 ('b1c294b8-f2a8-4d6a-a422-91b4728d94d7',
  'Memory',
  {'content': 'User expressed a love for hiking.'}),
 ('0',
  'Memory',
  {'content': 'Lance had a nice bike ride in San Francisco this morning.'}),
 ('1c6ff609-275b-4210-a231-b3807ccaef19',
  'Memory',
  {'content': 'Lance is thinking about his trip to Japan and considering going back this winter.'})]

In [64]:
merge_message_runs(
    [SystemMessage(content="ABC")] + [HumanMessage(content="Hi, I'm Lance.")]
)

[SystemMessage(content='ABC', additional_kwargs={}, response_metadata={}),
 HumanMessage(content="Hi, I'm Lance.", additional_kwargs={}, response_metadata={})]

## Chatbot with collection schema updating

Now, let's bring Trustcall into our chatbot to create and update a memory collection.

In [36]:
# Initialize the model
model = ChatOpenAI(model="gpt-4o", temperature=0)

In [37]:
# Memory schema
class Memory(BaseModel):
    content: str = Field(
        description="The main content of the memory. For example: User expressed interest in learning about French."
    )

In [38]:
# Create the Trustcall extractor
trustcall_extractor = create_extractor(
    model,
    tools=[Memory],
    tool_choice="Memory",
    enable_inserts=True                 # This will allow the extractor to insert new memories to the collection.
)

In [39]:
# Chatbot instruction
MODEL_SYSTEM_MESSAGE = """
    You are a helpful chatbot. You are designed to be a companion to a user. 
    You have a long term memory which keeps track of information you learn about the user over time.
    Current Memory (may include updated memories from this conversation): {memory}
"""

# Trustcall instruction
TRUSTCALL_INSTRUCTION = """
    Reflect on following interaction. 
    Use the provided tools to retain any necessary memories about the user. 
    Use parallel tool calling to handle updates and insertions simultaneously:
"""

In [59]:
def call_model(state: MessagesState, config: RunnableConfig, store: BaseStore):
    """Load memories from the store and use them to personalize the chatbot's response."""

    # Get the user ID from the config
    user_id = config["configurable"]["user_id"]

    # Retrieve memory from the store
    namespace = ("memories", user_id)
    memories = store.search(namespace)

    # Format the memories for the system prompt
    info = "\n".join(f"- {mem.value['content']}" for mem in memories)

    # system prompt
    system_msg = [SystemMessage(content=MODEL_SYSTEM_MESSAGE.format(memory=info))]

    # Respond using memory as well as the chat history
    response = model.invoke(system_msg + state["messages"])

    return {"messages": response}

In [75]:
def write_memory(state: MessagesState, config: RunnableConfig, store: BaseStore):
    """Reflect on the chat history and update the memory collection."""

    # Get the user ID from the config
    user_id = config["configurable"]["user_id"]

    # Retrieve memory from the store
    namespace = ("memories", user_id)

    # existing memories
    memories = store.search(namespace)

    tool_name = "Memory"
    existing_memories = ([   
                            (existing_itm.key, tool_name, existing_itm.value)
                            for existing_itm in memories
        ] if memories else None)
    
    # system message
    system_msg = [SystemMessage(content=TRUSTCALL_INSTRUCTION)]

    # Merge the chat history and the instruction
    updated_mssgs = list(merge_message_runs(
        messages=system_msg + state["messages"]
    ))

    # Invoke the extractor
    memory_collection = trustcall_extractor.invoke(
        {"messages": updated_mssgs, "existing": existing_memories}
    )

    for r, m in zip(memory_collection['responses'], memory_collection['response_metadata']):
        store.put(
            memory_namespace, 
            m.get("json_doc_id", str(uuid.uuid4())), 
            r.model_dump(mode="json")
        )

In [76]:
# Define the graph
builder = StateGraph(MessagesState)
builder.add_node("call_model", call_model)
builder.add_node("write_memory", write_memory)
builder.add_edge(START, "call_model")
builder.add_edge("call_model", "write_memory")
builder.add_edge("write_memory", END)

# Store for long-term (across-thread) memory
across_thread_memory = InMemoryStore()

# Checkpointer for short-term (within-thread) memory
within_thread_memory = MemorySaver()

# Compile the graph with the checkpointer fir and store
graph = builder.compile(checkpointer=within_thread_memory, store=across_thread_memory)

In [77]:
# # View
# display(Image(graph.get_graph(xray=1).draw_mermaid_png()))

In [78]:
# We supply a thread ID for short-term (within-thread) memory
# We supply a user ID for long-term (across-thread) memory 
config = {"configurable": {"thread_id": "1", "user_id": "1"}}

# User input 
input_messages = [HumanMessage(content="Hi, my name is Umer")]

# Run the graph
for chunk in graph.stream({"messages": input_messages}, config, stream_mode="values"):
    chunk["messages"][-1].pretty_print()


Hi, my name is Umer



Hello, Umer! It's nice to meet you. How can I assist you today?


In [79]:
# User input 
input_messages = [HumanMessage(content="I love to spend times alone.")]

# Run the graph
for chunk in graph.stream({"messages": input_messages}, config, stream_mode="values"):
    chunk["messages"][-1].pretty_print()


I love to spend times alone.

That sounds peaceful, Umer. Spending time alone can be a great way to relax and recharge. What do you enjoy doing during your alone time?


In [80]:
# Namespace for the memory to save
memory_namespace = ("memories", "1")

# Retrieve memory from the store
memories = across_thread_memory.search(memory_namespace)

In [81]:
memories

[Item(namespace=['memories', '1'], key='911d6871-9242-4fd0-b076-3deaec93f4ee', value={'content': "User's name is Umer."}, created_at='2026-01-06T16:17:56.761914+00:00', updated_at='2026-01-06T16:17:56.761914+00:00', score=None),
 Item(namespace=['memories', '1'], key='42303e7d-d8cc-47a1-9d03-5810b9dd9d1e', value={'content': 'User loves to spend time alone.'}, created_at='2026-01-06T16:18:00.476685+00:00', updated_at='2026-01-06T16:18:00.476685+00:00', score=None)]

In [83]:
for mem in memories:
    print("Key ->", mem.key)
    print("Value ->", mem.value)
    print("="*50)

Key -> 911d6871-9242-4fd0-b076-3deaec93f4ee
Value -> {'content': "User's name is Umer."}
Key -> 42303e7d-d8cc-47a1-9d03-5810b9dd9d1e
Value -> {'content': 'User loves to spend time alone.'}


In [84]:
# User input 
input_messages = [HumanMessage(content="I also love hiking.")]

# Run the graph
for chunk in graph.stream({"messages": input_messages}, config, stream_mode="values"):
    chunk["messages"][-1].pretty_print()


I also love hiking.

Hiking is a wonderful way to enjoy nature and get some exercise. Do you have any favorite trails or places you like to hike?


In [85]:
# We supply a thread ID for short-term (within-thread) memory
# We supply a user ID for long-term (across-thread) memory 
config = {"configurable": {"thread_id": "2", "user_id": "1"}}

# User input 
input_messages = [HumanMessage(content="What do you recommend for me to spend my time for myself?")]

# Run the graph
for chunk in graph.stream({"messages": input_messages}, config, stream_mode="values"):
    chunk["messages"][-1].pretty_print()


What do you recommend for me to spend my time for myself?

Since you enjoy spending time alone and love hiking, you might consider exploring new hiking trails or perhaps even planning a solo hiking trip to a place you've never been before. If you're looking for something different, you could try activities like photography during your hikes, journaling about your experiences, or even learning about the local flora and fauna. These activities can enhance your hiking experience and provide a deeper connection with nature. How does that sound?
