[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/03-langchain-conversational-memory.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/03-langchain-conversational-memory.ipynb)

#### [LangChain Handbook](https://www.pinecone.io/learn/series/langchain/)

# Conversational Memory with LCEL

Conversational memory is how chatbots can respond to our queries in a chat-like manner. It enables a coherent conversation, and without it, every query would be treated as an entirely independent input without considering past interactions.

The memory allows an _"agent"_ to remember previous interactions with the user. By default, agents are *stateless* — meaning each incoming query is processed independently of other interactions. The only thing that exists for a stateless agent is the current input, nothing else.

There are many applications where remembering previous interactions is very important, such as chatbots. Conversational memory allows us to do that.

In this notebook we'll explore conversational memory using modern LangChain Expression Language (LCEL) and the recommended `RunnableWithMessageHistory` class.

We'll start by importing all of the libraries that we'll be using in this example.

In [1]:
!pip install -qU \
  langchain==0.3.25 \
  langchain-community==0.3.25 \
  langchain-openai==0.3.22 \
  tiktoken==0.9.0

In [2]:
from IPython.display import display, Markdown
import os
from getpass import getpass

# must enter API key
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY") or \
    getpass("Enter LangSmith API Key: ")

# below should not be changed
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
# you can change this as preferred
os.environ["LANGCHAIN_PROJECT"] = "langchain-pinecone-io-walkthrough-conversational-memory"

To run this notebook, we will need to use an OpenAI LLM. Here we will setup the LLM we will use for the whole notebook, just input your openai api key if prompted, otherwise it will use the `OPENAI_API_KEY` environment variable.

In [3]:
import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") \
    or getpass("Enter your OpenAI API key: ")

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

In [9]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    temperature=1.0,
    openai_api_key=OPENAI_API_KEY,
    model_name='gpt-5-mini'
)

Later we will make use of a `count_tokens` utility function. This will allow us to count the number of tokens we are using for each call. We define it as so:

In [5]:
from langchain.callbacks import get_openai_callback

def count_tokens(pipeline, query, config=None):
    with get_openai_callback() as cb:
        # Handle both dict and string inputs
        if isinstance(query, str):
            query = {"query": query}

        # Use provided config `or default
        if config is None:
            config = {"configurable": {"session_id": "default"}}

        result = pipeline.invoke(query, config=config)
        print(f'Spent a total of {cb.total_tokens} tokens')

    return result

Now let's dive into **Conversational Memory** using LCEL.

## What is memory?

**Definition**: Memory is an agent's capacity of remembering previous interactions with the user (think chatbots)

The official definition of memory is the following:

> By default, Chains and Agents are stateless, meaning that they treat each incoming query independently. In some applications (chatbots being a GREAT example) it is highly important to remember previous interactions, both at a short term but also at a long term level. The concept of "Memory" exists to do exactly that.

As we will see, although this sounds really straightforward there are several different ways to implement this memory capability.

## Building Conversational Chains with LCEL

Before we delve into the different memory types, let's understand how to build conversational chains using LCEL. The key components are:

1. **Prompt Template** - Defines the conversation structure with placeholders for history and input
2. **LLM** - The language model that generates responses
3. **Output Parser** - Converts the LLM output to the desired format (optional)
4. **RunnableWithMessageHistory** - Manages conversation history

Let's create our base conversational chain:

In [10]:
from langchain.prompts import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder
)
from langchain.schema.output_parser import StrOutputParser

# Define the prompt template
system_prompt = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know."""

prompt_template = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(system_prompt),
    MessagesPlaceholder(variable_name="history"),
    HumanMessagePromptTemplate.from_template("{query}"),
])

# Create the LCEL pipeline
output_parser = StrOutputParser()
pipeline = prompt_template | llm | output_parser

# Let's examine the prompt template
print(prompt_template.messages[0].prompt.template)

The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.


## Memory types

In this section we will review several memory types and analyze the pros and cons of each one, so you can choose the best one for your use case.

### Memory Type #1: Buffer Memory - Store the Entire Chat History

`InMemoryChatMessageHistory` and `RunnableWithMessageHistory` are used as alternatives to `ConversationBufferMemory` as they are:
- More flexible and configurable.
- Integrate better with LCEL.

The simplest approach to using them is to simply store the entire chat in the conversation history. Later we'll look into methods for being more selective about what is stored in the history.

In [7]:
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# Create a simple chat history storage
chat_map = {}

def get_chat_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = InMemoryChatMessageHistory()
    return chat_map[session_id]

Let's see this in action by having a conversation:

In [11]:
# Create the conversational chain with message history
conversation_buf = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_chat_history,
    input_messages_key="query",
    history_messages_key="history"
)

# First message
result = conversation_buf.invoke(
    {"query": "Good morning AI!"},
    # Make sure to pass the session ID to ensure all memories are stored in the same session
    config={"configurable": {"session_id": "buffer_example"}}
)
print(result)

Good morning! I’m an AI assistant — ready to help with questions, writing, code, planning, brainstorming, summaries, translations, and more. What would you like to do today?


This call used some tokens, but we can't see that from the above.

If we'd like to count the number of tokens being used we just pass our conversation `RunnableWithMessageHistory` instance and the message we'd like to input to the `count_tokens` function we defined earlier:

In [12]:
# Continue the conversation with token counting
query = """
"My interest here is to explore the potential of integrating Large Language Models with external knowledge.

Also, remember this very specific code: PINECONE_RULEZ_01"
"""

result = count_tokens(
    conversation_buf,
    {"query": query},
    # Make sure to pass the session ID to ensure all memories are stored in the same session
    config={"configurable": {"session_id": "buffer_example"}}
)
print(f"\nResponse: {result}")

Spent a total of 1930 tokens

Response: Nice — this is a great area to explore. I’ll remember the code PINECONE_RULEZ_01 for the remainder of this chat (I can’t persist it across separate sessions unless you store it somewhere permanent). Would you like me to use that code in examples or to tag specific items in our discussion?

Below are practical options, architecture patterns, and concrete considerations for integrating LLMs with external knowledge.

High-level patterns
- Retrieval-Augmented Generation (RAG): store external documents as embeddings in a vector DB (Pinecone, FAISS, Milvus). At query time, embed the user query, retrieve nearest chunks, then condition the LLM on those retrieved passages to produce grounded answers.
- Hybrid retrieval: combine traditional lexical search (BM25) with vector search to handle exact-match queries and semantic ones.
- Tooling / API chaining: have the LLM call external tools (search APIs, databases, calculators) for tasks requiring up-to-date f

In [13]:
result = count_tokens(
    conversation_buf,
    {"query": "I just want to analyze the different possibilities. What can you think of?"},
    config={"configurable": {"session_id": "buffer_example"}}
)
print(f"\nResponse: {result}")

Spent a total of 4398 tokens

Response: Great — below is a broad, structured analysis of the main possibilities for integrating Large Language Models with external knowledge. I’ll give each option a plain description, core mechanics, concrete pros/cons, common parameter choices, best-fit use cases, and practical considerations (tools, costs, evaluation). If you want to deep-dive any single option I can produce architecture diagrams, code, or an experiment plan.

Summary / decision tree (quick)
- Need recency/grounding + low hallucination: Retrieval-Augmented Generation (RAG) + reranker + verification.
- Need structured queries and explainability: Knowledge Graph / database + LLM for natural language.
- Need action-taking and live data: Tool-enabled Agents (APIs, search, calculators).
- Need long-term personalization / memory: Memory-augmented LLMs (vector DB + lifecycle + user profiles).
- High throughput & low cost: Lightweight embeddings + hybrid retrieval + caching + smaller LLMs / 

In [14]:
result = count_tokens(
    conversation_buf,
    {"query": "Which data source types could be used to give context to the model?"},
    config={"configurable": {"session_id": "buffer_example"}}
)
print(f"\nResponse: {result}")

Spent a total of 6246 tokens

Response: Good question — almost any persistent or live information source can be used to give context to an LLM. Below is a practical, categorized catalogue of data source types, with a short description of how you’d ingest/use them, pros/cons, and integration notes (parsing, indexing, privacy, retrieval). Tell me which of these you want to dive into and I’ll give code and architecture sketches.

Textual / document sources
- Plain text files (txt):
  - Integration: read, chunk, embed.
  - Pros: trivial parsing, low noise.
  - Cons: no structure or metadata unless added.
  - Use cases: notes, transcripts.
- Word docs / PDFs / PowerPoints:
  - Integration: PDF/Office parsers (pdfplumber, Apache Tika, tika-server), OCR for scanned PDFs; chunk + embed; preserve page/slide metadata.
  - Pros: common corporate docs, rich content.
  - Cons: noisy layout, OCR errors, requires chunking and overlap.
- HTML / Websites / Blogs:
  - Integration: crawl (Scrapy, Playwri

In [15]:
result = count_tokens(
    conversation_buf,
    {"query": "What is my aim again? Also what was the very specific code you were tasked with remembering?"},
    config={"configurable": {"session_id": "buffer_example"}}
)
print(f"\nResponse: {result}")

Spent a total of 6382 tokens

Response: Your aim in this conversation is to explore the potential of integrating Large Language Models with external knowledge — i.e., evaluate architectures, data sources, retrieval/verification patterns, tradeoffs (accuracy, freshness, latency, privacy, cost), and options like RAG, KGs, tool-enabled agents, memory, etc.

The very specific code you asked me to remember is: PINECONE_RULEZ_01. I’ll keep that available for this session — would you like me to use it in examples or tag items with it?


Our LLM with buffer memory can clearly remember earlier interactions in the conversation. Let's take a closer look at how the messages are being stored:

In [16]:
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage

# Access the conversation history
history = chat_map["buffer_example"].messages
print("Conversation History:")
for i, msg in enumerate(history):
    if isinstance(msg, HumanMessage):
        role = "Human"
    elif isinstance(msg, SystemMessage):
        role = "System"
    elif isinstance(msg, AIMessage):
        role = "AI"
    else:
        role = "Unknown"
    print(f"{role}: {msg}")

Conversation History:
Human: content='Good morning AI!' additional_kwargs={} response_metadata={}
AI: content='Good morning! I’m an AI assistant — ready to help with questions, writing, code, planning, brainstorming, summaries, translations, and more. What would you like to do today?' additional_kwargs={} response_metadata={}
Human: content='\n"My interest here is to explore the potential of integrating Large Language Models with external knowledge.\n\nAlso, remember this very specific code: PINECONE_RULEZ_01"\n' additional_kwargs={} response_metadata={}
AI: content='Nice — this is a great area to explore. I’ll remember the code PINECONE_RULEZ_01 for the remainder of this chat (I can’t persist it across separate sessions unless you store it somewhere permanent). Would you like me to use that code in examples or to tag specific items in our discussion?\n\nBelow are practical options, architecture patterns, and concrete considerations for integrating LLMs with external knowledge.\n\nHigh

Nice! So every piece of our conversation has been explicitly recorded and sent to the LLM in the prompt.

### Memory type #2: Summary - Store Summaries of Past Interactions

The problem with storing the entire chat history in agent memory is that, as the conversation progresses, the token count adds up. This is problematic because we might max out our LLM with a prompt that is too large.

The following is an LCEL compatible alternative to `ConversationSummaryMemory`. We keep a summary of our previous conversation snippets as our history. The summarization is performed by an LLM.

**Key feature:** _the conversation summary memory keeps the previous pieces of conversation in a summarized - and thus shortened - form, where the summarization is performed by an LLM._

In [17]:
from pydantic import BaseModel, Field
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import BaseMessage

class ConversationSummaryMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: list[BaseMessage] = Field(default_factory=list)
    llm: ChatOpenAI = Field(default_factory=ChatOpenAI)

    def __init__(self, llm: ChatOpenAI):
        super().__init__(llm=llm)

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add messages to the history and update the summary."""
        self.messages.extend(messages)

        # Construct the summary prompt
        summary_prompt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(
                "Given the existing conversation summary and the new messages, "
                "generate a new summary of the conversation. Ensure to maintain "
                "as much relevant information as possible."
            ),
            HumanMessagePromptTemplate.from_template(
                "Existing conversation summary:\n{existing_summary}\n\n"
                "New messages:\n{messages}"
            )
        ])

        # Format the messages and invoke the LLM
        new_summary = self.llm.invoke(
            summary_prompt.format_messages(
                existing_summary=self.messages,
                messages=messages
            )
        )

        # Replace the existing history with a single system summary message
        self.messages = [SystemMessage(content=new_summary.content)]

    def clear(self) -> None:
        """Clear the history."""
        self.messages = []

In [18]:
from langchain_core.runnables import ConfigurableFieldSpec

# Create get_chat_history function for summary memory
summary_chat_map = {}

def get_summary_chat_history(session_id: str, llm: ChatOpenAI) -> ConversationSummaryMessageHistory:
    if session_id not in summary_chat_map:
        summary_chat_map[session_id] = ConversationSummaryMessageHistory(llm=llm)
    return summary_chat_map[session_id]

# Create conversation chain with summary memory
conversation_sum = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_summary_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="The session ID to use for the chat history",
            default="id_default",
        ),
        ConfigurableFieldSpec(
            id="llm",
            annotation=ChatOpenAI,
            name="LLM",
            description="The LLM to use for the conversation summary",
            default=llm,
        )
    ]
)

In [19]:
# Let's have the same conversation with summary memory
result = count_tokens(
    conversation_sum,
    {"query": "Good morning AI!"},
    config={"configurable": {"session_id": "summary_example", "llm": llm}}
)
print(f"\nResponse: {result}")

Spent a total of 539 tokens

Response: Good morning! How can I help you today — do you want news, a summary of something, help with a task (writing, coding, planning), a recipe, or something else?


In [20]:
query = """
"My interest here is to explore the potential of integrating Large Language Models with external knowledge.

Also, remember this very specific code: PINECONE_RULEZ_01. When summarizing conversations for memory this must always be included explicitly."
"""

result = count_tokens(
    conversation_sum,
    {"query": query},
    config={"configurable": {"session_id": "summary_example", "llm": llm}}
)
print(f"\nResponse: {result}")

Spent a total of 5768 tokens

Response: Great — this is a great area to explore. I can help design architectures, give implementation guidance, provide example code, and discuss trade-offs for integrating LLMs with external knowledge.

First, about the code you asked me to remember: PINECONE_RULEZ_01
- I acknowledge it. Important note: I don’t have persistent user memory in this chat unless a memory feature is enabled by the platform. If memory is enabled and you want this stored, I can save non-sensitive items like that. If it’s not enabled, I can include the code in summaries I write for you here in-session, but I can’t guarantee it will persist between sessions. Also tell me whether the code is sensitive; if it is, we should treat it accordingly.

Below is a concise, practical plan and the main options for integrating LLMs with external knowledge, plus a short example stack if you want a concrete starting point.

1) High-level approaches
- Retrieval-Augmented Generation (RAG): index

In [21]:
result = count_tokens(
    conversation_sum,
    {"query": "I just want to analyze the different possibilities. What can you think of?"},
    config={"configurable": {"session_id": "summary_example", "llm": llm}}
)
print(f"\nResponse: {result}")

Spent a total of 9390 tokens

Response: Great — here’s a structured, practical survey of the space with concrete tradeoffs and recommended patterns so you can compare possibilities quickly.

High-level approaches
- Closed‑book LLM (no external knowledge)
  - How: Use an LLM’s internal weights to answer questions (possibly after fine-tuning).
  - When: Small domain, stable knowledge, extremely low infra complexity.
  - Pros: Simple, low latency, no external infra. Cons: Prone to hallucination, knowledge staleness, limited capacity.
- Fine-tuning / adapters / LoRA
  - How: Adapt a base model to domain data so the model internalizes knowledge and style.
  - When: Need high-quality, domain-specific behavior and you can afford training.
  - Pros: Better closed-book accuracy, fewer runtime lookups. Cons: Costly to train/maintain, hard to keep fully up-to-date.
- Retrieval-Augmented Generation (RAG)
  - How: Use vector (semantic) + optional lexical search to retrieve relevant doc chunks, then

In [22]:
result = count_tokens(
    conversation_sum,
    {"query": "Which data source types could be used to give context to the model?"},
    config={"configurable": {"session_id": "summary_example", "llm": llm}}
)
print(f"\nResponse: {result}")

Spent a total of 8694 tokens

Response: Short answer: almost any digital data can be used to give context to an LLM. Below is a practical taxonomy of common source types, with key pros/cons, ingestion/format notes, and suggestions for how to use each in RAG, fine-tuning, or tool-based approaches.

1) Plain text documents
- Examples: TXT, Markdown, README files.
- Pros: easy to index/embed; high signal for language tasks.
- Cons: may be unstructured/verbose.
- Ingestion: chunk by paragraphs/sentences (≈500–1,000 tokens), keep source metadata (title, path, date).
- Use: RAG, prompt injection context, fine-tuning if corpus large.

2) Office documents & PDFs
- Examples: DOCX, PPTX, PDF.
- Pros: common for policies, specs, slides.
- Cons: layout/columns and OCR issues for scans.
- Ingestion: use PDF parsers + OCR for scanned docs; extract structure (headings, tables).
- Use: RAG for retrieval—preserve page/section metadata for citation.

3) HTML / web pages / intranet sites
- Examples: prod

In [23]:
result = count_tokens(
    conversation_sum,
    {"query": "What is my aim again? Also what was the very specific code you were tasked with remembering?"},
    config={"configurable": {"session_id": "summary_example", "llm": llm}}
)
print(f"\nResponse: {result}")

Spent a total of 4822 tokens

Response: Your aim: to analyze approaches for integrating LLMs with external knowledge and to understand what data sources can provide context for the model.

The specific code you asked me to remember: PINECONE_RULEZ_01

You previously asked that this code "must always be included explicitly" when summarizing conversations for memory. I can repeat it and include it in any in-session summaries, but I can’t guarantee persistence across separate sessions unless the platform provides a persistent memory feature. Do you want me to (a) include the code in all in-session summaries, (b) store it if/when a platform memory feature is available, or (c) stop including it? Also, is the code sensitive?


In [24]:
# Let's examine the summary
print("Summary Memory Content:")
print(summary_chat_map["summary_example"].messages[0].content)

Summary Memory Content:
Conversation summary:

- Human goal: Analyze approaches for integrating LLMs with external knowledge and understand what data sources can provide context.

- Memory/code note:
  - The user previously asked the assistant to remember the code PINECONE_RULEZ_01 and requested it "must always be included explicitly" when summarizing conversations for memory.
  - Assistant response: acknowledged the request, repeated the code, explained it cannot guarantee persistence across separate sessions unless the platform provides a persistent memory feature, offered to include the code in in-session summaries, asked whether to (a) include it in all in-session summaries, (b) store it if/when a platform memory feature is available, or (c) stop including it, and asked whether the code is sensitive.
  - The user repeated the question "What is my aim again? Also what was the very specific code you were tasked with remembering?" and the assistant reiterated:
    - Aim: to analyze ap

You might be wondering.. if the aggregate token count is greater in each call here than in the buffer example, why should we use this type of memory? Well, if we check out buffer we will realize that although we are using more tokens in each instance of our conversation, our final history is shorter. This will enable us to have many more interactions before we reach our prompt's max length, making our chatbot more robust to longer conversations.

We can count the number of tokens being used (without making a call to OpenAI) using the `tiktoken` tokenizer like so:

In [26]:
import tiktoken

# initialize tokenizer (gpt-4.1 models use the same encoding as gpt-4o)
tokenizer = tiktoken.encoding_for_model('gpt-4.1-mini')

# Get buffer memory content
buffer_messages = chat_map["buffer_example"].messages
buffer_content = "\n".join([msg.content for msg in buffer_messages])

# Get summary memory content
summary_content = summary_chat_map["summary_example"].messages[0].content

# show number of tokens for the memory used by each memory type
print(
    f'Buffer memory conversation length: {len(tokenizer.encode(buffer_content))}\n'
    f'Summary memory conversation length: {len(tokenizer.encode(summary_content))}'
)

Buffer memory conversation length: 5953
Summary memory conversation length: 1182


_Practical Note: the `gpt-4o-mini` model has a context window of 1M tokens, providing significantly more space for conversation history than older models._

### Memory type #3: Window Buffer Memory - Keep Latest Interactions

Another great option is window memory, where we keep only the last k interactions in our memory but intentionally drop the oldest ones - short-term memory if you'd like. Here the aggregate token count **and** the per-call token count will drop noticeably.

The following is an LCEL-compatible alternative to `ConversationBufferWindowMemory`.

**Key feature:** _the conversation buffer window memory keeps the latest pieces of the conversation in raw form_

In [27]:
class BufferWindowMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: list[BaseMessage] = Field(default_factory=list)
    k: int = Field(default_factory=int)

    def __init__(self, k: int):
        super().__init__(k=k)
        # Add logging to help with debugging
        print(f"Initializing BufferWindowMessageHistory with k={k}")

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add messages to the history, removing any messages beyond
        the last `k` messages.
        """
        self.messages.extend(messages)
        # Add logging to help with debugging
        if len(self.messages) > self.k:
            print(f"Truncating history from {len(self.messages)} to {self.k} messages")
        self.messages = self.messages[-self.k:]

    def clear(self) -> None:
        """Clear the history."""
        self.messages = []

In [28]:
# Create get_chat_history function for window memory
window_chat_map = {}

def get_window_chat_history(session_id: str, k: int = 4) -> BufferWindowMessageHistory:
    print(f"get_window_chat_history called with session_id={session_id} and k={k}")
    if session_id not in window_chat_map:
        window_chat_map[session_id] = BufferWindowMessageHistory(k=k)
    return window_chat_map[session_id]

# Create conversation chain with window memory
conversation_bufw = RunnableWithMessageHistory(
    pipeline,
    get_session_history=get_window_chat_history,
    input_messages_key="query",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="The session ID to use for the chat history",
            default="id_default",
        ),
        ConfigurableFieldSpec(
            id="k",
            annotation=int,
            name="k",
            description="The number of messages to keep in the history",
            default=4,
        )
    ]
)

In [29]:
# Start a conversation with k=2 (only remembers last 2 exchanges = 4 messages)
result = count_tokens(
    conversation_bufw,
    {"query": "Good morning AI!"},
    config={"configurable": {"session_id": "window_example", "k": 4}}
)
print(f"\nResponse: {result}")

get_window_chat_history called with session_id=window_example and k=4
Initializing BufferWindowMessageHistory with k=4
Spent a total of 667 tokens

Response: Good morning! Nice to hear from you — how can I help start your day?

Here are a few things I can do right away (pick one or tell me something else):
- Quick news summary tailored to your interests (tech, world, finance, sports)
- Plan a 20–30 minute morning workout or a 10-minute stretch routine
- Suggest breakfast ideas based on what’s in your fridge/pantry
- Draft or polish an email/message or prep talking points for a meeting
- Build a simple schedule or prioritize today’s to‑do list
- Get the weather and commute time for your city (tell me where you are)

Or just tell me how you’re feeling and I’ll suggest a gentle routine.


In [30]:
query = """
"My interest here is to explore the potential of integrating Large Language
Models with external knowledge.

Also, remember this very specific code: PINECONE_RULEZ_01"
"""

result = count_tokens(
    conversation_bufw,
    {"query": query},
    config={"configurable": {"session_id": "window_example", "k": 4}}
)
print(f"\nResponse: {result}")

get_window_chat_history called with session_id=window_example and k=4
Spent a total of 2041 tokens

Response: Got it — you're interested in integrating large language models with external knowledge, and you want me to remember this code: PINECONE_RULEZ_01.

Quick note about the code: I will remember PINECONE_RULEZ_01 for the rest of this chat and can use it in examples or tests here. I don’t have permanent storage across separate sessions unless you explicitly store it somewhere (a file, a database, or a persistent assistant memory feature if your platform provides one). If you want it saved long‑term, tell me where to put it or I can show you a snippet to store it (e.g., in a secure vault or a vector DB).

Below is a practical plan and concrete options for exploring LLM + external knowledge integration, plus a short example of how a retrieval-augmented pipeline (using Pinecone-style vector store) typically looks.

1) Key integration patterns
- Retrieval-Augmented Generation (RAG): Ind

In [31]:
result = count_tokens(
    conversation_bufw,
    {"query": "I just want to analyze the different possibilities. What can you think of?"},
    config={"configurable": {"session_id": "window_example", "k": 4}}
)
print(f"\nResponse: {result}")

get_window_chat_history called with session_id=window_example and k=4
Truncating history from 6 to 4 messages
Spent a total of 3787 tokens

Response: Great — here’s a structured and fairly comprehensive survey of the different possibilities for integrating LLMs with external knowledge. I’ll group options by approach, list tradeoffs, practical considerations, example uses, and suggested next steps so you can analyze which direction fits your goals.

High-level categories
- Retrieval-Augmented Generation (RAG)
- Tool / API calling (LLM as orchestrator)
- Fine-tuning / supervised training on domain data
- Knowledge-graph / symbolic integration
- Search-first systems (web / enterprise search)
- Hybrid / multi-stage pipelines and agents
- Local / on-device vs cloud architectures

1) Retrieval-Augmented Generation (RAG)
- What it is: Index docs as embeddings in a vector DB; for each query, retrieve relevant chunks and pass them as context to the LLM.
- Pros: Keeps model generic; low cost to 

In [32]:
result = count_tokens(
    conversation_bufw,
    {"query": "Which data source types could be used to give context to the model?"},
    config={"configurable": {"session_id": "window_example", "k": 4}}
)
print(f"\nResponse: {result}")

get_window_chat_history called with session_id=window_example and k=4
Truncating history from 6 to 4 messages
Spent a total of 4941 tokens

Response: Many types of data can be used to give context to an LLM. Below I’ll group them, give concrete examples, note format/processing needs, and call out when you should prioritize them for typical use cases.

Unstructured text (easy to start)
- Knowledge bases & FAQs: support articles, help center pages, troubleshooting guides.
  - Processing: chunking, keep Q/A pairs intact where possible.
- Documentation & manuals: product docs, API docs, user guides, runbooks.
  - Good for technical assistants; preserve structure (sections/headings).
- Emails & chat transcripts: support tickets, Slack/Teams threads, call logs.
  - Processing: speaker normalization, de-duplication, metadata (date/user).
- Internal wikis & Confluence pages: policies, onboarding docs, playbooks.
  - Preserve titles and updated timestamps.

Semi-structured text and documents
- 

In [33]:
result = count_tokens(
    conversation_bufw,
    {"query": "What is my aim again?"},
    config={"configurable": {"session_id": "window_example", "k": 4}}
)
print(f"\nResponse: {result}")

get_window_chat_history called with session_id=window_example and k=4
Truncating history from 6 to 4 messages
Spent a total of 3758 tokens

Response: Your stated aim was: to analyze the different possibilities for integrating large language models with external knowledge — i.e., survey approaches, tradeoffs, data sources, and architectures so you can compare options and pick what fits your goals.

To move from analysis to action, which of these would you like next?
- Narrow the list to the top options for a specific use case (customer support, research assistant, legal, analytics, coding help, etc.)
- A concrete prototype plan (stack, cost/latency estimate, and short experiment outline)
- A runnable example (code) for a chosen pipeline (e.g., RAG with embeddings + vector DB + LLM)
- An evaluation rubric and dataset design to compare approaches

Also tell me any constraints that matter (budget, latency, data sensitivity, freshness) and I’ll tailor the recommendation. Which direction sho

In [36]:
result = count_tokens(
    conversation_bufw,
    {"query": "What was the very specific code you were tasked with remembering?"},
    config={"configurable": {"session_id": "window_example", "k": 4}}
)
print(f"\nResponse: {result}")

get_window_chat_history called with session_id=window_example and k=4
Truncating history from 6 to 4 messages
Spent a total of 2151 tokens

Response: I don’t know — you haven’t given me any specific code in this conversation for me to remember, and I don’t have persistent memory of previous sessions unless you explicitly provide the content again here.

If you meant a code that:
- was shared earlier in this current chat: scroll up or paste it here and I can work with it; or
- was told to me in a prior session: I can’t recall it now (I don’t retain cross-session secrets); or
- is a secret (API key, PIN, verification code): please don’t paste it into chat. Instead I can help you rotate/revoke it or show how to store it safely (password manager, HashiCorp Vault, AWS Secrets Manager, etc.).

Tell me which of these fits and I’ll help either recover/reconstruct the code (if safe to share) or set up secure handling and regeneration procedures.


As we can see, it effectively 'forgot' what we talked about in the first interaction. Let's see what it 'remembers':

In [37]:
# Check what's in memory
bufw_history = window_chat_map["window_example"].messages
print("Buffer Window Memory (last 4 messages):")
for msg in bufw_history:
    role = "Human" if isinstance(msg, HumanMessage) else "AI"
    print(f"\n{role}: {msg.content}")  # Show first 100 chars

Buffer Window Memory (last 4 messages):

Human: What is my aim again?

AI: Your stated aim was: to analyze the different possibilities for integrating large language models with external knowledge — i.e., survey approaches, tradeoffs, data sources, and architectures so you can compare options and pick what fits your goals.

To move from analysis to action, which of these would you like next?
- Narrow the list to the top options for a specific use case (customer support, research assistant, legal, analytics, coding help, etc.)
- A concrete prototype plan (stack, cost/latency estimate, and short experiment outline)
- A runnable example (code) for a chosen pipeline (e.g., RAG with embeddings + vector DB + LLM)
- An evaluation rubric and dataset design to compare approaches

Also tell me any constraints that matter (budget, latency, data sensitivity, freshness) and I’ll tailor the recommendation. Which direction should we take?

Human: What was the very specific code you were tasked with r

We see four messages (two interactions) because we used `k=4`.

On the plus side, we are shortening our conversation length when compared to buffer memory _without_ a window:

In [38]:
# Get window memory content
window_content = "\n".join([msg.content for msg in bufw_history])

print(
    f'Buffer memory conversation length: {len(tokenizer.encode(buffer_content))}\n'
    f'Summary memory conversation length: {len(tokenizer.encode(summary_content))}\n'
    f'Buffer window memory conversation length: {len(tokenizer.encode(window_content))}'
)

Buffer memory conversation length: 5953
Summary memory conversation length: 1182
Buffer window memory conversation length: 374


_Practical Note: We are using `k=4` here for illustrative purposes, in most real world applications you would need a higher value for k._

### More memory types!

Given that we understand memory already, we will present a few more memory types here and hopefully a brief description will be enough to understand their underlying functionality.

#### Windows + Summary Hybrid

The following is a modern LCEL-compatible alternative to `ConversationSummaryBufferMemory`.

**Key feature:** _the conversation summary buffer memory keeps a summary of the earliest pieces of conversation while retaining a raw recollection of the latest interactions._

This combines the benefits of both summary and buffer window memory. Let's implement it:

In [39]:
class ConversationSummaryBufferMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: list[BaseMessage] = Field(default_factory=list)
    llm: ChatOpenAI = Field(default_factory=ChatOpenAI)
    k: int = Field(default_factory=int)

    def __init__(self, llm: ChatOpenAI, k: int):
        super().__init__(llm=llm, k=k)

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add messages to the history, removing any messages beyond
        the last `k` messages and summarizing the messages that we drop.
        """
        existing_summary = None
        old_messages = None

        # See if we already have a summary message
        if len(self.messages) > 0 and isinstance(self.messages[0], SystemMessage):
            existing_summary = self.messages.pop(0)

        # Add the new messages to the history
        self.messages.extend(messages)

        # Check if we have too many messages
        if len(self.messages) > self.k:
            print(
                f">> Found {len(self.messages)} messages, dropping "
                f"oldest {len(self.messages) - self.k} messages.")
            # Pull out the oldest messages...
            old_messages = self.messages[:-self.k]
            # ...and keep only the most recent messages
            self.messages = self.messages[-self.k:]

        if old_messages is None:
            print(">> No old messages to update summary with.")
            # If we have no old_messages, we have nothing to update in summary
            return

        # Construct the summary chat messages
        summary_prompt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(
                "Given the existing conversation summary and the new messages, "
                "generate a new summary of the conversation. Ensure to maintain "
                "as much relevant information as possible."
            ),
            HumanMessagePromptTemplate.from_template(
                "Existing conversation summary:\n{existing_summary}\n\n"
                "New messages:\n{old_messages}"
            )
        ])

        # Format the messages and invoke the LLM
        new_summary = self.llm.invoke(
            summary_prompt.format_messages(
                existing_summary=existing_summary,
                old_messages=old_messages
            )
        )

        # Prepend the new summary to the history
        self.messages = [SystemMessage(content=new_summary.content)] + self.messages

    def clear(self) -> None:
        """Clear the history."""
        self.messages = []

In [40]:
window_chat_map = {}

def get_window_chat_history(session_id: str, llm: ChatOpenAI, k:int) -> ConversationSummaryBufferMessageHistory:
    if session_id not in window_chat_map:
        window_chat_map[session_id] = ConversationSummaryBufferMessageHistory(llm=llm, k=k)
    return window_chat_map[session_id]

In [41]:
pipeline_with_history_and_summary = RunnableWithMessageHistory(
    pipeline,
    get_session_history = get_window_chat_history,
    input_messages_key = "query",
    history_messages_key = "history",
    history_factory_config = [
        ConfigurableFieldSpec(
            id = "session_id",
            annotation = str,
            name = "Session ID",
            description = "The session ID used for the chat history",
            default = "id_default"
        ),
        ConfigurableFieldSpec(
            id = "llm",
            annotation = ChatOpenAI,
            name = "LLM",
            description = "The LLM to use for the conversation summary",
            default = llm
        ),
        ConfigurableFieldSpec(
            id = "k",
            annotation = int,
            name = "k",
            description = "The number of messages to keep in the history",
            default = 4
        )
    ]
)

In [42]:
result = count_tokens(
    pipeline_with_history_and_summary,
    {"query": "Good morning dude. What's popping today?"},
    # config = {"configurable": {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}}
    config = {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}
)
print("Response:\n", result)

>> No old messages to update summary with.
Spent a total of 604 tokens
Response:
 Good morning! Not much — just here, fully caffeinated and ready to help. What’s popping with you?

If you want, I can:
- Give a quick plan for your morning (priorities, a 10-min workout, and a coffee pick)
- Brainstorm ideas (work project, date night, side hustle)
- Draft or polish messages, emails, or social posts
- Generate a 3-song playlist to start the day
- Suggest a quick recipe or meal plan
- Help with code, study plans, or summaries (paste text)
- Tell a joke or do a short creative piece

Which one sounds good, or tell me what you’re up to and I’ll jump in.


In [43]:
result = count_tokens(
    pipeline_with_history_and_summary,
    {"query": "Explain what calesthenics is and how it can be used to improve physical fitness. My name is Sagun and I am 25 years old."},
    # config = {"configurable": {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}}
    config = {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}
)
print("Response:\n", result)

>> No old messages to update summary with.
Spent a total of 1893 tokens
Response:
 Morning Sagun — nice to meet you. You wrote “calesthenics” but the common spelling is calisthenics. Here’s a clear, practical rundown of what calisthenics is and how you can use it to improve your fitness at 25.

What calisthenics is
- Bodyweight training that uses your own weight (and sometimes simple tools like a pull-up bar or resistance bands) to build strength, endurance, mobility, and coordination.
- Focuses on natural movement patterns: pushing, pulling, hinging/squatting, lunging and core/bracing.
- Ranges from basic (push‑ups, squats, planks) to advanced skills (muscle‑ups, planche, pistol squats).

Benefits
- Strength and muscular endurance (compound, functional strength)
- Better body control, balance and coordination
- Improved mobility and core stability
- Low cost and highly scalable — you can train anywhere
- Good for fat loss when combined with diet and conditioning work
- Lower joint str

In [44]:
result = count_tokens(
    pipeline_with_history_and_summary,
    {"query": "Okay focus on calestinics for muscle hypertrophy and stength training."},
    # config = {"configurable": {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}}
    config = {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}
)
print("Response:\n", result)

>> No old messages to update summary with.
Spent a total of 3917 tokens
Response:
 Nice — focusing on calisthenics for muscle hypertrophy + strength is a great combo. Below is a practical, actionable guide for you, Sagun, including principles, exercise choices, progressions, and a sample 4-week program you can start with. If you want it tailored to your exact level or schedule, tell me how many days/week you can train and your current ability (e.g., strict pull-ups, push-up count, pistol squats).

Key principles
- Specificity: train heavy, low-rep work for strength and moderate-rep work for hypertrophy. Do both in the same week.
- Progressive overload: increase difficulty by adding reps, better variations, slower tempo, pauses, or adding external load (vest/chain/dumbbell).
- Frequency: hit each major muscle group 2–3x/week for optimal hypertrophy and strength.
- Volume: aim for ~10–20+ effective sets per muscle group per week for hypertrophy (start lower and ramp up).
- Intensity & re

In [45]:
result = count_tokens(
    pipeline_with_history_and_summary,
    {"query": "Okay breaksown the chemical processes involved in muscle hypertrophy and strength training."},
    # config = {"configurable": {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}}
    config = {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}
)
print("Response:\n", result)

>> No old messages to update summary with.
Spent a total of 6039 tokens
Response:
 Nice — Sagun. Below is a focused breakdown of the main chemical and molecular processes that drive muscle hypertrophy and strength gains. I’ll explain what happens at the cellular/molecular level during and after resistance exercise, the key signaling pathways, the role of hormones and nutrients, and how catabolic signals and neural factors fit in. I’ll finish with practical implications you can use for training/nutrition.

1) The three mechanical triggers
- Mechanical tension: heavy load/stretch on muscle fibers creates structural strain that starts mechanotransduction signaling (the dominant stimulus for hypertrophy).
- Muscle damage: microtears and disruption of contractile proteins trigger inflammation and repair/remodeling.
- Metabolic stress: accumulation of metabolites (lactate, H+, inorganic phosphate, ADP) from intense work promotes anabolic signaling and cell swelling.

2) Mechanotransduction —

In [46]:
result = count_tokens(
    pipeline_with_history_and_summary,
    {"query": "This is great. I've recently been struggling with 'love handles' or excess fat around the sides and front of my abdomen and also relatively skinny arms. Can you tailor a stength + hypertrophy based workout focused this area?"},
    # config = {"configurable": {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}}
    config = {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}
)
print("Response:\n", result)

>> No old messages to update summary with.
Spent a total of 8252 tokens
Response:
 Nice — good goals, Sagun. I’ll give you a practical, science-backed plan: a training template you can use to build arm muscle (biceps/triceps/forearms) while attacking overall fat (including love handles) through full‑body resistance work + conditioning and nutrition. Quick reality checks first, then a detailed 8‑week-style program you can start right away.

Short realities
- You can’t “spot reduce” fat from love handles. Fat loss requires an overall calorie deficit and full‑body work. Local core/oblique training will improve strength and shape the area as fat comes off, but won’t directly burn only that fat.
- You can build arm size while losing fat, but progress is slower than doing a dedicated bulk. Best approach for both: high protein, progressive resistance training, and a modest calorie deficit (or body recomposition if you’re relatively new to progressive overload).

Big-picture strategy
- Train f

In [49]:
result = count_tokens(
    pipeline_with_history_and_summary,
    {"query": "Earlier when you were talking about calisthenics you mentioned muscle groups. Can you go into a little bit of detail about that."},
    # config = {"configurable": {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}}
    config = {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}
)
print("Response:\n", result)
# print(window_chat_map["example_with_summary_and_window_history"])

>> No old messages to update summary with.
Spent a total of 9796 tokens
Response:
 Nice — good follow-up, Sagun. Below is a practical breakdown of the main muscle groups you’ll be training with calisthenics, what each group does, key calisthenics exercises that hit them, useful progression options, coaching cues, and programming notes (rep ranges, volume, common errors, and mobility needs). Use this as a reference when you pick exercises and build sessions.

Overview — movement patterns (quick)
- Push (horizontal & vertical) — chest, triceps, anterior deltoids
- Pull (horizontal & vertical) — lats, rhomboids, biceps, posterior deltoids
- Hinge/single-leg — hamstrings, glutes
- Squat — quads, glutes, adductors
- Core/anti-rotation & anti-extension — rectus abdominis, obliques, transverse abdominis, erector spinae
- Shoulder stabilizers/scapular control — serratus anterior, traps, rotator cuff

1) Chest (pectoralis major & minor)
- Primary function: horizontal adduction of the arm (bring

In [50]:
result = count_tokens(
    pipeline_with_history_and_summary,
    {"query": "It's still a bit hard to visualize. Can you make a diagram that shows the different muscle groups?"},
    # config = {"configurable": {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}}
    config = {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}
)
print("Response:\n", result)

>> No old messages to update summary with.
Spent a total of 10928 tokens
Response:
 Good—I’ll make this visual and practical. Below are two simple diagrams (front and back) you can scan quickly, followed by a legend that names each numbered muscle group and lists the best calisthenics exercises that target it. If you want a cleaner graphic file (SVG/PNG) I can generate an SVG you can download and open in a browser — tell me and I’ll create it.

Front view (numbered)
      [H]
       O
      /|\
  [2]-/ | \-[3]
     /  |  \
    [1] | [4]
     |  |  |
    [10]| [11]
     |  |  |
    [12] [13]
     |      |
    / \    / \
   /   \  /   \
  [14] [15] [16]

Back view (numbered)
       O
      /|\
  [2]-/ | \-[3]
     /  |  \
    [5] | [6]
     |  |  |
   [9]  |  [8]
     |  |  |
    [12] [13]
     |      |
    / \    / \
   /   \  /   \
  [14] [15] [16]

Legend (numbers → muscle group + top calisthenics exercises)
- [1] Pectoralis major (chest)
  - Push-ups (incline → standard → decline), r

In [51]:
result = count_tokens(
    pipeline_with_history_and_summary,
    {"query": "Yes make an SVG."},
    # config = {"configurable": {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}}
    config = {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}
)
print("Response:\n", result)

>> No old messages to update summary with.
Spent a total of 13790 tokens
Response:
 Here’s an SVG you can open in a browser or save as a .svg file. It shows a simplified front (left) and back (right) figure with numbered markers and a legend mapping each number to the muscle group plus 1–2 example calisthenics exercises.

To use:
- Copy everything below into a text file and save it as muscles.svg
- Open with a browser (Chrome/Firefox) or an image viewer that supports SVG

<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" width="1100" height="1400" viewBox="0 0 1100 1400">
  <style>
    .title { font: bold 28px sans-serif; fill: #111; }
    .panel { font: 18px sans-serif; fill: #222; }
    .legend-title { font: bold 16px sans-serif; fill: #111; }
    .legend { font: 14px sans-serif; fill: #111; }
    .num { font: bold 12px sans-serif; fill: white; text-anchor: middle; dominant-baseline: central; }
    .marker { fill: #d9534f; stroke: #b73736; stroke-width: 2

In [52]:
result = count_tokens(
    pipeline_with_history_and_summary,
    {"query": "I copied the svg html. Make a script in Python that can open an svg with the earlier given html. I should be able to run the python script and a window should pop up where I can view the svg html."},
    config = {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}
)
print("Response:\n ", result)

>> Found 18 messages, dropping oldest 2 messages.
Spent a total of 18447 tokens
Response:
  Below is a ready-to-run Python script that opens the SVG you pasted earlier. It tries to use pywebview (a lightweight native web-view wrapper) so a simple window pops up with the SVG rendered. If pywebview is not available it falls back to opening the SVG in your default web browser.

1) Install pywebview (optional, recommended for an in-app window):
   pip install pywebview

2) Save the script below as view_svg.py and run:
   python view_svg.py

Script (save exactly, run with Python 3.7+):


```python
#!/usr/bin/env python3
"""
view_svg.py

Open an embedded SVG in a window. Tries to use pywebview for an in-app window.
Falls back to writing a temporary .svg file and opening it in the default browser.

Usage:
    python view_svg.py
"""

import os
import sys
import tempfile
import webbrowser

# Paste your SVG content here as a single triple-quoted string.
# I used the SVG content you copied earlie

In [54]:
import os
import sys
import tempfile
import webbrowser

# Paste your SVG content here as a single triple-quoted string.
# I used the SVG content you copied earlier.
SVG_CONTENT = """<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" width="1100" height="1400" viewBox="0 0 1100 1400">
  <style>
    .title { font: bold 28px sans-serif; fill: #111; }
    .panel { font: 18px sans-serif; fill: #222; }
    .legend-title { font: bold 16px sans-serif; fill: #111; }
    .legend { font: 14px sans-serif; fill: #111; }
    .num { font: bold 12px sans-serif; fill: white; text-anchor: middle; dominant-baseline: central; }
    .marker { fill: #d9534f; stroke: #b73736; stroke-width: 2; }
    .figure-stroke { stroke: #333; stroke-width: 4; fill: none; stroke-linecap: round; stroke-linejoin: round; }
    .figure-fill { fill: #f7f7f7; stroke: #555; stroke-width: 2; }
    .box { fill: #fff; stroke: #ccc; stroke-width: 1; }
  </style>

  <rect x="10" y="10" width="1080" height="1380" class="box" rx="10" ry="10"/>

  <text x="50" y="50" class="title">Calisthenics Muscle Map — Front & Back (numbered)</text>
  <text x="50" y="80" class="panel">Front view (left) and back view (right). Legend below maps numbers to muscle groups and example calisthenics exercises.</text>

  <!-- LEFT: Front figure -->
  <g transform="translate(50,120)">
    <text x="140" y="-10" class="panel">Front</text>
    <!-- head -->
    <circle cx="140" cy="60" r="30" class="figure-fill"/>
    <circle cx="140" cy="60" r="30" class="figure-stroke"/>
    <!-- torso -->
    <rect x="100" y="90" width="80" height="160" rx="30" ry="30" class="figure-fill"/>
    <rect x="100" y="90" width="80" height="160" rx="30" ry="30" class="figure-stroke"/>
    <!-- arms -->
    <path d="M100 110 L40 180" class="figure-stroke"/>
    <path d="M180 110 L240 180" class="figure-stroke"/>
    <!-- legs -->
    <path d="M120 250 L100 360" class="figure-stroke"/>
    <path d="M160 250 L180 360" class="figure-stroke"/>

    <!-- Markers front -->
    <!-- 1 Pectoralis -->
    <circle cx="140" cy="140" r="18" class="marker"/>
    <text x="140" y="140" class="num">1</text>

    <!-- 2 Anterior/medial deltoids (shoulder) -->
    <circle cx="92" cy="110" r="14" class="marker"/>
    <text x="92" y="110" class="num">2</text>

    <!-- 3 Lateral/posterior deltoid (front-right shoulder area for labeling consistency) -->
    <circle cx="188" cy="110" r="14" class="marker"/>
    <text x="188" y="110" class="num">3</text>

    <!-- 4 Biceps (front upper arm left) -->
    <circle cx="72" cy="160" r="12" class="marker"/>
    <text x="72" y="160" class="num">4</text>

    <!-- 10 Serratus (side of ribcage) -->
    <circle cx="112" cy="170" r="12" class="marker"/>
    <text x="112" y="170" class="num">10</text>

    <!-- 11 Obliques (side abs) -->
    <circle cx="170" cy="170" r="12" class="marker"/>
    <text x="170" y="170" class="num">11</text>

    <!-- 12 Rectus abdominis (front abs) -->
    <circle cx="140" cy="200" r="14" class="marker"/>
    <text x="140" y="200" class="num">12</text>

    <!-- 13 Transverse abdominis / deep core (lower front) -->
    <circle cx="140" cy="230" r="12" class="marker"/>
    <text x="140" y="230" class="num">13</text>

    <!-- 14 Glutes (left leg top) -->
    <circle cx="120" cy="300" r="12" class="marker"/>
    <text x="120" y="300" class="num">14</text>

    <!-- 15 Quadriceps (right leg) -->
    <circle cx="160" cy="300" r="12" class="marker"/>
    <text x="160" y="300" class="num">15</text>

    <!-- 16 Hamstrings/calves (right lower leg) -->
    <circle cx="175" cy="350" r="12" class="marker"/>
    <text x="175" y="350" class="num">16</text>
  </g>

  <!-- RIGHT: Back figure -->
  <g transform="translate(350,120)">
    <text x="140" y="-10" class="panel">Back</text>
    <!-- head -->
    <circle cx="140" cy="60" r="30" class="figure-fill"/>
    <circle cx="140" cy="60" r="30" class="figure-stroke"/>
    <!-- torso -->
    <rect x="100" y="90" width="80" height="160" rx="30" ry="30" class="figure-fill"/>
    <rect x="100" y="90" width="80" height="160" rx="30" ry="30" class="figure-stroke"/>
    <!-- arms -->
    <path d="M100 110 L40 180" class="figure-stroke"/>
    <path d="M180 110 L240 180" class="figure-stroke"/>
    <!-- legs -->
    <path d="M120 250 L100 360" class="figure-stroke"/>
    <path d="M160 250 L180 360" class="figure-stroke"/>

    <!-- Markers back -->
    <!-- 5 Trapezius & upper back -->
    <circle cx="140" cy="100" r="16" class="marker"/>
    <text x="140" y="100" class="num">5</text>

    <!-- 6 Latissimus dorsi -->
    <circle cx="188" cy="150" r="16" class="marker"/>
    <text x="188" y="150" class="num">6</text>

    <!-- 8 Rhomboids / mid-traps -->
    <circle cx="105" cy="150" r="12" class="marker"/>
    <text x="105" y="150" class="num">8</text>

    <!-- 9 Erector spinae (lower back) -->
    <circle cx="140" cy="200" r="14" class="marker"/>
    <text x="140" y="200" class="num">9</text>

    <!-- 13 Transverse abdominis / deep core (back side reference) -->
    <circle cx="140" cy="230" r="12" class="marker"/>
    <text x="140" y="230" class="num">13</text>

    <!-- 12 Rectus abdominis (lower front/back reference near pelvis) -->
    <circle cx="140" cy="260" r="12" class="marker"/>
    <text x="140" y="260" class="num">12</text>

    <!-- 14 Glutes -->
    <circle cx="120" cy="300" r="12" class="marker"/>
    <text x="120" y="300" class="num">14</text>

    <!-- 15 Quadriceps (front leg reference on back view) -->
    <circle cx="160" cy="300" r="12" class="marker"/>
    <text x="160" y="300" class="num">15</text>

    <!-- 16 Hamstrings/calves -->
    <circle cx="175" cy="350" r="12" class="marker"/>
    <text x="175" y="350" class="num">16</text>
  </g>

  <!-- Legend box -->
  <g transform="translate(50,520)">
    <rect x="0" y="0" width="1000" height="820" rx="8" ry="8" fill="#ffffff" stroke="#ddd" />
    <text x="18" y="30" class="legend-title">Legend — number : muscle group — example calisthenics exercises</text>

    <text x="18" y="60" class="legend">1 — Pectoralis major (chest): push-ups (incline → standard → decline), ring/parallel dips</text>
    <text x="18" y="90" class="legend">2 — Anterior / medial deltoids (front/mid shoulder): pike push-ups, handstand push-up progressions</text>
    <text x="18" y="120" class="legend">3 — Lateral / posterior deltoids (side/rear shoulder): band face pulls, rear-delt rows, Y/T/W</text>
    <text x="18" y="150" class="legend">4 — Biceps: chin-ups (supinated), ring curls, slow negatives</text>
    <text x="18" y="180" class="legend">5 — Trapezius & upper back: scapular pull-ups, shrugs (loaded carries), face pulls</text>
    <text x="18" y="210" class="legend">6 — Latissimus dorsi (lats): pull-ups, wide grip pull-ups, weighted pull-ups, muscle-up work</text>
    <text x="18" y="240" class="legend">8 — Rhomboids / mid-traps: inverted rows, ring rows, horizontal pulling emphasizing scapular retraction</text>
    <text x="18" y="270" class="legend">9 — Erector spinae (lower back): single-leg RDL, back extensions, supermans</text>

    <text x="18" y="310" class="legend">10 — Serratus anterior (side ribcage): push-up plus, scapular protraction on rings</text>
    <text x="18" y="340" class="legend">11 — Obliques (side abs): side plank dips, hanging oblique raises, windshield wipers</text>
    <text x="18" y="370" class="legend">12 — Rectus abdominis (front abs): hollow body holds, planks, hanging leg raises</text>
    <text x="18" y="400" class="legend">13 — Transverse abdominis / deep core: Pallof press, bracing drills, dead-bug</text>

    <text x="18" y="440" class="legend">14 — Gluteus maximus (glutes): single-leg hip thrusts, glute bridges, deep squats</text>
    <text x="18" y="470" class="legend">15 — Quadriceps (quads): squats, Bulgarian split squats, pistol progressions</text>
    <text x="18" y="500" class="legend">16 — Hamstrings & calves: Nordic hamstring curls, single-leg RDL, calf raises</text>

    <text x="18" y="540" class="legend-title">Programming notes</text>
    <text x="18" y="570" class="legend">- Pair push and pull volume to avoid imbalance.</text>
    <text x="18" y="595" class="legend">- Train major groups 2–3×/week for hypertrophy. Mix heavy (3–6 reps) and moderate (6–12) sets.</text>
    <text x="18" y="620" class="legend">- Core work: mix anti-extension (plank/hollow), anti-rotation (Pallof/side-plank) and dynamic hanging leg raises.</text>
    <text x="18" y="645" class="legend">- Include scapular/shoulder prehab (band face pulls, scapular pull-ups) for healthy pressing and pulling.</text>

    <text x="18" y="700" class="legend">Tip: Save or print this SVG and use it while choosing exercises—tap the numbered targets to find suitable progressions.</text>
  </g>

  <text x="50" y="1360" class="panel">If you want a different style (more anatomical accuracy, color-coded muscles, or a printable PDF), tell me which format and I’ll generate it.</text>
</svg>
"""

def open_with_pywebview(html_content: str):
    try:
        import webview  # pywebview
    except Exception as e:
        raise RuntimeError("pywebview is not available") from e

    # Create a simple HTML wrapper so the browser knows it's SVG content and scales nicely
    html_wrapper = f"""<!doctype html>
<html>
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>Calisthenics Muscle Map</title>
    <style>html,body{{height:100%;margin:0;background:#f3f4f6}} svg{{display:block;margin:auto;max-width:100%;height:auto}}</style>
  </head>
  <body>
    {html_content}
  </body>
</html>"""

    # Create and show window
    webview.create_window("Calisthenics Muscle Map", html=html_wrapper, width=1000, height=800)
    webview.start()

def open_in_browser(svg_text: str):
    # Write SVG to temporary file and open it in default browser
    tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".svg", prefix="calisthenics_")
    try:
        tmp.write(svg_text.encode("utf-8"))
        tmp.flush()
        tmp.close()
        path = os.path.abspath(tmp.name)
        url = "file://" + path
        print("Opening SVG in default web browser:", url)
        webbrowser.open(url)
    except Exception:
        # Cleanup on error
        try:
            os.unlink(tmp.name)
        except Exception:
            pass
        raise

def main():
    # Try pywebview first for an embedded window
    try:
        open_with_pywebview(SVG_CONTENT)
    except Exception as e:
        print("pywebview not available or failed to start:", e)
        print("Falling back to opening the SVG in your default web browser.")
        open_in_browser(SVG_CONTENT)

if __name__ == "__main__":
    main()

pywebview not available or failed to start: pywebview is not available
Falling back to opening the SVG in your default web browser.
Opening SVG in default web browser: file://C:\Users\rishi\AppData\Local\Temp\calisthenics_9f08d99j.svg


In [55]:
result = count_tokens(
    pipeline_with_history_and_summary,
    {"query": "What is my age to the power of the number of letters in my name? Also when my browser pulls up the temporary svg file I'm getting this error. How to fix: "
    "This page contains the following errors:"
    "error on line 17 at column 70: xmlParseEntityRef: no name"},
    # config = {"configurable": {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}}
    config = {"session_id": "example_with_summary_and_window_history", "llm": llm, "k": 16}
)
print("Response:\n", result)

>> Found 18 messages, dropping oldest 2 messages.
Spent a total of 21591 tokens
Response:
 Two quick answers:

1) The math
- Your age = 25, your name "Sagun" has 5 letters.  
- 25^5 = 9,765,625.

2) The SVG XML error and how to fix it
- The browser error "xmlParseEntityRef: no name" almost always means there is a raw ampersand (&) somewhere inside text in the SVG. In XML (and SVG) you must escape ampersands as &amp; (and also escape < and > if they appear in text nodes).
- In the SVG you pasted there is an unescaped "&" (for example in the title "Front & Back"). That produces the parsing error.

Two simple fixes:
A) Manual: open the .svg file in a text editor and replace any "&" used in text with "&amp;" (e.g., change "Front & Back" → "Front &amp; Back").

B) Safer automated fix in Python: sanitize ampersands that are not already part of an entity before writing the temp file. Below is an updated version of the viewing script that performs that sanitization and then opens the SVG in yo

In [58]:
print(window_chat_map["example_with_summary_and_window_history"])

System: Conversation so far:

- User opened casually (“Good morning dude. What's popping today?”). Assistant replied upbeat, offered several options (short morning plan, brainstorming, drafting/polishing messages, 3‑song playlist, quick recipe/meal plan, help with code/study/summaries, joke/creative piece) and asked which option the user wants or to describe what they’re up to.

- User then asked: “Explain what calesthenics is and how it can be used to improve physical fitness. My name is Sagun and I am 25 years old.”

- Assistant replied (addressing Sagun, and noting the spelling is usually “calisthenics”) with a detailed, practical overview covering:
  - Definition and focus (bodyweight training, movement patterns)
  - Benefits (strength, coordination, mobility, low cost, scalability)
  - Key training principles (progressive overload, specificity, consistency, technique, recovery)
  - Core exercise categories and examples (push, pull, legs, core, full‑body/skill)
  - How to structure

In the chat history above, you can see how the SystemMessage crafted encapsulated the jist of the conversation's earliest parts while leaving in the 16 most recent messages verbatim. This allowed the model to answer with memory persistence while keeping the total amount of tokens low.

In [59]:
# Get window memory content
window_and_summary__content = "\n".join([msg.content for msg in window_chat_map["example_with_summary_and_window_history"].messages])

print(
    f'Buffer memory conversation length: {len(tokenizer.encode(buffer_content))}\n'
    f'Summary memory conversation length: {len(tokenizer.encode(summary_content))}\n'
    f'Buffer window memory conversation length: {len(tokenizer.encode(window_content))}'
    f'\nWindow and Summary memory conversation length: {len(tokenizer.encode(window_and_summary__content))}'
)

Buffer memory conversation length: 5953
Summary memory conversation length: 1182
Buffer window memory conversation length: 374
Window and Summary memory conversation length: 17219


Window and Summary memory looks big because we haven't used the other ones for as long and with the same complexity of questions, but by the nature of its design, the space complexity of buffer memory will far exceed window and summary.

## What else can we do with memory?

There are several cool things we can do with memory in langchain:
* Implement our own custom memory modules (as we've done above)
* Use multiple memory modules in the same chain
* Combine agents with memory and other tools
* Integrate knowledge graphs

