# What is LangChain?

**LangChain** is a framework that helps you build AI applications using Large Language Models (LLMs) in a fast, stable and structured way.

*In short*: LangChain = A toolkit that helps you connect LLMs with data + logic + external tools to create a complete AI App.

# When to use LangChain

Large Language Models (LLMs), despite their impressive capabilities, fundamentally behave as **“text → text” functions**. By themselves, they cannot:
- access external data,
- store long-term memory,
- execute multi-step workflows,
- call tools or APIs,
- maintain state, or
- carry out reliable, complex logic.

Engineering teams quickly discovered that pure LLM usage is enough for demos, but **insufficient for real products** such as enterprise chatbots, document-grounded Q&A systems, data-analysis assistants, or automated workflows.

This gap is precisely where **LangChain** emerged: a framework designed to extend LLMs into practical, production-grade systems.

### Connecting LLMs to External Data

LangChain acts as a middleware layer between the model and the real world. It allows LLMs to read:
- documents
- databases
- APIs
- vector stores
- computational tools

This capability is essential for *RAG (Retrieval-Augmented Generation)*, where applications must provide accurate, up-to-date, contextual information instead of relying on the model’s internal guesses.

### Multi-Step Workflows

LLMs do not inherently understand step-by-step procedures or stateful tasks. LangChain provides:
- Chains for simple multi-step logic
- LangGraph for complex workflows such as:
    - branching
    - looping
    - retrying
    - validation
    - deterministic state machines

These are the building blocks of AI pipelines, document processing systems, and multi-stage reasoning assistants.

### Agents and Tool-Use Capabilities

LLMs cannot decide on their own: when to use a tool, which tool to select, or how to integrate tool outputs.

LangChain’s Agents introduce this capability. It can help the model interact with multiple type of tool or even creating an working environment that can simulate user computer.

Whenever you want an AI assistant that can act, not just respond, agents become essential.

# Getting Started with LangChain

In [None]:
!uv pip install langchain-core langchain-openai langchain-community

[2mUsing Python 3.9.23 environment at: /mnt/c/Working/WORKING/github/Artificial-Intelligent-Skills/.venv[0m
[2K[2mResolved [1m55 packages[0m [2min 2.17s[0m[0m                                        [0m
[2K[37m⠙[0m [2mPreparing packages...[0m (0/26)                                                  
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/26)-------------[0m[0m     0 B/4.85 KiB            [1A
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/26)---------[2m[0m[0m 4.85 KiB/4.85 KiB           [1A
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/26)---------[2m[0m[0m 4.85 KiB/4.85 KiB           [1A
[2mmypy-extensions     [0m [32m------------------------------[2m[0m[0m 4.85 KiB/4.85 KiB
[2K[2A[37m⠙[0m [2mPreparing packages...[0m (0/26)-------------[0m[0m     0 B/14.91 KiB           [2A
[2mmypy-extensions     [0m [32m------------------------------[2m[0m[0m 4.85 KiB/4.85 KiB
[2K[2A[37m⠙[0m [2mPreparing packages...[0m (0/26)-

In [8]:
!uv pip install scikit-image matplotlib

[2mUsing Python 3.9.23 environment at: /mnt/c/Working/WORKING/github/Artificial-Intelligent-Skills/.venv[0m
[2K[2mResolved [1m19 packages[0m [2min 775ms[0m[0m                                        [0m
[2K[37m⠙[0m [2mPreparing packages...[0m (0/5)                                                   
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/5)--------------[0m[0m     0 B/11.81 KiB           [1A
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/5)--------------[0m[0m     0 B/11.81 KiB           [1A
[2mlazy-loader         [0m [32m[2m------------------------------[0m[0m     0 B/11.81 KiB
[2K[2A[37m⠙[0m [2mPreparing packages...[0m (0/5)--------------[0m[0m     0 B/1.57 MiB            [2A
[2mlazy-loader         [0m [32m------------------------------[2m[0m[0m 11.81 KiB/11.81 KiB
[2K[2A[37m⠙[0m [2mPreparing packages...[0m (0/5)--------------[0m[0m     0 B/1.57 MiB            [2A
[2mlazy-loader         [0m [32m--------------------

In this example, we will introduce LangChain, building a simple LLM-powered assistant. We'll provide examples for both OpenAI's `gpt-4o-mini` *and* Meta's `llama3.2` via Ollama!

### Initializing OpenAI's gpt-4o-mini

We start by initializing our LLM. We will use OpenAI's `gpt-4o-mini` model, if you need an API key you can get one from [OpenAI's website](https://platform.openai.com/settings/organization/api-keys).

In [None]:
import os
from getpass import getpass
from os import getenv


openai_model_auth = "openai/gpt-4o-mini"
openai_model = "openai/gpt-oss-120b:free"
os.environ["OPENROUTER_API_KEY"] = os.getenv("OPENROUTER_API_KEY") or getpass(
    "Enter OpenRouter API Key: "
)

In [None]:
from langchain_openai import ChatOpenAI

# For normal accurate responses
llm = ChatOpenAI(temperature=0.0, model=openai_model, api_key=getenv("OPENROUTER_API_KEY"), base_url="https://openrouter.ai/api/v1")

# For unique creative responses
creative_llm = ChatOpenAI(temperature=0.9, model=openai_model, api_key=getenv("OPENROUTER_API_KEY"), base_url="https://openrouter.ai/api/v1")

We will be taking an `article` _draft_ and using LangChain to generate various useful items around this article. We'll be creating:

1. An article title
2. An article description
3. Editor advice where we will insert an additional paragraph in the article
4. A thumbnail / hero image for our article.

Here we input our article to start with. Currently this is using an article from the Aurelio AI learning page.

In [52]:
article = """
\
We believe AI's short—to mid-term future belongs to agents and that the long-term future of *AGI* may evolve from agentic systems. Our definition of agents covers any neuro-symbolic system in which we merge neural AI (such as an LLM) with semi-traditional software.

With agents, we allow LLMs to integrate with code — allowing AI to search the web, perform math, and essentially integrate into anything we can build with code. It should be clear the scope of use cases is phenomenal where AI can integrate with the broader world of software.

In this introduction to AI agents, we will cover the essential concepts that make them what they are and why that will make them the core of real-world AI in the years to come.

---

## Neuro-Symbolic Systems

Neuro-symbolic systems consist of both neural and symbolic computation, where:

- Neural refers to LLMs, embedding models, or other neural network-based models.
- Symbolic refers to logic containing symbolic logic, such as code.

Both neural and symbolic AI originate from the early philosophical approaches to AI: connectionism (now neural) and symbolism. Symbolic AI is the more traditional AI. Diehard symbolists believed they could achieve true AGI via written rules, ontologies, and other logical functions.

The other camp were the connectionists. Connectionism emerged in 1943 with a theoretical neural circuit but truly kicked off with Rosenblatt's perceptron paper in 1958 [1][2]. Both of these approaches to AI are fascinating but deserve more time than we can give them here, so we will leave further exploration of these concepts for a future chapter.

Most important to us is understanding where symbolic logic outperforms neural-based compute and vice-versa.

| Neural | Symbolic |
| --- | --- |
| Flexible, learned logic that can cover a huge range of potential scenarios. | Mostly hand-written rules which can be very granular and fine-tuned but hard to scale. |
| Hard to interpret why a neural system does what it does. Very difficult or even impossible to predict behavior. | Rules are written and can be understood. When unsure why a particular ouput was produced we can look at the rules / logic to understand. |
| Requires huge amount of data and compute to train state-of-the-art neural models, making it hard to add new abilities or update with new information. | Code is relatively cheap to write, it can be updated with new features easily, and latest information can often be added often instantaneously. |
| When trained on broad datasets can often lack performance when exposed to unique scenarios that are not well represented in the training data. | Easily customized to unique scenarios. |
| Struggles with complex computations such as mathematical operations. | Perform complex computations very quickly and accurately. |

Pure neural architectures struggle with many seemingly simple tasks. For example, an LLM *cannot* provide an accurate answer if we ask it for today's date.

Retrieval Augmented Generation (RAG) is commonly used to provide LLMs with up-to-date knowledge on a particular subject or access to proprietary knowledge.

### Giving LLMs Superpowers

By 2020, it was becoming clear that neural AI systems could not perform tasks symbolic systems typically excelled in, such as arithmetic, accessing structured DB data, or making API calls. These tasks require discrete input parameters that allow us to process them reliably according to strict written logic.

In 2022, researchers at AI21 developed Jurassic-X, an LLM-based "neuro-symbolic architecture." Neuro-symbolic refers to merging the "neural computation" of large language models (LLMs) with more traditional (i.e. symbolic) computation of code.

Jurassic-X used the Modular Reasoning, Knowledge, and Language (MRKL) system [3]. The researchers developed MRKL to solve the limitations of LLMs, namely:

- Lack of up-to-date knowledge, whether that is the latest in AI or something as simple as today's date.
- Lack of proprietary knowledge, such as internal company docs or your calendar bookings.
- Lack of reasoning, i.e. the inability to perform operations that traditional software is good at, like running complex mathematical operations.
- Lack of ability to generalize. Back in 2022, most LLMs had to be fine-tuned to perform well in a specific domain. This problem is still present today but far less prominent as the SotA models generalize much better and, in the case of MRKL, are able to use tools relatively well (although we could certainly take the MRKL solution to improve tool use performance even today).

MRKL represents one of the earliest forms of what we would now call an agent; it is an LLM (neural computation) paired with executable code (symbolic computation).

## ReAct and Tools

There is a misconception in the broader industry that an AI agent is an LLM contained within some looping logic that can generate inputs for and execute code functions. This definition of agents originates from the huge popularity of the ReAct agent framework and the adoption of a similar structure with function/tool calling by LLM providers such as OpenAI, Anthropic, and Ollama.

![ReAct agent flow with the Reasoning-Action loop [4]. When the action chosen specifies to use a normal tool, the tool is used and the observation returned for another iteration through the Reasoning-Action loop. To return a final answer to the user the LLM must choose action "answer" and provide the natural language response, finishing the loop.](/images/posts/ai-agents/ai-agents-00.png)

<small>ReAct agent flow with the Reasoning-Action loop [4]. When the action chosen specifies to use a normal tool, the tool is used and the observation returned for another iteration through the Reasoning-Action loop. To return a final answer to the user the LLM must choose action "answer" and provide the natural language response, finishing the loop.</small>

Our "neuro-symbolic" definition is much broader but certainly does include ReAct agents and LLMs paired with tools. This agent type is the most common for now, so it's worth understanding the basic concept behind it.

The **Re**ason **Act**ion (ReAct) method encourages LLMs to generate iterative *reasoning* and *action* steps. During *reasoning,* the LLM describes what steps are to be taken to answer the user's query. Then, the LLM generates an *action,* which we parse into an input to some executable code, which we typically describe as a tool/function call.

![ReAct method. Each iteration includes a Reasoning step followed by an Action (tool call) step. The Observation is the output from the previous tool call. During the final iteration the agent calls the answer tool, meaning we generate the final answer for the user.](/images/posts/ai-agents/ai-agents-01.png)

<small>ReAct method. Each iteration includes a Reasoning step followed by an Action (tool call) step. The Observation is the output from the previous tool call. During the final iteration the agent calls the answer tool, meaning we generate the final answer for the user.</small>

Following the reason and action steps, our action tool call returns an observation. The logic returns the observation to the LLM, which is then used to generate subsequent reasoning and action steps.

The ReAct loop continues until the LLM has enough information to answer the original input. Once the LLM reaches this state, it calls a special *answer* action with the generated answer for the user.

## Not only LLMs and Tool Calls

LLMs paired with tool calling are powerful but far from the only approach to building agents. Using the definition of neuro-symbolic, we cover architectures such as:

- Multi-agent workflows that involve multiple LLM-tool (or other agent structure) combinations.
- More deterministic workflows where we may have set neural model-tool paths that may fork or merge as the use case requires.
- Embedding models that can detect user intents and decide tool-use or LLM selection-based selection in vector space.

These are just a few high-level examples of alternative agent structures. Far from being designed for niche use cases, we find these alternative options to frequently perform better than the more common ReAct or Tool agents. We will cover all of these examples and more in future chapters.

---

Agents are fundamental to the future of AI, but that doesn't mean we should expect that future to come from agents in their most popular form today. ReAct and Tool agents are great and handle many simple use cases well, but the scope of agents is much broader, and we believe thinking beyond ReAct and Tools is key to building future AI.

---

You can sign up for the [Aurelio AI newsletter](https://b0fcw9ec53w.typeform.com/to/w2BDHVK7) to stay updated on future releases in our comprehensive course on agents.

---

## References

[1] The curious case of Connectionism (2019) [https://www.degruyter.com/document/doi/10.1515/opphil-2019-0018/html](https://www.degruyter.com/document/doi/10.1515/opphil-2019-0018/html)

[2] F. Rosenblatt, [The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain](https://www.ling.upenn.edu/courses/cogs501/Rosenblatt1958.pdf) (1958), Psychological Review

[3] E. Karpas et al. [MRKL Systems: A Modular, Neuro-Symbolic Architecture That Combines Large Language Models, External Knowledge Sources and Discrete Reasoning](https://arxiv.org/abs/2205.00445) (2022), AI21 Labs
"""

### Preparing our Prompts

LangChain comes with several prompt classes and methods for organizing or constructing our prompts. We will cover these in more detail in later examples, but for now we'll cover the essentials that we need here.

Prompts for chat agents are at a minimum broken up into three components, those are:

* System prompt: this provides the instructions to our LLM on how it must behave, what it's objective is, etc.

* User prompt: this is a user written input.

* AI prompt: this is the AI generated output. When representing a conversation, previous generations will be inserted back into the next prompt and become part of the broader _chat history_.

```
You are a helpful AI assistant, you will do XYZ.    | SYSTEM PROMPT

User: Hi, what is the capital of Australia?         | USER PROMPT
AI: It is Canberra                                  | AI PROMPT
User: When is the best time to visit?               | USER PROMPT
```

LangChain provides us with _templates_ for each of these prompt types. By using templates we can insert different inputs to the template, modifying the prompt based on the provided inputs.

Let's initialize our system and user prompt first:

In [None]:
from langchain.prompts import SystemMessagePromptTemplate, HumanMessagePromptTemplate, ChatPromptTemplate

# Defining the system prompt (how the AI should act)
system_prompt = SystemMessagePromptTemplate.from_template(
    "You are an AI assistant that helps generate article titles."
)

# the user prompt is provided by the user, in this case however the only dynamic
# input is the article
user_prompt = HumanMessagePromptTemplate.from_template(
    """You are tasked with creating a name for a article.
The article is here for you to examine: 

---

{article}

---

The name should be based of the context of the article.
Be creative, but make sure the names are clear, catchy,
and relevant to the theme of the article.

Only output the article name, no other explanation or
text can be provided.""",
    input_variables=["article"]
)

We can display what our formatted human prompt would look like after inserting a value into the `article` parameter:

In [54]:
user_prompt.format(article="TEST STRING")

HumanMessage(content='You are tasked with creating a name for a article.\nThe article is here for you to examine: \n\n---\n\nTEST STRING\n\n---\n\nThe name should be based of the context of the article.\nBe creative, but make sure the names are clear, catchy,\nand relevant to the theme of the article.\n\nOnly output the article name, no other explanation or\ntext can be provided.', additional_kwargs={}, response_metadata={})

We have our system and user prompts, we can merge both into our full chat prompt using the `ChatPromptTemplate`:

In [None]:
first_prompt = ChatPromptTemplate.from_messages([system_prompt, user_prompt])

By default, the `ChatPromptTemplate` will read the `input_variables` from each of the prompt templates inserted and allow us to use those input variables when formatting the full chat prompt template:

In [56]:
print(first_prompt.format(article="TEST STRING"))

System: You are an AI assistant that helps generate article titles.
Human: You are tasked with creating a name for a article.
The article is here for you to examine: 

---

TEST STRING

---

The name should be based of the context of the article.
Be creative, but make sure the names are clear, catchy,
and relevant to the theme of the article.

Only output the article name, no other explanation or
text can be provided.


`ChatPromptTemplate` also prefixes each individual message with it's role, ie `System:`, `Human:`, or `AI:`.

We can chain together our `first_prompt` template and the `llm` object we defined earlier to create a simple LLM chain. This chain will perform the steps **prompt formatting > llm generation > get output**.

We'll be using **L**ang**C**hain **E**xpression **L**anguage (LCEL) to construct our chain. This syntax can look a little strange but we will cover it in detail later in the course. For now, all we need to know is that we define our inputs with the first dictionary segment (ie `{"article": lambda x: x["article"]}`) and then we use the pipe operator (`|`) to say that the output from the left of the pipe will be fed into the input to the right of the pipe.

In [57]:
chain_one = (
    {"article": lambda x: x["article"]}
    | first_prompt
    | creative_llm
    | {"article_title": lambda x: x.content}
)

Our first chain creates the article title, note: we can run all of these individually...

In [58]:
article_title_msg = chain_one.invoke({"article": article})
article_title_msg

{'article_title': '"Unlocking AI\'s Potential: The Rise of Neuro-Symbolic Agents"'}

But we will actually chain this step with multiple other `LLMChain` steps. So, to continue, our next step is to summarize the article using both the `article` and newly generated `article_title` values, from which we will output a new `summary` variable:

In [59]:
second_user_prompt = HumanMessagePromptTemplate.from_template(
    """You are tasked with creating a description for
the article. The article is here for you to examine:

---

{article}

---

Here is the article title '{article_title}'.

Output the SEO friendly article description. Do not output
anything other than the description.""",
    input_variables=["article", "article_title"]
)

second_prompt = ChatPromptTemplate.from_messages([
    system_prompt,
    second_user_prompt
])

In [60]:
chain_two = (
    {
        "article": lambda x: x["article"],
        "article_title": lambda x: x["article_title"]
    }
    | second_prompt
    | llm
    | {"summary": lambda x: x.content}
)

In [61]:
article_description_msg = chain_two.invoke({
    "article": article,
    "article_title": article_title_msg["article_title"]
})
article_description_msg

{'summary': "Explore the transformative future of AI with neuro-symbolic agents, where neural networks and symbolic logic converge. This article delves into the essential concepts of AI agents, their capabilities, and the groundbreaking integration of large language models (LLMs) with traditional software. Discover how these innovative systems enhance AI's functionality, enabling complex tasks and real-world applications. Join us as we uncover the potential of neuro-symbolic architectures and their role in shaping the next generation of artificial intelligence."}

The third step will consume our first `article` variable and provide several output fields, focusing on helping the user improve a part of their writing. As we are outputting multiple fields we can specify for the LLM to use structured outputs, keeping the generated fields aligned with our requirements.

In [62]:
third_user_prompt = HumanMessagePromptTemplate.from_template(
    """You are tasked with creating a new paragraph for the
article. The article is here for you to examine:

---

{article}

---

Choose one paragraph to review and edit. During your edit
ensure you provide constructive feedback to the user so they
can learn where to improve their own writing.""",
    input_variables=["article"]
)

# prompt template 3: creating a new paragraph for the article
third_prompt = ChatPromptTemplate.from_messages([
    system_prompt,
    third_user_prompt
])

We create a pydantic object describing the output format we need. This format description is then passed to our model using the `with_structured_output` method:

In [63]:
from pydantic import BaseModel, Field

class Paragraph(BaseModel):
    original_paragraph: str = Field(description="The original paragraph")
    edited_paragraph: str = Field(description="The improved edited paragraph")
    feedback: str = Field(description=(
        "Constructive feedback on the original paragraph"
    ))

structured_llm = creative_llm.with_structured_output(Paragraph)

Now we put all of this together in another chain:

In [64]:
# chain 3: inputs: article / output: article_para
chain_three = (
    {"article": lambda x: x["article"]}
    | third_prompt
    | structured_llm
    | {
        "original_paragraph": lambda x: x.original_paragraph,
        "edited_paragraph": lambda x: x.edited_paragraph,
        "feedback": lambda x: x.feedback
    }
)

In [65]:
out = chain_three.invoke({"article": article})
out

{'original_paragraph': 'Most important to us is understanding where symbolic logic outperforms neural-based compute and vice-versa.',
 'edited_paragraph': 'It is crucial for us to comprehend the specific scenarios in which symbolic logic excels compared to neural-based computation, and vice versa.',
 'feedback': 'The original sentence is clear but could be more engaging and precise. Instead of "Most important to us is understanding," use a more active construction like "It is crucial for us to comprehend." This not only sharpens the focus but also enhances the overall readability. Additionally, replacing \'outperform\' with \'excels\' adds a more positive connotation.'}

### Generate Image

In [66]:
from langchain_community.utilities.dalle_image_generator import DallEAPIWrapper
from langchain_core.prompts import PromptTemplate

image_prompt = PromptTemplate(
    input_variables=["article"],
    template=(
        "Generate a prompt with less then 500 characters to generate an image "
        "based on the following article: {article}"
    )
)

The `generate_and_display` function will generate the article image once we have the prompt from our image prompt.

In [67]:
from skimage import io
import matplotlib.pyplot as plt
from langchain_core.runnables import RunnableLambda

def generate_and_display_image(image_prompt):
    image_url = DallEAPIWrapper(model = "dall-e-3").run(image_prompt)
    image_data = io.imread(image_url)

    # And update the display code to:
    plt.imshow(image_data)
    plt.axis('off')
    plt.show()

# we wrap this in a RunnableLambda for use with LCEL
image_gen_runnable = RunnableLambda(generate_and_display_image)

In [68]:
# chain 3: inputs: article / output: article_para
chain_three = (
    {"article": lambda x: x["article"]}
    | third_prompt
    | structured_llm
    | {
        "original_paragraph": lambda x: x.original_paragraph,
        "edited_paragraph": lambda x: x.edited_paragraph,
        "feedback": lambda x: x.feedback
    }
)

In [69]:
out = chain_three.invoke({"article": article})
out

{'original_paragraph': 'In this introduction to AI agents, we will cover the essential concepts that make them what they are and why that will make them the core of real-world AI in the years to come.',
 'edited_paragraph': 'This introduction to AI agents will explore the fundamental concepts that define their functionality and discuss their potential to become the cornerstone of practical AI applications in the future.',
 'feedback': "The original paragraph is straightforward but lacks engagement and specificity. Enhancing the vocabulary can boost its impact. The phrase 'core of real-world AI' is vague, so specifying 'cornerstone of practical AI applications' adds clarity. Consider varying sentence structure to maintain reader interest."}

In [70]:
from langchain_community.utilities.dalle_image_generator import DallEAPIWrapper
from langchain_core.prompts import PromptTemplate

image_prompt = PromptTemplate(
    input_variables=["article"],
    template=(
        "Generate a prompt with less then 500 characters to generate an image "
        "based on the following article: {article}"
    )
)

The `generate_and_display` function will generate the article image once we have the prompt from our image prompt.

In [None]:
from skimage import io
import matplotlib.pyplot as plt
from langchain_core.runnables import RunnableLambda

def generate_and_display_image(image_prompt):
    print(image_prompt)
    image_url = DallEAPIWrapper(model = "dall-e-3").run(image_prompt)
    image_data = io.imread(image_url)

    # And update the display code to:
    plt.imshow(image_data)
    plt.axis('off')
    plt.show()

# we wrap this in a RunnableLambda for use with LCEL
image_gen_runnable = RunnableLambda(generate_and_display_image)

We have all of our image generation components ready, we chain them together again with LCEL:

In [74]:
# chain 4: inputs: article, article_para / outputs: new_suggestion_article
chain_four = (
    {"article": lambda x: x["article"]}
    | image_prompt
    | llm
    | (lambda x: x.content)
    | image_gen_runnable
)

And now, we `invoke` our final chain:

In [None]:
chain_four.invoke({"article": article})

# Chat Memory

Memory Buffer is a mechanism that stores the conversation history between the user and the AI assistant. It acts as a short-term memory, allowing the language model to access previous messages in the conversation.

There is 4 main type of Memory:
- Buffer Memory: Storing all conversation in raw text
- Summary Memory: Storing an overall context of a conversation
- Window Memory: Storing N closest chat
- Entity Memory: Storing key memory

### Buffer Memory

`ConversationBufferMemory` is the simplest form of conversational memory, it is literally just a place that we store messages, and then use to feed messages into our LLM.

Let's start with LangChain's original `ConversationBufferMemory` object, we are setting `return_messages = True` to return the messages as a list of `ChatMessage` objects — unless using a non-chat model we would always set this to `True` as without it the messages are passed as a direct string which can lead to unexpected behavior from chat LLMs.

In [None]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(return_messages = True)

There are several ways that we can add messages to our memory, using the `save_context` method we can add a user query (via the `input` key) and the AI's response (via the `output` key).

For example, we will create a conversation and save it directly

In [None]:
memory.save_context(
    {"input": "Hi, my name is Dat"},  # user message
    {"output": "Hey Dat, what's up? I'm an AI model."}  # AI response
)
memory.save_context(
    {"input": "I'm researching the different types of conversational memory."},  # user message
    {"output": "That's interesting, what type do you want to see?"}  # AI response
)
memory.save_context(
    {"input": "I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory."},  # user message
    {"output": "That's interesting, ConversationBufferMemory is the simplest form of conversational memory in LangChain. It stores the entire conversation history as a buffer, allowing the LLM to access all previous messages for context. It is useful for chatbots and agents that need to remember the full conversation. Beside The Window version will limited to what model need to remember instead"}  # AI response
)
memory.save_context(
    {"input": "Buffer memory just stores the entire conversation, right?"},  # user message
    {"output": "Yes"}  # AI response
)
memory.save_context(
    {"input": "Buffer window memory stores the last k messages, dropping the rest."},  # user message
    {"output": "Also right!"}  # AI response
)

Before using the memory, we need to load in any variables for that memory type — in this case, there are none, so we just pass an empty dictionary:

In [None]:
memory.load_memory_variables({})

With that, we've created our buffer memory. Before feeding it into our LLM let's quickly view the alternative method for adding messages to our memory. With this other method, we pass individual user and AI messages via the add_user_message and add_ai_message methods. To reproduce what we did above, we do:

In [None]:
memory = ConversationBufferMemory(return_messages=True)

memory.chat_memory.add_user_message("Hello, my name is Dat.")
memory.chat_memory.add_ai_message("Hey Dat, what's up? I'm an AI model.")
memory.chat_memory.add_user_message("I'm researching the different types of conversational memory.")
memory.chat_memory.add_ai_message("That's interesting, what type do you want to see?")
memory.chat_memory.add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
memory.chat_memory.add_ai_message("That's interesting, ConversationBufferMemory is the simplest form of conversational memory in LangChain. It stores the entire conversation history as a buffer, allowing the LLM to access all previous messages for context. It is useful for chatbots and agents that need to remember the full conversation. Beside The Window version will limited to what model need to remember instead")
memory.chat_memory.add_user_message("Buffer memory just stores the entire conversation, right?")
memory.chat_memory.add_ai_message("Yes")
memory.chat_memory.add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
memory.chat_memory.add_ai_message("Also right!")

memory.load_memory_variables({})

The outcome is exactly the same in either case. To pass this onto our LLM, we need to create a `ConversationChain` object — which is already deprecated in favor of the `RunnableWithMessageHistory` class, which we will cover in a moment.

In [None]:
from langchain.chains import ConversationChain

chain = ConversationChain(
    llm = llm,
    memory = memory,
    verbose = True
)

In [None]:
chain.invoke({"input": "what is my name again?"})

##### `RunnableWithMessageHistory` in Buffer

The `ConversationBufferMemory` type is due for deprecation. Instead, we can use the `RunnableWithMessageHistory` class to implement the same functionality.

When implementing `RunnableWithMessageHistory` we will use LangChain Expression Language (LCEL) and for this we need to define our prompt template and LLM components. Our llm has already been defined, so now we just define a `ChatPromptTemplate` object.


In [None]:
from langchain.prompts import MessagesPlaceholder

system_prompt = "You are a helpful AI assistant."

prompt_template = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(system_prompt),
    MessagesPlaceholder(variable_name = "history"),
    HumanMessagePromptTemplate.from_template("{query}"),
])

We can link our `prompt_template` and our `llm` together to create a pipeline via LCEL.

In [None]:
pipeline = prompt_template | llm

Our `RunnableWithMessageHistory` requires our pipeline to be wrapped in a `RunnableWithMessageHistory` object. This object requires a few input parameters. One of those is `get_session_history`, which requires a function that returns a `ChatMessageHistory` object based on a session ID. We define this function ourselves:

In [None]:
from langchain_core.chat_history import InMemoryChatMessageHistory

chat_map = {}
def get_chat_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = InMemoryChatMessageHistory()
    return chat_map[session_id]

We also need to tell our runnable which variable name to use for the chat history (ie `history`) and which to use for the user's query (ie `query`).

In [None]:
from langchain_core.runnables.history import RunnableWithMessageHistory

pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history = get_chat_history,
    input_messages_key = "query",
    history_messages_key = "history"
)

In [None]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is Dat"},
    config={"session_id": "id_123"}
)

In [None]:
pipeline_with_history.invoke(
    {"query": "What is my name again?"},
    config={"session_id": "id_123"}
)

### Window Memory

The `ConversationBufferWindowMemory` type is similar to `ConversationBufferMemory`, but only keeps track of the last `k` messages. There are a few reasons why we would want to keep only the last `k` messages:
- More messages mean more tokens are sent with each request, more tokens increases latency and cost.
- LLMs tend to perform worse when given more tokens, making them more likely to deviate from instructions, hallucinate, or "forget" information provided to them. Conciseness is key to high performing LLMs.
- If we keep all messages we will eventually hit the LLM's context window limit, by adding a window size `k` we can ensure we never hit this limit.

The buffer window solves many problems that we encounter with the standard buffer memory, while still being a very simple and intuitive form of conversational memory.


In [None]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k = 4, return_messages = True)

In [None]:
memory.chat_memory.add_user_message("Hello, my name is Dat.")
memory.chat_memory.add_ai_message("Hey Dat, what's up? I'm an AI model.")
memory.chat_memory.add_user_message("I'm researching the different types of conversational memory.")
memory.chat_memory.add_ai_message("That's interesting, what type do you want to see?")
memory.chat_memory.add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
memory.chat_memory.add_ai_message("That's interesting, ConversationBufferMemory is the simplest form of conversational memory in LangChain. It stores the entire conversation history as a buffer, allowing the LLM to access all previous messages for context. It is useful for chatbots and agents that need to remember the full conversation. Beside The Window version will limited to what model need to remember instead")
memory.chat_memory.add_user_message("Buffer memory just stores the entire conversation, right?")
memory.chat_memory.add_ai_message("Yes")
memory.chat_memory.add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
memory.chat_memory.add_ai_message("Also right!")

memory.load_memory_variables({})

As before, we use the `ConversationChain` object (again, this is deprecated and we will rewrite it with `RunnableWithMessageHistory` in a moment).

In [None]:
chain = ConversationChain(
    llm = llm,
    memory = memory,
    verbose = True
)

In [None]:
chain.invoke({"input": "what is my name again?"})

The reason our LLM can no longer remember our name is because we have set the `k` parameter to 4, meaning that only the last messages are stored in memory, as we can see above this does not include the first message where we introduced ourselves.

Based on the agent forgetting our name, we might wonder why we would ever use this memory type compared to the standard buffer memory. Well, as with most things in AI, it is always a trade-off. Here we are able to support much longer conversations, use less tokens, and improve latency — but these come at the cost of forgetting non-recent messages.

##### `RunnableWithMessageHistory` in Window Buffer

To implement this memory type using the `RunnableWithMessageHistory` class, we can use the same approach as before.

In [4]:
from pydantic import BaseModel, Field
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import BaseMessage

class BufferWindowMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: list[BaseMessage] = Field(default_factory=list)
    k: int = Field(default_factory = int)

    def __init__(self, k: int):
        super().__init__(k = k)
        print(f"Initializing BufferWindowMessageHistory with k={k}")

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add messages to the history, removing any messages beyond
        the last `k` messages.
        """
        self.messages.extend(messages)
        self.messages = self.messages[-self.k:]

    def clear(self) -> None:
        """Clear the history."""
        self.messages = []

In [None]:
chat_map = {}
def get_chat_history(session_id: str, k: int = 4) -> BufferWindowMessageHistory:
    print(f"get_chat_history called with session_id={session_id} and k={k}")
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = BufferWindowMessageHistory(k = k)
    # remove anything beyond the last
    return chat_map[session_id]

In [None]:
from langchain_core.runnables import ConfigurableFieldSpec

pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history = get_chat_history,
    input_messages_key = "query",
    history_messages_key = "history",
    history_factory_config = [
        ConfigurableFieldSpec(
            id = "session_id",
            annotation = str,
            name = "Session ID",
            description = "The session ID to use for the chat history",
            default = "id_default",
        ),
        ConfigurableFieldSpec(
            id = "k",
            annotation = int,
            name = "k",
            description = "The number of messages to keep in the history",
            default = 4,
        )
    ]
)



Now we invoke our runnable, this time passing a `k` parameter via the config parameter.

In [None]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is Dat"},
    config = {"configurable": {"session_id": "id_k4", "k": 4}}
)

We can also modify the messages that are stored in memory by modifying the records inside the chat_map dictionary directly.

In [None]:
chat_map["id_k4"].clear()  # clear the history

chat_map["id_k4"].add_user_message("Hello, my name is Dat.")
chat_map["id_k4"].add_ai_message("Hey Dat, what's up? I'm an AI model.")
chat_map["id_k4"].add_user_message("I'm researching the different types of conversational memory.")
chat_map["id_k4"].add_ai_message("That's interesting, what type do you want to see?")
chat_map["id_k4"].add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
chat_map["id_k4"].add_ai_message("That's interesting, ConversationBufferMemory is the simplest form of conversational memory in LangChain. It stores the entire conversation history as a buffer, allowing the LLM to access all previous messages for context. It is useful for chatbots and agents that need to remember the full conversation. Beside The Window version will limited to what model need to remember instead")
chat_map["id_k4"].add_user_message("Buffer memory just stores the entire conversation, right?")
chat_map["id_k4"].add_ai_message("Yes")
chat_map["id_k4"].add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
chat_map["id_k4"].add_ai_message("Also right!")

chat_map["id_k4"].messages  # should contain only the last 4 messages

In [None]:
# Test again
pipeline_with_history.invoke(
    {"query": "what is my name again?"},
    config={"configurable": {"session_id": "id_k4", "k": 4}}
)

Now let's initialize a new session with further `k`

In [None]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is James"},
    config = {"session_id": "id_k14", "k": 14}
)

In [None]:
chat_map["id_k4"].add_user_message("Hello, my name is Dat.")
chat_map["id_k4"].add_ai_message("Hey Dat, what's up? I'm an AI model.")
chat_map["id_k4"].add_user_message("I'm researching the different types of conversational memory.")
chat_map["id_k4"].add_ai_message("That's interesting, what type do you want to see?")
chat_map["id_k4"].add_user_message("I've been looking at ConversationBufferMemory and ConversationBufferWindowMemory.")
chat_map["id_k4"].add_ai_message("That's interesting, ConversationBufferMemory is the simplest form of conversational memory in LangChain. It stores the entire conversation history as a buffer, allowing the LLM to access all previous messages for context. It is useful for chatbots and agents that need to remember the full conversation. Beside The Window version will limited to what model need to remember instead")
chat_map["id_k4"].add_user_message("Buffer memory just stores the entire conversation, right?")
chat_map["id_k4"].add_ai_message("Yes")
chat_map["id_k4"].add_user_message("Buffer window memory stores the last k messages, dropping the rest.")
chat_map["id_k4"].add_ai_message("Also right!")

In [None]:
chat_map["id_k14"].messages  # should contain all messages since k = 14

### Summary Memory

This memory type keeps track of a summary of the conversation rather than the entire conversation. This is useful for long conversations where we don't need to keep track of the entire conversation, but we do want to keep some thread of the full conversation.

In [None]:
from langchain.memory import ConversationSummaryMemory

memory = ConversationSummaryMemory(llm = llm)

Unlike with the previous memory types, we need to provide an llm to initialize `ConversationSummaryMemory`. The reason for this is that we need an LLM to generate the conversation summaries.

Beyond this small tweak, using `ConversationSummaryMemory` is the same as with our previous memory types when using the deprecated `ConversationChain` object.


In [None]:
chain = ConversationChain(
    llm = llm,
    memory = memory,
    verbose = True
)

In [None]:
chain.invoke({"input": "hello there my name is Dat"})
chain.invoke({"input": "i am researching different types of AI model"})
chain.invoke({"input": "what have we talked about so far?"})

As this information was stored in the summary the LLM successfully recalled our name. This may not always be the case, by summarizing the conversation we inevitably compress the full amount of information and so we may lose key details occasionally. Nonetheless, this is a great memory type for long conversations while retaining some key information.

##### `RunnableWithMessageHistory` in Summary

As with the window buffer memory, we need to define a custom implementation of the `InMemoryChatMessageHistory` class. We'll call this one `ConversationSummaryMessageHistory`.

In [None]:
from langchain_core.messages import SystemMessage


class ConversationSummaryMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: list[BaseMessage] = Field(default_factory = list)
    llm: ChatOpenAI = Field(default_factory = ChatOpenAI)

    def __init__(self, llm: ChatOpenAI):
        super().__init__(llm = llm)

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add messages to the history, removing any messages beyond
        the last `k` messages.
        """
        self.messages.extend(messages)
        # construct the summary chat messages
        summary_prompt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(
                "Given the existing conversation summary and the new messages, "
                "generate a new summary of the conversation. Ensuring to maintain "
                "as much relevant information as possible."
            ),
            HumanMessagePromptTemplate.from_template(
                "Existing conversation summary:\n{existing_summary}\n\n"
                "New messages:\n{messages}"
            )
        ])
        # format the messages and invoke the LLM
        new_summary = self.llm.invoke(
            summary_prompt.format_messages(
                existing_summary = self.messages.content,
                messages=[x.content for x in messages]
            )
        )
        # replace the existing history with a single system summary message
        self.messages = [SystemMessage(content = new_summary.content)]

    def clear(self) -> None:
        """Clear the history."""
        self.messages = []

In [None]:
chat_map = {}
def get_chat_history(session_id: str, llm: ChatOpenAI) -> ConversationSummaryMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = ConversationSummaryMessageHistory(llm = llm)
    # return the chat history
    return chat_map[session_id]

In [None]:
pipeline_with_history = RunnableWithMessageHistory(
    pipeline,
    get_session_history = get_chat_history,
    input_messages_key = "query",
    history_messages_key = "history",
    history_factory_config = [
        ConfigurableFieldSpec(
            id = "session_id",
            annotation = str,
            name = "Session ID",
            description = "The session ID to use for the chat history",
            default = "id_default",
        ),
        ConfigurableFieldSpec(
            id = "llm",
            annotation = ChatOpenAI,
            name = "LLM",
            description = "The LLM to use for the conversation summary",
            default = llm,
        )
    ]
)

Now we invoke our runnable, this time passing a `llm` parameter via the `config` parameter.

In [None]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is James"},
    config={"session_id": "id_123", "llm": llm}
)

In [None]:
chat_map["id_123"].messages

Let's continue the conversation and see if the summary is updated:

In [None]:
for msg in [
    "I have been looking for specific information on Large Language Models.",
    "Can you help me understand what they are?",
    "What are some common use cases for LLMs?",
    "How do LLMs handle context in conversations?",
    "What are the limitations of LLMs?",
    "How can I improve the performance of an LLM for my specific application?"
]:
    pipeline_with_history.invoke(
        {"query": msg},
        config={"session_id": "id_123", "llm": llm}
    )

In [None]:
chat_map["id_123"].messages

The information still maintained, let's check again

In [None]:
pipeline_with_history.invoke(
    {"query": "What is my name again?"},
    config={"session_id": "id_123", "llm": llm}
)

### Entity Memory

Entity memory is a more sophisticated type of conversational memory that stores and tracks specific entities mentioned in the conversation. Rather than storing the entire conversation or just a summary, entity memory extracts key information about important entities (people, places, concepts, etc.) and maintains a knowledge graph of these entities.

This is particularly useful when:
- You need to remember specific facts about entities across long conversations
- You want to build up knowledge about entities over time
- You need to recall specific details about entities mentioned earlier in the conversation

For example, if a user mentions "My friend John is a software engineer in Seattle", entity memory would extract and store: John (person) → job: software engineer, location: Seattle.

In [None]:
from langchain.memory import ConversationEntityMemory

memory = ConversationEntityMemory(llm = llm)

Like `ConversationSummaryMemory`, we need to provide an LLM to `ConversationEntityMemory` because it uses the LLM to extract entities and their attributes from the conversation. Let's add some conversations to see how it tracks entities.

In [None]:
memory.save_context(
    {"input": "Hi, my name is Dat and I work as a machine learning engineer."},
    {"output": "Nice to meet you Dat! It's great to connect with an ML engineer. How can I help you today?"}
)
memory.save_context(
    {"input": "I'm currently working on a project with my colleague Sarah, she's a data scientist."},
    {"output": "That sounds like an interesting collaboration. What kind of project are you and Sarah working on?"}
)
memory.save_context(
    {"input": "We're building a recommendation system for an e-commerce platform called TechMart."},
    {"output": "A recommendation system for TechMart sounds like a valuable project! Those can really improve customer experience."}
)
memory.save_context(
    {"input": "Yes, the project is based in Singapore and we're using LangChain for the AI components."},
    {"output": "Singapore is a great tech hub! And LangChain is an excellent choice for building AI applications."}
)

Now let's load the memory variables and see what entities have been extracted. Entity memory stores information about specific entities in a structured format.

In [None]:
memory.load_memory_variables({"input": "What do you know about Dat?"})

Notice that when we query for a specific entity ("Dat"), entity memory returns information about that entity along with the conversation history. We can also check what the entity store looks like directly:

In [None]:
# View all entities stored
memory.entity_store.store

Now let's use this memory with a `ConversationChain` to see how it works in practice.

In [None]:
chain = ConversationChain(
    llm = llm,
    memory = memory,
    verbose = True
)

In [None]:
chain.invoke({"input": "What is Sarah's role?"})

The LLM successfully recalled that Sarah is a data scientist. Entity memory allows the model to retrieve specific information about entities even across long conversations. This is particularly powerful when you need to maintain detailed knowledge about multiple entities.

##### `RunnableWithMessageHistory` in Entity

To implement entity memory using the modern `RunnableWithMessageHistory` approach, we need to create a custom `ConversationEntityMessageHistory` class that tracks entities and their attributes throughout the conversation.

In [None]:
class ConversationEntityMessageHistory(BaseChatMessageHistory, BaseModel):
    messages: list[BaseMessage] = Field(default_factory = list)
    llm: ChatOpenAI = Field(default_factory = ChatOpenAI)
    entity_store: dict = Field(default_factory = dict)
    
    def __init__(self, llm: ChatOpenAI):
        super().__init__(llm = llm)
    
    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add messages to the history and extract entities."""
        self.messages.extend(messages)
        
        # Extract entities from the new messages
        entity_extraction_prompt = ChatPromptTemplate.from_messages([
            SystemMessagePromptTemplate.from_template(
                "You are an entity extraction system. Extract entities (people, places, "
                "organizations, concepts) and their attributes from the conversation. "
                "Return the entities in a structured format: Entity Name: description/attributes."
            ),
            HumanMessagePromptTemplate.from_template(
                "Extract entities from these messages:\n{messages}\n\n"
                "Existing entities:\n{existing_entities}\n\n"
                "Return ONLY new or updated entity information in the format 'Entity: description'."
            )
        ])
        
        # Format existing entities
        existing_entities_str = "\n".join([
            f"{entity}: {info}" for entity, info in self.entity_store.items()
        ]) if self.entity_store else "None"
        
        # Extract entities using the LLM
        entity_response = self.llm.invoke(
            entity_extraction_prompt.format_messages(
                messages = "\n".join([f"{m.type}: {m.content}" for m in messages]),
                existing_entities = existing_entities_str
            )
        )
        
        # Parse and store entities (simplified parsing)
        entity_lines = entity_response.content.strip().split("\n")
        for line in entity_lines:
            if ":" in line:
                entity, description = line.split(":", 1)
                entity = entity.strip()
                description = description.strip()
                if entity and description:
                    # Update or add entity
                    if entity in self.entity_store:
                        self.entity_store[entity] += f" {description}"
                    else:
                        self.entity_store[entity] = description
    
    def clear(self) -> None:
        """Clear the history and entity store."""
        self.messages = []
        self.entity_store = {}

In [None]:
chat_map = {}
def get_chat_history(session_id: str, llm: ChatOpenAI) -> ConversationEntityMessageHistory:
    if session_id not in chat_map:
        # if session ID doesn't exist, create a new chat history
        chat_map[session_id] = ConversationEntityMessageHistory(llm = llm)
    # return the chat history
    return chat_map[session_id]

Now we need to modify our prompt template to include entity information. This will help the LLM access entity data when needed.

In [None]:
entity_prompt_template = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(
        "You are a helpful AI assistant. Use the entity information provided "
        "to answer questions accurately."
    ),
    SystemMessagePromptTemplate.from_template(
        "Known entities:\n{entities}"
    ),
    MessagesPlaceholder(variable_name = "history"),
    HumanMessagePromptTemplate.from_template("{query}"),
])

# Create a custom pipeline that includes entity information
def add_entity_context(inputs):
    session_id = inputs.get("session_id", "default")
    if session_id in chat_map:
        entity_store = chat_map[session_id].entity_store
        entities_str = "\n".join([
            f"- {entity}: {info}" for entity, info in entity_store.items()
        ]) if entity_store else "None"
    else:
        entities_str = "None"
    
    return {
        "query": inputs["query"],
        "entities": entities_str
    }

entity_pipeline = add_entity_context | entity_prompt_template | llm

In [None]:
pipeline_with_history = RunnableWithMessageHistory(
    entity_pipeline,
    get_session_history = get_chat_history,
    input_messages_key = "query",
    history_messages_key = "history",
    history_factory_config = [
        ConfigurableFieldSpec(
            id = "session_id",
            annotation = str,
            name = "Session ID",
            description = "The session ID to use for the chat history",
            default = "id_default",
        ),
        ConfigurableFieldSpec(
            id = "llm",
            annotation = ChatOpenAI,
            name = "LLM",
            description = "The LLM to use for entity extraction",
            default = llm,
        )
    ]
)

Now let's test our entity memory implementation. We'll add some conversations and see how entities are tracked.

In [None]:
pipeline_with_history.invoke(
    {"query": "Hi, my name is Alex and I work as a software engineer at Google.", "session_id": "entity_test"},
    config = {"configurable": {"session_id": "entity_test", "llm": llm}}
)

In [None]:
pipeline_with_history.invoke(
    {"query": "I'm working on a project with my colleague Maria. She's a product manager.", "session_id": "entity_test"},
    config = {"configurable": {"session_id": "entity_test", "llm": llm}}
)

Let's check what entities have been stored:

In [None]:
chat_map["entity_test"].entity_store

Now let's test if the system can recall information about specific entities:

In [None]:
pipeline_with_history.invoke(
    {"query": "What do you know about Maria?", "session_id": "entity_test"},
    config = {"configurable": {"session_id": "entity_test", "llm": llm}}
)

In [None]:
pipeline_with_history.invoke(
    {"query": "Where do I work?", "session_id": "entity_test"},
    config = {"configurable": {"session_id": "entity_test", "llm": llm}}
)

The entity memory successfully tracks and recalls information about specific entities mentioned in the conversation. This approach is particularly useful for:

- **Customer support systems** - Remembering customer details, preferences, and history
- **Personal assistants** - Tracking information about people, places, and things mentioned by the user
- **Knowledge management** - Building a graph of entities and their relationships over time
- **Multi-turn conversations** - Maintaining context about entities across long conversations

Entity memory provides a middle ground between buffer memory (stores everything) and summary memory (compresses everything), by extracting and storing only the key entity information.

# Agent and Tools

Tools are a way augment our LLMs with code execution. A tool is simply a function formatted so that our agent can undertstand how to use it, and then execute it.

We can use the @tool decorator to create an LLM-compatible tool from a standard python function — this function should include a few things for optimal performance:
- A docstring describing what the tool does and when it should be used, this will be read by our LLM/agent and used to decide when to use the tool, and also how to use the tool.
- Clear parameter names that ideally tell the LLM what each parameter is, if it isn't clear we make sure the docstring explains what the parameter is for and how to use it.
- Both parameter and return type annotations.


In [None]:
from langchain_core.tools import tool

@tool
def add(x: float, y: float) -> float:
    """Add 'x' and 'y'."""
    return x + y

@tool
def multiply(x: float, y: float) -> float:
    """Multiply 'x' and 'y'."""
    return x * y

@tool
def exponentiate(x: float, y: float) -> float:
    """Raise 'x' to the power of 'y'."""
    return x ** y

@tool
def subtract(x: float, y: float) -> float:
    """Subtract 'x' from 'y'."""
    return y - x

With the `@tool` decorator our function is turned into a `StructuredTool` object, which we can see below:

In [None]:
add

We can see the tool name, description, and arg schema:

In [None]:
print(f"{add.name=}\n{add.description=}")

In [None]:
add.args_schema.model_json_schema()

In [None]:
exponentiate.args_schema.model_json_schema()

When invoking the tool, a JSON string output by the LLM will be parsed into JSON and then consumed as kwargs, similar to the below:

In [None]:
import json

llm_output_string = "{\"x\": 5, \"y\": 2}"  # this is the output from the LLM
llm_output_dict = json.loads(llm_output_string)  # load as dictionary
llm_output_dict

This is then passed into the tool function as `kwargs` (keyword arguments) as indicated by the `**` operator - the `**` operator is used to unpack the dictionary into keyword arguments.

In [None]:
exponentiate.func(**llm_output_dict)  # call the function with unpacked args

This covers the basics of tools and how they work, let's move on to creating the agent itself.

### Creating an Agent

We need this agent to remember previous interactions within the conversation. To do that, we will use the `ChatPromptTemplate` with a system message, a placeholder for our chat history, a placeholder for the user query, and finally a placeholder for the agent scratchpad.

The agent scratchpad is where the agent will write it's "notes" as it is working through multiple internal thought and tool-use steps to produce a final output to the user.

In [None]:
prompt = ChatPromptTemplate.from_messages([
    ("system", "you're a helpful assistant"),
    MessagesPlaceholder(variable_name = "chat_history"),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

When creating an agent we need to add conversational memory to make the agent remember previous interactions. We'll be using the older `ConversationBufferMemory` class rather than the newer `RunnableWithMessageHistory` — the reason being that we will also be using the older `create_tool_calling_agent` and `AgentExecutor` method and class.

In [None]:
memory = ConversationBufferMemory(
    memory_key = "chat_history",  # must align with MessagesPlaceholder variable_name
    return_messages = True  # to return Message objects
)

In [None]:
from langchain.agents import create_tool_calling_agent

tools = [add, subtract, multiply, exponentiate]

agent = create_tool_calling_agent(
    llm = llm, tools = tools, prompt = prompt
)

Our agent by itself is like one-step of our agent execution loop. So, if we call the `agent.invoke` method it will get the LLM to generate a single response and go no further, so no tools will be executed, and no next iterations will be performed.

We can see this by asking a query that should trigger a tool call:

In [None]:
agent.invoke({
    "input": "what is 10.7 multiplied by 7.68?",
    "chat_history": memory.chat_memory.messages,
    "intermediate_steps": []  # agent will append it's internal steps here
})

Here, we can see the LLM has generated that we should use the multiply tool and the tool input should be `{"x": 10.7, "y": 7.68}`. However, the tool is not executed. For that to happen we need an agent execution loop, which will handle the multiple iterations of generation to tool calling to generation, etc.

We use the `AgentExecutor` class to handle the execution loop:

In [None]:
from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(
    agent = agent,
    tools = tools,
    memory = memory,
    verbose = True
)

Now let's try the same query with the executor, note that the `intermediate_steps` parameter that we added before is no longer needed as the executor handles it internally.

In [None]:
agent_executor.invoke({
    "input": "what is 10.7 multiplied by 7.68?",
    "chat_history": memory.chat_memory.messages,
})

We can see that the multiply tool was invoked, producing the observation of 82.175999.... After the observation was provided, we can see that the LLM then generated a final response of:

```
10.7 multiplied by 7.68 is approximately 82.18.
```

This final response was generated based on the original query and the tool output (ie the observation). We can also confirm that this answer is accurate:

In [5]:
10.7*7.68

82.17599999999999

Let's test our agent with some memory and tool use. First, we tell it our name, then we will perform a few tool calls, then see if the agent can still recall our name.

First, give the agent our name:

In [None]:
agent_executor.invoke({
    "input": "My name is Dat",
    "chat_history": memory
})

Now let's try and get the agent to perform multiple tool calls within a single execution loop:

In [None]:
agent_executor.invoke({
    "input": "What is nine plus 10, minus 4 * 2, to the power of 3",
    "chat_history": memory
})

In [None]:
9+10-(4*2)**3

Perfect, now let's see if the agent can still recall our name:

In [None]:
agent_executor.invoke({
    "input": "What is my name",
    "chat_history": memory
})

For other tool provided by 3rd-party, we can use `load_tools` library which Langchain support

In [None]:
from langchain.agents import load_tools
from IPython.display import display, Markdown

# Define toolfunctions with correct @tool decorator as above
# Invoke the call with tools list passing in
# For better display in Jupyter notebooks, we use Markdown to format

# Agent Executor

When we talk about agents, a significant part of an "agent" is simple code logic, iteratively rerunning LLM calls and processing their output. The exact logic varies significantly, but one well-known example is the ReAct agent.

Reason + Action (ReAct) agents use iterative reasoning and action steps to incorporate chain-of-thought and tool-use into their execution. During the reasoning step, the LLM generates the steps to take to answer the query. Next, the LLM generates the action input, which our code logic parses into a tool call.

Following our action step, we get an observation from the tool call. Then, we feed the observation back into the agent executor logic for a final answer or further reasoning and action steps.

The agent and agent executor we will be building will follow this pattern.

In [None]:
prompt = ChatPromptTemplate.from_messages([
    ("system", (
        "You're a helpful assistant. When answering a user's question "
        "you should first use one of the tools provided. After using a "
        "tool the tool output will be provided in the "
        "'scratchpad' below. If you have an answer in the "
        "scratchpad you should not use any more tools and "
        "instead answer directly to the user."
    )),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

In [None]:
from langchain_core.runnables.base import RunnableSerializable

# define the agent runnable
agent: RunnableSerializable = (
    {
        "input": lambda x: x["input"],
        "chat_history": lambda x: x["chat_history"],
        "agent_scratchpad": lambda x: x.get("agent_scratchpad", [])
    }
    | prompt
    | llm.bind_tools(tools, tool_choice = "any")
)

We invoke the agent with the `invoke` method, passing in the input and chat history.

In [None]:
tool_call = agent.invoke({"input": "What is 10 + 10", "chat_history": []})
tool_call

Because we set `tool_choice = "any"` to force the tool output, the usual `content` field will be empty as that field is used for natural language output, ie the final answer of the LLM. To find our tool output, we need to look at the `tool_calls` field:

In [None]:
tool_call.tool_calls

From here, we have the tool name that our LLM wants to use and the `args` that it wants to pass to that tool. We can see that the tool `add` is being used with the arguments `x = 10` and `y = 10`. The agent.invoke method has not executed the tool function; we need to write that part of the agent code ourselves.

Executing the tool code requires two steps:
- Map the tool name to the tool function.
- Execute the tool function with the generated args.

In [None]:
# create tool name to function mapping
name2tool = {tool.name: tool.func for tool in tools}

Now execute to get our answer:

In [None]:
tool_exec_content = name2tool[tool_call.tool_calls[0]["name"]](
    **tool_call.tool_calls[0]["args"]
)
tool_exec_content

That is our answer and tool execution logic. We feed this back into our LLM via the `agent_scratchpad` placeholder.

In [None]:
from langchain_core.messages import ToolMessage

tool_exec = ToolMessage(
    content=f"The {tool_call.tool_calls[0]['name']} tool returned {tool_exec_content}",
    tool_call_id=tool_call.tool_calls[0]["id"]
)

out = agent.invoke({
    "input": "What is 10 + 10",
    "chat_history": [],
    "agent_scratchpad": [tool_call, tool_exec]
})
out

Despite having the answer in our `agent_scratchpad`, the LLM still tries to use the tool again. This behaviour happens because we bonded the tools to the LLM with `tool_choice = "any"`. When we set `tool_choice` to `"any"` or `"required"`, we tell the LLM that it MUST use a tool, i.e., it cannot provide a final answer.

There's two options to fix this:
- Set `tool_choice = "auto"`to tell the LLM that it can choose to use a tool or provide a final answer.
- Create a `final_answer` tool - we'll explain this shortly.


In [None]:
# Option 1
agent: RunnableSerializable = (
    {
        "input": lambda x: x["input"],
        "chat_history": lambda x: x["chat_history"],
        "agent_scratchpad": lambda x: x.get("agent_scratchpad", [])
    }
    | prompt
    | llm.bind_tools(tools, tool_choice="auto")
)

We'll start from the start again, so `agent_scratchpad` is empty:

In [None]:
tool_call = agent.invoke({"input": "What is 10 + 10", "chat_history": []})
tool_call

In [None]:
tool_output = name2tool[tool_call.tool_calls[0]["name"]](
    **tool_call.tool_calls[0]["args"]
)

tool_exec = ToolMessage(
    content=f"The {tool_call.tool_calls[0]['name']} tool returned {tool_output}",
    tool_call_id=tool_call.tool_calls[0]["id"]
)

out = agent.invoke({
    "input": "What is 10 + 10",
    "chat_history": [],
    "agent_scratchpad": [tool_call, tool_exec]
})
out

We now have the final answer in the content field! This method is perfectly functional; however, we recommend option 2 as it provides more control over the agent's output.

There are several reasons that option 2 can provide more control, those are:
- It removes the possibility of an agent using the direct content field when it is not appropriate; for example, some LLMs (particularly smaller ones) may try to use the content field when using a tool.
- We can enforce a specific structured output in our answers. Structured outputs are handy when we require particular fields for downstream code or multi-part answers. For example, a RAG agent may return a natural language answer and a list of sources used to generate that answer.

To implement option 2, we must create a `final_answer` tool. We will add a `tools_used` field to give our output some structure—in a real-world use case, we probably wouldn't want to generate this field, but it's useful for our example here.


In [None]:
@tool
def final_answer(answer: str, tools_used: list[str]) -> str:
    """Use this tool to provide a final answer to the user.
    The answer should be in natural language as this will be provided
    to the user directly. The tools_used must include a list of tool
    names that were used within the `scratchpad`.
    """
    return {"answer": answer, "tools_used": tools_used}

Our `final_answer` tool doesn't necessarily need to do anything; in this example, we're using it purely to structure our final response. We can now add this tool to our agent:

In [None]:
tools = [final_answer, add, subtract, multiply, exponentiate]

# we need to update our name2tool mapping too
name2tool = {tool.name: tool.func for tool in tools}

agent: RunnableSerializable = (
    {
        "input": lambda x: x["input"],
        "chat_history": lambda x: x["chat_history"],
        "agent_scratchpad": lambda x: x.get("agent_scratchpad", [])
    }
    | prompt
    | llm.bind_tools(tools, tool_choice="any")  # we're forcing tool use again
)

In [None]:
tool_call = agent.invoke({"input": "What is 10 + 10", "chat_history": []})
tool_call.tool_calls

We execute the tool and provide it's output to the agent again:

In [None]:
tool_out = name2tool[tool_call.tool_calls[0]["name"]](
    **tool_call.tool_calls[0]["args"]
)

tool_exec = ToolMessage(
    content=f"The {tool_call.tool_calls[0]['name']} tool returned {tool_out}",
    tool_call_id=tool_call.tool_calls[0]["id"]
)

out = agent.invoke({
    "input": "What is 10 + 10",
    "chat_history": [],
    "agent_scratchpad": [tool_call, tool_exec]
})
out

We see that `content` remains empty because we force tool use. But we now have the `final_answer` tool, which the agent executor passes via the `tool_calls` field:

In [None]:
out.tool_calls[0]["args"]

# LangChains Expression Language