# üó£Ô∏è Conversational RAG using a Chat Message Store

_by [Sebastian Husch Lee](https://www.linkedin.com/in/sebastian-husch-lee) and [Vladimir Blagojevic](https://www.linkedin.com/in/blagojevicvladimir)_

In this notebook, we'll show how to incorporate a conversational history into a RAG pipeline to enable multi-turn conversations with our documents, using our experimental components: `InMemoryChatMessageStore`, `ChatMessageRetriever`, and `ChatMessageWriter`.

## Installation

Install Haystack, `haystack-experimental` and `datasets` with pip:

In [None]:
!pip install -U "haystack-experimental>=0.15.0" datasets

## Enter OpenAI API key

In [29]:
import os
from getpass import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")

## Conversational Pipeline

- Just show basic setup with new components plus Chat Generator to do multi-turn conversations
- Highlight how the `chat_history_id` can be used to manage session histories 

### Create a Chat Message Store

The conversation history is saved as `ChatMessage` objects in a `InMemoryChatMessageStore`. You can retrieve the conversation history from the chat message store using `ChatMessageRetriever`.

To store the conversation history, initialize an `InMemoryChatMessageStore`, a `ChatMessageRetriever` and a `ChatMessageWriter`. Import these components from the [`haystack-experimental`](https://github.com/deepset-ai/haystack-experimental) package:

In [30]:
from haystack_experimental.chat_message_stores.in_memory import InMemoryChatMessageStore
from haystack_experimental.components.retrievers import ChatMessageRetriever
from haystack_experimental.components.writers import ChatMessageWriter

# Chat History components
message_store = InMemoryChatMessageStore()
message_retriever = ChatMessageRetriever(memory_store)
message_writer = ChatMessageWriter(memory_store)

### Build the Pipeline

In [38]:
from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.converters import OutputAdapter
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage


pipeline = Pipeline()

# components for RAG
pipeline.add_component(
    "prompt_builder",
    ChatPromptBuilder(
        template=[
            ChatMessage.from_system("You are a helpful AI assistant that answers users questions."),
            ChatMessage.from_user("{{query}}")
        ],
        required_variables="*"
    )
)
pipeline.add_component("llm", OpenAIChatGenerator())

# components for chat history retrieval and storage
pipeline.add_component("message_retriever", ChatMessageRetriever(memory_store))
pipeline.add_component("message_writer", ChatMessageWriter(memory_store))
pipeline.add_component(
    "message_joiner",
    OutputAdapter(template="{{ prompt + replies }}", output_type=list[ChatMessage], unsafe=True)
)

# connections
pipeline.connect("prompt_builder.prompt", "message_retriever.current_messages")
pipeline.connect("prompt_builder.prompt", "message_joiner.prompt")
pipeline.connect("message_retriever.messages", "llm.messages")
pipeline.connect("llm.replies", "message_joiner.replies")
pipeline.connect("message_joiner", "message_writer.messages")

Unsafe mode is enabled. This allows execution of arbitrary code in the Jinja template. Use this only if you trust the source of the template.


<haystack.core.pipeline.pipeline.Pipeline object at 0x15d3e9b20>
üöÖ Components
  - prompt_builder: ChatPromptBuilder
  - llm: OpenAIChatGenerator
  - message_retriever: ChatMessageRetriever
  - message_writer: ChatMessageWriter
  - message_joiner: OutputAdapter
üõ§Ô∏è Connections
  - prompt_builder.prompt -> message_retriever.current_messages (list[ChatMessage])
  - prompt_builder.prompt -> message_joiner.prompt (list[ChatMessage])
  - llm.replies -> message_joiner.replies (list[ChatMessage])
  - message_retriever.messages -> llm.messages (list[ChatMessage])
  - message_joiner.output -> message_writer.messages (list[ChatMessage])

### Visualize the pipeline

Visualize the pipeline with the [`show()`](https://docs.haystack.deepset.ai/docs/visualizing-pipelines) method to confirm the connections are correct.

In [37]:
# pipeline.show()

### Run the Pipeline

- Test the pipeline with some queries. 
- Ensure that in every request we add a `chat_history_id` parameter so that we know which conversational history we'd like to retrieve and write to.

Here are example queries you can try:

* *Describe Haystack by deepset in a few words.*
* *What do people use it for?*

In [40]:
chat_history_id = "user_123_session_1"

while True:
    question = input("Enter your question or Q to exit.\nüßë ")
    if question == "Q":
        break

    res = pipeline.run(
        data={
            "prompt_builder": {"query": question},
            "message_retriever": {"chat_history_id": chat_history_id},
            "message_writer": {"chat_history_id": chat_history_id},
        },
        include_outputs_from={"llm"}
    )
    print(f'ü§ñ {res["llm"]["replies"][0].text}')

Enter your question or Q to exit.
üßë  Describe Haystack by deepset in a few words.


ü§ñ Open-source framework for document-based semantic search and question answering.


Enter your question or Q to exit.
üßë  What do people use it for?


ü§ñ - Building semantic search over documents (PDFs, Word, HTML, databases) so users can find relevant passages, not just keyword matches.  
- Question answering systems that retrieve relevant contexts and generate concise answers from corpora (RAG pipelines).  
- Chatbots and conversational assistants that answer domain-specific questions using company knowledge bases.  
- Enterprise knowledge bases and internal help desks (employee onboarding, policy lookup, support).  
- Document-based summarization and long-document understanding (extract key points, generate summaries).  
- Information extraction and QA over structured and unstructured content (contracts, medical records, invoices).  
- Prototyping and deploying production retrieval pipelines with vector stores, embeddings, and large language models.  
- Experimenting with and evaluating different retrievers, readers, and rerankers for research and development.


Enter your question or Q to exit.
üßë  Q


### Switching to a New Chat Session
- Now we can update the `chat_history_id` to change to a new chat session with an empty chat history

Here are example queries you can try:

* *When was it published?*
* *When was Haystack the open source framework by deepset published?*

In [41]:
# Update the chat history ID
chat_history_id = "user_123_session_2"

while True:
    question = input("Enter your question or Q to exit.\nüßë ")
    if question == "Q":
        break

    res = pipeline.run(
        data={
            "prompt_builder": {"query": question},
            "message_retriever": {"chat_history_id": chat_history_id},
            "message_writer": {"chat_history_id": chat_history_id},
        },
        include_outputs_from={"llm"}
    )
    print(f'ü§ñ {res["llm"]["replies"][0].text}')

Enter your question or Q to exit.
üßë  How old is it?


ü§ñ I‚Äôm missing what ‚Äúit‚Äù refers to ‚Äî can you tell me what object or thing you mean? 

Here are quick ways to find the age for common items; tell me which one matches and any details (photo, serial/model number, label, date stamp, location, etc.) and I‚Äôll help more precisely:

- Electronics (phones, laptops, appliances): check model/serial number, manufacture date on label, warranty/receipt, or EXIF/firmware info.
- Cars/motorcycles: decode the VIN (manufacture year is embedded) or check title/registration.
- Clothing/shoes: look for care tags, brand seasonal codes, or provenance/receipt.
- Furniture/antiques: maker‚Äôs mark/stamps, construction techniques, joinery, or consult an appraiser.
- Documents/photos: check metadata/EXIF or printed/handwritten dates; paper/ink analysis for older items.
- People/pets: birth certificate, vaccination records, or vet exam/teeth estimate for animals.
- Trees/wood: count rings in a cross-section or use an increment borer; dendrochronology

Enter your question or Q to exit.
üßë  How old is Haystack?


ü§ñ Which "Haystack" do you mean? A few common ones:

- Haystack Rock (the sea stack at Cannon Beach, OR)
- Haystack Observatory (MIT Haystack Observatory in Massachusetts)
- Haystack Mountain (there are several ‚Äî e.g., Vermont, Colorado)
- Haystack (a company, app, or other organization)
- Something else (a person, pet, building, etc.)

Tell me which one or paste a photo/link or any other detail and I‚Äôll give the age or how to find it.


Enter your question or Q to exit.
üßë  Haystack by deepset


ü§ñ Do you mean the open‚Äësource Haystack project from deepset (the NLP/RAI framework)? If so ‚Äî I can show you exactly how to find its creation/release date. Quick summary: Haystack was first published around 2019‚Äì2020, so it‚Äôs roughly 5‚Äì6 years old as of late 2025.

If you want the exact date, run one of these:

- GitHub API (gives repo creation timestamp)
  curl -s https://api.github.com/repos/deepset-ai/haystack | jq .created_at

- Or clone and check the first commit date (shows when development started)
  git clone https://github.com/deepset-ai/haystack.git
  cd haystack
  git log --reverse --format=%ai | head -n1

- Or check the first PyPI release date (if you care about the first published package)
  curl -s https://pypi.org/pypi/farm-haystack/json | jq '.releases | keys[0]'

Tell me if you want me to look up the exact date for you (and whether you mean the GitHub repo, the first PyPI release, or something else) ‚Äî I‚Äôll give the precise age.


Enter your question or Q to exit.
üßë  Q


## Conversational RAG Pipeline

- This is to highlight how we can incorporate RAG into our conversational pipeline

### Create a Document Store and Index Documents

Create an index with [seven-wonders](https://huggingface.co/datasets/bilgeyucel/seven-wonders) dataset:

In [42]:
from haystack import Document
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from datasets import load_dataset

dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
docs = [Document(content=doc["content"], meta=doc["meta"]) for doc in dataset]

document_store = InMemoryDocumentStore()
document_store.write_documents(documents=docs)

151

### Build the Pipeline

- Add components for RAG and chat history retreival and storage to build your pipeline.
- Incorporate an `OutputAdapter` component into your pipeline to handle messages from both the user and the LLM, writing them to the memory store.
- The previous conversation history will be retrieved by `ChatMessageRetriever` from the `InMemoryChatMessageStore` given a `chat_history_id`.

In [43]:
from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.converters import OutputAdapter
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage


pipeline = Pipeline()

# Create the system and user message
system_message = ChatMessage.from_system(
    "You are a helpful AI assistant that answers users questions grounded in a set supporting documents."
)
user_message_template ="""Give a brief answer to the question grounded in the supporting documents.
If question can't be answered from supporting documents, say so.

Supporting documents:
{%- if documents|length > 0 %}
{%- for doc in documents %}
Document [{{ loop.index }}] :
{{ doc.content }}
{% endfor -%}
{%- else %}
No relevant documents found.
{% endif %}

Question: {{query}}
Answer:
"""
user_message = ChatMessage.from_user(user_message_template)

# components for RAG
pipeline.add_component("doc_retriever", InMemoryBM25Retriever(document_store=document_store, top_k=3))
pipeline.add_component(
    "prompt_builder", ChatPromptBuilder(template=[system_message, user_message], required_variables="*")
)
pipeline.add_component("llm", OpenAIChatGenerator())

# components for chat history retrieval and storage
pipeline.add_component("message_retriever", ChatMessageRetriever(memory_store))
pipeline.add_component("message_writer", ChatMessageWriter(memory_store))
pipeline.add_component(
    "message_joiner",
    OutputAdapter(template="{{ prompt + replies }}", output_type=list[ChatMessage], unsafe=True)
)

# connections
pipeline.connect("doc_retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "message_retriever.current_messages")
pipeline.connect("prompt_builder.prompt", "message_joiner.prompt")
pipeline.connect("message_retriever.messages", "llm.messages")
pipeline.connect("llm.replies", "message_joiner.replies")
pipeline.connect("message_joiner", "message_writer.messages")

Unsafe mode is enabled. This allows execution of arbitrary code in the Jinja template. Use this only if you trust the source of the template.


<haystack.core.pipeline.pipeline.Pipeline object at 0x15d7139b0>
üöÖ Components
  - doc_retriever: InMemoryBM25Retriever
  - prompt_builder: ChatPromptBuilder
  - llm: OpenAIChatGenerator
  - message_retriever: ChatMessageRetriever
  - message_writer: ChatMessageWriter
  - message_joiner: OutputAdapter
üõ§Ô∏è Connections
  - doc_retriever.documents -> prompt_builder.documents (list[Document])
  - prompt_builder.prompt -> message_retriever.current_messages (list[ChatMessage])
  - prompt_builder.prompt -> message_joiner.prompt (list[ChatMessage])
  - llm.replies -> message_joiner.replies (list[ChatMessage])
  - message_retriever.messages -> llm.messages (list[ChatMessage])
  - message_joiner.output -> message_writer.messages (list[ChatMessage])

### Run the Pipeline

Test the pipeline with some queries. Here are example queries you can try:

* *What does Rhodes Statue look like?*
* *Who built it?*

In [27]:
chat_history_id = "user_123_session_3"

while True:
    question = input("Enter your question or Q to exit.\nüßë ")
    if question == "Q":
        break

    res = pipeline.run(
        data={
            "doc_retriever": {"query": question},
            "prompt_builder": {"query": question},
            "message_retriever": {"chat_history_id": chat_history_id},
            "message_writer": {"chat_history_id": chat_history_id},
        },
        include_outputs_from={"llm"}
    )
    print(f'ü§ñ {res["llm"]["replies"][0].text}')

Enter your question or Q to exit.
üßë  What does Rhodes Statue look like?


ü§ñ Scholars do not know the Colossus‚Äô full appearance. It represented Helios; the head and face are thought to have had curly hair with evenly spaced bronze or silver spikes (a radiating ‚Äúsun‚Äù crown) like images on contemporary Rhodian coins. Anecdotes of it straddling the harbour lack historical/scientific support.


Enter your question or Q to exit.
üßë  Who built it?


ü§ñ Which monument do you mean?

- Hanging Gardens of Babylon: legend credits Neo‚ÄëBabylonian King Nebuchadnezzar II.  
- Mausoleum at Halicarnassus: it was the tomb of Mausolus ‚Äî construction was begun by/for Mausolus (continued after his death) and, per Vitruvius, built by the architects Satyros and Pytheus (and traditionally finished by his wife).


Enter your question or Q to exit.
üßë  Q


‚ö†Ô∏è If you followed the example queries, you'll notice that the second question was answered incorrectly. This happened because the retrieved documents weren't relevant to the user's query. The retrieval was based on the query "*Who built it?*", which doesn't have enough context to retrieve the relevant documents.

Let's fix this by using an **Agent** equipped with a RAG tool!

## Conversational Agent with a RAG Tool

### Create RAG Tool

In conversational systems, simply pre-pending the chat history to the new user message is not enough to perform RAG effectively. There needs to be a mechanism to rephrase the user's query based on the conversation history to ensure relevant documents are retrieved. For instance, if the first user query is "*What's the first name of Einstein?*" and the second query is "*Where was he born?*", the system should understand that "he" refers to Einstein. The rephrasing mechanism should then modify the second query to "*Where was Einstein born?*" to retrieve the correct documents.

We can use an Agent to call its RAG tool with a rephrased version of the user's query. 

In [11]:
query_rephrase_template = """
Rewrite the question for search while keeping its meaning and key terms intact.
If there is no conversation history, DO NOT change the query.
Use conversation history only if necessary, and avoid extending the query with your own knowledge.
If no changes are needed, output the current question as is.

User Query: {{query}}
Rewritten Query:
"""

### Build the Conversational Agent

In [61]:
from haystack import Pipeline
from haystack.components.agents import Agent
from haystack.components.builders import ChatPromptBuilder, PromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.tools import PipelineTool
from haystack.components.generators.utils import print_streaming_chunk

# Build the RAG Tool
rag_pipeline = Pipeline()

rag_pipeline.add_component(
    "doc_retriever", InMemoryBM25Retriever(document_store=document_store, top_k=3)
)
rag_pipeline.add_component(
    "builder",
    PromptBuilder(
        template="""Supporting documents:
{%- if documents|length > 0 %}
{%- for doc in documents %}
Document [{{ loop.index }}] :
{{ doc.content }}
{% endfor -%}
{%- else %}
No relevant documents found.
{% endif %}""",
        required_variables="*"
    )
)

rag_pipeline.connect("doc_retriever.documents", "builder.documents")

rag_tool = PipelineTool(
    pipeline=rag_pipeline,
    name="rag_tool",
    description="A tool for fetching information on the seven wonders of the ancient world.",
    input_mapping={"query": ["doc_retriever.query"]},
    output_mapping={"builder.prompt": "rag_output"},
)

# Build the Agent
conversational_agent = Pipeline()

conversational_agent.add_component(
    "agent",
    Agent(
        system_prompt="""You are a helpful AI assistant that answers users questions grounded in a set supporting documents.
If any questions are asked about the seven wonders always use the `rag_tool` to fetch supporting documents.
Stay concise in your answers.
""",
        chat_generator=OpenAIChatGenerator(),
        tools=[rag_tool],
        streaming_callback=print_streaming_chunk,
    )
)

# components for chat history storage and retrieval
conversational_agent.add_component("message_retriever", ChatMessageRetriever(memory_store))
conversational_agent.add_component("message_writer", ChatMessageWriter(memory_store))

# connections for Agent
conversational_agent.connect("message_retriever.messages", "agent.messages")
conversational_agent.connect("agent.messages", "message_writer")

<haystack.core.pipeline.pipeline.Pipeline object at 0x15e39af30>
üöÖ Components
  - agent: Agent
  - message_retriever: ChatMessageRetriever
  - message_writer: ChatMessageWriter
üõ§Ô∏è Connections
  - agent.messages -> message_writer.messages (list[ChatMessage])
  - message_retriever.messages -> agent.messages (list[ChatMessage])

### Let's have a conversation üòÄ

Now, run the pipeline with the relevant inputs.

Here are some example queries and follow ups you can try:

* *What does Rhodes Statue look like?* - *Who built it?* - *Did he destroy it?*
* *Where is Gardens of Babylon?* - *When was it built?*

In [62]:
chat_history_id = "user_123_session_4"

while True:
    question = input("Enter your question or Q to exit.\nüßë ")
    if question == "Q":
        break

    conversational_agent.run(
        data={
            "message_retriever": {"current_messages": [ChatMessage.from_user(question)], "chat_history_id": chat_history_id},
            "message_writer": {"chat_history_id": chat_history_id}
        }
    )
    # No need to print the output since we are streaming it

Enter your question or Q to exit.
üßë  What does Rhodes Statue look like?


[TOOL CALL]
Tool: rag_tool 
Arguments: {"query":"Colossus of Rhodes appearance description Helios statue pose torch crown straddling harbour ancient sources"}

[TOOL RESULT]
{'rag_output': 'Supporting documents:\nDocument [1] :\nAlso, the fallen statue would have blocked the harbour, and since the ancient Rhodians did not have the ability to remove the fallen statue from the harbour, it would not have remained visible on land for the next 800 years, as discussed above. Even neglecting these objections, the statue was made of bronze, and engineering analyses indicate that it could not have been built with its legs apart without collapsing under its own weight.[29]\nMany researchers have considered alternative positions for the statue which would have made it more feasible for actual construction by the ancients.[29][30] There is also no evidence that the statue held a torch aloft; the records simply say that after completion, the Rhodians kindled the "torch of freedom". A relief in a ne

Enter your question or Q to exit.
üßë  Who built it?


[TOOL CALL]
Tool: rag_tool 
Arguments: {"query":"Who built the Colossus of Rhodes Chares of Lindos builder 280 BC"}

[TOOL RESULT]
{'rag_output': "Supporting documents:\nDocument [1] :\nThe Colossus of Rhodes (Ancient Greek: ·ΩÅ ŒöŒøŒªŒøœÉœÉ·Ω∏œÇ ·ø¨œåŒ¥ŒπŒøœÇ, romanized:\xa0ho Koloss√≤s Rh√≥dios Greek: ŒöŒøŒªŒøœÉœÉœåœÇ œÑŒ∑œÇ Œ°œåŒ¥ŒøœÖ, romanized:\xa0Koloss√≥s tes Rh√≥dou)[a] was a statue of the Greek sun-god Helios, erected in the city of Rhodes, on the Greek island of the same name, by Chares of Lindos in 280\xa0BC. One of the Seven Wonders of the Ancient World, it was constructed to celebrate the successful defence of Rhodes city against an attack by Demetrius Poliorcetes, who had besieged it for a year with a large army and navy.\nAccording to most contemporary descriptions, the Colossus stood approximately 70 cubits, or 33 metres (108 feet) high ‚Äì approximately the height of the modern Statue of Liberty from feet to crown ‚Äì making it the tallest statue in the ancient world.[

Enter your question or Q to exit.
üßë  Q


‚úÖ Notice that this time, with the help of query rephrasing, we've built a conversational RAG pipeline that can handle follow-up queries and retrieve the relevant documents.