# Chatbot with Memory using LangChain
* Notebook by Adam Lang
* Date: 1/23/2025

# Overview
* In this notebook I will demonstrate how to design and build an LLM powered chatbot with conversational memory using Langchain.
* The concept of "memory" allows a chatbot to cache a user conversation and utilize the data to remember previous conversations.

* This will NOT include:
    * Conversational RAG: a chatbot over an external data source.
    * Agents: chatbot that can take actions.

* **This notebook can serve as a basic "template" that you would need to build simple and more advanced chatbots.**

# Install Dependencies
* You can put this in a `requirements.txt` file or run the cell below.

In [1]:
%%capture
!pip install langchain langchain_community langchain_groq langsmith langchain_core langchain-chroma langchain-huggingface

In [2]:
%%capture
!pip install --upgrade transformers
!pip install --upgrade sentence-transformers

# Setup Environment Variables

In [3]:
import os
from getpass import getpass

GROQ_API_KEY = getpass("Enter your GROQ API Key: ")

Enter your GROQ API Key: ··········


In [4]:
## set GROQ environment
os.environ["GROQ_API_KEY"] = GROQ_API_KEY

# Setup GROQ Open Source LLM

In [5]:
from langchain_groq import ChatGroq

## init GROQ model
model = ChatGroq(model="gemma2-9b-it", groq_api_key=GROQ_API_KEY)
model

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7eae8697ef10>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x7eae866c9c90>, model_name='gemma2-9b-it', model_kwargs={}, groq_api_key=SecretStr('**********'))

# Experiment 1 - without message history
* This experiment will show what happens when there is no build in function for conversational memory used.


In [6]:
from langchain_core.messages import HumanMessage

## invoke model human message
model.invoke([HumanMessage(content="Hi, My name is Tom Brady and I'm the greatest QB of all time!")])


AIMessage(content='Hey Tom! \n\nIt\'s great to meet you.  \n\nWhile many people consider you one of the greatest QBs of all time, the "greatest" title is always subjective and up for debate.  What\'s undeniable is your incredible career, the seven Super Bowl wins, and the countless records you\'ve broken. \n\nYou\'ve definitely earned a place in football history!  \n\nWhat are you working on these days?  \n\n', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 96, 'prompt_tokens': 27, 'total_tokens': 123, 'completion_time': 0.174545455, 'prompt_time': 0.00014059, 'queue_time': 0.01824804, 'total_time': 0.174686045}, 'model_name': 'gemma2-9b-it', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run-86bfb544-17be-4ece-8a03-b6ae3ba6ec37-0', usage_metadata={'input_tokens': 27, 'output_tokens': 96, 'total_tokens': 123})

In [7]:
from langchain_core.messages import AIMessage

## invoke model with AIMessage
model.invoke(
    [
        HumanMessage(content="Hi, My name is Tom Brady and I'm the greatest QB of all time!"),
        AIMessage(content="'That\'s a bold statement, Tom!  \n\nYou\'ve certainly built a strong case for yourself with all your Super Bowl wins, records, and accolades.  The 'GOAT' title is always a hot debate among football fans, though.  \n\nWho do you think your biggest competition for that title is?  \n\n'"),
        HumanMessage(content="Hey what is my name and what do I do?"),
    ]
)

AIMessage(content="You're Tom Brady, and you're widely considered one of the greatest quarterbacks in NFL history!  \n\nYou've won seven Super Bowls, a record for any player, and hold numerous other passing records.  You're known for your incredible accuracy, longevity, and competitive drive.  \n\n\n\n\n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 65, 'prompt_tokens': 116, 'total_tokens': 181, 'completion_time': 0.118181818, 'prompt_time': 0.003655929, 'queue_time': 0.02196207, 'total_time': 0.121837747}, 'model_name': 'gemma2-9b-it', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run-17763519-a7ee-46e8-8f12-145147202586-0', usage_metadata={'input_tokens': 116, 'output_tokens': 65, 'total_tokens': 181})

Summary
* This is the "manual way" to introduce conversational memory into a chatbot.
* The next step we will actually use the components of langchain to make this possible without manual hard coding.

# Message History
* Here we will demonstrate how you can use the `Message History class` in langchain to wrap an LLM model and make it "stateful".
* This will allow the model to keep track of all inputs and outputs to the model and store them in a datastore as the user interacts with the chatbot.
* All future interactions will then load the messages and pass them into the chain as part of the input.

In [8]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

## store session history in dict
store = {}

## store user session history
def get_session_history(session_id: str)->BaseChatMessageHistory:
  """Function to store user session history"""

  ## check if sess_id present, if not then store it
  if session_id not in store:
    store[session_id]=ChatMessageHistory()

  return store[session_id]


## init variable to store message history with model + get_session_history
with_message_history=RunnableWithMessageHistory(model,
                                                get_session_history)


In [9]:
## config for session_id
config={"configurable":{"session_id":"chat1"}}

## Create conversation with history


In [10]:
## invoke a conversation with history
response=with_message_history.invoke(
    [HumanMessage(content="Hi my name is Joe and I am a Data Scientist.")],
    config=config,
)

In [11]:
## invoke response
response.content

"Hello Joe, it's nice to meet you!\n\nAs a large language model, I'm always interested in learning more about people in different fields. What kind of data science work do you do?\n"

In [12]:
## continue conversation
response=with_message_history.invoke(
    [HumanMessage(content="What is my job?")],
    config=config,
)
response.content

"You told me!  You said your name is Joe and you are a Data Scientist.  \n\nIs there anything else you'd like to tell me about your work as a Data Scientist? 😊  Perhaps what kind of projects you're working on or what tools you use?  I'm eager to learn more! \n"

Experiment - change config to a different session_id

In [13]:
### lets try changing the config --> session_id
config1={"configurable":{"session_id":"chat2"}} ## changed session_id!!
response=with_message_history.invoke(
    [HumanMessage(content="What is my name?")],
    config=config1,
)
response.content

"As an AI, I have no memory of past conversations and do not know your name. If you'd like to tell me, I'm happy to learn it! 😊\n"

Summary
* We can see that if we change the session_id, the llm now has no memory of our original conversation.

In [14]:
## change the humanmessage content
response=with_message_history.invoke(
    [HumanMessage(content="Hey My name is Jay")],
    config=config1,
)
response.content

"Hi Jay! It's nice to meet you.  What can I do for you today? 😊  \n"

In [15]:
## now lets test the memory
response=with_message_history.invoke(
    [HumanMessage(content="What is my name?")],
    config=config1,
)
response.content

'Your name is Jay! 😊  I remembered from our last conversation. \n\nIs there anything else I can help you with?\n'

# Prompt Templates

* Prompt Templates allow us to transform user information into a format that an LLM can work with.
* The raw user input is a message which we are passing to the LLM.

* Now we will increase the complexity of this.
  * We will add a `SystemMessage` with custom instructions.
    * We will still take messages as input.
  * Then we will add more input besides just the messages.


In [16]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import SystemMessage
from langchain_core.prompts import MessagesPlaceholder

## create prompt
prompt = ChatPromptTemplate.from_messages(
    [
      SystemMessage(content="You are a helpful chatbot assistant. Answer all questions to the best of your ability."),
      MessagesPlaceholder(variable_name="messages")
    ]
)

## prompt chain
chain = prompt | model

In [17]:
## call/invoke chain
chain.invoke({"messages":[HumanMessage(content="Hi my name is Tom.")]})

AIMessage(content="Hello Tom, it's nice to meet you!  \n\nHow can I help you today? 😊  \n\n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 26, 'prompt_tokens': 32, 'total_tokens': 58, 'completion_time': 0.047272727, 'prompt_time': 0.000337379, 'queue_time': 0.02044788, 'total_time': 0.047610106}, 'model_name': 'gemma2-9b-it', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run-54719f62-9937-46a6-bdc4-53fab9d35147-0', usage_metadata={'input_tokens': 32, 'output_tokens': 26, 'total_tokens': 58})

In [18]:
## invoke with history
with_message_history=RunnableWithMessageHistory(chain, get_session_history)


In [19]:
## create session_id config
config = {"configurable": {"session_id":"chat3"}}
response=with_message_history.invoke(
    [HumanMessage(content="Hi my name is Tom.")],
    config=config,
)
response.content

"Hello Tom! It's nice to meet you. How can I help you today? 😊  \n\n"

In [20]:
## add more complexity -- setup prompt template
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability in {language}."
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)
## setup chain
chain = prompt | model

In [21]:
## invoke response
response = chain.invoke({"messages":[HumanMessage(content="Hi, my name is Tom")], "language":"French"})
response.content

"Bonjour Tom, enchanté de te rencontrer ! Comment puis-je t'aider aujourd'hui ? \n"

## Wrap this in more complicated chain in MessageHistory Class
* This time because there are multiple keys in the input, we need to specify the correct key to use to save the chat history.

In [22]:
## wrap chain in MessageHistory
with_message_history=RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="messages",
)

In [23]:
## set config for session_id
config={"configurable":{"session_id":"chat4"}}
response=with_message_history.invoke(
    {'messages': [HumanMessage(content="Hi, I am Tom")], "language": "French"},
    config=config,
)
response.content

"Bonjour Tom, \n\nEnchanté de faire ta connaissance ! Comment puis-je t'aider aujourd'hui ? \n\n"

In [24]:
## continue conversation with memory
response=with_message_history.invoke(
    {'messages': [HumanMessage(content="What is my name?")], "language": "Chinese"},
    config=config,
)
response.content

'Ton nom est Tom. 😊 \n'

# Manage Conversation History
* An important concept to understand when building chatbots is how to manage the conversation history.
* If you do not manage the history of the conversation, the list of messages will grow unbounded and potentially **overflow the context window of the LLM.**
  * We all know about the phenomenon of "Lost in the middle" where an LLM context window is "stuffed" and is not able to locate the "needle in the haystack" thus we need to manage the context window during conversation history.

* **Thus, it is very important to add a step to limit the sie of the messages we are passing in.**

## What is `trim_messages`
* This will reduce how many messages we are sending to the LLM model.
* This helps us specify number of tokens sent to model, etc..

In [25]:
from langchain_core.messages import SystemMessage, trim_messages

## init the trimmer
trimmer=trim_messages(
    max_tokens=65,
    strategy="last", #last tokens
    token_counter=model,
    include_system=True,
    allow_partial=False,
    start_on="human", #start on human conversation
)

## set of messages
messages = [
    SystemMessage(content="you're a good assistant"),
    HumanMessage(content="hey! I am Bob."),
    AIMessage(content="hi!"),
    HumanMessage(content="I like chocolate ice cream"),
    AIMessage(content="nice"),
    HumanMessage(content="what is 2 + 2"),
    AIMessage(content="4"),
    HumanMessage(content="thanks"),
    AIMessage(content="no problem!"),
    HumanMessage(content="having fun?"),
    AIMessage(content="yes!"),
]
## invoke trimmer
trimmer.invoke(messages)

[SystemMessage(content="you're a good assistant", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='hey! I am Bob.', additional_kwargs={}, response_metadata={}),
 AIMessage(content='hi!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='I like chocolate ice cream', additional_kwargs={}, response_metadata={}),
 AIMessage(content='nice', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='what is 2 + 2', additional_kwargs={}, response_metadata={}),
 AIMessage(content='4', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='thanks', additional_kwargs={}, response_metadata={}),
 AIMessage(content='no problem!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='having fun?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='yes!', additional_kwargs={}, response_metadata={})]

In [26]:
from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough

## create chain
chain = (
    RunnablePassthrough.assign(messages=itemgetter("messages")|trimmer) #chain trimmer
    | prompt
    | model
)
## invoke chain
response = chain.invoke(
    {
    "messages":messages + [HumanMessage(content="What math problem did i ask about?")],
    "language":"English",
    }
)
## invoke response
response.content

'You asked about 2 + 2.  😊  \n'

## Wrap in Message History

In [27]:
## wrapping it in message history
with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="messages",
)
## set session_id config
config={"configurable":{"session_id":"chat5"}}

In [28]:
## add more to the conversation
response = with_message_history.invoke(
    {
        "messages": messages + [HumanMessage(content="what is my name?")],
        "language": "English",
    },
    ## set session_id config
    config=config,
)
response.content

'Your name is Bob. 😊  \n\n\n\nHow can I help you further?\n'

In [29]:
## add more to the conversation
response = with_message_history.invoke(
    {
        "messages": messages + [HumanMessage(content="what dessert did i mention?")],
        "language": "English",
    },
    ## set session_id config
    config=config,
)
response.content

'You mentioned chocolate ice cream!  🍨  😊'

# Memory with Vector Stores and Retrievers
* Here we will utilize LangChain's vector store and retriever abstractions.
* These abstractions are designed to support retrieval of data from vector databases and other sources (e.g. graph DBs) for seamless integration with LLM workflows.
* These are very important for applications that fetch data to be reasoned over as part of LLM model inference as in the most common use case: **Retrieval Augmented Generation or RAG**

## Below we will implement
1. Documents
2. Vector Stores
3. Retrievers

### Documents
* LangChain implements a Document abstraction which is intended to represent a unit of text and its associated metadata. It has 2 main attributes:

1. **page_content**: a string representing the content;

2. **metadata**: a dict containing arbitrary metadata.

The metadata attribute can capture information about the:
  * **source of the document**
  * **relationship to other documents**
  * **other useful information (e.g. keywords, topics, custom metadata)**

Note that an individual Document object often represents a CHUNK of a larger document.

Here we will generate **sample documents:**

In [30]:
from langchain_core.documents import Document ## document class

## sample documents
documents = [
    Document(
        page_content="Dogs are great companions, known for their loyalty and friendliness.",
        metadata={"source": "mammal-pets-doc"},
    ),
    Document(
        page_content="Cats are independent pets that often enjoy their own space.",
        metadata={"source": "mammal-pets-doc"},
    ),
    Document(
        page_content="Goldfish are popular pets for beginners, requiring relatively simple care.",
        metadata={"source": "fish-pets-doc"},
    ),
    Document(
        page_content="Parrots are intelligent birds capable of mimicking human speech.",
        metadata={"source": "birds-pets-doc"},
    ),
    Document(
        page_content="Rabbits are social animals that need plenty of space to hop around.",
        metadata={"source": "mammal-pets-doc"},
    )
]

In [31]:
## view documents
documents

[Document(metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.'),
 Document(metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'),
 Document(metadata={'source': 'fish-pets-doc'}, page_content='Goldfish are popular pets for beginners, requiring relatively simple care.'),
 Document(metadata={'source': 'birds-pets-doc'}, page_content='Parrots are intelligent birds capable of mimicking human speech.'),
 Document(metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that need plenty of space to hop around.')]

## VectorStores

### Setup GROQ LLM for different LLM
* Above we already initiated the API key for GROQ

In [32]:
import os
from langchain_groq import ChatGroq

## init new llm from GROQ --> llama3
llm=ChatGroq(groq_api_key=GROQ_API_KEY, model="Llama3-8b-8192")
llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7eae8659ce90>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x7eadbd971d90>, model_name='Llama3-8b-8192', model_kwargs={}, groq_api_key=SecretStr('**********'))

### Create Embeddings
* We will use a SentenceTransformer model: sentence-transformers/all-MiniLM-L6-v2
* Model Card: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

In [33]:
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
from sentence_transformers import SentenceTransformer, util

## create embeddings
embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

### Create Vector Store

In [34]:
from langchain_chroma import Chroma ## vector DB

## create vector store
vectorstore = Chroma.from_documents(documents, embedding=embeddings)
vectorstore

<langchain_chroma.vectorstores.Chroma at 0x7ead5f75f490>

In [36]:
## 1. similarity search in vectorstore
vectorstore.similarity_search("bird")

[Document(id='bcd799a2-0530-48ca-b1a1-1649873317ae', metadata={'source': 'birds-pets-doc'}, page_content='Parrots are intelligent birds capable of mimicking human speech.'),
 Document(id='b323bac9-6751-434e-8f8e-5be29a99ddea', metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that need plenty of space to hop around.'),
 Document(id='c559eba9-f874-4a81-8c64-a9cb7f63dd69', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'),
 Document(id='75b15e0b-ce7d-474a-bc74-d74c963188e5', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.')]

In [37]:
### 2. async queries
await vectorstore.asimilarity_search("cat")

[Document(id='c559eba9-f874-4a81-8c64-a9cb7f63dd69', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'),
 Document(id='75b15e0b-ce7d-474a-bc74-d74c963188e5', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.'),
 Document(id='b323bac9-6751-434e-8f8e-5be29a99ddea', metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that need plenty of space to hop around.'),
 Document(id='bcd799a2-0530-48ca-b1a1-1649873317ae', metadata={'source': 'birds-pets-doc'}, page_content='Parrots are intelligent birds capable of mimicking human speech.')]

In [38]:
### 3. similarity search with score
vectorstore.similarity_search_with_score("cat")

[(Document(id='c559eba9-f874-4a81-8c64-a9cb7f63dd69', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.'),
  0.935105562210083),
 (Document(id='75b15e0b-ce7d-474a-bc74-d74c963188e5', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.'),
  1.5740898847579956),
 (Document(id='b323bac9-6751-434e-8f8e-5be29a99ddea', metadata={'source': 'mammal-pets-doc'}, page_content='Rabbits are social animals that need plenty of space to hop around.'),
  1.5956902503967285),
 (Document(id='bcd799a2-0530-48ca-b1a1-1649873317ae', metadata={'source': 'birds-pets-doc'}, page_content='Parrots are intelligent birds capable of mimicking human speech.'),
  1.665792465209961)]

## Retrievers
* LangChain VectorStore objects DO NOT subclass Runnable.
  * Therefore you CANNOT immediately integrate them into the LangChain expression chains (LCEL).

* LangChain Retrievers are RUNNABLES.
  * This means they implement a standard set of methods including:
    * synchronous invoke
    * asynchronous invoke
    * batch operations
  * They are designed to be incorporated with LCEL chains.

* We are able create a simple version of this ourselves WITHOUT having to subclass Retriever.
  * If we choose the method we wish to use to retrieve documents, we can create a runnable very easily.
  * Below we will build a runnable around the `similarity_search` method.

In [39]:
from typing import List

from langchain_core.documents import Document
from langchain_core.runnables import RunnableLambda ##Wrapping a callable in a RunnableLambda makes the callable usable within either a sync or async context.

## init retriever
retriever=RunnableLambda(vectorstore.similarity_search).bind(k=1)
retriever.batch(["cat","dog"])

[[Document(id='c559eba9-f874-4a81-8c64-a9cb7f63dd69', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.')],
 [Document(id='75b15e0b-ce7d-474a-bc74-d74c963188e5', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.')]]

Note:
* Vectorstores implement an `as_retriever` method that will generate a Retriever, specifically a `VectorStoreRetriever`.
* These retrievers include specific `search_type` and `search_kwargs` attributes that identify what methods of the underlying vector store to call.
  * They also tell us how to paramterize them.
  * For example, we can replicate what we did above with....

In [40]:
## using vectorstore with `as_retriever`
retriever=vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k":1}
)
retriever.batch(["cat","dog"])

[[Document(id='c559eba9-f874-4a81-8c64-a9cb7f63dd69', metadata={'source': 'mammal-pets-doc'}, page_content='Cats are independent pets that often enjoy their own space.')],
 [Document(id='75b15e0b-ce7d-474a-bc74-d74c963188e5', metadata={'source': 'mammal-pets-doc'}, page_content='Dogs are great companions, known for their loyalty and friendliness.')]]

# Put it all together: Retriever with Chain - "Basic RAG"
* Below is a basic RAG chain.

In [41]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

## message
message = """
Answer this question usiing the provided context only.

{question}

Context:
{context}
"""
prompt=ChatPromptTemplate.from_messages([("human", message)])

## create RAG chain
rag_chain={"context":retriever, "question":RunnablePassthrough()}|prompt|llm

## invoke chain
response=rag_chain.invoke("Tell me about cats")
print(response.content)

According to the provided context, cats are independent pets that often enjoy their own space.
