### Managing the Conversation History
One important concept to understand when building chatbots is how to manage conversation history. If left unmanaged, the list of messages will grow unbounded and potentially overflow the context window of the LLM. Therefore, it is important to add a step that limits the size of the messages you are passing in.
'trim_messages' helper to reduce how many messages we're sending to the model. The trimmer allows us to specify how many tokens we want to keep, along with other parameters like if we want to always keep the system message and whether to allow partial messages

- Importing enviornment variables

In [90]:
import os
from dotenv import load_dotenv
load_dotenv()

groq_api_key = os.getenv('GROQ_API_KEY')
groq_api_key

'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'

In [91]:
from langchain_groq import ChatGroq

In [92]:
model = ChatGroq(model = 'gemma2-9b-it' , groq_api_key = groq_api_key)
model

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x00000258373FE690>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x0000025838444A90>, model_name='gemma2-9b-it', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [93]:
from langchain_core.messages import SystemMessage, trim_messages

In [94]:
trimmer = trim_messages(max_tokens=45,   ### Max token count of trimmed messages
    
    # strategy="last",     ###Strategy for trimming."first": Keep the first <= n_count tokens of the messages."last": Keep the last <= n_count tokens of the messages. Default is "last".

    token_counter=model, ### Function or llm for counting tokens

    include_system=True, ### Whether to keep the SystemMessage if there is one at index 0. Should only be specified if strategy="last". Default is False.
    
    allow_partial=False, ### Whether to split a message if only part of the message can be included. If strategy="last" then the last partial contents of a message are included. If strategy="first" then the first partial contents of a message are included. Default is False.


    start_on="human"   ### The message type to start on. Should only be specified if strategy="last". If specified then every message before the first occurrence of this type is ignored
    
    )

In [95]:
trimmer

RunnableLambda(...)

In [96]:
from langchain_core.messages import SystemMessage , HumanMessage , AIMessage

messages = [
    SystemMessage(content="you're a good assistant"),
    HumanMessage(content="hi! I'm bob"),
    AIMessage(content="hi!"),
    HumanMessage(content="I like vanilla ice cream"),
    AIMessage(content="nice"),
    HumanMessage(content="whats 2 + 2"),
    AIMessage(content="4"),
    HumanMessage(content="thanks"),
    AIMessage(content="no problem!"),
    HumanMessage(content="having fun?"),
    AIMessage(content="yes!"),
]

In [97]:
### To run the below command we need to install transformers package either by pip install transformers or mentioning in requirements.txt

### As the max_tokens is less we can observe that few human/AI messages from top are ignored 

trimmer.invoke(messages)

[SystemMessage(content="you're a good assistant", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='I like vanilla ice cream', additional_kwargs={}, response_metadata={}),
 AIMessage(content='nice', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='whats 2 + 2', additional_kwargs={}, response_metadata={}),
 AIMessage(content='4', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='thanks', additional_kwargs={}, response_metadata={}),
 AIMessage(content='no problem!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='having fun?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='yes!', additional_kwargs={}, response_metadata={})]

- The next step is to use the trimmer object to pass only few messages using the prompt to the chat model

In [98]:
### Create the structure of the prompt

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder


prompt = ChatPromptTemplate.from_messages([
    ('system' , 'You are an helpful AI assistant. Answer all question to the best of your ability in {language}'),
    MessagesPlaceholder(variable_name='messages')])

In [99]:
from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough

In [100]:
chain=(
    RunnablePassthrough.assign(messages=itemgetter("messages")|trimmer)
    | prompt
    | model
    
)

In [101]:
response=chain.invoke(
    {
    "messages":messages + [HumanMessage(content="What ice cream do i like")],
    "language":"English"
    }
)

In [102]:
response.content

"As an AI, I don't have access to your personal information, including your favorite ice cream! \n\nWhat's your favorite flavor? 😋\n"

The reason why chat model is not able to recall the answer to this question is since the max_token is set to 45 it doesnot have this question in its message history.

We can try with different question

In [103]:
chain.invoke(
    {
    "messages":messages + [HumanMessage(content="What math problem did I asked?")],
    "language":"English"
    }
)

AIMessage(content='You asked:  "whats 2 + 2" \n\n\nLet me know if you\'d like to try another one! 😊 \n', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 32, 'prompt_tokens': 87, 'total_tokens': 119, 'completion_time': 0.058181818, 'prompt_time': 0.003157387, 'queue_time': 0.08558302300000001, 'total_time': 0.061339205}, 'model_name': 'gemma2-9b-it', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run-0112ce87-94ea-4b31-9820-45bfb18346e7-0', usage_metadata={'input_tokens': 87, 'output_tokens': 32, 'total_tokens': 119})

- The next step is to wrap the chain in Message History so that we chat model remembers the message history for each session of the user

In [104]:
### Chat message history stores a history of the message interactions in a chat.
from langchain_community.chat_message_histories import ChatMessageHistory 


###  Abstract base class for storing chat message history.
from langchain_core.chat_history import BaseChatMessageHistory  

store = {}

### function that retrives chat history for a specific session id

def get_session_history(session_id:str)->BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

RunnableWithMessageHistory wraps another Runnable and manages the chat message history for it; it is responsible for reading and updating the chat message history

In [105]:
### this time we are using a new parameter -> input_messages_key that knows which messages to select for saving in the chat history

from langchain_core.runnables.history import RunnableWithMessageHistory 

with_message_history = RunnableWithMessageHistory(chain,get_session_history=get_session_history,input_messages_key='messages')


### Creating a new config for new session 
config_3 = {'configurable' : {'session_id' : 'chat3'}}

In [106]:
with_message_history.invoke({'messages' : messages + [HumanMessage(content='what is my name?')], 'language' : 'English'} ,
                            config=config_3)

AIMessage(content="As an AI, I don't have access to any personal information about you, including your name.  \n\nIf you'd like to tell me your name, I'd be happy to use it! 😊\n", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 47, 'prompt_tokens': 85, 'total_tokens': 132, 'completion_time': 0.085454545, 'prompt_time': 0.00226773, 'queue_time': 0.012029749999999999, 'total_time': 0.087722275}, 'model_name': 'gemma2-9b-it', 'system_fingerprint': 'fp_10c08bf97d', 'finish_reason': 'stop', 'logprobs': None}, id='run-72eb2d59-d1f7-4c55-aa75-a3b7a403bb34-0', usage_metadata={'input_tokens': 85, 'output_tokens': 47, 'total_tokens': 132})

The chat model is not able to remember the name as Bob as we provided earlier in messages since the max_token is set to 45. If you increase the limit lets say 90 it would consider entire messages in that list 