# LangChain Conversation History Management
In the [LangChain Expression Language (LCEL)](./lcel.ipynb), we covered LCEL at a high level, demonstrating specifically how to chain a prompt engineered chat prompt with an LLM, namely MLX. What I failed to demonstrate in that notebook was how to think about memory (aka chat conversation management), and to be honest, I underestimated how "involved" of a thing this got to be! 😅 To be clear, what we will be covering in this notebook is less of a technical concern and more of a business logic concern.

While LangChain offers many mechanisms for handling chat conversations (aka memory) correctly, I found some of the higher level ones to not be satisfactory for our purposes. Specifically, since I want us to adhere to a fixed schema, the high level abstraction objects provided by LangChain simply don't operate in the ideal way in which we need them to. No worries! We can still work around this without having to abandon LangChain. We're just going to need to do some special stuff throughout this notebook!

## High Level Flow
Before we get into the code itself, let's talk about how we want to think about the flow. For simplicity's sake, we are going to be ultimately saving this chat history as a JSON file. This JSON file should look like the schema that we've defined in the file `data/schema.json`.

Let's say that the user is loading the MLX Gradio UI interface, either for the first time ever or as a returning user. Here is the flow of how we should be thinking about our data:

1. **Loading the conversation history from file**: Just as it sounds, we will want to load the conversation history from file so that the user can interact with their historical conversations if they would like. Now, it's possible that this is the user's first time interacting with the chatbot, so it may be that we need to create this file from scratch!
2. **Setting a new conversation ID**: Regardless if the user is new or returning, we are going to make the assumption that the user will want to begin with a new conversation. This means that we will need to instantiate a new conversation ID so that we can keep appending new conversation interactions to that same conversation thread.
3. **Managing conversation back-and-forth**: As the conversation proceeds, we will want to continually update our conversation schema with any new human and AI interactions. This will include also autosaving them to file for the user's convenience.
4. **Starting a new conversation / loading an existing conversation**: At any point, the user may want to pivot from their current conversation to either a new conversation or to continue another historical conversation loaded from our file as part of step 1. If this is the case, we will need to ensure that our backend system is referencing the correct conversation interaction.

## Notebook Setup
In this section, we'll do all our usual set ups. We'll also set up the LangChain MLX model using the new ChatMLX implementation. All these are things we've already explored in other notebooks.

In [1]:
# Importing the necessary Python libraries
import os
import json
import uuid
import pandas as pd
from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate, MessagesPlaceholder
from langchain_community.llms.mlx_pipeline import MLXPipeline
from langchain_community.chat_models.mlx import ChatMLX
from langchain_core.runnables import RunnableLambda
from langchain_community.chat_message_histories.in_memory import ChatMessageHistory
from langchain_core.messages.human import HumanMessage
from langchain_core.messages.ai import AIMessage
from langchain_core.messages.system import SystemMessage

In [2]:
# Setting a default system prompt
DEFAULT_SYSTEM_PROMPT = 'You are a helpful assistant.'

class MLXModelParameters():

    def __init__(self, temp = 0.7, max_tokens = 1000, system_prompt = DEFAULT_SYSTEM_PROMPT):
        self.temp = temp
        self.max_tokens = max_tokens
        self.system_prompt = system_prompt

    def __str__(self):
        return f'Temperature: {self.temp}\nMax Tokens: {self.max_tokens}\nSystem Prompt: {self.system_prompt}'
    
    def __repr__(self):
        return f'Temperature: {self.temp}\nMax Tokens: {self.max_tokens}\System Prompt: {self.system_prompt}'

    def update_temp(self, new_temp):
        self.temp = new_temp

    def update_max_tokens(self, new_max_tokens):
        self.max_tokens = new_max_tokens

    def update_system_prompt(self, new_system_prompt):
        self.system_prompt = new_system_prompt

    def to_json(self):
        return { 'temp': self.temp, 'max_tokens': self.max_tokens }
    
mlx_model_parameters = MLXModelParameters()

In [3]:
# Setting a list of models that we'll need to check against
NO_SYSTEM_MODEL_PROVIDERS = ['mistralai', 'meta-llama']

In [4]:
# Setting constant values to represent model name and directory
MODEL_NAME = 'mistralai/Mistral-7B-Instruct-v0.2'
BASE_DIRECTORY = '../models'
MLX_DIRECTORY = f'{BASE_DIRECTORY}/mlx'
mlx_model_directory = f'{MLX_DIRECTORY}/{MODEL_NAME}'

In [5]:
# Setting up the LangChain MLX LLM
llm = MLXPipeline.from_model_id(
    model_id = mlx_model_directory,
    pipeline_kwargs = {
        'temp': mlx_model_parameters.temp,
        'max_tokens': mlx_model_parameters.max_tokens,
    }
)

# Setting up the LangChain MLX Chat Model with the LLM above
chat_model = ChatMLX(llm = llm)

  from .autonotebook import tqdm as notebook_tqdm


## Setting up the LangChain pipeline
In order to use LangChain's preferred implementation of memory management, we're first going to need to establish our LangChain pipeline. We've done similar things to this in other notebooks, but with this particular implementation, we are going to make a specific adjustment. Namely, since we are now going to make use of the LangChain Community implementation of MLX, we are going to need to manually add our own metadata. To seamlessly do this, we are going to make use of LCEL's **RunnableLambda**, which essentially allows us to define our own custom function.

Also note that when we set up our chat prompt, we are going to need to slide in an extra entry referred to as **MessagesPlaceholder**. As the name implies, that will serve as a placeholder so that we can keep passing the history back through the model.

In [6]:
# Setting up the Chat prompt template
human_message_prompt = HumanMessagePromptTemplate.from_template(template = "{input}")
chat_prompt = ChatPromptTemplate.from_messages(messages = [
    MessagesPlaceholder(variable_name = 'history'),
    human_message_prompt
])

In [7]:
def update_ai_response_metadata(ai_message):
    '''
    Updates the metadata on the AI response

    Inputs:
        - ai_message (LangChain AIMessage): The AI message produced by the model

    Returns:
        - ai_message (LangChain AIMessage): The AI message produced by the model, except now with the appropriate metadata intact
    '''

    # Referencing global variables
    global mlx_model_parameters
    global MODEL_NAME

    # Creating a dictionary of the metadata that we will be adding to the AI message
    metadata = {
        'model_name': MODEL_NAME,
        'timestamp': str(pd.Timestamp.utcnow()),
        'like_data': None,
        'hyperparameters': mlx_model_parameters.to_json()
    }

    # Applying the metadata to the AI response
    ai_message.response_metadata = metadata

    return ai_message

In [8]:
def correct_for_llama_and_mistral(chat_history):
    '''
    Corrects for the issue where Llama / Mistral models are unable to accept LangChain system messages

    Inputs:
        - chat_history (LangChain ChatPromptValue): The current chat history with no alterations

    Returns:
        - chat_history (LangChain ChatPromptValue): The chat history with alterations (if needed)
    '''
    # Referencing global variables
    global MODEL_NAME
    global NO_SYSTEM_MODEL_PROVIDERS

    # Checking if the correction needs to be made if the model is Llama or Mistral
    if MODEL_NAME.split('/')[0] in NO_SYSTEM_MODEL_PROVIDERS:

        # Popping the system messages out of the queue of messages
        system_message = chat_history.messages.pop(0)

        # Getting the first human message
        first_human_message = chat_history.messages[0].content

        # Joining the system message with the first human message as a new human message
        new_human_message = f'{system_message.content}\n\n{first_human_message}'

        # Setting this new human message as the content in the first human message in the chat history
        chat_history.messages[0].content = new_human_message

    return chat_history

In [9]:
# Creating the inference chain by chaining together the chat prompt, chat model, and custom function to update metadata
inference_chain = chat_prompt | RunnableLambda(correct_for_llama_and_mistral) | chat_model | RunnableLambda(update_ai_response_metadata)

In [10]:
# Testing the inference chain with a fake chat history
fake_chat_history = ChatMessageHistory(messages = [
    SystemMessage(content = 'You are a helpful assistant.'),
    HumanMessage(content = 'What is the capital of Illinois?'),
    AIMessage(content = 'The capital of Illinois is Springfield.')
])

# Generating the test response with the fake chat history
test_response = inference_chain.invoke({
    'history': fake_chat_history.messages,
    'input': 'What is the largest city in that state?'
})

print(test_response.content)
print(test_response.response_metadata)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


The largest city in Illinois is Chicago. While Springfield is the state capital, Chicago is the most populous city in Illinois.
{'model_name': 'mistralai/Mistral-7B-Instruct-v0.2', 'timestamp': '2024-04-20 19:19:06.012664+00:00', 'like_data': None, 'hyperparameters': {'temp': 0.7, 'max_tokens': 1000}}


## Starting the Conversation History Schema
As mentioned before, we are going to be emulating the structure of the schema as defined in `data/schema.json`. In this notebook, we are going to pretend as if the user is a brand new user, so we will need to set up the conversation history schema from scratch.

In [11]:
# Generating a current conversation ID
current_conversation_id = str(uuid.uuid4())
current_conversation_id

'1ab6d990-db3f-4e47-81d2-31fe2a76d1f1'

In [12]:
# Creating the base conversation history schema per a single user
BASE_CONVERSATION_HISTORY_SCHEMA = {
    'user_id': 'default_username',
    'conversation_history': [
        {
            'conversation_id': current_conversation_id,
            'summary_title': '',
            'system_prompt': DEFAULT_SYSTEM_PROMPT,
            'conversation': []
        }
    ]
}

BASE_CONVERSATION_HISTORY_SCHEMA

{'user_id': 'default_username',
 'conversation_history': [{'conversation_id': '1ab6d990-db3f-4e47-81d2-31fe2a76d1f1',
   'summary_title': '',
   'system_prompt': 'You are a helpful assistant.',
   'conversation': []}]}

In [13]:
BASE_CONVERSATION_HISTORY_SCHEMA

{'user_id': 'default_username',
 'conversation_history': [{'conversation_id': '1ab6d990-db3f-4e47-81d2-31fe2a76d1f1',
   'summary_title': '',
   'system_prompt': 'You are a helpful assistant.',
   'conversation': []}]}

In [14]:
from langchain_community.chat_message_histories.in_memory import ChatMessageHistory
from langchain_core.messages.system import SystemMessage

In [15]:
chat_history = ChatMessageHistory(
    messages = [
        SystemMessage(content = DEFAULT_SYSTEM_PROMPT)
    ]
)

In [16]:
from langchain_core.messages.ai import AIMessage

ai_message = AIMessage(content = 'The capital of Illinois is Springfield')

In [17]:
ai_message.response_metadata = {'temp': 0.7}

In [18]:
response = chat_model.invoke('What is the capital of Illinois?')

In [19]:
from langchain_core.messages.human import HumanMessage
chat_history = ChatMessageHistory()
chat_history.add_messages(messages = [
    HumanMessage(content = 'What is the capital of Illinois?'),
    AIMessage(content = 'The capital of Illinois is Springfield.')
])

In [20]:
ChatMessageHistory.schema_json

<bound method BaseModel.schema_json of <class 'langchain_community.chat_message_histories.in_memory.ChatMessageHistory'>>