# The Whole Process of Creating the Chatbot

This is actually available at the LangChain documentation but I put it here because I want to learn how to do it by working on it first-hand.

# Installation

I installed the version of langgraph that is > 0.2.27

In [None]:
pip install langchain-core langgraph>0.2.27

# LangSmith
### Setting up LangSmith environment variables
Developing applications with LangChain involves a lot of invocation of LLM calls. Due to this, when the project becomes larger and more complex, being able to see what's going on inside the chain or agent becomes important. The best way to do it is by using <b>LangSmith</b>.

In [1]:
import getpass
import os

os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

# Quickstart

Installing LangChain with Groq as the selected chat model.

1. I first created a python virtual environment because I was having compatibility conflicts with my global Python environment.

In [None]:
python -m venv myenv

In [None]:
myenv\Scripts\activate

2. Then I installed the required packages.

In [None]:
pip install numpy==1.26.0
pip install -qU "langchain[groq]"

# Using the model directly

In [23]:
import getpass
import os

if not os.environ.get("GROQ_API_KEY"):
  os.environ["GROQ_API_KEY"] = getpass.getpass("Enter API key for Groq: ")

from langchain.chat_models import init_chat_model

model = init_chat_model("llama-3.3-70b-versatile", model_provider="groq")

- <code>chat_models</code> is one of the instances of what we call LangChain <b>"Runnables"</b>-- an object that simplifies how we chain together different components or steps in our application. Think of it as a building blocks that we can easily connect/chain together to create smooth workflow for processing texts, data, or other tasks.
  - Runnables are the object that we use to take an input, perform some operations on it, then return an output.

In [24]:
from langchain_core.messages import HumanMessage

model.invoke([HumanMessage(content="Hi! I'm Kian.")])

AIMessage(content="Hello Kian! It's nice to meet you. Is there something I can help you with or would you like to chat?", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 27, 'prompt_tokens': 42, 'total_tokens': 69, 'completion_time': 0.098181818, 'prompt_time': 0.008165849, 'queue_time': 1.158250161, 'total_time': 0.106347667}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_3884478861', 'finish_reason': 'stop', 'logprobs': None}, id='run-5de2b972-f44e-4f72-8787-3542ba432035-0', usage_metadata={'input_tokens': 42, 'output_tokens': 27, 'total_tokens': 69})

- To simply call the model, we can pass in a message-- a list of message-- to the <code>.invoke</code> method.

- <code>HumanMessage</code> are messages that are passed in from the human to the model.

- The model on its own doesn't have the concept of state. So, if we ask a follow up question, it won't be able to take into consideration our previous question/prompt. Example is shown below:

In [25]:
model.invoke([HumanMessage(content="What's my name?")])

AIMessage(content="I don't know your name. I'm a large language model, I don't have the ability to know your personal information or recall previous conversations. Each time you interact with me, it's a new conversation. If you'd like to share your name, I'd be happy to chat with you!", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 62, 'prompt_tokens': 40, 'total_tokens': 102, 'completion_time': 0.225454545, 'prompt_time': 0.008362109, 'queue_time': 0.019013239, 'total_time': 0.233816654}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_3884478861', 'finish_reason': 'stop', 'logprobs': None}, id='run-7e751b69-88da-4e9e-8215-72623a5344a9-0', usage_metadata={'input_tokens': 40, 'output_tokens': 62, 'total_tokens': 102})

To solve that problem, we need to pass the entire conversation history to the model. That would require doing the following:

In [26]:
from langchain_core.messages import AIMessage

model.invoke(
    [
        HumanMessage(content="Hi! I'm Kian"),
        AIMessage(content="Hello Kian! How can I assist you today?"),
        HumanMessage(content="What's my name?"),
    ]
)

AIMessage(content='Your name is Kian.', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 7, 'prompt_tokens': 67, 'total_tokens': 74, 'completion_time': 0.026492835, 'prompt_time': 0.006854816, 'queue_time': 0.27002501, 'total_time': 0.033347651}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_7b42aeb9fa', 'finish_reason': 'stop', 'logprobs': None}, id='run-2e088c54-0bb9-4880-94b5-feb370cf6d92-0', usage_metadata={'input_tokens': 67, 'output_tokens': 7, 'total_tokens': 74})

- But, as we can see, this is incredibly inefficient and would not work when building a proper chatbot. Luckily, we can solve this by using something called <code>Message persistence</code>.

# Message persistence

A tool called <code>LangGraph</code> has a built-in message persistence, making it ideal to use for chat applications that has multiple conversational turns.

Wrapping our chat model inside a minimal <code>LangGraph</code> application automatically persists the message history.

<code>LangGraph</code> comes with a simple in-memory checkpointer.

In [27]:
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

# Define a new graph
workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
    response = model.invoke(state["messages"])
    return {"messages": response}


# Define the (single) node in the graph
workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

# Add memory
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

- We now need to create a <config>config</config> that we pass to the runnable every single time. This <code>config</code> contains information that contains information that is not included in the input. In this case, we want to include a <code>thread_id</code>-- which is an id that signifies the conversation we're on. The moment this <code>thread_id</code> changes, the conversation will start fresh. This is very useful when you have multiple conversation threads-- which is expected when your chatbot has multiple users. <u>See the following examples:</u>

In [28]:
config = {"configurable": {"thread_id": "abc1"}}

In [29]:
query = "Hi! I'm Kian."

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()  # output contains all messages in state


Hello Kian! It's nice to meet you. Is there something I can help you with or would you like to chat?


In [30]:
query = "What's my name?"

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


Your name is Kian. You told me that when you first said hello!


If we change the config to different <code>thread_id</code>, our conversation will start fresh.

In [31]:
config = {"configurable": {"thread_id": "abc2"}}

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


I don't know your name. I'm a large language model, I don't have the ability to access personal information about you, including your name. I can only respond based on the text you provide to me. If you'd like to share your name, I'd be happy to chat with you!


However, if we go back to our old <code>thread_id</code>, we'll go back to our old conversation.

In [32]:
config = {"configurable": {"thread_id": "abc1"}}

input_messages = [HumanMessage(query)]
output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


Still Kian! You mentioned it earlier when we started chatting.


### Tip
For <code>async</code> support, update the <code>call_model</code> node to be an <code>async function</code> and use <code>.ainvoke</code> when invoking the application:

In [None]:
# Async function for node:
async def call_model(state: MessagesState):
    response = await model.ainvoke(state["messages"])
    return {"messages": response}


# Define graph as before:
workflow = StateGraph(state_schema=MessagesState)
workflow.add_edge(START, "model")
workflow.add_node("model", call_model)
app = workflow.compile(checkpointer=MemorySaver())

# Async invocation:
output = await app.ainvoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()

# Prompt Template

<code>Prompt Template</code> gives the LLM an instruction on how it needs to respond to the user inputs. To put it simply, it gives the LLM a template or a format that it can work with.

- We will create a <code>ChatPromptTemplate</code> to add in a system message.
- We will use the <code>MessagesPlaceholder</code> to pass all the messages in.

In [38]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Act like a nutritionist or gym instructor (depending on the type of question). Answer with the best of your ability."
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

Since I'm aiming to build a personal nutritionist/gym instructor chatbot, I'm giving it a template to act like one.

Upload the application to <code>invoke</code> the template.

In [39]:
workflow = StateGraph(state_schema=MessagesState)


def call_model(state: MessagesState):
    prompt = prompt_template.invoke(state)
    response = model.invoke(prompt)
    return {"messages": response}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

Then <code>invoke</code> the application the same way.

In [None]:
config = {
    "configurable" : {
        "thread_id" : "abc3"
    }
}
query = "Hi! I'm Ki."

input_messages = [HumanMessage(content=query)]

output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


Hello Ki. It's great to meet you. I'm a nutritionist and fitness expert, here to help you with any questions or concerns you may have about healthy eating, exercise, or wellness. What brings you here today? Are you looking to start a new fitness routine, seeking nutrition advice, or just wanting to learn more about living a healthy lifestyle?


In [41]:
query = "What is my name? And what rep range do you recommend for muscle hypertrophy?"

input_messages = [HumanMessage(query)]

output = app.invoke({"messages": input_messages}, config)
output["messages"][-1].pretty_print()


Your name is Ki. 

For muscle hypertrophy, I recommend a rep range of 8-12 reps per set. This range allows for a good balance between muscle damage and metabolic stress, which are both important factors for building muscle. However, it's worth noting that some research suggests that a slightly wider rep range of 6-15 reps can also be effective for hypertrophy.

In particular, I would recommend:

* 3-4 sets per exercise
* 8-12 reps per set
* Rest for 60-90 seconds between sets
* Choose a weight that allows you to complete the given number of reps with proper form, but still challenges you

Keep in mind that everyone's body is different, and you may need to adjust the rep range and weight based on your individual needs and goals. But as a general guideline, 8-12 reps is a good starting point for building muscle.


# Managing Conversation History

One important thing to learn when doing a chatbot is to manage the conversation history. If the messages that were sent became too large, it can potentially overflow the context window of the LLM which can cause problems. Therefore, it is important to include a step that will limit the size of the messages we are passing in.

<b>Importantly, you will want to do it BEFORE the prompt template, but AFTER loading the previous message from the message history.</b>

We can do this by adding a step at the font of the prompt that modifies the messages key appropriately, and then wrap that new chain in the Message history class.

<code>LangChain</code> has a few built-in helper for managing a list of messages, but for simplicity, we can go with the <code>trim_messages</code> helper to reduce the number of messages that we're sending to the model.

The trimmer will allow us to limit the tokens we want to keep, along with other parameters that allows us to always keep the system messages, or allow partial messages.

In [None]:
from langchain_core.messages import SystemMessage, trim_messages

trimmer = trim_messages(
    max_tokens=65,
    strategy="last",
    token_counter=model,
    include_system=True,
    allow_partial=False,
    start_on="human",
)

messages = [
    SystemMessage(content="you're a good assistant"),
    HumanMessage(content="hi! I'm bob"),
    AIMessage(content="hi!"),
    HumanMessage(content="I like vanilla ice cream"),
    AIMessage(content="nice"),
    HumanMessage(content="whats 2 + 2"),
    AIMessage(content="4"),
    HumanMessage(content="thanks"),
    AIMessage(content="no problem!"),
    HumanMessage(content="having fun?"),
    AIMessage(content="yes!"),
]

trimmer.invoke(messages)

To use it in our chain, just run the trimmer before passing the messages input to our prompt.

In [None]:
from sre_parse import State


workflow = StateGraph(state_schema=State)


def call_model(state: State):
    trimmed_messages = trimmer.invoke(state["messages"])
    prompt = prompt_template.invoke(
        {"messages": trimmed_messages}
    )
    response = model.invoke(prompt)
    return {"messages": [response]}


workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

# Streaming

By now, the chatbot should be functioning properly, the only thing left is to enhance the UX Experience. Sometimes, generating responses could take a while, so one way to enhance user experience is by <code>streaming</code>-- stream back each token as it is generated.

By default, <code>.stream</code> in our LangGraph application streams application steps-- in this case, the single step of the model response. By setting the <code>stream_mode="messages"</code>, it will allow us to stream the output tokens instead.

In [None]:
config = {"configurable": {"thread_id": "abc789"}}
query = "Hi Kian, please tell me a joke."
language = "English"

input_messages = [HumanMessage(query)]
for chunk, metadata in app.stream(
    {"messages": input_messages, "language": language},
    config,
    stream_mode="messages",
):
    if isinstance(chunk, AIMessage):  # Filter to just model responses
        print(chunk.content, end="|")

# End

All I included are the things that I need to learn in order to finish my chatbot project. I didn't include some features-- such as the language feature. For reference, here is the link to the documentation: https://python.langchain.com/docs/tutorials/chatbot/#overview