# Conversational Interface - Contextual-aware Chatbot with Claude-3 LLM
In this notebook we will build a chatbot which stores the conversation history in Dynamo DB and uses context which is stored in an in-memory vector db (FAISS)

In [27]:
%pip install pypdf
%pip install faiss-cpu>=1.7,<2

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
/bin/bash: line 1: 2: No such file or directory
Note: you may need to restart the kernel to use updated packages.


In [28]:
import boto3
import botocore
import botocore.exceptions
import pprint
import os

pp = pprint.PrettyPrinter(indent=2)

### Set up

In [29]:
boto3_session = boto3.session.Session()
llm_region = db_region_name = boto3_session.region_name

# the statetements below can be used to override the region in the session, comment them out if there is no need to override
#db_region = "us-west-2"
#llm_region = "us-west-2"

table_name = "ConversationHistory"

Create the DynamoDB table that will be storing the conversation history. Note that by default, DynamoDBChatMessageHistory
expects a primary key *sessionId*, however you can change the name of the key or you can set a composite key.
A general discussion on DynamoDBChatMessageHistory can be found here: 
https://python.langchain.com/v0.2/docs/integrations/memory/aws_dynamodb/

In [30]:
# create the service client.
dynamodb = boto3.client("dynamodb",region_name=db_region)

# check if the table exists; if it doesn't, then create the table
try:
    response = dynamodb.describe_table(TableName=table_name)
    print(f'Table <{table_name}> already exists ...')
except botocore.exceptions.ClientError as error:
    if error.response['Error']['Code'] == 'ResourceNotFoundException':
        print(f'Table <{table_name}> does not exist, creating ...')
        table = dynamodb.create_table(
            TableName=table_name,
            KeySchema=[{"AttributeName": "SessionId", "KeyType": "HASH"}],
            AttributeDefinitions=[{"AttributeName": "SessionId", "AttributeType": "S"}],
            BillingMode="PAY_PER_REQUEST",
        )

        # Wait until the table exists.
        dynamodb.get_waiter("table_exists").wait(TableName=table_name)

        # Print out some data about the table.
        print(f'Table <{table_name}> has been created')

Table <ConversationHistory> already exists ...


### Create and Run the chatbot
Here we create a simple chatbot which translates the input from English to some language specified in the system prompt (you are welcome to change the language). 
The history of the conversation is stored in the Dynamo DB table we created above, you can check out how the conversation is stored in DDB by opening the AWS console

In [31]:
#Set up parameters for the model
model = "anthropic.claude-3-sonnet-20240229-v1:0"
temperature = 0.1

In [32]:
# create the Chat model
from langchain_aws.chat_models import ChatBedrock

llm_chat = ChatBedrock(
    model_id=model, 
    model_kwargs={"temperature": temperature},
    region_name=llm_region
)

In [34]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import DynamoDBChatMessageHistory

prompt = ChatPromptTemplate.from_messages([
    ("system","You're an assistant who speaks in {language}. Translate the user input"),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{question}"),
])

chain = prompt | llm_chat

"""
DynamoDBChatMessageHistory internally creates a dynamodb resource in the default region from .aws/config, which is problematic 
if the DDB table you have created is in a different region. In theory you could use the parameter *endpoint_url* in the 
DynamoDBChatMessageHistory constructor, however if you try it you get an InvalidSignatureException; in stackoverflow it is 
suggested that you should only use endpoint_url for local services like localstack or non-standard endpoints. 
So if you want to force to host the DDB in a specific region, it seems the only workaround is to set the AWS_DEFAULT_REGION env variable
"""
os.environ["AWS_DEFAULT_REGION"] = db_region

chain_with_history = RunnableWithMessageHistory(
    chain,
    lambda session_id: DynamoDBChatMessageHistory(
        table_name=table_name, session_id=session_id
    ),
    input_messages_key="question",
    history_messages_key="history",
)

In [35]:
lang = "French"
session_id = "zzz"

response = chain_with_history.invoke(
                    {"language": lang, "question": "Hi my name is John"},
                    config={"configurable": {"session_id": session_id}}
                    )
pp.pprint(response)

AIMessage(content="Bonjour, je m'appelle John.", additional_kwargs={'usage': {'prompt_tokens': 26, 'completion_tokens': 15, 'total_tokens': 41}, 'stop_reason': 'end_turn', 'model_id': 'anthropic.claude-3-sonnet-20240229-v1:0'}, response_metadata={'usage': {'prompt_tokens': 26, 'completion_tokens': 15, 'total_tokens': 41}, 'stop_reason': 'end_turn', 'model_id': 'anthropic.claude-3-sonnet-20240229-v1:0'}, id='run-056ac0c0-911e-451b-993b-5d9c8bc7bc17-0', usage_metadata={'input_tokens': 26, 'output_tokens': 15, 'total_tokens': 41})


In [36]:
response = chain_with_history.invoke(
    {"language": lang, "question": "What is my name?"},
    config={"configurable": {"session_id": session_id}}
    )
pp.pprint(response)

AIMessage(content='Votre nom est John.', additional_kwargs={'usage': {'prompt_tokens': 49, 'completion_tokens': 10, 'total_tokens': 59}, 'stop_reason': 'end_turn', 'model_id': 'anthropic.claude-3-sonnet-20240229-v1:0'}, response_metadata={'usage': {'prompt_tokens': 49, 'completion_tokens': 10, 'total_tokens': 59}, 'stop_reason': 'end_turn', 'model_id': 'anthropic.claude-3-sonnet-20240229-v1:0'}, id='run-339b4edc-0434-4654-957b-a3a099edf5a2-0', usage_metadata={'input_tokens': 49, 'output_tokens': 10, 'total_tokens': 59})


Now lets take a look what is stored in the message history

In [37]:
h = chain_with_history.get_session_history(session_id=session_id)
h.messages

[HumanMessage(content='Hi my name is John'),
 AIMessage(content="Bonjour, je m'appelle John.", additional_kwargs={'stop_reason': 'end_turn', 'usage': {'total_tokens': Decimal('41'), 'completion_tokens': Decimal('15'), 'prompt_tokens': Decimal('26')}, 'model_id': 'anthropic.claude-3-sonnet-20240229-v1:0'}, response_metadata={'stop_reason': 'end_turn', 'usage': {'total_tokens': Decimal('41'), 'completion_tokens': Decimal('15'), 'prompt_tokens': Decimal('26')}, 'model_id': 'anthropic.claude-3-sonnet-20240229-v1:0'}, id='run-056ac0c0-911e-451b-993b-5d9c8bc7bc17-0', usage_metadata={'input_tokens': 26, 'output_tokens': 15, 'total_tokens': 41}),
 HumanMessage(content='What is my name?'),
 AIMessage(content='Votre nom est John.', additional_kwargs={'stop_reason': 'end_turn', 'usage': {'total_tokens': Decimal('59'), 'completion_tokens': Decimal('10'), 'prompt_tokens': Decimal('49')}, 'model_id': 'anthropic.claude-3-sonnet-20240229-v1:0'}, response_metadata={'stop_reason': 'end_turn', 'usage': {

In [38]:
pp.pprint(h.messages[0])
pp.pprint(h.messages[1])
pp.pprint(h.messages[2])
pp.pprint(h.messages[3])

HumanMessage(content='Hi my name is John')
AIMessage(content="Bonjour, je m'appelle John.", additional_kwargs={'stop_reason': 'end_turn', 'usage': {'total_tokens': Decimal('41'), 'completion_tokens': Decimal('15'), 'prompt_tokens': Decimal('26')}, 'model_id': 'anthropic.claude-3-sonnet-20240229-v1:0'}, response_metadata={'stop_reason': 'end_turn', 'usage': {'total_tokens': Decimal('41'), 'completion_tokens': Decimal('15'), 'prompt_tokens': Decimal('26')}, 'model_id': 'anthropic.claude-3-sonnet-20240229-v1:0'}, id='run-056ac0c0-911e-451b-993b-5d9c8bc7bc17-0', usage_metadata={'input_tokens': 26, 'output_tokens': 15, 'total_tokens': 41})
HumanMessage(content='What is my name?')
AIMessage(content='Votre nom est John.', additional_kwargs={'stop_reason': 'end_turn', 'usage': {'total_tokens': Decimal('59'), 'completion_tokens': Decimal('10'), 'prompt_tokens': Decimal('49')}, 'model_id': 'anthropic.claude-3-sonnet-20240229-v1:0'}, response_metadata={'stop_reason': 'end_turn', 'usage': {'total_

### Now lets add context using the RAG pattern

In a typical Q&A situation, when we want to add context with the RAG pattern, we use the user prompt to first run a similarity search to the vector db which stores our documents, then we "augment" the prompt with the chunks returned in the similarity search and finally pass the "new" prompt to the LLM to return its answer.

However, in a conversational setting, the user query might require conversational context to be understood. For example, consider this exchange:

> **Human:** "What is Graviton?"
> 
> **AI:** some answer ...

> **Human:** "How much better price-performance do they deliver?"

Clearly the user's question still relates to Graviton, so when we run the similarity search we need to take that into account, otherwise with the simple approach it is very likely we will not get the most relevant chunks back

The solution to this problem is to define a sub-chain that takes historical messages and the latest user question, and reformulates the question if it makes reference to any information in the historical information. This approach is depicted in the following diagram

![alt text](./images/conversational_retrieval_chain.png)

First lets load the 2022 Shareholder letter from Andy Jassy (pdf) and split it into chunks

In [40]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

file_name = "./rag_data/2022-Shareholder-Letter.pdf"
loader = PyPDFLoader(file_name)
pages = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(pages)
pp.pprint(splits[0].page_content) # print the first chunk of the document

('Dear shareholders:\n'
 'As I sit down to write my second annual shareholder letter as CEO, I find '
 'myself optimistic and energized\n'
 'by what lies ahead for Amazon. Despite 2022 being one of the harder '
 'macroeconomic years in recent memory,and with some of our own operating '
 'challenges to boot, we still found a way to grow demand (on top ofthe '
 'unprecedented growth we experienced in the first half of the pandemic). We '
 'innovated in our largestbusinesses to meaningfully improve customer '
 'experience short and long term. And, we made importantadjustments in our '
 'investment decisions and the way in which we’ll invent moving forward, while '
 'stillpreserving the long-term investments that we believe can change the '
 'future of Amazon for customers,\n'
 'shareholders, and employees.\n'
 'While there were an unusual number of simultaneous challenges this past '
 'year, the reality is that if you')


Now we will create an in-memory FAISS db to hold the chunks from the previous step, which we will use to retrieve the relevant context (RAG pattern) based on the user query

In [None]:
from langchain_community.vectorstores import FAISS
from langchain_aws import BedrockEmbeddings

bedrock_client = boto3.client(service_name='bedrock-runtime', region_name=llm_region)
model="amazon.titan-embed-text-v2:0"
bedrock_embeddings = BedrockEmbeddings(model_id=model, client=bedrock_client)

vectorstore = FAISS.from_documents(
    documents=splits,
    embedding = bedrock_embeddings
)
retriever = vectorstore.as_retriever()

At this point we will use the helper function *create_history_retriever()* which constructs a chain that accepts keys **input** and **chat_history** as input and has the same output schema as a receiver. Thsi will effectively implement the sub-chain for the "contectual" similarity search

In [None]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

contextual_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextual_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    llm_chat, retriever, prompt
)

Finally we will build our full QA chain, leveraging 2 helper functions:
1. *create_stuff_documents_chain()* specifies how retrieved context is fed into a prompt and LLM; it will include all retrieved context without any summarization or other processing and will generate an answer using the retrieved context and query.
2. *create_retrieval_chain()* adds the retrieval step and propagates the retrieved context through the chain, providing it alongside the final answer

In [None]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)


question_answer_chain = create_stuff_documents_chain(llm_chat, qa_prompt)
# note that create_stuff_documents_chain() returns an LCEL Runnable; the input is a dictionary that 
# must have a “context” key that maps to a List[Document], and any other input variables expected in the prompt

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

Now lets ask the chatbot some questions!

In [None]:
session_id = "abc"

conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    lambda session_id: DynamoDBChatMessageHistory(
        table_name=table_name, session_id=session_id
    ),
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

In [None]:
conversational_rag_chain.invoke(
    {"input": "What is Graviton?"},
    config={
        "configurable": {"session_id": "abc"}
    },  # constructs a key "abc123" in `store`.
)["answer"]

In [None]:
conversational_rag_chain.invoke(
    {"input": "How much better price-performance do they deliver?"},
    config={
        "configurable": {"session_id": "abc"}
    },  # constructs a key "abc123" in `store`.
)["answer"]