# Conversational Interface - Medical Clinic

> *This notebook should work well with the **`Data Science 3.0`** kernel in SageMaker Studio*

In this notebook, we will build a chatbot using the Foundation Models (FMs) in Amazon Bedrock. For our use-case we use Claude V3 Sonnet as our foundation models.  For more details refer to [Documentation](https://aws.amazon.com/bedrock/claude/). The ideal balance between intelligence and speed—particularly for enterprise workloads. It excels at complex reasoning, nuanced content creation, scientific queries, math, and coding. Data teams can use Sonnet for RAG, as well as search and retrieval across vast amounts of information while sales teams can leverage Sonnet for product recommendations, forecasting, and targeted marketing. 

## Overview

Conversational interfaces such as chatbots and virtual assistants can be used to enhance the user experience for your customers.Chatbots uses natural language processing (NLP) and machine learning algorithms to understand and respond to user queries. Chatbots can be used in a variety of applications, such as customer service, sales, and e-commerce, to provide quick and efficient responses to users. They can be accessed through various channels such as websites, social media platforms, and messaging apps.


## Chatbot using Amazon Bedrock

![Amazon Bedrock - Conversational Interface](./images/chatbot_bedrock.png)


## Use Cases

1. **Chatbot (Basic)** - Zero Shot chatbot with a FM model
2. **Chatbot using prompt** - template(Langchain) - Chatbot with some context provided in the prompt template
3. **Chatbot with persona** - Chatbot with defined roles. i.e. Career Coach and Human interactions
4. **Contextual-aware chatbot** - Passing in context through an external file by generating embeddings.

## Langchain framework for building Chatbot with Amazon Bedrock
In Conversational interfaces such as chatbots, it is highly important to remember previous interactions, both at a short term but also at a long term level.

LangChain provides memory components in two forms. First, LangChain provides helper utilities for managing and manipulating previous chat messages. These are designed to be modular and useful regardless of how they are used. Secondly, LangChain provides easy ways to incorporate these utilities into chains.
It allows us to easily define and interact with different types of abstractions, which make it easy to build powerful chatbots.

## Building Chatbot with Context - Key Elements

The first process in a building a contextual-aware chatbot is to **generate embeddings** for the context. Typically, you will have an ingestion process which will run through your embedding model and generate the embeddings which will be stored in a sort of a vector store. In this example we are using Titan Embeddings model for this

![Embeddings](./images/embeddings_lang.png)

Second process is the user request orchestration , interaction,  invoking and returing the results

![Chatbot](./images/chatbot_lang.png)

## Architecture [Context Aware Chatbot]
![4](./images/context-aware-chatbot.png)


## Setup

⚠️ ⚠️ ⚠️ Before running this notebook, ensure you've run the [Bedrock boto3 setup notebook](../00_Prerequisites/bedrock_basics.ipynb) notebook. ⚠️ ⚠️ ⚠️ Then run these installs below

**please note**

for we are tracking an annoying warning when using the RunnableWithMessageHistory [Runnable History Issue]('https://github.com/langchain-ai/langchain-aws/issues/150'). Please ignore the warning mesages for now


In [None]:
# %pip install -U langchain-community==0.2.12
# %pip install -U --no-cache-dir  \
#     "langchain>=0.2.12" \
#     sqlalchemy -U \
#     "faiss-cpu>=1.7,<2" \
#     "pypdf>=3.8,<4" \
#     pinecone-client>=5.0.1 \
#     tiktoken>=0.7.0 \
#     "ipywidgets>=7,<8" \
#     matplotlib>=3.9.0 \
#     anthropic>=0.32.0 \
#     "langchain-aws>=0.1.15"
# - boto3-1.34.162 botocore-1.34.162 langchain-0.2.14 langchain-aws-0.1.17 langchain-core-0.2.34 langchain-community-0.2.12
#%pip install -U --no-cache-dir transformers
#%pip install -U --no-cache-dir boto3


In [1]:
import warnings

from io import StringIO
import sys
import textwrap
import os
from typing import Optional

# External Dependencies:
import boto3
from botocore.config import Config

warnings.filterwarnings('ignore')

def print_ww(*args, width: int = 100, **kwargs):
    """Like print(), but wraps output to `width` characters (default 100)"""
    buffer = StringIO()
    try:
        _stdout = sys.stdout
        sys.stdout = buffer
        print(*args, **kwargs)
        output = buffer.getvalue()
    finally:
        sys.stdout = _stdout
    for line in output.splitlines():
        print("\n".join(textwrap.wrap(line, width=width)))
        



def get_bedrock_client(
    assumed_role: Optional[str] = None,
    region: Optional[str] = None,
    runtime: Optional[bool] = True,
):
    """Create a boto3 client for Amazon Bedrock, with optional configuration overrides

    Parameters
    ----------
    assumed_role :
        Optional ARN of an AWS IAM role to assume for calling the Bedrock service. If not
        specified, the current active credentials will be used.
    region :
        Optional name of the AWS Region in which the service should be called (e.g. "us-east-1").
        If not specified, AWS_REGION or AWS_DEFAULT_REGION environment variable will be used.
    runtime :
        Optional choice of getting different client to perform operations with the Amazon Bedrock service.
    """
    if region is None:
        target_region = os.environ.get("AWS_REGION", os.environ.get("AWS_DEFAULT_REGION"))
    else:
        target_region = region

    print(f"Create new client\n  Using region: {target_region}")
    session_kwargs = {"region_name": target_region}
    client_kwargs = {**session_kwargs}

    profile_name = os.environ.get("AWS_PROFILE")
    if profile_name:
        print(f"  Using profile: {profile_name}")
        session_kwargs["profile_name"] = profile_name

    retry_config = Config(
        region_name=target_region,
        retries={
            "max_attempts": 10,
            "mode": "standard",
        },
    )
    session = boto3.Session(**session_kwargs)

    if assumed_role:
        print(f"  Using role: {assumed_role}", end='')
        sts = session.client("sts")
        response = sts.assume_role(
            RoleArn=str(assumed_role),
            RoleSessionName="langchain-llm-1"
        )
        print(" ... successful!")
        client_kwargs["aws_access_key_id"] = response["Credentials"]["AccessKeyId"]
        client_kwargs["aws_secret_access_key"] = response["Credentials"]["SecretAccessKey"]
        client_kwargs["aws_session_token"] = response["Credentials"]["SessionToken"]

    if runtime:
        service_name='bedrock-runtime'
    else:
        service_name='bedrock'

    bedrock_client = session.client(
        service_name=service_name,
        config=retry_config,
        **client_kwargs
    )

    print("boto3 Bedrock client successfully created!")
    print(bedrock_client._endpoint)
    return bedrock_client

In [2]:
import json
import os
import sys

import boto3




# ---- ⚠️ Un-comment and edit the below lines as needed for your AWS setup ⚠️ ----

# os.environ["AWS_DEFAULT_REGION"] = "<REGION_NAME>"  # E.g. "us-east-1"
# os.environ["AWS_PROFILE"] = "<YOUR_PROFILE>"
# os.environ["BEDROCK_ASSUME_ROLE"] = "<YOUR_ROLE_ARN>"  # E.g. "arn:aws:..."


boto3_bedrock = get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region='us-west-2' #os.environ.get("AWS_DEFAULT_REGION", None)
)

Create new client
  Using region: us-west-2
boto3 Bedrock client successfully created!
bedrock-runtime(https://bedrock-runtime.us-west-2.amazonaws.com)


In [3]:
models_list = get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region='us-west-2', #os.environ.get("AWS_DEFAULT_REGION", None),
    runtime=False
).list_foundation_models()

#[models['modelId'] for models in models_list['modelSummaries']]

Create new client
  Using region: us-west-2
boto3 Bedrock client successfully created!
bedrock(https://bedrock.us-west-2.amazonaws.com)


In [4]:
# boto3.Session().client("s3").list_buckets()

## Chatbot (Basic - without context)

We use [CoversationChain](https://python.langchain.com/en/latest/modules/models/llms/integrations/bedrock.html?highlight=ConversationChain#using-in-a-conversation-chain) from LangChain to start the conversation. We also use the [ConversationBufferMemory](https://python.langchain.com/en/latest/modules/memory/types/buffer.html) for storing the messages. We can also get the history as a list of messages (this is very useful in a chat model).

Chatbots needs to remember the previous interactions. Conversational memory allows us to do that. There are several ways that we can implement conversational memory. In the context of LangChain, they are all built on top of the ConversationChain.

**Note:** The model outputs are non-deterministic

In [5]:
#modelId = "anthropic.claude-3-sonnet-20240229-v1:0" #"anthropic.claude-v2"
modelId = 'meta.llama3-8b-instruct-v1:0'

messages_list=[
    { 
        "role":'user', 
        "content":[{
            'text': "What is quantum mechanics? "
        }]
    },
    { 
        "role":'assistant', 
        "content":[{
            'text': "It is a branch of physics that describes how matter and energy interact with discrete energy values "
        }]
    },
    { 
        "role":'user', 
        "content":[{
            'text': "Can you explain a bit more about discrete energies?"
        }]
    }
]

    
response = boto3_bedrock.converse(
    messages=messages_list, 
    modelId='meta.llama3-8b-instruct-v1:0',
    inferenceConfig={
        "temperature": 0.5,
        "maxTokens": 100,
        "topP": 0.9
    }
)
response_body = response['output']['message']['content'][0]['text'] \
        + '\n--- Latency: ' + str(response['metrics']['latencyMs']) \
        + 'ms - Input tokens:' + str(response['usage']['inputTokens']) \
        + ' - Output tokens:' + str(response['usage']['outputTokens']) + ' ---\n'

print(response_body)


def invoke_meta_converse(prompt_str,boto3_bedrock ):
    modelId = "meta.llama3-8b-instruct-v1:0"
    messages_list=[{ 
        "role":'user', 
        "content":[{
            'text': prompt_str
        }]
    }]
  
    response = boto3_bedrock.converse(
        messages=messages_list, 
        modelId=modelId,
        inferenceConfig={
            "temperature": 0.5,
            "maxTokens": 100,
            "topP": 0.9
        }
    )
    response_body = response['output']['message']['content'][0]['text']
    return response_body


invoke_meta_converse("what is quantum mechanics", boto3_bedrock)   



In classical physics, energy is often thought of as being continuous, meaning it can take on any value within a certain range. For example, the energy of a rolling ball can be thought of as being anywhere from 0 to infinity, with any value in between being possible.

In contrast, quantum mechanics introduces the concept of discrete energy levels or states. This means that energy is not continuous, but rather comes in specific, distinct packets or quanta. These quanta are separated by gaps, and
--- Latency: 1358ms - Input tokens:58 - Output tokens:100 ---



'\n\nQuantum mechanics is a fundamental theory in physics that describes the behavior of matter and energy at the smallest scales, such as atoms and subatomic particles. It provides a new and different framework for understanding physical phenomena, and it has been incredibly successful in explaining a wide range of experimental results.\n\nThe core idea of quantum mechanics is that, at the atomic and subatomic level, particles do not have definite positions, velocities, or properties until they are measured. Instead, they exist in a state of super'

#### Introduction to ChatBedrock

**Supports the following**
1. Multiple Models from Bedrock 
2. Converse API
3. Ability to do tool binding
4. Ability to plug with LangGraph flows

### Ask the question Meta Llama models

**please make sure you have the models enabled**

In [6]:
from langchain_aws.chat_models.bedrock import ChatBedrock
from langchain_core.messages import HumanMessage
from langchain_core.messages import HumanMessage, SystemMessage

model_parameter = {"temperature": 0.0, "top_p": .5, "max_tokens_to_sample": 200}
modelId = "meta.llama3-8b-instruct-v1:0"
bedrock_llm = ChatBedrock(
    model_id=modelId,
    client=boto3_bedrock,
    model_kwargs=model_parameter, 
    beta_use_converse_api=True
)

messages = [
    HumanMessage(
        content="what is the weather like in Seattle WA"
    )
]
bedrock_llm.invoke(messages)


AIMessage(content="\n\nSeattle, Washington is known for its mild and wet climate, with significant rainfall throughout the year. Here's a breakdown of the typical weather patterns in Seattle:\n\n1. Rainfall: Seattle is famous for its rain, with an average annual rainfall of around 37 inches (94 cm). The rainiest months are November to March, with an average of 15-20 rainy days per month.\n2. Temperature: Seattle's average temperature ranges from 35°F (2°C) in January (the coldest month) to 77°F (25°C) in July (the warmest month). The average temperature is around 50°F (10°C) throughout the year.\n3. Sunshine: Seattle gets an average of 154 sunny days per year, with the sunniest months being July and August. However, the sun can be obscured by clouds and fog, reducing the amount of direct sunlight.\n4. Fog: Seattle is known for its fog, especially during the winter months. The city can experience fog for several days at a time, especially in the mornings.\n5. Wind: Seattle is known for 

#### Due to the converse api flag -- this class corectly formulates the messages correctly

so we can directly use the string mesages

In [7]:
bedrock_llm.invoke("what is the weather like in Seattle WA?")

AIMessage(content="\n\nSeattle, Washington is known for its mild and wet climate, with significant rainfall throughout the year. Here's a breakdown of the typical weather patterns in Seattle:\n\n1. Rainfall: Seattle is famous for its rain, with an average annual rainfall of around 37 inches (94 cm). The rainiest months are November to March, with an average of 15-20 rainy days per month.\n2. Temperature: Seattle's average temperature ranges from 35°F (2°C) in January (the coldest month) to 77°F (25°C) in July (the warmest month). The average temperature is around 50°F (10°C) throughout the year.\n3. Sunshine: Seattle gets an average of 154 sunny days per year, with the sunniest months being July and August. However, the sun can be obscured by clouds and fog, reducing the amount of direct sunlight.\n4. Fog: Seattle is known for its fog, especially during the winter months. The city can experience fog for several days at a time, especially in the mornings.\n5. Wind: Seattle is known for 

#### Ask a follow on

because we have not plugged in any History or context or api's the model wil not be able to answer the question

In [8]:
bedrock_llm.invoke("is it warm in summers?")

AIMessage(content='\n\nThe warmth of summers depends on the location and climate. In general, summer is the warmest season in many parts of the world, especially near the equator.\n\nIn tropical regions, such as near the equator, summers are often extremely hot and humid. Temperatures can soar above 90°F (32°C) and even reach as high as 100°F (38°C) or more in some areas.\n\nIn temperate regions, such as in the Northern Hemisphere, summers are usually warm but not as hot as in tropical regions. Temperatures can range from the mid-70s to the mid-80s Fahrenheit (23-30°C).\n\nIn some regions, such as in the Southern Hemisphere, summers can be quite mild, especially in areas with a Mediterranean climate. Temperatures may range from the mid-60s to the mid-70s Fahrenheit (18-24°C).\n\nSome examples of warm summer temperatures in different parts of the world include:\n\n* In the United States, temperatures in the summer can range from 80°F (27°C) in the Northeast to 100°F (38°C) in the Southw

In [9]:
from langchain_aws.chat_models.bedrock import ChatBedrock
from langchain_core.messages import HumanMessage
from langchain_core.messages import HumanMessage, SystemMessage

model_parameter = {"temperature": 0.0, "top_p": .5, "max_tokens_to_sample": 2000}
modelId = "meta.llama3-8b-instruct-v1:0"
bedrock_llm = ChatBedrock(
    model_id=modelId,
    client=boto3_bedrock,
    model_kwargs=model_parameter, 
    beta_use_converse_api=True
)

messages = [
    HumanMessage(
        content="what is the weather like in Seattle WA"
    )
]
bedrock_llm.invoke(messages)


AIMessage(content="\n\nSeattle, Washington is known for its mild and wet climate, with significant rainfall throughout the year. Here's a breakdown of the typical weather patterns in Seattle:\n\n1. Rainfall: Seattle is famous for its rain, with an average annual rainfall of around 37 inches (94 cm). The rainiest months are November to March, with an average of 15-20 rainy days per month.\n2. Temperature: Seattle's average temperature ranges from 35°F (2°C) in January (the coldest month) to 77°F (25°C) in July (the warmest month). The average temperature is around 50°F (10°C) throughout the year.\n3. Sunshine: Seattle gets an average of 154 sunny days per year, with the sunniest months being July and August. However, the sun can be obscured by clouds and fog, reducing the amount of direct sunlight.\n4. Fog: Seattle is known for its fog, especially during the winter months. The city can experience fog for several days at a time, especially in the mornings.\n5. Wind: Seattle is known for 

### Adding prompt templates 

1. You can define prompts as a list of messages, all modesl expect SystemMessage, and then alternate with HumanMessage and AIMessage
2. This means Context needs to be part of the System message 
3. Further the CHAT HISTORY needs to be right after the system message as a MessagePlaceholder which is a list of alternating [Human/AI]
4. The Variables defined in the chat template need to be send into the chain as dict with the keys being the variable names
5. You can define the template as a tuple with ("system", "message") or can be using the class SystemMessage 
6. Invoke creates a final resulting object of type <class 'langchain_core.prompt_values.ChatPromptValue'> with the variables substituted with their values 

In [10]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_history_aware_retriever
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory

from langchain_core.messages import HumanMessage, SystemMessage, AIMessage

chat_history_messages = [
        HumanMessage("What is the weather like in Seattle WA?"), # - normal string converts it to a Human message always but we need ai/human pairs
        AIMessage("Ahoy matey! As a pirate, I don't spend much time on land, but I've heard tales of the weather in Seattle.")
]

prompt = ChatPromptTemplate.from_messages( # can create either as System Message Object or as TUPLE -- system, message
    [
        ("system", "You are a pirate. Answer the following questions as best you can."),
        ("placeholder", "{chat_history}"), # this assumes the messages are in list of messages format and this becomes MessagePlaceholder object
        ("human", "{input}"),
    ]
)
#- variable chat_history should be a list of base messages, got test_chat_history of type <class 'str'>
#- this gets converted as a LIST of messages -- with each of the TUPLE or Object being executed with the variables when invoked
print_ww(prompt.invoke({"input":"test_input", "chat_history": chat_history_messages}))

# -- condense question prompt with CONTEXT
condense_question_system_template = (
    """
    You are an assistant for question-answering tasks. ONLY Use the following pieces of retrieved context to answer the question.
    If the answer is not in the context below , just say you do not have enough context. 
    If you don't know the answer, just say that you don't know. 
    Use three sentences maximum and keep the answer concise.
    Context: {context} 
    """
)

condense_question_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", condense_question_system_template),
        ("human", "{input}"),
    ]
)
#- missing variables {'context'}. chat history will get ignored - variables are passed in as keys in the dict
print("\n")
print_ww(condense_question_prompt.invoke({"input":"test_input", "chat_history": chat_history_messages, "context": "this is a test context"}))

# - Chat prompt template with Place holders
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
    
)

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("placeholder", "{contex}"),
        MessagesPlaceholder("chat_history"),
        ("human", "Explain this  {input}."),
    ]
)

print("\n")
print_ww(qa_prompt.invoke({"input":"test_input", "chat_history": chat_history_messages, "context": "this is a test context"}))

print("\n")
print(type(qa_prompt.invoke({"input":"test_input", "chat_history": chat_history_messages, "context": "this is a test context"})))

messages=[SystemMessage(content='You are a pirate. Answer the following questions as best you
can.'), HumanMessage(content='What is the weather like in Seattle WA?'), AIMessage(content="Ahoy
matey! As a pirate, I don't spend much time on land, but I've heard tales of the weather in
Seattle."), HumanMessage(content='test_input')]


messages=[SystemMessage(content="\n    You are an assistant for question-answering tasks. ONLY Use
the following pieces of retrieved context to answer the question.\n    If the answer is not in the
context below , just say you do not have enough context. \n    If you don't know the answer, just
say that you don't know. \n    Use three sentences maximum and keep the answer concise.\n
Context: this is a test context \n    "), HumanMessage(content='test_input')]


messages=[SystemMessage(content="You are an assistant for question-answering tasks. Use the
following pieces of retrieved context to answer the question. If you don't know the answer, say that
you don'

In [11]:
ChatPromptTemplate.from_messages(
    [
        ("system", "You are a pirate. Answer the following questions as best you can."),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
    ]
).invoke({'input': 'test_input', 'chat_history' : chat_history_messages})


ChatPromptValue(messages=[SystemMessage(content='You are a pirate. Answer the following questions as best you can.'), HumanMessage(content='What is the weather like in Seattle WA?'), AIMessage(content="Ahoy matey! As a pirate, I don't spend much time on land, but I've heard tales of the weather in Seattle."), HumanMessage(content='test_input')])

#### Simple Conversation chain 

**Uses the In memory Chat Message History**

The above example uses the same history for all sessions. The example below shows how to use a different chat history for each session.

**Note**
1. `Chat History` is a variable is a place holder in the prompt template. which will have Human/Ai alternative messages
2. Human query is the final question as `Input` variable
3. config is the `{"configurable": {'session_id_variable':'value,....other keys}` These are passed into the any and all Runnable and wrappers of runnable
4. `RunnableWithMessageHistory` is the class which we wrap the `chain` in to run with history. which is in [Docs link]('https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html#')
5. For production use cases, you will want to use a persistent implementation of chat message history, such as `RedisChatMessageHistory`.
6. This class needs a DICT as a input
7. chain has .input_schema.schema to get the json of how to pass in the input

8. Configuration gets passed in as invoke({dict}, config={"configurable": {"session_id": "abc123"}}) and it gets converted to `RunnableConfig` which is passed into every invoke method. To access this we need to extend the Runnable class and access it
9. The chain usually processes the inputs as a dict object


Wrap the rag_chain with RunnableWithMessageHistory to automatically handle chat history:

Any Chain wrapped with RunnableWithMessageHistory - will manage chat history variables appropriately, however the ChatTemplate should have the Placeholder for history

### Implement the same manually by configuring the chain with the chat history being Added and invoked automatically

if we configue the chain manually not necessary all variables have to be invluded in the inputs. If those are being used or accessed then it will provide those

1. For runnable we can either extend the runnable class
2. Or we can define a method and create a runnable lambda

In [13]:
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_aws.chat_models.bedrock import ChatBedrock
from langchain_core.runnables.config import RunnableConfig

from langchain_core.runnables import Runnable
from langchain_core.runnables import RunnableLambda, RunnablePassthrough

prompt_with_history = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a pirate. Answer the following questions as best you can."),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
    ]
)

history = InMemoryChatMessageHistory()

def get_history():
    return history


model_parameter = {"temperature": 0.0, "top_p": .5, "max_tokens_to_sample": 2000}
modelId = "meta.llama3-8b-instruct-v1:0" #"anthropic.claude-v2"
chatbedrock_llm = ChatBedrock(
    model_id=modelId,
    client=boto3_bedrock,
    model_kwargs=model_parameter, 
    beta_use_converse_api=True
)

# - add the history to the in-memory chat history
class ChatHistoryAdd(Runnable):
    def __init__(self, chat_history):
        self.chat_history = chat_history

    def invoke(self, input: str, config: RunnableConfig = None) -> str:
        try:
            #print_ww(f"ChatHistoryAdd::config={config}::history_object={self.chat_history}::input={input}::")
            
            self.chat_history.add_ai_message(input.content)
            return input
        except Exception as e:
            return f"Error processing input: {str(e)}"

# Usage
chat_add = ChatHistoryAdd(get_history())

#- second way to create a callback runnable function--
def ChatUserInputAdd(input_dict: dict, config: RunnableConfig) -> dict:
    #print_ww(f"ChatUserAdd::input_dict:{input_dict}::config={config}") #- if we do dict at start of chain -- {'input': {'input': 'what is the weather like in Seattle WA?', 'chat_history':
    get_history().add_user_message(input_dict['input']) 
    return input_dict # return the text as is

chat_user_add = RunnableLambda(ChatUserInputAdd)


history_chain = (
    #- Expected a Runnable, callable or dict. If we use a dict here make sure every element is a runnable. And further access is via 'input'.'input'
    # { # make sure all variable in the prompt template are in this dict
    #     "input": RunnablePassthrough(),
    #     "chat_history": get_history().messages
    # }
    RunnablePassthrough() # passes in the full dict as is -- since we have the variables defined in the INVOKE call itself
    | chat_user_add
    | prompt_with_history
    | chatbedrock_llm
    | chat_add
    | StrOutputParser()
)


print_ww(history_chain.invoke( # here the variable matches the chat prompt template
    {"input": "what is the weather like in Seattle WA?", "chat_history": get_history().messages}, 
    config={"configurable": {"session_id": "abc123"}})
)

print(f"\n\n chat_history after invocation is -- >{get_history()}")

#- ask a follow on question
print_ww(history_chain.invoke(
    {"input": "How is it in winters?", "chat_history": get_history().messages}, 
    config={"configurable": {"session_id": "abc123"}})
)




Arrr, shiver me timbers! As a pirate, I've had me share o' sailin' the seven seas, but I've never
set foot in Seattle, Washington. But I've heard tales o' the Emerald City's weather from me mateys
who've sailed those waters.

From what I've gathered, Seattle's weather be as unpredictable as a barnacle on a ship's hull. It's
known for bein' rainy and gray, with overcast skies most o' the time. The city gets a fair amount o'
precipitation, with an average o' 226 days o' rain per year! That be a lot o' wet weather, matey!

But don't ye worry, there be some sunshine to be had, too. The summer months o' June, July, and
August be the driest, with an average o' 15-20 days o' sunshine. And in the winter, the days be
shorter, but the sun still shines bright, even if it be through the clouds.

So, if ye be plannin' a trip to Seattle, be prepared for some rain, but don't let it dampen yer
spirits, matey! Just grab yer trusty umbrella and a good pair o' boots, and ye'll be ready to take
on the E

### Alternate way of invoking 

1. Here  only use input is sent in as a string
2. The chain tales care of the History of chats addition to the whole prompt
3. We create a new Chain -- `but we are re-using the same History Object` and hence it has the previous conversations

In [14]:
#- second way to create a callback runnable function--
def get_chat_history(input_dict: dict, config: RunnableConfig) -> dict:
    print(f"get_chat_history::input_dict:{input_dict}::config={config}") #- if we do dict at start of chain -- {'input': {'input': 'what is the weather like in Seattle WA?', 'chat_history':
    return get_history().messages # return the text as is

chat_history_get = RunnableLambda(get_chat_history)

history_chain = (
    #- Expected a Runnable, callable or dict. If we use a dict here make sure every element is a runnable. And further access is via 'input'.'input'
    { # make sure all variable in the prompt template are in this dict
        "input": RunnablePassthrough(),
        "chat_history": chat_history_get
    }
    | chat_user_add
    | prompt_with_history
    | chatbedrock_llm
    | chat_add
    | StrOutputParser()
)


history_chain.invoke( # here the variable matches the chat prompt template
    "what is it like in autumn?", 
    config={"configurable": {"session_id": "abc123"}}
)


get_chat_history::input_dict:what is it like in autumn?::config={'tags': [], 'metadata': {'session_id': 'abc123'}, 'callbacks': <langchain_core.callbacks.manager.CallbackManager object at 0x11a232650>, 'recursion_limit': 25, 'configurable': {'session_id': 'abc123'}}


"\n\nAutumn in Seattle, matey! It be a grand time o' year, indeed! The Pacific Northwest's autumn season be a time o' transition, when the summer's warmth gives way to the winter's chill. And Seattle, being the Emerald City, be a sight to behold during this time o' year.\n\nIn autumn, the days be gettin' shorter, with the sun risin' later and setin' earlier. But the weather be mild, with average highs in the mid-50s to low 60s Fahrenheit (13-18°C). It be a perfect time to get out and about, takin' in the sights and sounds o' the city.\n\nThe rain, which be a constant companion in Seattle, starts to pick up in October and November, but it be a gentle, misty rain that adds to the autumnal atmosphere. And the leaves, oh the leaves! The trees in Seattle's parks and gardens be ablaze with color, turnin' shades o' gold, orange, and red. It be a pirate's delight, matey!\n\nBut autumn in Seattle be more than just the weather and the scenery. It be a time o' harvest and celebration, with the ci

#### Now use the In-built helper methods to continue 

1. We can see that the auto chain will add user and also the AI messages automatically at appropriate places
2. Key needs to be the same as what we have in the prompt template

In [15]:
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_aws.chat_models.bedrock import ChatBedrock

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a pirate. Answer the following questions as best you can."),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
    ]
)

history = InMemoryChatMessageHistory()

def get_history():
    return history


model_parameter = {"temperature": 0.0, "top_p": .5, "max_tokens_to_sample": 2000}
modelId = "meta.llama3-8b-instruct-v1:0" #"anthropic.claude-v2"
chatbedrock_llm = ChatBedrock(
    model_id=modelId,
    client=boto3_bedrock,
    model_kwargs=model_parameter, 
    beta_use_converse_api=True
)

chain = prompt | chatbedrock_llm | StrOutputParser()

wrapped_chain = RunnableWithMessageHistory(
    chain,
    get_history,
    history_messages_key="chat_history",
)

print_ww(wrapped_chain.invoke({"input": "what is the weather like in Seattle WA?"}))


print_ww(f"\nINPUT_SCHEMA::{wrapped_chain.input_schema.schema()}")
print_ww(f"\nCHAIN:SCHEMA::{wrapped_chain.schema()}")
print_ww(f"\nOUPUT_SCHEMA::{wrapped_chain.output_schema()}")


print("\n\n Now we run The example below shows how to use a different chat history for each session.")




Arrr, shiver me timbers! Seattle, ye say? Well, matey, I've had me share o' adventures on the high
seas, but I've never set foot in that damp and drizzly place. But I've heard tell from me mateys
who've sailed those waters that Seattle's weather be as unpredictable as a barnacle on a ship's
hull!

From what I've gathered, Seattle's got a reputation for bein' a soggy place, with rain comin' down
like a stormy sea on most days o' the year. The clouds be gray and thick, like a pirate's beard
after a long voyage at sea. And don't even get me started on the wind, matey! It be as fierce as a
sea monster, blowin' in from the Pacific and makin' ye want to tie yerself to the mast!

But, I've also heard that when the sun does come out, it be as bright as a chest overflowin' with
gold doubloons! So, if ye be lookin' for a bit o' sunshine, ye might want to keep yer eye on the
forecast, matey!

So, there ye have it, me take on the weather in Seattle, WA. Now, if ye'll excuse me, I've got to
get b

In [16]:
print(history)
# history.add_ai_message
# history.add_user_message

Human: what is the weather like in Seattle WA?
AI: 

Arrr, shiver me timbers! Seattle, ye say? Well, matey, I've had me share o' adventures on the high seas, but I've never set foot in that damp and drizzly place. But I've heard tell from me mateys who've sailed those waters that Seattle's weather be as unpredictable as a barnacle on a ship's hull!

From what I've gathered, Seattle's got a reputation for bein' a soggy place, with rain comin' down like a stormy sea on most days o' the year. The clouds be gray and thick, like a pirate's beard after a long voyage at sea. And don't even get me started on the wind, matey! It be as fierce as a sea monster, blowin' in from the Pacific and makin' ye want to tie yerself to the mast!

But, I've also heard that when the sun does come out, it be as bright as a chest overflowin' with gold doubloons! So, if ye be lookin' for a bit o' sunshine, ye might want to keep yer eye on the forecast, matey!

So, there ye have it, me take on the weather in Seat

#### Use the multiple session id's with in memory conversations

In [17]:
### This below LEVARAGES the In-memory with multiple sessions and session id
store = {}
def get_session_history(session_id: str) -> BaseChatMessageHistory:
    #print(session_id)
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

chain = prompt | chatbedrock_llm | StrOutputParser()

wrapped_chain = RunnableWithMessageHistory(
    chain,
    get_session_history,
    history_messages_key="chat_history",
)

print_ww(wrapped_chain.invoke(
    {"input": "what is the weather like in Seattle WA"},
    config={"configurable": {"session_id": "abc123"}},
))

print("\n\n now ask another question and we will see the History conversation was maintained")
print_ww(wrapped_chain.invoke(
    {"input": "Ok what are benefits of this weather in 100 words?"},
    config={"configurable": {"session_id": "abc123"}},
))

print("\n\n now check the history")
print(history)



Arrr, shiver me timbers! As a pirate, I be more familiar with the high seas than the landlubbers'
weather forecasts. But, I've heard tell of Seattle, Washington bein' a damp and drizzly place,
especially in the winter months. They call it the "Emerald City" due to its lush greenery, but I
reckon it's more like the "Grey City" with all the overcast skies!

In the summer, the weather be mild and pleasant, with temperatures rangein' from 65 to 85 degrees
Fahrenheit (18 to 30 degrees Celsius). But don't ye be thinkin' it's all sunshine and rainbows,
matey! The Pacific Northwest be known for its rain, and Seattle gets its fair share o'
precipitation, even in the summer. So, pack yer waterproof gear and a good sense o' humor!

In the winter, it be a different story altogether. The temperatures drop, and the rain turns to snow
and ice. It be a good idea to keep yer wits about ye and yer sea legs steady, or ye might find
yerself walkin' the plank into a puddle o' slush!

So, there ye have it

#### Now we do a Conversation Chat Chain with History and add a Retriever to that convo


[Docs links]('https://python.langchain.com/v0.2/docs/versions/migrating_chains/conversation_retrieval_chain/')

**Chat History needs to be a list since this is message api so alternate with human and user**

1. The ConversationalRetrievalChain was an all-in one way that combined retrieval-augmented generation with chat history, allowing you to "chat with" your documents.

2. Advantages of switching to the LCEL implementation are similar to the RetrievalQA section above:

3. Clearer internals. The ConversationalRetrievalChain chain hides an entire question rephrasing step which dereferences the initial query against the chat history.
4. This means the class contains two sets of configurable prompts, LLMs, etc.
5. More easily return source documents.
6. Support for runnable methods like streaming and async operations.

**Below are the key classes to be used**

1. We create a QA Chain using the qa_chain as `create_stuff_documents_chain(chatbedrock_llm, qa_prompt)`
2. Then we create the Retrieval History chain using the `create_retrieval_chain(history_aware_retriever, qa_chain)`
3. Retriever is wrapped in as `create_history_aware_retriever`
4. `{context}` goes as System prompts which goes into the Prompt templates
5. `Chat History` goes in the Prompt templates like "placeholder", "{chat_history}")

The LCEL implementation exposes the internals of what's happening around retrieving, formatting documents, and passing them through a prompt to the LLM, but it is more verbose. You can customize and wrap this composition logic in a helper function, or use the higher-level `create_retrieval_chain` and `create_stuff_documents_chain` helper method:

#### FAISS as VectorStore

In order to be able to use embeddings for search, we need a store that can efficiently perform vector similarity searches. In this notebook we use FAISS, which is an in memory store. For permanently store vectors, one can use pgVector, Pinecone or Chroma.

The langchain VectorStore API's are available [here](https://python.langchain.com/en/harrison-docs-refactor-3-24/reference/modules/vectorstore.html)

To know more about the FAISS vector store please refer to this [document](https://arxiv.org/pdf/1702.08734.pdf).

#### Titan embeddings Model

Embeddings are a way to represent words, phrases or any other discrete items as vectors in a continuous vector space. This allows machine learning models to perform mathematical operations on these representations and capture semantic relationships between them.

Embeddings are for example used for the RAG [document search capability](https://labelbox.com/blog/how-vector-similarity-search-works/) 


In [18]:
from langchain.document_loaders import CSVLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain.vectorstores import FAISS

from langchain.embeddings import BedrockEmbeddings

br_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", client=boto3_bedrock)

s3_path = "s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv"
!aws s3 cp $s3_path ./rag_data/Amazon_SageMaker_FAQs.csv

loader = CSVLoader("./rag_data/Amazon_SageMaker_FAQs.csv") # --- > 219 docs with 400 chars, each row consists in a question column and an answer column
documents_aws = loader.load() #
print(f"Number of documents={len(documents_aws)}")

docs = CharacterTextSplitter(chunk_size=2000, chunk_overlap=400, separator=",").split_documents(documents_aws)

print(f"Number of documents after split and chunking={len(docs)}")
vectorstore_faiss_aws = None

    
vectorstore_faiss_aws = FAISS.from_documents(
    documents=docs,
     embedding = br_embeddings
)

print(f"vectorstore_faiss_aws: number of elements in the index={vectorstore_faiss_aws.index.ntotal}::")



  warn_deprecated(


download: s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv to rag_data/Amazon_SageMaker_FAQs.csv
Number of documents=153
Number of documents after split and chunking=154
vectorstore_faiss_aws: number of elements in the index=154::


#### First we do the simple Retrieval QA chain -- No chat history but with retriver
[Docs link]('https://python.langchain.com/v0.2/docs/versions/migrating_chains/retrieval_qa/')

Key points
1. The chain in QA uses the variable as the first value, can be input or question  and so the prompt template for the Human query has to have the `Question` or `input` as the variable
2. This chain will re formulate the question, call the retriver and then answer the question
3. Our prompt template removes any answer where retriver is not needed and so no answer is obtained
4. Context goes into the system prompts section

In [19]:
ChatPromptTemplate.from_messages(
    [
        ("system", "You are a pirate. Answer the following questions as best you can."),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
    ]
).invoke({'input': 'test_input', 'chat_history' : chat_history_messages})

ChatPromptValue(messages=[SystemMessage(content='You are a pirate. Answer the following questions as best you can.'), HumanMessage(content='What is the weather like in Seattle WA?'), AIMessage(content="Ahoy matey! As a pirate, I don't spend much time on land, but I've heard tales of the weather in Seattle."), HumanMessage(content='test_input')])

In [20]:
vectorstore_faiss_aws.as_retriever()

VectorStoreRetriever(tags=['FAISS', 'BedrockEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x119a1ddd0>)

### The retriever invoke is called with the user input 

1. That will fetch the context and then add that as a string to the inputs 
2. The chain will use that as `context` based on the variable in the chain so we have the correct context
3. This same process could have been done with the memory as well if we wanted to send a string as input

The input is a string because we convert it to a dict as the very first step on the chain

In [21]:
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.runnables.config import RunnableConfig

from langchain_core.runnables import Runnable
from langchain_core.runnables import RunnableLambda, RunnablePassthrough

condense_question_system_template = (
    """
    You are an assistant for question-answering tasks. ONLY Use the following pieces of retrieved context to answer the question.
    If the answer is not in the context below , just say you do not have enough context. 
    If you don't know the answer, just say that you don't know. 
    Use three sentences maximum and keep the answer concise.
    Context: {context} 
    """
)

condense_question_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", condense_question_system_template),
        ("human", "{input}"), # expected by the qa chain as it sends in question as the variable
    ]
)

model_parameter = {"temperature": 0.0, "top_p": .5, "max_tokens_to_sample": 2000}
modelId = "meta.llama3-8b-instruct-v1:0" #"anthropic.claude-v2"
chatbedrock_llm = ChatBedrock(
    model_id=modelId,
    client=boto3_bedrock,
    model_kwargs=model_parameter, 
    beta_use_converse_api=True
)


def format_docs(docs):
    #print(docs)
    return "\n\n".join(doc.page_content for doc in docs)

#- second way to create a callback runnable function--
def debug_inputs(input_dict: dict, config: RunnableConfig) -> dict:
    #print_ww(f"debug_inputs::input_dict:{type(input_dict)}::value::{input_dict}::config={config}") #- if we do dict at start of chain -- {'input': {'input': 'what is the weather like in Seattle WA?', 'chat_history':
    return input_dict # return the text as is

chat_user_debug = RunnableLambda(debug_inputs)

# The chain 
qa_chain = (
    {
        "context": vectorstore_faiss_aws.as_retriever() | format_docs, # can work even without the format
        "input": RunnablePassthrough(),
    }
    | chat_user_debug
    | condense_question_prompt
    | chatbedrock_llm
    | StrOutputParser()
)

print_ww(qa_chain.invoke(input="What are autonomous agents?")) # cannot be a dict object here because we create the dict from string as first step

print_ww(qa_chain.invoke(input="What is SageMaker used for?")) # cannot be a dict object here)



I don't have enough context to answer this question. The provided context only mentions autonomous
vehicles and HVAC as examples of applications where reinforcement learning can be used, but it does
not define what autonomous agents are.


Amazon SageMaker is a fully managed service to prepare data and build, train, and deploy machine
learning (ML) models for any use case with fully managed infrastructure, tools, and workflows.


#### Alternate way of creating the Chain with retriever and ask a valid question - No History of chat 

1. Now we get a real answer as we invoke where retriever gives context

2. Use the Helper method to create the Retiever QA Chain

In [24]:
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

condense_question_system_template = (
    """
    You are an assistant for question-answering tasks. ONLY Use the following pieces of retrieved context to answer the question.
    If the answer is not in the context below , just say you do not have enough context. 
    If you don't know the answer, just say that you don't know. 
    Use three sentences maximum and keep the answer concise.
    Context: {context} 
    """
)

condense_question_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", condense_question_system_template),
        ("human", "{input}"),
    ]
)
model_parameter = {"temperature": 0.0, "top_p": .5, "max_tokens_to_sample": 2000}
modelId = "meta.llama3-8b-instruct-v1:0" #"anthropic.claude-v2"
chatbedrock_llm = ChatBedrock(
    model_id=modelId,
    client=boto3_bedrock,
    model_kwargs=model_parameter, 
    beta_use_converse_api=True
)
qa_chain = create_stuff_documents_chain(chatbedrock_llm, condense_question_prompt)

convo_qa_chain = create_retrieval_chain(vectorstore_faiss_aws.as_retriever(), qa_chain)

# - view the keys

print_ww(convo_qa_chain.invoke(
    {'input':"What are the options for model explainability in SageMaker?", 
      'config':{"configurable": {"session_id": "abc123"}},
    }).keys()) # cannot be a dict object here)

# view the actual output
print("\n return values\n")
print_ww(convo_qa_chain.invoke(
    {'input':"What are the options for model explainability in SageMaker?", 
      'config':{"configurable": {"session_id": "abc123"}}, # this param is not used in this chain
    })) # cannot be a dict object here)



dict_keys(['input', 'config', 'context', 'answer'])

 return values

{'input': 'What are the options for model explainability in SageMaker?', 'config': {'configurable':
{'session_id': 'abc123'}}, 'context': [Document(metadata={'source':
'./rag_data/Amazon_SageMaker_FAQs.csv', 'row': 11}, page_content='\ufeffWhat is Amazon SageMaker?:
How does Amazon SageMaker Clarify improve model explainability?\nAmazon SageMaker is a fully managed
service to prepare data and build, train, and deploy machine learning (ML) models for any use case
with fully managed infrastructure, tools, and workflows.: Amazon SageMaker Clarify is integrated
with Amazon SageMaker Experiments to provide a feature importance graph detailing the importance of
each input for your model’s overall decision-making process after the model has been trained. These
details can help determine if a particular model input has more influence than it should on overall
model behavior. SageMaker Clarify also makes explanations for indiv

#### View the Chain

In [25]:
convo_qa_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['FAISS', 'BedrockEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x119a1ddd0>), config={'run_name': 'retrieve_documents'})
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), config={'run_name': 'format_inputs'})
            | ChatPromptTemplate(input_variables=['context', 'input'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], template="\n    You are an assistant for question-answering tasks. ONLY Use the following pieces of retrieved context to answer the question.\n    If the answer is not in the context below , just say you do not have enough context. \n    If you don't know the answer, just say that you don't know. \n    Use three

#### Now we create Chat Conversation which has history and retrieval context - First just history chain and  with advanced option of re writing the context and query
So we use the HISTORY AWARE Retriever and create a chain

1. We create a stuff chain
2. Then we pass it to the create retrieval chain method -- we could have used the LCEL as well to create the chain
3. If we need advanced history calling with advanced options of first check if the question has been answered before using an LLM call then use `create_history_aware_retriever`

**However to create the actual history we need to wrap with RunnableWithHistory**

In [26]:
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.runnables.config import RunnableConfig

from langchain_core.runnables import Runnable
from langchain_core.runnables import RunnableLambda, RunnablePassthrough


### This below LEVARAGES the In-memory with multiple sessions and session id
store = {}
def get_session_history(session_id: str) -> BaseChatMessageHistory:
    #print(session_id)
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

contextualized_question_system_template = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualized_question_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualized_question_system_template),
        ("human", "{input}"),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    chatbedrock_llm, vectorstore_faiss_aws.as_retriever(), contextualized_question_prompt
)

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
    
)

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("placeholder", "{chat_history}"),
        ("human", "Explain this  {input}."),
    ]
)

qa_chain = create_stuff_documents_chain(chatbedrock_llm, qa_prompt)

convo_qa_chain = create_retrieval_chain(
    history_aware_retriever, 
    #vectorstore_faiss_aws.as_retriever(),
    qa_chain
)

print_ww(f"\n{convo_qa_chain}::\n")

convo_qa_chain.invoke(
    {
        "input": "What are the options for model explainability in SageMaker?",
        "chat_history": [],
    }
)


bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableBranch(branches=[(RunnableLambda(lambda x: not
x.get('chat_history', False)), RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['FAISS', 'BedrockEmbeddings'],
vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x119a1ddd0>))],
default=ChatPromptTemplate(input_variables=['input'],
messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='Given a
chat history and the latest user question which might reference context in the chat history,
formulate a standalone question which can be understood without the chat history. Do NOT answer the
question, just reformulate it if needed and otherwise return it as is.')),
HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], template='{input}'))])
           | ChatBedrock(client=<botocore.client.BedrockRuntime object at 0x1141585d0>,
model_id='meta.llama3-8b-instruct-v1:0', model_kwa

{'input': 'What are the options for model explainability in SageMaker?',
 'chat_history': [],
 'context': [Document(metadata={'source': './rag_data/Amazon_SageMaker_FAQs.csv', 'row': 11}, page_content='\ufeffWhat is Amazon SageMaker?: How does Amazon SageMaker Clarify improve model explainability?\nAmazon SageMaker is a fully managed service to prepare data and build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows.: Amazon SageMaker Clarify is integrated with Amazon SageMaker Experiments to provide a feature importance graph detailing the importance of each input for your model’s overall decision-making process after the model has been trained. These details can help determine if a particular model input has more influence than it should on overall model behavior. SageMaker Clarify also makes explanations for individual predictions available via an API.'),
  Document(metadata={'source': './rag_data/Amazon_SageMake

#### Auto add the history to the Chat with Retriever

Wrap with Runnable Chat History with Session id and run the chat conversation

![Amazon Bedrock - Conversational Interface](./images/context_aware_history_retriever.png)

borrowed from https://github.com/langchain-ai/langchain

In [34]:
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.runnables.config import RunnableConfig

from langchain_core.runnables import Runnable
from langchain_core.runnables import RunnableLambda, RunnablePassthrough


### This below LEVARAGES the In-memory with multiple sessions and session id
store = {}
def get_session_history(session_id: str) -> BaseChatMessageHistory:
    #print(session_id)
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

model_parameter = {"temperature": 0.0, "top_p": .5, "max_tokens_to_sample": 2000}
modelId = "meta.llama3-8b-instruct-v1:0" #"anthropic.claude-v2"
chatbedrock_llm = ChatBedrock(
    model_id=modelId,
    client=boto3_bedrock,
    model_kwargs=model_parameter, 
    beta_use_converse_api=True
)

contextualized_question_system_template = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualized_question_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualized_question_system_template),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

#- we will not ue this below
# history_aware_retriever = create_history_aware_retriever(
#     chatbedrock_llm, vectorstore_faiss_aws.as_retriever(), contextualized_question_prompt
# )


qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If the answer is not present in the context, just say you do not have enough context to answer. \
If the input is not present in the context, just say you do not have enough context to answer. \
If the question is not present in the context, just say you do not have enough context to answer. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\

{context}"""

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", qa_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}")
])
question_answer_chain = create_stuff_documents_chain(chatbedrock_llm, qa_prompt)

#rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain) # - this works but adds a call to the LLM for context 
rag_chain = create_retrieval_chain(vectorstore_faiss_aws.as_retriever(), question_answer_chain) # - this works but adds a call to the LLM for context 

#- Wrap the rag_chain with RunnableWithMessageHistory to automatically handle chat history:

chain_with_history = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)


In [35]:
result = chain_with_history.invoke(
    {"input": "What kind of bias can SageMaker detect?"},
    config={"configurable": {"session_id": "session_1"}}
)
result

{'input': 'What kind of bias can SageMaker detect?',
 'chat_history': [],
 'context': [Document(metadata={'source': './rag_data/Amazon_SageMaker_FAQs.csv', 'row': 10}, page_content="\ufeffWhat is Amazon SageMaker?: What kind of bias does Amazon SageMaker Clarify detect?\nAmazon SageMaker is a fully managed service to prepare data and build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows.: Measuring bias in ML models is a first step to mitigating bias. Bias may be measured before training and after training, as well as for inference for a deployed model. Each measure of bias corresponds to a different notion of fairness. Even considering simple notions of fairness leads to many different measures applicable in various contexts. You need to choose bias notions and metrics that are valid for the application and the situation under investigation. SageMaker currently supports the computation of different bias metrics f

### As a follow on question

1. The phrase `it` will be converted based on the chat history
2. Retriever gets invoked to get relevant content based on chat history 

In [36]:
follow_up_result = chain_with_history.invoke(
    {"input": "What are common ways of implementing this?"},
    config={"configurable": {"session_id": "session_1"}}
)
print_ww(follow_up_result)

{'input': 'What are common ways of implementing this?', 'chat_history': [HumanMessage(content='What
kind of bias can SageMaker detect?'), AIMessage(content='\n\nAmazon SageMaker Clarify detects
statistical bias across the entire ML workflow, including imbalances during data preparation, after
training, and ongoing over time.')], 'context': [Document(metadata={'source':
'./rag_data/Amazon_SageMaker_FAQs.csv', 'row': 105}, page_content='\ufeffWhat is Amazon SageMaker?:
When should I use reinforcement learning?\nAmazon SageMaker is a fully managed service to prepare
data and build, train, and deploy machine learning (ML) models for any use case with fully managed
infrastructure, tools, and workflows.: While the goal of supervised learning techniques is to find
the right answer based on the patterns in the training data, the goal of unsupervised learning
techniques is to find similarities and differences between data points. In contrast, the goal of
reinforcement learning (RL) techniques i

In [37]:
follow_up_result = chain_with_history.invoke(
    {"input": "Will it help?"},
    config={"configurable": {"session_id": "session_1"}}
)
print_ww(follow_up_result)

{'input': 'Will it help?', 'chat_history': [HumanMessage(content='What kind of bias can SageMaker
detect?'), AIMessage(content='\n\nAmazon SageMaker Clarify detects statistical bias across the
entire ML workflow, including imbalances during data preparation, after training, and ongoing over
time.'), HumanMessage(content='What are common ways of implementing this?'),
AIMessage(content="\n\nI don't have enough context to answer. The provided context does not mention
specific ways of implementing bias detection in Amazon SageMaker.")], 'context':
[Document(metadata={'source': './rag_data/Amazon_SageMaker_FAQs.csv', 'row': 101},
page_content='\ufeffWhat is Amazon SageMaker?: How do I decide to use Amazon SageMaker Autopilot or
Automatic Model Tuning?\nAmazon SageMaker is a fully managed service to prepare data and build,
train, and deploy machine learning (ML) models for any use case with fully managed infrastructure,
tools, and workflows.: Amazon SageMaker Autopilot automates everything i

#### Now ask a random question

In [None]:
follow_up_result = chain_with_history.invoke(
    {"input": "Give me a few tips on how to plant a  new garden."},
    config={"configurable": {"session_id": "session_1"}}
)
follow_up_result

Let's see how the semantic search works:
1. First we calculate the embeddings vector for the query, and
2. then we use this vector to do a similarity search on the store

In [None]:
v = br_embeddings.embed_query("R in SageMaker")
print(v[0:10])
results = vectorstore_faiss_aws.similarity_search_by_vector(v, k=4)
for r in results:
    print_ww(r.page_content)
    print('----')

#### Memory
In any chatbot we will need a QA Chain with various options which are customized by the use case. But in a chatbot we will always need to keep the history of the conversation so the model can take it into consideration to provide the answer. In this example we use the [ConversationalRetrievalChain](https://python.langchain.com/docs/modules/chains/popular/chat_vector_db) from LangChain, together with a ConversationBufferMemory to keep the history of the conversation.

Source: https://python.langchain.com/docs/modules/chains/popular/chat_vector_db

Set `verbose` to `True` to see all the what is going on behind the scenes.

In [None]:
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT

print_ww(CONDENSE_QUESTION_PROMPT.template)

#### Parameters used for ConversationRetrievalChain
* **retriever**: We used `VectorStoreRetriever`, which is backed by a `VectorStore`. To retrieve text, there are two search types you can choose: `"similarity"` or `"mmr"`. `search_type="similarity"` uses similarity search in the retriever object where it selects text chunk vectors that are most similar to the question vector.

* **memory**: Memory Chain to store the history 

* **condense_question_prompt**: Given a question from the user, we use the previous conversation and that question to make up a standalone question

* **chain_type**: If the chat history is long and doesn't fit the context you use this parameter and the options are `stuff`, `refine`, `map_reduce`, `map-rerank`

If the question asked is outside the scope of context, then the model will reply it doesn't know the answer

**Note**: if you are curious how the chain works, uncomment the `verbose=True` line.

#### Do some prompt engineering

You can "tune" your prompt to get more or less verbose answers. For example, try to change the number of sentences, or remove that instruction all-together. You might also need to change the number of `max_tokens` (eg 1000 or 2000) to get the full answer.

### In this demo we used Claude V3 sonnet LLM to create conversational interface with following patterns:

1. Chatbot (Basic - without context)

2. Chatbot using prompt template(Langchain)

3. Chatbot with personas

4. Chatbot with context