# AIM329 - Build a chat assistant with Amazon Bedrock

Welcome to session AIM329 of AWS re:Invent 2023 - Build a chat assistant with Amazon Bedrock.

This notebook will walk you through the process of building a chat assistant using a Large Language Model (LLM) hosted on [Amazon Bedrock](https://aws.amazon.com/bedrock/). We will use the [Retrieval Augment Generation (RAG)](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-customize-rag.html) architecture with an Embeddings Model hosted on Amazon Bedrock to convert raw text to vectors and store and search them in an [Amazon OpenSearch Serverless](https://aws.amazon.com/opensearch-service/features/serverless/) collection.

<div class="alert alert-block alert-info">
<b>Note:</b>
    <ul>
        <li>This notebook should only be run from within an <a href="https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html">Amazon SageMaker Notebook instance</a> or within an <a href="https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks.html">Amazon SageMaker Studio Notebook</a> and in an AWS Region that supports <a href="https://aws.amazon.com/opensearch-service/features/serverless/">Amazon OpenSearch Serverless</a>.</li>
        <li>At the time of writing this notebook, Amazon Bedrock was only available in <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html#bedrock-regions">these supported AWS Regions</a>. If you are running this notebook from any other AWS Region, then you have to change the Amazon Bedrock client's region and/or endpoint URL parameters to one of those supported AWS Regions. Follow the guidance in the <i>Organize imports</i> section of this notebook.</li>
        <li>This notebook is recommended to be run with a minimum instance size of <i>ml.m5.xlarge</i> and
            <ul>
                <li>With <i>Amazon Linux 2, Jupyter Lab 3</i> as the platform identifier on an Amazon SageMaker Notebook instance.</li>
                <li> (or)
                <li>With <i>Data Science 3.0</i> as the image on an Amazon SageMaker Studio Notebook.</li>
            <ul>
        </li>
        <li>At the time of this writing, the most relevant latest version of the Kernel for running this notebook,
            <ul>
                <li>On an Amazon SageMaker Notebook instance was <i>conda_python3</i></li>
                <li>On an Amazon SageMaker Studio Notebook was <i>Python 3</i></li>
            </ul>
        </li>
    </ul>
</div>

**Table of Contents:**

1. [Complete prerequisites](#Complete%20prerequisites)

    1. [Check and configure access to the Internet](#Check%20and%20configure%20access%20to%20the%20Internet)

    2. [Install required software libraries](#Install%20required%20software%20libraries)
    
    3. [Configure logging](#Configure%20logging)
        
        1. [System logs](#Configure%20system%20logs)
        
        2. [Application logs](#Configure%20application%20logs)
    
    4. [Organize imports](#Organize%20imports)
    
    5. [Set AWS Region and boto3 config](#Set%20AWS%20Region%20and%20boto3%20config)
    
    6. [Check and create an Amazon OpenSearch Serverless collection](#Check%20and%20create%20an%20Amazon%20OpenSearch%20Serverless%20collection)
    
    7. [Enable model access in Amazon Bedrock](#Enable%20model%20access%20in%20Amazon%20Bedrock)
    
    8. [Check and configure security permissions](#Check%20and%20configure%20security%20permissions)

    9. [Create common objects](#Create%20common%20objects)
    
    10. [Create an index in the Amazon OpenSearch Serverless collection](#Create%20index%20in%20collection)
    
 2. [Build the chat assistant](#Build%20the%20chat%20assistant)

    1. [Architecture](#Architecture)
    
    2. [Step 0a: Prepare to load data into the vector database](#Step0a)
        
        1. [Initialize the text splitter](#Initialize%20the%20text%20splitter)
        
        2. [Prepare HTML files for loading](#Prepare%20HTML%20files%20for%20loading)
        
        3. [Prepare PDF files for loading](#Prepare%20PDF%20files%20for%20loading)
    
    3. [Step 0b and 0c: Create the embeddings](#Step0band0c)
    
    4. [Step 0d: Store the embeddings in the vector database](#Step0d)
    
    5. [Step 1 to 6: Build the chat steps](#Step1to6)
    
 3. [Chat with the assistant](#Chat%20with%20the%20assistant)
 
 4. [Cleanup](#Cleanup)
 
 5. [Conclusion](#Conclusion)
 
 6. [Frequently Asked Questions (FAQs)](#FAQs)

##  1. Complete prerequisites <a id ='Complete%20prerequisites'> </a>

Check and complete the prerequisites.

###  A. Check and configure access to the Internet <a id ='Check%20and%20configure%20access%20to%20the%20Internet'> </a>
This notebook requires outbound access to the Internet to download the required software updates and to download the dataset.  You can either provide direct Internet access (default) or provide Internet access through an [Amazon VPC](https://aws.amazon.com/vpc/).  For more information on this, refer [here](https://docs.aws.amazon.com/sagemaker/latest/dg/appendix-notebook-and-internet-access.html).

<div class="alert alert-block alert-info">
    <b>Note:</b> During the AIM329 session, by default, outbound Internet access will be enabled for this notebook.
</div>

### B. Install required software libraries <a id ='Install%20required%20software%20libraries'> </a>
This notebook requires the following libraries:
* [SageMaker Python SDK version 2.x](https://sagemaker.readthedocs.io/en/stable/v2.html)
* [Python 3.10.x](https://www.python.org/downloads/release/python-3100/)
* [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)
* [LangChain](https://www.langchain.com/)
* [Unstructured](https://pypi.org/project/unstructured/)
* [OpenSearch Python Client](https://pypi.org/project/opensearch-py/)
* [PyPDF2](https://pypi.org/project/PyPDF2/)
* [AWS v4 authentication for the Python Requests library](https://pypi.org/project/requests-aws4auth/)

Run the following cell to install the required libraries.

<div class="alert alert-block alert-warning">  
    <b>Note:</b> At the end of the installation, the Kernel will be forcefully restarted immediately. Please wait 10 seconds for the kernel to come back before running the next cell.
</div>

In [None]:
!pip install boto3==1.29.4
!pip install langchain==0.0.339
!pip install unstructured==0.11.0
!pip install opensearch-py==2.4.2
!pip install PyPDF2==3.0.1
!pip install requests-aws4auth==1.2.3

import IPython

IPython.Application.instance().kernel.do_shutdown(True)

### C. Configure logging <a id ='Configure%20logging'> </a>

####  a. System logs <a id='Configure%20system%20logs'></a>

System logs refers to the logs generated by the notebook's interactions with the underlying notebook instance. Some examples of these are the logs generated when loading or saving the notebook.

These logs are automatically setup when the notebook instance is launched.

These logs can be accessed through the [Amazon CloudWatch Logs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html) console in the same AWS Region where this notebook is running.
* When running this notebook in an Amazon SageMaker Notebook instance, navigate to the following location,
    * <i>CloudWatch > Log groups > /aws/sagemaker/NotebookInstances > {notebook-instance-name}/jupyter.log</i>
* When running this notebook in an Amazon SageMaker Studio Notebook, navigate to the following locations,
    * <i>CloudWatch > Log groups > /aws/sagemaker/studio > {sagmaker-domain-name}/{user-name}/KernelGateway/{notebook-instance-name}</i>
    * <i>CloudWatch > Log groups > /aws/sagemaker/studio > {sagmaker-domain-name}/{user-name}/JupyterServer/default</i>

Run the following cell to print the name of the underlying notebook instance.

In [None]:
import json

notebook_name = ''
resource_metadata_path = '/opt/ml/metadata/resource-metadata.json'
with open(resource_metadata_path, 'r') as metadata:
    notebook_name = (json.load(metadata))['ResourceName']
print("Notebook instance name: '{}'".format(notebook_name))

####  b. Application logs <a id='Configure%20application%20logs'></a>

Application logs refers to the logs generated by running the various code cells in this notebook. To set this up, instantiate the [Python logging service](https://docs.python.org/3/library/logging.html) by running the following cell. You can configure the default log level and format as required.

By default, this notebook will only print the logs to the corresponding cell's output console.

In [None]:
import logging
import os

# Set the logging level and format
log_level = logging.INFO
log_format = '%(asctime)s - %(levelname)s - %(message)s'
logging.basicConfig(level=log_level, format=log_format)

# Save these in the environment variables for use in the helper scripts
os.environ['LOG_LEVEL'] = str(log_level)
os.environ['LOG_FORMAT'] = log_format

###  D. Organize imports <a id ='Organize%20imports'> </a>

Organize all the library and module imports for later use.

In [None]:
import boto3
import langchain
import opensearchpy
import PyPDF2
import requests
import sagemaker
import sys
from botocore.config import Config
from tqdm.contrib.concurrent import process_map
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
from langchain.embeddings import BedrockEmbeddings
from langchain.llms import Bedrock
from langchain.memory import ConversationBufferMemory
from langchain.prompts.prompt import PromptTemplate
from langchain.vectorstores import OpenSearchVectorSearch
from IPython.core.display import HTML

# Import the helper functions from the 'scripts' folder
sys.path.append(os.path.join(os.getcwd(), "scripts"))
#logging.info("Updated sys.path: {}".format(sys.path))
from helper_functions import *

Print the installed versions of some of the important libraries.

In [None]:
logging.info("Python version : {}".format(sys.version))
logging.info("Boto3 version : {}".format(boto3.__version__))
logging.info("SageMaker Python SDK version : {}".format(sagemaker.__version__))
logging.info("LangChain version : {}".format(langchain.__version__))
logging.info("OpenSearch Python Client version : {}".format(opensearchpy.__version__))
logging.info("PyPDF2 version : {}".format(PyPDF2.__version__))

###  E. Set AWS Region and boto3 config <a id ='Set%20AWS%20Region%20and%20boto3%20config'> </a>

Get the current AWS Region (where this notebook is running) and the SageMaker Session. This will be used to initiate some of the clients to AWS services using the boto3 APIs.

<div class="alert alert-block alert-info">
    <b>Note:</b> All the AWS services used by this notebook except Amazon Bedrock will use the current AWS Region. For Bedrock, follow the guidance in the next cell.
</div>

<div class="alert alert-block alert-warning">  
<b>Note:</b> At the time of writing this notebook, Amazon Bedrock was only available in <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html#bedrock-regions">these supported AWS Regions</a>. If you are running this notebook from any other AWS Region, then you have to change the Amazon Bedrock client's region and/or endpoint URL parameters to one of those supported AWS Regions. In order to do this, this notebook will use the value specified in the environment variable named <mark>AMAZON_BEDROCK_REGION</mark>. If this is not specified, then the notebook will default to <mark>us-west-2 (Oregon)</mark> for Amazon Bedrock.
</div>



In [None]:
# Get the AWS Region, SageMaker Session and IAM Role references
my_session = boto3.session.Session()
logging.info("SageMaker Session: {}".format(my_session))
my_iam_role = sagemaker.get_execution_role()
logging.info("Notebook IAM Role: {}".format(my_iam_role))
my_region = my_session.region_name
logging.info("Current AWS Region: {}".format(my_region))

# Explicity set the AWS Region for Amazon Bedrock clients
AMAZON_BEDROCK_DEFAULT_REGION = "us-west-2"
br_region = os.environ.get('AMAZON_BEDROCK_REGION')
if br_region is None:
    br_region = AMAZON_BEDROCK_DEFAULT_REGION
elif len(br_region) == 0:
    br_region = AMAZON_BEDROCK_DEFAULT_REGION
logging.info("AWS Region for Amazon Bedrock: {}".format(br_region))

Set the timeout and retry configurations that will be applied to all the boto3 clients used in this notebook.

In [None]:
# Increase the standard time out limits in the boto3 client from 1 minute to 3 minutes
# and set the retry limits
my_boto3_config = Config(
    connect_timeout = (60 * 3),
    read_timeout = (60 * 3),
    retries = {
        'max_attempts': 10,
        'mode': 'standard'
    }
)

###  F. Check and create an Amazon OpenSearch Serverless collection <a id ='Check%20and%20create%20an%20Amazon%20OpenSearch%20Serverless%20collection'> </a>

This notebook uses an [Amazon OpenSearch Serverless (AOSS) collection](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-collections.html) of type [Vector search](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-overview.html#serverless-usecase) as the vector database that will be used by the chat assistant.

Run the following cells to check and create an AOSS collection if it does not exist.

In [None]:
# Set the flags to identify if an AOSS collection exists and if it is created through this notebook
aoss_collection_exists = False
aoss_collection_created = False

<div class="alert alert-block alert-info">
<b>Note:</b> For the purpose of running this notebook, it is preferable to have an empty collection. During the AIM329 session, by default, an empty collection of type <i>Vector search</i> will be pre-created and ready to use.
</div>

Run the following code cell to retreive the details of the first available AOSS collection.

In [None]:
# Create the AOSS client
aoss_client = boto3.client("opensearchserverless", config = my_boto3_config)

# Check and create a collection if none is found
collection_id = ''
collections = aoss_client.list_collections()['collectionSummaries']
if len(collections) == 0:
    aoss_collection_exists = False
    logging.info("No AOSS collections exist.")
else:
    aoss_collection_exists = True
    logging.info("Found an AOSS collection.")
    first_collection = collections[0]
    collection_id = first_collection["id"]
    collection_name = first_collection["name"]

<div class="alert alert-block alert-info">
<b>Note:</b> If you are running this notebook outside of the AIM329 session and would like to create an AOSS collection through this notebook, then, run the following cell.
</div>

In [None]:
### Note: It may take 8 to 10 minutes to create the AOSS collection.

# The helper function 'create_aoss_collection' (available through ./scripts/helper_functions.py) creates the specified
# AOSS collection with the following policies:
# Data access policy: provides full access to the IAM role associated with this notebook instance.
# Encryption policy: encrypts with AWS owned key.
# Network policy: provides public network access to the collection.

if aoss_collection_exists:
    logging.info("Skipping AOSS collection creation.")
else:
    collection_name = "aim329"
    data_access_policy_name = "aim329-dap"
    encryption_policy_name = "aim329-ep"
    network_policy_name = "aim329-np"
    response = create_aoss_collection(aoss_client, collection_name, data_access_policy_name,
                                      encryption_policy_name, network_policy_name, my_iam_role)
    collection_id = response["id"]
    collection_name = response["name"]
    aoss_collection_created = True

Run the following cell to print the details of the AOSS collection that will be used.

In [None]:
if len(collection_id) == 0:
    aoss_collection_exists = False
    logging.info("No AOSS collections exist.")
else:
    aoss_collection_exists = True
    logging.info("The following AOSS collection will be used:\nCollection id: {}; Collection name: {}"
                 .format(collection_id, collection_name))
    # Print the AWS console URL to the AOSS collection
    collection_aws_console_url = "https://{}.console.aws.amazon.com/aos/home?region={}#opensearch/collections/{}"\
    .format(my_region, my_region, collection_name)
    logging.info("If you like to take a look at this collection, visit {}".format(collection_aws_console_url))

###  G. Enable model access in Amazon Bedrock <a id ='Enable%20model%20access%20in%20Amazon%20Bedrock'> </a>

<div class="alert alert-block alert-danger">
    <b>Note:</b> Before invoking any model in Amazon Bedrock, enable access to that model by following the instructions <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html">here</a>. In addition, for Anthropic models, you need to submit the use case details. Otherwise, you will get an authorization error.
</div>

Run the following cell to print the Amazon Bedrock model access page URL for the AWS Region that was selected earlier.

In [None]:
# Print the Amazon Bedrock model access page URL
logging.info("Amazon Bedrock model access page - https://{}.console.aws.amazon.com/bedrock/home?region={}#/modelaccess"
             .format(br_region, br_region))

<div class="alert alert-block alert-warning">  
<b>Note:</b> You will have to do this manually after reading the End User License Agreement (EULA) for each of the models that you want to enable. Unless you explicitly disable it, this is a one-time setup for each model in an AWS account.
</div>

###  H. Check and configure security permissions <a id ='Check%20and%20configure%20security%20permissions'> </a>
This notebook uses the IAM role attached to the underlying notebook instance.  To view the name of this role, run the following cell.

This IAM role should have the following permissions,

1. Full access to invoke Large Language Models (LLMs) on Amazon Bedrock.
2. Full access to read and write to the Amazon OpenSearch Serverless collection created in the previous step.
3. Access to write to Amazon CloudWatch Logs.

In addition, [data access control](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html) should be setup on the Amazon OpenSearch Serverless collection to provide create, read and write access to the IAM role associated with this notebook instance.

<div class="alert alert-block alert-info">
<b>Note:</b>  During the AIM329 session, by default, all these permissions will be setup.
</div>

Run the following cell to print the details of the IAM role attached to the underlying notebook instance.

In [None]:
# Print the IAM role ARN and console URL
logging.info("This notebook's IAM role is '{}'".format(my_iam_role))
arn_parts = my_iam_role.split('/')
logging.info("Details of this IAM role are available at https://{}.console.aws.amazon.com/iamv2/home?region={}#/roles/details/{}?section=permissions"
             .format(my_region, my_region, arn_parts[len(arn_parts) - 1]))

###  I. Create common objects <a id='Create%20common%20objects'></a>

To begin with, list all the available models in Amazon Bedrock by running the following cell. This will help you pick a LLM and the Embeddings model within Amazon Bedrock that you will be using in this notebook. By default, both will use the On-Demand pricing model.

In [None]:
# List all the available foundation models in Amazon Bedrock
models_info = ''
bedrock_client = boto3.client("bedrock", region_name = br_region, endpoint_url = "https://bedrock.{}.amazonaws.com"
                              .format(br_region), config = my_boto3_config)
response = bedrock_client.list_foundation_models()
model_summaries = response["modelSummaries"]
models_info = models_info + "\n"
models_info = models_info + "-".ljust(125, "-") + "\n"
models_info = models_info + "{:<15} {:<30} {:<20} {:<20} {:<40}".format("Provider Name", "Model Name", "Input Modalities",
                                                          "Output Modalities", "Model Id")
models_info = models_info + "-".ljust(125, "-")
for model_summary in model_summaries:
    models_info = models_info + "\n"
    models_info = models_info + "{:<15} {:<30} {:<20} {:<20} {:<40}".format(model_summary["providerName"],
                                                                            model_summary["modelName"],
                                                                            "|".join(model_summary["inputModalities"]),
                                                                            "|".join(model_summary["outputModalities"]),
                                                                            model_summary["modelId"])
models_info = models_info + "-".ljust(125, "-") + "\n"
logging.info("Displaying available models in the '{}' Region:".format(br_region) + models_info)

From the results of running the above cell,

1. Pick the model-id that corresponds to the LLM that you want and set it as the value of the `llm_model_id` variable in the following cell.
2. (Optional) Specify the [LLM-specific inference parameters](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html) in the `model_kwargs` parameter.
2. Pick the model-id that corresponds to the Embeddings model that you want and set it as the value of the `embeddings_model_id` variable in the following cell.

Now, run the following cell to create the common objects to be used in future steps in this notebook.

<div class="alert alert-block alert-info">
<b>Note:</b> This notebook was tested with the following Amazon Bedrock models:
    <li>LLMs: anthropic.claude-v2, anthropic.claude-instant-v1, cohere.command-text-v14, ai21.j2-ultra-v1</li>
    <li>Embedding model(s): amazon.titan-embed-text-v1</li>
</div>

<div class="alert alert-block alert-danger">
<b>Note:</b> During the AIM329 session, it is recommended to ONLY use the following Amazon Bedrock models:
    <li>LLMs: anthropic.claude-instant-v1</li>
    <li>Embedding model(s): amazon.titan-embed-text-v1</li>
</div>

In [None]:
##### Specify the model-ids along with their inference parameters
# Model-id of the LLM to be used in the chat assistant
llm_model_id = "anthropic.claude-instant-v1"
temperature = 0.5
max_response_token_length = 300
# Model-id of the Embeddings model to be used in the chat assistant
embeddings_model_id = "amazon.titan-embed-text-v1"

##### LLM related objects
# Create the Amazon Bedrock runtime client
bedrock_rt_client = boto3.client("bedrock-runtime", region_name = br_region, config = my_boto3_config)

# Create the LangChain client for the LLM using the Bedrock client created above.
llm = Bedrock(
    model_id = llm_model_id,
    model_kwargs = get_model_specific_inference_params(llm_model_id,
                                                       temperature,
                                                       max_response_token_length),
    client = bedrock_rt_client
)

##### Embeddings related objects
# Use the LangChain BedrockEmbeddings class to create the Embeddings client.
br_embeddings = BedrockEmbeddings(client = bedrock_rt_client, model_id = embeddings_model_id, region_name = br_region)

##### Amazon OpenSearch Serverless (AOSS) related objects
# Create the AOSS Python client from the AOSS boto3 client using the helper function 
# available through ./scripts/helper_functions.py)
aoss_py_client = auth_opensearch(host = "{}.{}.aoss.amazonaws.com".format(collection_id, my_region),
                            service = 'aoss', region = my_region)
# Specify the name of the index in the AOSS collection; this will be created later in the notebook
index_name = "aim329-index"
# Specify the max workers for loading data in parallel into the index
max_workers = 8
# To access an Opensearch Collection using LangChain, we can use the OpenSearchVectorSearch class.
doc_search = OpenSearchVectorSearch(
    opensearch_url = "{}.{}.aoss.amazonaws.com".format(collection_id, my_region),
    index_name = index_name,
    embedding_function = br_embeddings)
# Set the doc search client to the AOSS Python client
doc_search.client = aoss_py_client

##### File related objects
# Specify the path to the directory that will contain the RAG data
rag_dir = os.path.join(os.getcwd(), "data/rag")
# Create the directory if it doesn't exist
os.makedirs(rag_dir, exist_ok = True)

###  J. Create an index in the Amazon OpenSearch Serverless collection <a id='Create%20index%20in%20collection'></a>

To create an index in the Amazon OpenSearch Serverless (AOSS) collection, we first need to define a schema for our index. AOSS allows users to specify a simple search index, which utilizes keyword matching, or the vector search feature, which utilizes [k-Nearest Neighbor (k-NN) search](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/knn.html). Vector search differs from standard search in that instead of using a typical keyword matching or fuzzy matching algorithm, vector search compares [embeddings](https://en.wikipedia.org/wiki/Word_embedding) of two pieces of text. An embedding is a numerical representation of a piece of information, like text, that we can compare against other embeddings. To learn more about embeddings, take a look at [this blog](https://huggingface.co/blog/getting-started-with-embeddings). The vector search feature allows us to search for documents that are semantically similar to the questions that our end users send to our chat assistant. This can improve the context that we then give to our LLM to answer the user's questions.

In [None]:
# Define the schema for the index with an k-NN type vector as the embedding
knn_index = {
    "settings": {
        "index.knn": True,
    },
    "mappings": {
        "properties": {
           "content-embedding": { 
                "type": "knn_vector",
                "dimension": 1536 # can have dimension up to 10k
            },
            "content": {
                "type": "text"
            },
            "title": {
                "type": "text"
            },
            "source": {
                "type": "text"
            }
        }
    }
}

# Delete the index
#aoss_py_client.indices.delete(index = index_name)

# Create the index if it does not exist
if aoss_py_client.indices.exists(index = index_name):
    logging.info("AOSS index '{}' already exists.".format(index_name))
else:
    logging.info("Creating AOSS index '{}'...".format(index_name))
    logging.info(aoss_py_client.indices.create(index = index_name, body = knn_index, ignore = 400))

In [None]:
# Print the AWS console URL to the AOSS index
index_aws_console_url = collection_aws_console_url + "/" + index_name
logging.info("If you like to take a look at this index, visit {}".format(index_aws_console_url))

## 2. Build the chat assistant <a id ='Build%20the%20chat%20assistant'> </a>

Large language models (LLMs) have a tendency to [hallucinate](https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)). Hallucination in a LLM context is our model providing a confident but factually incorrect response that often tells us what the model thinks we want to hear, regardless of if it actually is the correct answer. One way to prevent LLMs from giving us incorrect information is by using a [Retrieval Augment Generation (RAG)](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-customize-rag.html) mechanism.

RAG allows us to provide our model with correct context information that it can use to ground its response in facts, instead of it trying to remember facts from its training data. To setup RAG, we need to have a document database that we can utilize to provide our model with related source documents. There are many ways to setup a document database. In this notebook, we will use an [Amazon Opensearch Serverless](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-overview.html) collection of type 'Vector search'.

We will use [LangChain](https://www.langchain.com/) to orchestrate the sequence of events performed by the chat assistant. LangChain is a framework designed to simplify the creation of LLM applications. It provides flexible abstractions and an extensive toolkit.

###  A. Architecture <a id='Architecture'></a>

![Architecture](./images/architecture.png)





###  B. Step 0a: Prepare to load data into the vector database <a id='Step0a'></a>

An Amazon OpenSearch Serverless (AOSS) collection is a logical grouping of one or more indexes that work together to support a specific workload or use case.

This notebook will use a vector index for indexing documents in the AOSS collection. To demonstrate the versatility of the data that can be processed, we will ingest data from some HTML and PDF files. During the ingestion process, we will extract text from these files and split them into smaller chunks.

####  a. Initialize the text splitter <a id='Initialize%20the%20text%20splitter'></a>

When we are indexing documents for information retrieval, providing an entire document to a LLM as context can be overwhelming to our LLM, especially for very long documents. A best practice is to divide the document into easier to consume partially overlapping chunks. Dividing the document in this way also tends to improve search result relevance as often the answer we are looking for is contained within a specific passage of a document and providing the entire document is unnecessary. 

Let's use the LangChain's [RecursiveCharacterTextSplitter](https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter) to create a text splitting object that we will use split the content before loading into the vector database. Here, we will use the simple `fixed-size chunking` strategy where we will set the size of each chunk and the number of overlapping characters between two consecutive chunks.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 8000,
    chunk_overlap  = 100,
    length_function = len,
    add_start_index = True,
)

####  b. Prepare HTML files for loading <a id='Prepare%20HTML%20files%20for%20loading'></a>

Run the following cells to populate some documentation on Amazon Bedrock so our chat assistant can answer questions about Amazon Bedrock with factually correct information.

These documentation are available as HTML files. As a first step, download these files to the local directory named `./data/rag/amazon-bedrock-docs/`. The required directory structure will be created if it doesn't exist.

In [None]:
# A simple list of Agents for Amazon Bedrock documentation to index
html_link_list = [
    "https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-service.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/setting-up.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/embeddings.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-prepare.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-guidelines.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/prov-throughput.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/agents.html",
    "https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html"
]

# Set the sub-directory to store these HTML files
html_files_dir = "{}/{}".format(rag_dir, "amazon-bedrock-docs")
# Create the directory if it doesn't exist
os.makedirs(html_files_dir, exist_ok = True)

# Download and store each HTML file; overwrite existing file
for html_link in html_link_list:
    html_content = requests.get(html_link).content.decode('utf-8')
    file_name = html_link.split("/")[-1]
    with open('{}/{}'.format(html_files_dir, file_name), "w") as f:
        f.write(html_content)
        logging.info("Downloaded file '{}' to '{}'".format(file_name, html_files_dir))

logging.info("A total of {} HTML files were downloaded to '{}'.".format(len(html_link_list), html_files_dir))
        
# Display the last downloaded HTML file
#HTML(html_content)

Now we have downloaded all of these HTML documents, let's go ahead and load them using a HTML loader. We are going to use LangChain's [Unstructured HTML Loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/html). This will parse the raw HTML files and put it in a format that we can then give to our LLMs. Finally we will split each document as per the splitter configuration defined above.

In [None]:
from langchain.document_loaders import UnstructuredHTMLLoader

# Initialize
html_doc_data_list = []
html_doc_link_list = []

# Loop through the downloaded HTML files in the directory
for html_link in html_link_list:
    file_name = html_link.split("/")[-1]
    file_path = '{}/{}'.format(html_files_dir, file_name)
    
    # Load the file content
    loader = UnstructuredHTMLLoader(file_path)
    data = loader.load()
    
    # Remove irrelevant text
    html_doc = data[0].page_content.replace("""Did this page help you? - Yes

Thanks for letting us know we're doing a good job!

If you've got a moment, please tell us what we did right so we can do more of it.

Did this page help you? - No

Thanks for letting us know this page needs work. We're sorry we let you down.

If you've got a moment, please tell us how we can make the documentation better.""", "").replace("""Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.""", "")
    
    # Split our document into chunks
    texts = text_splitter.create_documents([html_doc])
    
    # Create a list of document chunks as well as a list of links
    for text in texts:
        html_doc_data_list.append(text.page_content)
        html_doc_link_list.append(html_link)

logging.info("Created {} chunks from the {} downloaded HTML files.".format(len(html_doc_data_list), len(html_link_list)))        

# Print the first chunk
#logging.info(html_doc_data_list[0])

####  c. Prepare PDF files for loading <a id='Prepare%20PDF%20files%20for%20loading'></a>

Run the following cells to populate some Generative AI related whitepapers so our chat assistant can answer questions about them with factually correct information.

These whitepapers are available as PDF files. As a first step, download these files to the local directory named `./data/rag/gen-ai-whitepapers/`. The required directory structure will be created if it doesn't exist.

In [None]:
# A simple list of Gen AI whitepapers to index
pdf_link_list = [
    "https://www-cdn.anthropic.com/bd2a28d2535bfb0494cc8e2a3bf135d2e7523226/Model-Card-Claude-2.pdf"
]

# Set the sub-directory to store these PDF files
pdf_files_dir = "{}/{}".format(rag_dir, "gen-ai-whitepapers")
# Create the directory if it doesn't exist
os.makedirs(pdf_files_dir, exist_ok = True)

# Download and store each PDF file; overwrite existing file
for pdf_link in pdf_link_list:
    pdf_content = requests.get(pdf_link).content
    file_name = pdf_link.split("/")[-1]
    with open('{}/{}'.format(pdf_files_dir, file_name), "wb") as f:
        f.write(pdf_content)
        logging.info("Downloaded file '{}' to '{}'".format(file_name, pdf_files_dir))

logging.info("A total of {} PDF files were downloaded to '{}'.".format(len(pdf_link_list), pdf_files_dir))

Now we have downloaded all of these PDF documents, let's go ahead and process them. We are going to extract the text from them using the [PyPDF2](https://pypdf2.readthedocs.io/en/3.0.0/) library. Finally we will split each document as per the splitter configuration defined above.

In [None]:
from PyPDF2 import PdfReader

# Initialize
pdf_doc_data_list = []
pdf_doc_link_list = []

# Loop through the downloaded PDF files in the directory
for pdf_link in pdf_link_list:
    pdf_doc = ''
    file_name = pdf_link.split("/")[-1]
    file_path = '{}/{}'.format(pdf_files_dir, file_name)
    
    # Load the file content
    reader = PdfReader(file_path)
    
    # Loop through the pages
    for page in reader.pages:
        pdf_doc = pdf_doc + page.extract_text()
    
    # Split our document into chunks
    texts = text_splitter.create_documents([pdf_doc])
    
    # Create a list of document chunks as well as a list of links
    for text in texts:
        pdf_doc_data_list.append(text.page_content)
        pdf_doc_link_list.append(pdf_link)

logging.info("Created {} chunks from the {} downloaded PDF files.".format(len(pdf_doc_data_list), len(pdf_link_list)))        

# Print the first chunk
#logging.info(pdf_doc_data_list[0])

###  C. Step 0b and 0c: Create the embeddings <a id='Step0band0c'></a>

Now that our document chunks ready, let us vectorize them to create the embeddings. Along with this, we will prepare the documents to be inserted into the AOSS collection's index.

Each of those prepared documents will contain the following fields,
- `content` - contains the actual text of the document chunk.
- `content-embedding` - contains the corresponding document chunk's embedding.
- `title` - a reference to the name of the location from where the content was ingested.
- `source` - a reference to the location from where the content was ingested.

In [None]:
# Create the embeddings from the document chunks and prepare documents to index
# by using the 'prepare_index_document_list' helper function available through ./scripts/helper_functions.py)

# Process the chunked HTML documents
html_doc_list = prepare_index_document_list(br_embeddings, 'HTML', html_doc_data_list, html_doc_link_list)

# Process the chunked PDF documents
pdf_doc_list = prepare_index_document_list(br_embeddings, 'PDF', pdf_doc_data_list, pdf_doc_link_list)

###  D. Step 0d: Store the embeddings in the vector database <a id='Step0d'></a>

Run the following cell to upload the prepared documents into our created AOSS collection's index. The below function uses a parallel processing function to upload our documents into our index. The number of parallel worker threads is controlled by the `max_workers` variable.

<div class="alert alert-block alert-info">
    <b>Note:</b> After executing the below cell, it may take up to 30 seconds for the data to be available for reading. If there are any boto3 errors, then the API calls will be retried automatically based on the settings in the earlier step.
</div>

<div class="alert alert-block alert-warning">  
    <b>Note:</b> At the time of writing this notebook, AOSS did not support ingestion with <i>id</i> for <i>Vector search</i> collection type. As a result, running the following cell more than once will result in duplicate documents being created in the AOSS index. This is ok for the purpose of running this notebook.
</div>

In [None]:
# Define the function to import into the AOSS index.
def os_import(article):
    """
    This function imports the documents and their metadata into the AOSS index.
    """
    aoss_py_client.index(index = index_name,
                         body={
                                "content-embedding": article['content-embedding'],
                                "content": article['content'],
                                "title": article['title'],
                                "source": article['source'],
                              }
                        )
    
# Parallelize and populate the AOSS Collection's index with HTML data
process_map(os_import, html_doc_list, max_workers = max_workers)

# Parallelize and populate the AOSS Collection's index with PDF data
process_map(os_import, pdf_doc_list, max_workers = max_workers) 

###  E. Step 1 to 6: Build the chat steps <a id='Step1to6'></a>

In order to demonstrate the usefulness of the [Retrieval Augment Generation (RAG)](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-customize-rag.html) architecture, let us directly call the LLM without any searches on the vector database. Here, we will ask a question about Agents for Amazon Bedrock or about Anthropic Claude models' performance which we know the LLM will not be aware of because these were not available when the LLM was trained.

In [None]:
# Set the query
query = "What are Agents in Amazon Bedrock?"
#query = "How did Claude 2 do on the Graduate Record Exam?"
query_char_count, query_word_count = get_counts_from_text(query)
# Invoke the LLM
output = llm.generate([query])
# Read the response
query_response = output.generations[0][0].text
query_resp_char_count, query_resp_word_count = get_counts_from_text(query_response)

# Print the details
logging.info("\n\nHuman:\n{}\n\nAssistant:\n{}\n".format(query, query_response))

# Print the stats
logging.info("Query (prompt) stats: Character count = {}; Word count = {}"
             .format(query_char_count, query_word_count))
logging.info("Response stats: Character count = {}; Word count = {}"
             .format(query_resp_char_count, query_resp_word_count))

While this response may seem plausible, it is actually incorrect. In general, LLMs will try to answer your question but in this case it has hallucinated. This is an example of how the LLM was not able to provide the correct answer. Here is where a RAG architecture will provide the remedy.

Let us see if we can find a document that contains information about Agents for Amazon Bedrock or about Anthropic Claude models' performance in our document index. We can do this in two ways:

Method 1: Using the AOSS Python client's `search` directly.

In [None]:
query = "What are Agents in Amazon Bedrock?"
#query = "How did Claude 2 do on the Graduate Record Exam?"
temp_embedding = br_embeddings.embed_query(text = query)
search_query = {"query": {"knn": {"content-embedding": {"vector": temp_embedding, "k": 5}}}}
results = aoss_py_client.search(index = index_name, body = search_query)
hits = results["hits"]["hits"]
logging.info("Found {} hit(s).".format(len(hits)))
for hit in hits:
    logging.info(hit["_source"]["source"])

Method 2: Using LangChain's `similarity_search` which will use the AOSS Python client under the covers. 

In [None]:
max_results = 5
query = "What are Agents in Amazon Bedrock?"
#query = "How did Claude 2 do on the Graduate Record Exam?"
docs = doc_search.similarity_search(
    # Our text query
    query = query,
    # The name of the field that contains our vector
    vector_field = "content-embedding",
    # The actual text field we are looking for
    text_field = "content",
    # The number of results we want to return
    k = max_results
)
logging.info("Specified {} max results. Found {} hit(s).".format(max_results, len(docs)))
for doc in docs:
    logging.info(docs[0].metadata['source'])

It looks like we do have some information on Agents for Amazon Bedrock or the RefinedWeb Dataset for Falcom LLM in the documents that we prepared and stored in the AOSS collection's index. Now that we know we have the right information in our document index, let us setup a [RetrievalQA chain](https://python.langchain.com/docs/use_cases/question_answering/vector_db_qa). This chain allows us to supply a [prompt template](https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/), our LLM, and our document index to form a question answering chain that will answer questions based on the returned context document. We will use the [stuff](https://python.langchain.com/docs/modules/chains/document/stuff) chain type.

[Prompt templates](https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/) are pre-defined recipes for generating prompts for language models. In the one we create below, we specify context and question input variables, which our RetrievalQA chain will fill in with the query and source documents.

In [None]:
# Create the prompt template
prompt_template = """\n\nHuman: Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Don't include harmful content.

<context>
{context}
</context>

Question: {question}
\n\nAssistant:"""

PROMPT = PromptTemplate(
    template = prompt_template, input_variables = ["context", "question"]
)

# Create the Retrieval QA chain
qa = RetrievalQA.from_chain_type(llm = llm, 
                                 chain_type = "stuff", 
                                 retriever = doc_search.as_retriever(search_kwargs = {
                                     "vector_field": "content-embedding",
                                     "text_field": "content",
                                     "k": 5}),
                                 return_source_documents = True,
                                 chain_type_kwargs = {"prompt": PROMPT, "verbose": False},
                                 verbose = False)

# Ask the question to the LLM and print the response along with the references from the source
question = "What are Agents in Amazon Bedrock?"
#question = "How did Claude 2 do on the Graduate Record Exam?"
# Invoke the embedding search, LLM etc. through the LangChain RetrievalQA chain
response = qa(question, return_only_outputs = True)
# Parse the output
question, answer, context, title, source = parse_rqa_prompt_output(question, response)

# Print the details
prompt_template_with_result = prompt_template + answer + "\n"
logging.info(prompt_template_with_result.format(context=context, question=question))
logging.info("The context for the above question was retreived from here --> Title: {}; Source: {}"
             .format(title, source))

# Print the stats
query_char_count, query_word_count = get_counts_from_text(question)
logging.info("Question stats: Character count = {}; Word count = {}"
             .format(query_char_count, query_word_count))
query_resp_char_count, query_resp_word_count = get_counts_from_text(answer)
logging.info("Answer stats: Character count = {}; Word count = {}"
             .format(query_resp_char_count, query_resp_word_count))

Let's now take it one step further with a [ConversationalRetrievalChain](https://python.langchain.com/docs/expression_language/cookbook/retrieval#conversational-retrieval-chain). 

LLMs on their own will not remember the last input you provided them. So we need a mechanism to remember and supply our previous conversation information back to our LLM. We can do this by pairing with a [ConversationBufferMemory](https://python.langchain.com/docs/modules/memory/adding_memory) class. This will allow us to hold a conversation with our LLM and retain the previous conversation in memory.

The following cells add a conversational element to the retrieval chain and allows us to add chat memory to the retrieval. This chain uses a LLM call prior to the document retrieval that condenses conversation history and the current question into a single new question to improve document retrieval.

To begin with, construct the prompt template and instantiate the chain.

In [None]:
# Create the Conversational Retrieval chain
cqa = ConversationalRetrievalChain.from_llm(llm = llm, 
                                            chain_type = "stuff", 
                                            condense_question_llm = llm,
                                            retriever = doc_search.as_retriever(search_kwargs = {
                                                "vector_field": "content-embedding",
                                                "text_field": "content",
                                                "k": 5}),
                                            return_source_documents = True,
                                            memory = ConversationBufferMemory(input_key = "question",
                                                                              output_key = "answer",
                                                                              memory_key = "chat_history",
                                                                              return_messages = True),
                                            verbose = False)
chat_history = []

Ask the first question.

In [None]:
question = "How did Claude 2 do on the Graduate Record Exam?"
# Invoke the embedding search, LLM etc. through the LangChain ConversationalRetrievalChain chain
response = cqa.invoke({"question": question, "chat_history": chat_history})
# Parse and print the output
logging.info(convert_crc_chat_history_to_text(response))

Ask the second question. Now when the response is printed, the previous message exchange will also be printed as it is stored in the chat history in memory.

In [None]:
question = "What are Agents in Amazon Bedrock?"
# Invoke the embedding search, LLM etc. through the LangChain ConversationalRetrievalChain chain
response = cqa.invoke({"question": question, "chat_history": chat_history})
# Parse and print the output
logging.info(convert_crc_chat_history_to_text(response))

## 3. Chat with the assistant <a id='Chat%20with%20the%20assistant'></a>

Now we are going to put it all together in a single browser interface. The below call will initiate a simple conversational UI inside of our notebook. Run it and start asking questions!

In [None]:
# The class 'ChatUX' is defined in ./scripts/helper_functions.py
# Instantiate it and start the interactive chat with the assistant

chatux = ChatUX(cqa)
chatux.start_chat()

## 4. Cleanup <a id='Cleanup'></a>

As a best practice, you should delete AWS resources that are no longer required.  This will help you avoid incurring unncessary costs.

<div class="alert alert-block alert-info">
<b>Note:</b> During the AIM329 session, by default, all resources will be cleaned up at the end of the session. If you are running this notebook outside of the AIM329 session, you can cleanup the AOSS resources created through this notebook by running the following cell.
</div>

In [None]:
# The helper function 'delete_aoss_collection' (available through ./scripts/helper_functions.py) deletes the specified
# AOSS collection along with all the indexes in it. In addition, it also deletes the specified data access policy,
# encryption policy and network policy.

if aoss_collection_created:
    delete_aoss_collection(aoss_client, collection_id, data_access_policy_name,
                           encryption_policy_name, network_policy_name)
else:
    logging.info("Skipping AOSS collection deletion.")

## 5. Conclusion <a id='Conclusion'></a>

We have now seen how to build chat assistant using a Large Language Model (LLM) hosted on Amazon Bedrock. In the process, we also demonstrated how a Retrieval Augment Generation (RAG) mechanism can help prevent hallucination. While using RAG, we showed you how to use an Embeddings Model hosted on Amazon Bedrock to convert raw text to vectors and how to store and search them in an Amazon OpenSearch Serverless collection.

## 6. Frequently Asked Questions (FAQs) <a id='FAQs'></a>

**Q: What AWS services are used in this notebook?**

Amazon Bedrock, Amazon OpenSearch Serverless, AWS Identity and Access Management (IAM), Amazon CloudWatch, and Amazon SageMaker Notebook instance (or) Amazon SageMaker Studio Notebook depending on what you use to run the notebook.

**Q: What is the difference between OpenSearch, Amazon OpenSearch Serverless, and Amazon OpenSearch Service?**

OpenSearch is a fully open-source search and analytics engine for use cases such as log analytics, real-time application monitoring, and clickstream analysis. For more information, see the [OpenSearch documentation](https://opensearch.org/docs/latest/).

Amazon OpenSearch Service provisions all the resources for your OpenSearch cluster and launches it. It also automatically detects and replaces failed OpenSearch Service nodes, reducing the overhead associated with self-managed infrastructures. You can scale your cluster with a single API call or a few clicks in the console.

Amazon OpenSearch Serverless is an on-demand serverless configuration for Amazon OpenSearch Service. Serverless removes the operational complexities of provisioning, configuring, and tuning your OpenSearch clusters. It's a good option for organizations that don't want to self-manage their OpenSearch clusters, or organizations that don't have the dedicated resources or expertise to operate large clusters. With OpenSearch Serverless, you can easily search and analyze a large volume of data without having to worry about the underlying infrastructure and data management.

**Q: How does Amazon OpenSearch Serverless manage capacity?**

With Amazon OpenSearch Serverless, you don't have to manage capacity yourself. OpenSearch Serverless automatically scales compute capacity for your account based on the current workload. Serverless compute capacity is measured in OpenSearch Compute Units (OCUs). Each OCU is a combination of 6 GiB of memory and corresponding virtual CPU (vCPU), as well as data transfer to Amazon S3. For more information about the decoupled architecture in OpenSearch Serverless, see [How it works](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-overview.html#serverless-process).

**Q: Will Amazon Bedrock capture and store my data?**

Amazon Bedrock doesn't use your prompts and continuations to train any AWS models or distribute them to third parties. Your training data isn't used to train the base Amazon Titan models or distributed to third parties. Other usage data, such as usage timestamps, logged account IDs, and other information logged by the service, is also not used to train the models.

Amazon Bedrock uses the fine tuning data you provide only for fine tuning an Amazon Titan model. Amazon Bedrock doesn't use fine tuning data for any other purpose, such as training base foundation models.

Each model provider has an escrow account that they upload their models to. The Amazon Bedrock inference account has permissions to call these models, but the escrow accounts themselves don't have outbound permissions to Amazon Bedrock accounts. Additionally, model providers don't have access to Amazon Bedrock logs or access to customer prompts and continuations.

Amazon Bedrock doesn’t store or log your data in its service logs.

**Q: What models are supported by Amazon Bedrock?**

Go [here](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html#models-supported).

**Q: What is the difference between On-demand and Provisioned Throughput in Amazon Bedrock?**

With the On-Demand mode, you only pay for what you use, with no time-based term commitments. For text generation models, you are charged for every input token processed and every output token generated. For embeddings models, you are charged for every input token processed. A token is comprised of a few characters and refers to the basic unit that a model learns to understand user input and prompt to generate results. For image generation models, you are charged for every image generated.

With the Provisioned Throughput mode, you can purchase model units for a specific base or custom model. The Provisioned Throughput mode is primarily designed for large consistent inference workloads that need guaranteed throughput. Custom models can only be accessed using Provisioned Throughput. A model unit provides a certain throughput, which is measured by the maximum number of input or output tokens processed per minute. With this Provisioned Throughput pricing, charged by the hour, you have the flexibility to choose between 1-month or 6-month commitment terms.

**Q: Where can I find customer references for Amazon Bedrock?**

Go [here](https://aws.amazon.com/bedrock/testimonials/).

**Q: Where can I find resources for prompt engineering?**

[Prompt Engineering Guide](https://www.promptingguide.ai/).

**Q: Is LangChain mandatory to use Amazon Bedrock?**

No. You can interact with Amazon Bedrock using the [Bedrock API](https://docs.aws.amazon.com/bedrock/latest/APIReference/welcome.html) or language-specific [AWS SDKs](https://aws.amazon.com/developer/tools/). Using LangChain will simplify the orchestration of the steps involved in the interactions between various components involved in this architecture. 

**Q: How do I get started with LangChain?**

Go [here](https://python.langchain.com/docs/get_started/introduction).

**Q: Where can I find pricing information for the AWS services used in this notebook?**

- Amazon Bedrock pricing - go [here](https://aws.amazon.com/bedrock/pricing/).
- Amazon OpenSearch Serverless pricing - go [here](https://aws.amazon.com/opensearch-service/pricing/) and navigate to the <i>Serverless</i> section.
- AWS Identity and Access Management (IAM) pricing - free.
- Amazon CloudWatch pricing - go [here](https://aws.amazon.com/cloudwatch/pricing/).
- Amazon SageMaker Notebook instance (or) Amazon SageMaker Studio Notebook pricing - go [here](https://aws.amazon.com/sagemaker/pricing/).