# Maximizing AI Potentials: Leveraging Foundational Models from Amazon Bedrock and Amazon OpenSearch Serverless as Vector Engine

### Context
Amazon Bedrock is a fully managed service that provides access to FMs from third-party providers and Amazon; available via an API. With Bedrock, you can choose from a variety of models to find the one that’s best suited for your use case. On one hand Amazon Bedrock provides an option to generate vectors as well as summarizezation of texts, then on other hands vector engine for Amazon OpenSearch Serverless complements it by providing a machinsm to store those vectors and run semantic search against those vectors. 

In this sample notebook you will explore some of the most common usage patterns we are seeing with our customers for Generative AI such as generating text and images, creating value for organizations by improving productivity. This is achieved by leveraging foundation models to help in composing emails, summarizing text, answering questions, building chatbots, and creating images.

### Challenges
- How to manage large document(s) that exceed the token limit
- How to find the document(s) relevant to the question being asked

### Proposal
To the above challenges, this notebook proposes the following strategy
#### Prepare documents
![Embeddings](./images/Embeddings_lang.png)

Before being able to answer the questions, the documents must be processed and a stored in a document store index
- Load the documents
- Process and split them into smaller chunks
- Create a numerical vector representation of each chunk using Amazon Bedrock Titan Embeddings model
- Create an index using the chunks and the corresponding embeddings and store into OpenSearch Serverless
#### Ask question
![Question](./images/Chatbot_lang.png)

When the documents index is prepared, you are ready to ask the questions and relevant documents will be fetched based on the question being asked. Following steps will be executed.
- Create an embedding of the input question
- Compare the question embedding with the embeddings stored in OpenSearch Serverless
- Fetch the (top N) relevant document chunks using vector engine
- Add those chunks as part of the context in the prompt
- Send the prompt to the model under Amazon Bedrock
- Get the contextual answer based on the documents retrieved

## Usecase
#### Dataset
To explain this architecture pattern we are using the Amazon shareholder letters for a few years.

The model will try to answer from the documents in easy language.


## Implementation
In order to follow the RAG approach this notebook is using the LangChain framework where it has integrations with different services and tools that allow efficient building of patterns such as RAG. We will be using the following tools:

- **LLM (Large Language Model)**: Anthropic Claude V1 available through Amazon Bedrock
- **Vector Store**: vector engine for Amazon OpenSearch Serverless
  In this notebook we are using OpenSearch Serverless as a vector-store to store both the embeddings and the documents. 
- **Index**: VectorIndex - This model will be used to understand the document chunks and provide an answer in human friendly manner.
- **Embeddings Model**: Amazon Titan Embeddings available through Amazon Bedrock

  This model will be used to generate a numerical representation of the textual documents
- **Document Loader**: PDF Loader available through LangChain

  This is the loader that can load the documents from a source, in this example we are loading the vector embeddings generated from those file chunks to OpenSearch Serverless. 

  The index helps to compare the input embedding and the document embeddings to find relevant document
- **Wrapper**: wraps index, vector store, embeddings model and the LLM to abstract away the logic from the user.

### Setup
To run this notebook you would need to install dependencies such as, [PyPDF](https://pypi.org/project/pypdf/)



Then begin with instantiating the LLM and the Embeddings model. Here we are using Amazon Titan to demonstrate the use case.

Note: It is possible to choose other models available with Bedrock. You can replace the `model_id` as follows to change the model.

`llm = Bedrock(model_id="amazon.titan-tg1-large")`

Available models under Bedrock have the following IDs:
- `amazon.titan-tg1-large`
- `ai21.j2-grande-instruct`
- `ai21.j2-jumbo-instruct`
- `anthropic.claude-instant-v1`
- `anthropic.claude-v1`

#### ⚠️⚠️⚠️ Execute the following cells before running this notebook ⚠️⚠️⚠️

For a detailed description on what the following cells do refer to [Bedrock boto3 setup](../00_Intro/bedrock_boto3_setup.ipynb) notebook.

In [None]:
!cd .. && ./download-dependencies.sh

In [None]:
import glob
import subprocess

botocore_whl_filename = glob.glob("../dependencies/botocore-*-py3-none-any.whl")[0]
boto3_whl_filename = glob.glob("../dependencies/boto3-*-py3-none-any.whl")[0]

subprocess.Popen(['pip', 'install', botocore_whl_filename, boto3_whl_filename, '--force-reinstall'], bufsize=1, universal_newlines=True)

In [None]:
# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell
%pip install langchain==0.0.254 --force-reinstall --quiet
%pip install pypdf==3.8.1 faiss-cpu==1.7.4 --force-reinstall --quiet

In [None]:
# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell
#%pip install ./dependencies/botocore-1.29.162-py3-none-any.whl ./dependencies/boto3-1.26.162-py3-none-any.whl ./dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall
%pip install langchain==0.0.245 --quiet
%pip install pypdf==3.8.1 faiss-cpu==1.7.4 --quiet
%pip install requests_aws4auth opensearch-py

In [None]:
#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access

# import os
# os.environ['BEDROCK_ASSUME_ROLE'] = '<enter role>'
# os.environ['AWS_PROFILE'] = '<aws-profile>'

In [None]:
import boto3
import json
import os
import sys

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww

os.environ['AWS_DEFAULT_REGION'] = 'us-west-2'
boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))
print (f"bedrock client {boto3_bedrock}")

In [None]:
## set up opensearch
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
import json

# create open search collection public endpoint from public preview in us-east-2
#host = 'jobm91bhqffl30fsl22a.us-east-2.aoss.amazonaws.com' # OpenSearch Serverless collection endpoint

#https://vbepafgvf5rx9n314te6.us-west-2.aoss.amazonaws.com
host = 'nas3j63bxdjwni0aty0k.us-west-2.aoss.amazonaws.com'

region = 'us-west-2'

service = 'aoss'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service,
session_token=credentials.token)

# Create an OpenSearch client
client = OpenSearch(
    hosts = [{'host': host, 'port': 443}],
    http_auth = awsauth,
    timeout = 300,
    use_ssl = True,
    verify_certs = True,
    connection_class = RequestsHttpConnection
)

### Setup langchain

We create an instance of the Bedrock classes for the LLM and the embedding models. In this example we are showing an example with "titan" model from Amazon, and "claude" model from Anthropic.

In [None]:
# We will be using the Titan Embeddings Model to generate our Embeddings.
from langchain.embeddings import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock

# - create the Anthropic Model
claude_llm = Bedrock(model_id="anthropic.claude-v1", client=boto3_bedrock, model_kwargs={'max_tokens_to_sample':200})
titan_llm = Bedrock(model_id= "amazon.titan-tg1-large", client=boto3_bedrock)
bedrock_embeddings = BedrockEmbeddings(client=boto3_bedrock)

### Data Preparation
Let's first download some of the files to build our document store. For this example we will be using public Amazon Shareholder Letters.

In [None]:
!mkdir -p ./data

from urllib.request import urlretrieve
urls = [
    'https://s2.q4cdn.com/299287126/files/doc_financials/2023/ar/2022-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2022/ar/2021-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2021/ar/Amazon-2020-Shareholder-Letter-and-1997-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2020/ar/2019-Shareholder-Letter.pdf'
]

filenames = [
    'AMZN-2022-Shareholder-Letter.pdf',
    'AMZN-2021-Shareholder-Letter.pdf',
    'AMZN-2020-Shareholder-Letter.pdf',
    'AMZN-2019-Shareholder-Letter.pdf'
]

metadata = [
    dict(year=2022, source=filenames[0]),
    dict(year=2021, source=filenames[1]),
    dict(year=2020, source=filenames[2]),
    dict(year=2019, source=filenames[3])]

data_root = "./data/"

for idx, url in enumerate(urls):
    file_path = data_root + filenames[idx]
    urlretrieve(url, file_path)

As part of Amazon's culture, the CEO always includes a copy of the 1997 Letter to Shareholders with every new release. This will cause repetition, take longer to generate embeddings, and may skew your results. In the next section you will take the downloaded data, trim the 1997 letter (last 3 pages) and overwrite them as processed files.

In [None]:
from pypdf import PdfReader, PdfWriter

local_pdfs = glob.glob(data_root + '*.pdf')

for local_pdf in local_pdfs:
    pdf_reader = PdfReader(local_pdf)
    pdf_writer = PdfWriter()
    for pagenum in range(len(pdf_reader.pages)-3):
        page = pdf_reader.pages[pagenum]
        pdf_writer.add_page(page)

    with open(local_pdf, 'wb') as new_file:
        new_file.seek(0)
        pdf_writer.write(new_file)
        new_file.truncate()

After downloading we can load the documents with the help of [DirectoryLoader from PyPDF available under LangChain](https://python.langchain.com/en/latest/reference/modules/document_loaders.html) and splitting them into smaller chunks.

Note: The retrieved document/text should be large enough to contain enough information to answer a question; but small enough to fit into the LLM prompt. Also the embeddings model has a limit of the length of input tokens limited to 512 tokens, which roughly translates to ~2000 characters. For the sake of this use-case we are creating chunks of roughly 1000 characters with an overlap of 100 characters using [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html).

In [None]:
import numpy as np
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader

documents = []

for idx, file in enumerate(filenames):
    loader = PyPDFLoader(data_root + file)
    document = loader.load()
    for document_fragment in document:
        document_fragment.metadata = metadata[idx]
        
    print(f'{len(document)} {document}\n')
    documents += document

# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 1000,
    chunk_overlap  = 100,
)

docs = text_splitter.split_documents(documents)

Before we are proceeding we are looking into some interesting statistics regarding the document preprocessing we just performed:

In [None]:
avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)
print(f'Average length among {len(documents)} documents loaded is {avg_doc_length(documents)} characters.')
print(f'After the split we have {len(docs)} documents as opposed to the original {len(documents)}.')
print(f'Average length among {len(docs)} documents (after split) is {avg_doc_length(docs)} characters.')

In [None]:
# !mkdir data

# from urllib.request import urlretrieve
# files = [
#     'https://www.irs.gov/pub/irs-pdf/p1544.pdf',
#     'https://www.irs.gov/pub/irs-pdf/p15.pdf',
#     'https://www.irs.gov/pub/irs-pdf/p1212.pdf'
# ]
# for url in files:
#     file_path = './data/' + url.split('/')[-1]
#     urlretrieve(url, file_path)

After downloading we can load the documents with the help of [DirectoryLoader from PyPDF available under LangChain](https://python.langchain.com/en/latest/reference/modules/document_loaders.html) and splitting them into smaller chunks.

Note: The retrieved document/text should be large enough to contain enough information to answer a question; but small enough to fit into the LLM prompt. Also the embeddings model has a limit of the length of input tokens limited to 512 tokens, which roughly translates to ~2000 characters. For the sake of this use-case we are creating chunks of roughly 1000 characters with an overlap of 100 characters using [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html).

In [None]:
# import numpy as np
# from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
# from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

# loader = PyPDFDirectoryLoader("./data/")

# documents = loader.load()
# # - in our testing Character split works better with this PDF data set
# text_splitter = RecursiveCharacterTextSplitter(
#     # Set a really small chunk size, just to show.
#     chunk_size = 1000,
#     chunk_overlap  = 100,
# )
# docs = text_splitter.split_documents(documents)

In [None]:
# avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)
# avg_char_count_pre = avg_doc_length(documents)
# avg_char_count_post = avg_doc_length(docs)
# print(f'Average length among {len(documents)} documents loaded is {avg_char_count_pre} characters.')
# print(f'After the split we have {len(docs)} documents more than the original {len(documents)}.')
# print(f'Average length among {len(docs)} documents (after split) is {avg_char_count_post} characters.')

We had 3 PDF documents which have been split into smaller ~500 chunks.

Now we can see how a sample embedding would look like for one of those chunks

In [None]:
query_embedding = np.array(bedrock_embeddings.embed_query(docs[0].page_content))
np.array(query_embedding)

 
The below function will establish a connection with OpenSearch Serverless, create a new index, create embeddings for the documents and then store the embeddings in OpenSearch serverless. For details on documentation refer this link: https://python.langchain.com/docs/integrations/vectorstores/opensearch

*Note: Wait for a minute or two after the below command to excute, before the new index can be queried.*

In [None]:
# TODO - Direct langchain integration with version 0.0.245 gives timeout error, therefore, commenting the following code. 

# from langchain.vectorstores import OpenSearchVectorSearch

# docsearch = OpenSearchVectorSearch.from_documents(
#     docs,
#     bedrock_embeddings,
#     opensearch_url=host,
#     http_auth=awsauth,
#     timeout = 300,
#     use_ssl = True,
#     verify_certs = True,
#     connection_class = RequestsHttpConnection,
#     index_name="bedrock-aos-irs-index2",
#     engine="faiss",
#     bulk_size=len(docs)
# )

In [None]:
index_name = "bedrock-opensearch-serverless-amazon-shareholder"
vector_size = 4096

In [None]:
# create a new index
index_body = {
    "settings": {
        "index.knn": True
  },
  'mappings': {
    'properties': {
      "title": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, #the field will be title.keyword and the data type will be keyword, this will act as sub field for
      "v_title": { "type": "knn_vector", "dimension": vector_size },
    }
  }
}

client.indices.create(
  index=index_name, 
  body=index_body
)

In [None]:
# python code to view schema for OpenSearch Serverless. 
client.indices.get_mapping(index_name)

In [None]:
actions =[]
bulk_size = 0
action = {"index": {"_index": index_name}}


# # Prepare bulk request
# actions.append(action)
# actions.append(json_data.copy())

In [None]:
# Bulk API to ingest documents in OSS.
# it will take about 5 mins to ingest the 503 vectors
for document in docs: 
    sample_embedding = np.array(bedrock_embeddings.embed_query(document.page_content))
    actions.append(action)
    json_data = {
             "title" : document.page_content,
            "v_title" : sample_embedding
        }
    actions.append(json_data)
    bulk_size+=1
    if(bulk_size > 200 ):
        client.bulk(body=actions)
        print(f"bulk request sent with size: {bulk_size}")
        bulk_size = 0

#ingest remaining documents
print("remaining documents: ", bulk_size)
client.bulk(body=actions)

Following the similar pattern embeddings could be generated for the entire corpus and stored in a vector store.
**⚠️⚠️⚠️ NOTE: it might take few minutes to run the following cell ⚠️⚠️⚠️**

### Question Answering

Now that we have our vector store in place, we can start asking questions.

In [None]:
query = "How has AWS evolved?"

The first step would be to create an embedding of the query such that it could be compared with the documents

In [None]:
query_embedding = np.array(bedrock_embeddings.embed_query(query))
np.array(query_embedding)

In [None]:
index_name

In [None]:
query_os = {
  "size": 3,
  "fields": ["title"],
  "_source": False,
  "query": {
    "knn": {
      "v_title": {
        "vector": query_embedding,
        "k": vector_size
      }
    }
  }
}

relevant_documents = client.search(
body = query_os,
index = index_name
)

In [None]:
relevant_documents

We can use this embedding of the query to then fetch relevant documents.
Now our query is represented as embeddings we can do a similarity search of our query against our data store providing us with the most relevant information.

In [None]:
print(len(relevant_documents["hits"]["hits"]))
print("--------------------")
context = " "
for i, rel_doc in enumerate(relevant_documents["hits"]["hits"]):
    print_ww(f'## Document {i+1}: {relevant_documents["hits"]["hits"][i]["fields"]["title"][0]}.......')
    print('---')
    context += relevant_documents["hits"]["hits"][i]["fields"]["title"][0]

In [None]:
parameters = {
    "maxTokenCount":512,
    "stopSequences":[],
    "temperature":0,
    "topP":0.9
    }

In [None]:
query

In [None]:
prompt_data_claude = f"""Human: Answer the question based only on the information provided. If the answer is not in the context, say "I don't know, answer not found in the documents. Provide quote from the document.
<context>
{context}
</context>
<question>
{query}
</question>
Assistant:"""



In [None]:
output_text_claude = claude_llm(prompt_data_claude)

print ("########## Output from Claude Model #################\n")
print(output_text_claude)



In [None]:
prompt_data_titan = f"""Answer the below question based on the context provided. If the answer is not in the context, say "I don't know, answer not found in the documents".
{context}
{query}
"""

In [None]:
output_text_titan = titan_llm(prompt_data_titan)
print ("########## Output from Titan Model ################\n")
print(output_text_titan)

## Conclusion
Congratulations on completing this moduel on retrieval augmented generation! This is an important technique that combines the power of large language models with the precision of retrieval methods. By augmenting generation with relevant retrieved examples, the responses we recieved become more coherent, consistent and grounded. You should feel proud of learning this innovative approach. I'm sure the knowledge you've gained will be very useful for building creative and engaging language generation systems. Well done!

In the above implementation of RAG based Question Answering we have explored the following concepts and how to implement them using Amazon Bedrock and it's LangChain integration.

- Loading documents and generating embeddings to create a vector store
- Retrieving documents to the question
- Preparing a prompt which goes as input to the LLM
- Present an answer in a human friendly manner

### Take-aways
- Experiment with different Vector Stores
- Leverage various models available under Amazon Bedrock to see alternate outputs
- Explore options such as persistent storage of embeddings and document chunks
- Integration with enterprise data stores

# Thank You