# Retrieval Augmented Question & Answering with Amazon Bedrock

Using LangChain & Pinecone Vector DB

STEPS:

Prepare documents -> chunks -> Create mebedding using Amazon Bedrock Titan Embeddings model -> save it in vectorDB Pinecone
When user input request -> request to embedding vectors -> Find the document(s) relevant to the question being asked -> feed it to LLM with prompt -> Finally return answer.

# Setup AWS BedRock

In [3]:
!pip3 install -U langchain pypdf pinecone-client apache-beam datasets tiktoken fastapi kaleido python-multipart uvicorn cohere openai --force-reinstall -q

[33m  DEPRECATION: pyjsparser is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559[0m[33m
[0m[33m  DEPRECATION: crcmod is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559[0m[33m
[0m[33m  DEPRECATION: dill is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pe

In [4]:
%pip install pydantic==1.10.13 --force-reinstall --quiet
%pip install sqlalchemy==2.0.21 --force-reinstall --quiet
%pip install boto3 --force-reinstall --quiet


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [40]:
import boto3
import json
import os
import sys

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww

bedrock_client = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region="us-west-2",
    runtime=True # Default. Needed for invoke_model() from the data plane
)

Create new client
  Using region: us-west-2
boto3 Bedrock client successfully created!
bedrock-runtime(https://bedrock-runtime.us-west-2.amazonaws.com)


# Configure langchain
We begin with instantiating the LLM and the Embeddings model. Here we are using Titan for text generation and Amazon Titan for text embedding.

Note: It is possible to choose other models available with Bedrock. You can replace the model_id as follows to change the model.

In [37]:
# We will be using the Titan Embeddings Model to generate our Embeddings.
from langchain.embeddings import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock

# - create the Titan Model
llm = Bedrock(
    model_id="amazon.titan-text-express-v1", 
    client=bedrock_client
)
bedrock_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1",
                                       client=bedrock_client)

# Chop pdf 

In [8]:
import numpy as np
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader

filenames=['AMZ-2023-10-k.pdf']
metadata = [dict(year=2023, source=filenames[0]) ]
data_root = "./data/"
documents = []

for idx, file in enumerate(filenames):
    loader = PyPDFLoader(data_root + file)
    document = loader.load()
    for document_fragment in document:
        document_fragment.metadata = metadata[idx]
        
    print(f'{len(document)} {document}\n')
    documents += document

# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 1000,
    chunk_overlap  = 100,
)

docs = text_splitter.split_documents(documents)

94 [Document(page_content='Table of Contents\nUNITED STATES\nSECURITIES AND EXCHANGE COMMISSION\nWashington, D.C. 20549\n ____________________________________\nFORM 10-K\n____________________________________ \n(Mark One)\n☒ ANNUAL  REPOR T PURSUANT  TO SECTION 13 OR 15(d) OF  THE SECURITIES EXCHANGE ACT  OF 1934\nFor the fiscal year ended December 31, 2023\nor\n☐ TRANSITION REPOR T PURSUANT  TO SECTION 13 OR 15(d) OF  THE SECURITIES EXCHANGE ACT  OF 1934\nFor the transition period from            to             .\nCommission File No. 000-22513\n____________________________________\nAMAZON .COM, INC.\n(Exact name of registrant as specified in its charter)\nDelaware  91-1646860\n(State or other jurisdiction of\nincorporation or organization)  (I.R.S. Employer\nIdentification No.)\n410 Terry Avenue North\nSeattle, Washington 98109-5210\n(206) 266-1000\n(Addr ess and telephone number , including ar ea code, of r egistrant’ s principal executive offices)\nSecurities registered pursuant to S

In [9]:
avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)
print(f'Average length among {len(documents)} documents loaded is {avg_doc_length(documents)} characters.')
print(f'After the split we have {len(docs)} documents as opposed to the original {len(documents)}.')
print(f'Average length among {len(docs)} documents (after split) is {avg_doc_length(docs)} characters.')

Average length among 94 documents loaded is 3466 characters.
After the split we have 395 documents as opposed to the original 94.
Average length among 395 documents (after split) is 836 characters.


In [10]:
docs[0]

Document(page_content='Table of Contents\nUNITED STATES\nSECURITIES AND EXCHANGE COMMISSION\nWashington, D.C. 20549\n ____________________________________\nFORM 10-K\n____________________________________ \n(Mark One)\n☒ ANNUAL  REPOR T PURSUANT  TO SECTION 13 OR 15(d) OF  THE SECURITIES EXCHANGE ACT  OF 1934\nFor the fiscal year ended December 31, 2023\nor\n☐ TRANSITION REPOR T PURSUANT  TO SECTION 13 OR 15(d) OF  THE SECURITIES EXCHANGE ACT  OF 1934\nFor the transition period from            to             .\nCommission File No. 000-22513\n____________________________________\nAMAZON .COM, INC.\n(Exact name of registrant as specified in its charter)\nDelaware  91-1646860\n(State or other jurisdiction of\nincorporation or organization)  (I.R.S. Employer\nIdentification No.)\n410 Terry Avenue North\nSeattle, Washington 98109-5210\n(206) 266-1000\n(Addr ess and telephone number , including ar ea code, of r egistrant’ s principal executive offices)\nSecurities registered pursuant to Secti

In [14]:
sample_embedding = np.array(bedrock_embeddings.embed_query(docs[0].page_content))
print("Sample embedding of a document chunk: ", sample_embedding)
print("Size of the embedding: ", sample_embedding.shape)

Sample embedding of a document chunk:  [ 0.734375   -0.10742188 -0.08251953 ...  0.359375   -0.19042969
 -0.28320312]
Size of the embedding:  (1536,)


# Store documents in a vector store: Pinecone

In [22]:
from pinecone import Pinecone, ServerlessSpec
import time
import os

# add index name from pinecone.io
index_name = 'amz-2023-10k'
# add Pinecone API key from app.pinecone.io
api_key = os.environ.get("PINECONE_API_KEY") 

pc = Pinecone(api_key=api_key)

if index_name in pc.list_indexes():
    pc.delete_index(index_name)

pc.create_index(name=index_name, 
        dimension=sample_embedding.shape[0], 
        metric="dotproduct",  
        spec=ServerlessSpec(
            cloud="aws",
            region="us-west-2"
    ) )
# wait for index to finish initialization
while not pc.describe_index(index_name).status["ready"]:
    time.sleep(1)

In [23]:
index = pc.Index(index_name)
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

In [25]:
%%time

from langchain.vectorstores import Pinecone

docsearch = Pinecone.from_documents(docs, bedrock_embeddings, index_name=index_name)

CPU times: user 3.14 s, sys: 264 ms, total: 3.4 s
Wall time: 1min 16s


In [26]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

# LangChain Vector Store and Query 


In [27]:
from langchain.vectorstores import Pinecone

text_field = "text"

vectorstore = Pinecone(index, bedrock_embeddings, text_field)

  warn_deprecated(


In [48]:
query = "Does amazon have a clear strategy for growth and innovation? please elaborate"

vectorstore.similarity_search(query, k=3)

[Document(page_content='economic conditions and customer demand and spending, inflation, interest rates, regional labor market constraints, world events, the rate of growth of the\ninternet, online commerce, cloud services, and new and emerging technologies, the amount that Amazon.com invests in new business opportunities and the\ntiming of those investments, the mix of products and services sold to customers, the mix of net sales derived from products as compared with services, the extent\nto which we owe income or other taxes, competition, management of growth, potential fluctuations in operating results, international growth and expansion,\nthe outcomes of claims, litigation, government investigations, and other proceedings, fulfillment, sortation, delivery, and data center optimization, risks of\ninventory management, variability in demand, the degree to which we enter into, maintain, and develop commercial agreements, proposed and completed', metadata={'source': 'AMZ-2023-10-k.pdf

# Generative Question Answering

In [44]:
from langchain.chains import RetrievalQAWithSourcesChain


qa_with_sources = RetrievalQAWithSourcesChain.from_chain_type(
llm=llm, chain_type="stuff", retriever=vectorstore.as_retriever(), return_source_documents=True)

In [45]:
qa_with_sources(query)

{'question': 'Does amazon have a clear strategy for growth and innovation? please elaborate',
 'answer': '\n\nYes, Amazon has a clear strategy for growth and innovation.\n\n',
 'sources': '',
 'source_documents': [Document(page_content='economic conditions and customer demand and spending, inflation, interest rates, regional labor market constraints, world events, the rate of growth of the\ninternet, online commerce, cloud services, and new and emerging technologies, the amount that Amazon.com invests in new business opportunities and the\ntiming of those investments, the mix of products and services sold to customers, the mix of net sales derived from products as compared with services, the extent\nto which we owe income or other taxes, competition, management of growth, potential fluctuations in operating results, international growth and expansion,\nthe outcomes of claims, litigation, government investigations, and other proceedings, fulfillment, sortation, delivery, and data center

# Customizable option 
with RetrievalQA

In [46]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

prompt_template = """

Human: Use the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}

Assistant:"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

query = "What is the strategy for growth and innovation of Amazon?"

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT},
)
result = qa({"query": query})
print_ww(result["result"])

 To invest efficiently in numerous areas of technology and infrastructure so we may continue to
enhance the customer experience and improve our process efficiency through rapid technology
developments, while operating at an ever increasing scale.
