# Retrieval Augmented Question (RAG) Application with Llama3-8B on Bedrock using LangChain

RAG Application use cases with Llama3-8B on Bedrock

In this notebook, we demonstrate the use of [Llama3-8B](https://huggingface.co/meta-llama/Llama-2-13b) text generation combined with [Amazon Titan Embedding v2](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-embed-text.html) embedding model to efficiently construct a Retrieval Augmented Generation (RAG) QnA system on a SageMaker Notebook. This notebook, powered by an `ml.t3.medium instance`, uses LLMs deployed on [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html). These can be called with an API, which we then use to build, experiment with, and tune for comparing Advanced RAG application techniques using [LangChain](https://www.langchain.com/). Additionally, we showcase how the [FAISS](https://github.com/facebookresearch/faiss) Embedding store can be utilized to archive and retrieve embeddings, integrating it into your RAG workflow. 

## Prerequisites

---
This Jupyter Notebook can be run on a t3.medium instance (ml.t3.medium). Other items needed to run this notebook are:

1. Configure AWS CLI in the notebook instance with your AWS credentials
2. Add BedrockAccess policy to your AWS user or role
3. Follow the instructions in the pre-requisites document to set up your Jupyter space instance

## Contents
---

1. [Requirements](#Requirements)
1. [Model Deployment](#00.-Model-Deployment)
1. [Setup LangChain](#01.-Setup-LangChain)
1. [Data Preparation](#Data-Preparation)
1. [Question Answering](#Question-Answering)
1. [Regular Retriever Chain](#Regular-Retriever-Chain)
1. [Parent Document Retriever Chain](#Parent-Document-Retriever-Chain)
1. [Contextual Compression Chain](#Contextual-Compression-Chain)
1. [Conclusion](#Conclusion)
1. [Clean Up Resources](#Clean-Up-Resources)

## Requirements
---

1. Create an Amazon SageMaker Notebook Instance - [Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-setup-working-env.html)
    - For Notebook Instance type, choose ml.t3.medium.
2. For Select Kernel, choose [conda_python3](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-prepare.html).
3. Install the required packages.

<div class="alert alert-block alert-info"> 

<b>NOTE:

- </b> For <a href="https://aws.amazon.com/sagemaker/studio/" target="_blank">Amazon SageMaker Studio</a>, select Kernel "<span style="color:green;">Python 3 (ipykernel)</span>".

- For <a href="https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html" target="_blank">Amazon SageMaker Studio Classic</a>, select Image "<span style="color:green;">Base Python 3.0</span>" and Kernel "<span style="color:green;">Python 3</span>".

</div>

To run this notebook you would need to install the following dependencies:

In [1]:
%%writefile requirements.txt
langchain==0.1.14
pypdf==4.1.0
faiss-cpu==1.8.0
boto3==1.34.58
sqlalchemy==2.0.29

Overwriting requirements.txt


In [2]:
!pip install -U -r requirements.txt --quiet

<div class="alert alert-block alert-warning"> 

<b>NOTE:</b>

Before proceeding, please verify that you have the correct version of the SQLAlchemy library installed. This notebook requires SQLAlchemy >= 2.0.0.

To check your installed SQLAlchemy version, you can run the following code:

```python
import sqlalchemy
print(sqlalchemy.__version__)
```

If the version displayed is less than 2.0.0, and you have already installed the correct version using `pip`, you may need to "<span style="color:green;">restart</span>" or "<span style="color:green;">shutdown</span>" the Jupyter Notebook kernel to load the updated library.

To restart the kernel, go to the "Kernel" menu and select "Restart Kernel". If that doesn't work, try shutting down the notebook completely and relaunching it.

Restarting or shutting down the kernel will resolve any dependency issues and ensure that the correct SQLAlchemy version is loaded.

If you haven't installed SQLAlchemy >= 2.0.0 yet, you can do so by running the following command in your terminal or command prompt:

```
pip install sqlalchemy>=2.0.29
```

Once the installation is complete, restart or shutdown the Jupyter Notebook kernel as described above.

</div>

In [3]:
import sqlalchemy
print(sqlalchemy.__version__)

2.0.29


In [4]:
import langchain
print(langchain.__version__)

0.1.14


In [5]:
try:
    import sagemaker
except ImportError:
    !pip install sagemaker --quiet

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


## 01. Setup LangChain
---

In [9]:
import boto3
import pprint
from botocore.client import Config
import json

pp = pprint.PrettyPrinter(indent=2)
session = boto3.session.Session()
region = session.region_name
bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})
bedrock_client = boto3.client('bedrock-runtime', region_name = region)
bedrock_agent_client = boto3.client("bedrock-agent-runtime",
                              config=bedrock_config, region_name = region)
print(region)

us-west-2


In [10]:
import json
import sagemaker

from langchain_core.prompts import PromptTemplate
from langchain_community.llms import Bedrock
from langchain_community.embeddings import BedrockEmbeddings

Instantiate the LLM with Bedrock and LangChain

In [11]:
model_kwargs_llama = {
    "temperature": 0,
    "top_p": 0.9,
    "max_gen_len": 2048
}

llm = Bedrock(model_id="meta.llama3-8b-instruct-v1:0",
              model_kwargs=model_kwargs_llama,
              client = bedrock_client,)

  warn_deprecated(


Instantiate the embedding model with Bedrock and LangChain

In [12]:
embeddings_doc = BedrockEmbeddings(model_id='amazon.titan-embed-text-v2:0',                               
              client = bedrock_client)

## Data Preparation
---

Let's first download some of the files to build our document store.

In this example, you will use several years of Amazon's Letter to Shareholders as a text corpus to perform Q&A on.

In [13]:
!mkdir -p ./data

from urllib.request import urlretrieve
urls = [
    'https://d18rn0p25nwr6d.cloudfront.net/CIK-0001018724/c7c14359-36fa-40c3-b3ca-5bf7f3fa0b96.pdf',
    'https://d18rn0p25nwr6d.cloudfront.net/CIK-0001018724/d2fde7ee-05f7-419d-9ce8-186de4c96e25.pdf',
    'https://d18rn0p25nwr6d.cloudfront.net/CIK-0001018724/f965e5c3-fded-45d3-bbdb-f750f156dcc9.pdf',
    'https://d18rn0p25nwr6d.cloudfront.net/CIK-0001018724/336d8745-ea82-40a5-9acc-1a89df23d0f3.pdf'
]

filenames = [
    'AMZN-2024-10-K-Annual-Report.pdf',
    'AMZN-2023-10-K-Annual-Report.pdf',
    'AMZN-2022-10-K-Annual-Report.pdf',
    'AMZN-2021-10-K-Annual-Report.pdf'
]

metadata = [
    dict(year=2024, source=filenames[0]),
    dict(year=2023, source=filenames[1]),
    dict(year=2022, source=filenames[2]),
    dict(year=2021, source=filenames[3])]

data_root = "./data/"

for idx, url in enumerate(urls):
    file_path = data_root + filenames[idx]
    urlretrieve(url, file_path)

If you take a look into the Amazon 10-Ks, the first 4 pages are all the very similar and may skew the responses if you they are kept in the embeddings. This will cause repetition, take longer to generate embeddings, and may skew your results. In the next section you will take the downloaded data, trim the 10-K (first 4 pages) and overwrite them as processed files.

In [14]:
from pypdf import PdfReader, PdfWriter
import glob

local_pdfs = glob.glob(data_root + '*.pdf')

# Iterate over each PDF file
for idx, local_pdf in enumerate(local_pdfs):
    pdf_reader = PdfReader(local_pdf)
    pdf_writer = PdfWriter()
    
    if idx == 0:
        # Keep the first 4 pages for the first document
        for pagenum in range(len(pdf_reader.pages)):
            page = pdf_reader.pages[pagenum]
            pdf_writer.add_page(page)
    else:
        # Remove the first 4 pages for other documents
        for pagenum in range(4, len(pdf_reader.pages)):
            page = pdf_reader.pages[pagenum]
            pdf_writer.add_page(page)

    # Write the modified content to a new file
    with open(local_pdf, 'wb') as new_file:
        new_file.seek(0)
        pdf_writer.write(new_file)
        new_file.truncate()

After downloading we can load the documents with the help of [DirectoryLoader from PyPDF available under LangChain](https://python.langchain.com/en/latest/reference/modules/document_loaders.html) and splitting them into smaller chunks.

Note: The retrieved document/text should be large enough to contain enough information to answer a question; but small enough to fit into the LLM prompt. Also the embeddings model has a limit of the length of input tokens limited to 512 tokens, which roughly translates to ~2000 characters. For the sake of this use-case we are creating chunks of roughly 1000 characters with an overlap of 100 characters using [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html).

In [15]:
import numpy as np
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

documents = []

for idx, file in enumerate(filenames):
    loader = PyPDFLoader(data_root + file)
    document = loader.load()
    for document_fragment in document:
        document_fragment.metadata = metadata[idx]

    documents += document

# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=1000,
    chunk_overlap=100,
)

docs = text_splitter.split_documents(documents)
print(docs[100])

page_content='Table of Contents\nItem 7. Management’s Discussion and Analysis of Financial Condition and Results of Operations\nForward-Looking Statements\nThis Annual Report on Form 10-K includes forward-looking statements within the meaning of the Private Securities Litigation Reform Act of 1995. All\nstatements other than statements of historical fact, including statements regarding guidance, industry prospects, or future results of operations or financial\nposition, made in this Annual Report on Form 10-K are forward-looking. We use words such as anticipates, believes, expects, future, intends, and similar\nexpressions to identify forward-looking statements. Forward-looking statements reflect management’s current expectations and are inherently uncertain. Actual\nresults and outcomes could differ materially for a variety of reasons, including, among others, fluctuations in foreign exchange rates, changes in global' metadata={'year': 2024, 'source': 'AMZN-2024-10-K-Annual-Report.pdf

Before we are proceeding we are looking into some interesting statistics regarding the document preprocessing we just performed:

In [16]:
avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)

print(f'Average length among {len(documents)} documents loaded is {avg_doc_length(documents)} characters.')
print(f'After the split we have {len(docs)} documents as opposed to the original {len(documents)}.')
print(f'Average length among {len(docs)} documents (after split) is {avg_doc_length(docs)} characters.')

Average length among 437 documents loaded is 3348 characters.
After the split we have 1801 documents as opposed to the original 437.
Average length among 1801 documents (after split) is 823 characters.


We had 4 PDF documents which have been split into smaller ~500 chunks.

Now we can see how a sample embedding would look like for one of those chunks.

In [17]:
sample_embedding = np.array(embeddings_doc.embed_query(docs[0].page_content))
print("Sample embedding of a document chunk: ", sample_embedding)
print("Size of the embedding: ", sample_embedding.shape)

Sample embedding of a document chunk:  [-0.09360763  0.00087276  0.02369082 ... -0.01300106 -0.01781627
  0.02850603]
Size of the embedding:  (1024,)


This can be easily done using [FAISS](https://github.com/facebookresearch/faiss) implementation inside [LangChain](https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/faiss.html) which takes  input the embeddings model and the documents to create the entire vector store. Using the Index Wrapper we can abstract away most of the heavy lifting such as creating the prompt, getting embeddings of the query, sampling the relevant documents and calling the LLM. [VectorStoreIndexWrapper](https://python.langchain.com/en/latest/modules/indexes/getting_started.html#one-line-index-creation) helps us with that.

In [18]:
%%time
from langchain_community.vectorstores import FAISS
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

vectorstore_faiss = FAISS.from_documents(
    docs,
    embeddings_doc,
)
wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss)

CPU times: user 4.54 s, sys: 372 ms, total: 4.91 s
Wall time: 2min 22s


## Question Answering with LangChain Vector Store Wrapper
---

We use the wrapper provided by LangChain which wraps around the Vector Store and takes input the LLM. This wrapper performs the following steps behind the scences:

- Takes input the question
- Create question embedding
- Fetch relevant documents
- Stuff the documents and the question into a prompt
- Invoke the model with the prompt and generate the answer in a human readable manner.

*Note: In this example we are using `Llama 3 8B Instruct` as the LLM under Amazon SageMaker, this particular model performs best if the inputs are provided under `<|begin_of_text|><|start_header_id|>system<|end_header_id|>`, `{{system_message}}`, `<|eot_id|><|start_header_id|>user<|end_header_id|>`, `{{user_message}}`, and the model is requested to generate an output after `<|eot_id|><|start_header_id|>assistant<|end_header_id|>`. In the cell below you see an example of how to control the prompt such that the LLM stays grounded and doesn't answer outside the context.*

In [19]:
prompt_template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant.
<|eot_id|><|start_header_id|>user<|end_header_id|>
{query}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["query"]
)

In [20]:
query = "How did AWS perform in 2021?"

In [21]:
answer = wrapper_store_faiss.query(question=PROMPT.format(query=query), llm=llm)
print(answer)

 According to the provided context, AWS operating income increased in absolute dollars in 2021, compared to the prior year, primarily due to increased customer usage and cost structure productivity, partially offset by increased spending on technology infrastructure and payroll and related expenses, all of which were primarily driven by additional investments to support the business growth, and reduced prices for our customers. AWS sales increased 37% in 2021, compared to the prior year. The sales growth primarily reflects increased customer usage, partially offset by pricing changes. Pricing changes were driven largely by our continued efforts to reduce prices for our customers.


We can ask another question.

In [22]:
query_2 = "How much square footage did Amazon have in North America in 2023?"

In [23]:
answer = wrapper_store_faiss.query(question=PROMPT.format(query=query_2), llm=llm)
print(answer)

 According to the provided context, as of December 31, 2023, Amazon operated the following facilities in North America:

* Leased Square Footage: 424,145
* Owned Square Footage: 15,438

Total Square Footage in North America: 424,145 + 15,438 = 439,583


## Regular Retriever Chain
---
In the above scenario you explored the quick and easy way to get a context-aware answer to your question. Now let's have a look at a more customizable option with the help of [RetrievalQA](https://docs.smith.langchain.com/cookbook/hub-examples/retrieval-qa-chain) where you can customize how the documents fetched should be added to prompt using `chain_type` parameter. Also, if you want to control how many relevant documents should be retrieved then change the `k` parameter in the cell below to see different outputs. In many scenarios you might want to know which were the source documents that the LLM used to generate the answer, you can get those documents in the output using `return_source_documents` which returns the documents that are added to the context of the LLM prompt. `RetrievalQA` also allows you to provide a custom [prompt template](https://python.langchain.com/docs/modules/model_io/prompts/quick_start/) which can be specific to the model.
Should add or point to info about the chain_type parameter

In [24]:
%%time
from langchain.chains import RetrievalQA

prompt_template = """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

This is a conversation between an AI assistant and a Human.

<|eot_id|><|start_header_id|>user<|end_header_id|>

Use the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
#### Context ####
{context}
#### End of Context ####

Question: {question}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore_faiss.as_retriever(
        search_type="similarity", search_kwargs={"k": 3}
    ),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)

CPU times: user 461 µs, sys: 9 µs, total: 470 µs
Wall time: 515 µs


Let's start asking questions:

In [25]:
query = "How did AWS perform in 2023?"
result = qa({"query": query})
print(result['result'])

print(f"\n{result['source_documents']}")

  warn_deprecated(


According to the provided context, AWS operating income was $24.631 million in 2023, which is an increase from $22.841 million in 2022.

[Document(page_content='Table of Contents\nAWS sales increased 13% in 2023, compared to the prior year. The sales growth primarily reflects increased customer usage, partially offset by pricing\nchanges, primarily driven by long-term customer contracts.\nOperating Income (Loss)\nOperating income (loss) by segment is as follows (in millions):\nYear Ended December 31,\n2022 2023\nOperating Income (Loss)\nNorth America $ (2,847)$ 14,877 \nInternational (7,746) (2,656)\nAWS 22,841 24,631 \nConsolidated $ 12,248 $ 36,852 \nOperating income was $12.2 billion and $36.9 billion for 2022 and 2023. We believe that operating income is a more meaningful measure than gross\nprofit and gross margin due to the diversity of our product categories and services.\nThe North America operating income in 2023, as compared to the operating loss in the prior year, is primari

In [26]:
query = "What are some of the risk factors associated to Amazon?"
result = qa({"query": query})
print(result['result'])

print(f"\n{result['source_documents']}")

Based on the provided context, some of the risk factors associated with Amazon include:

1. Competition
2. Management of growth
3. Potential fluctuations in operating results
4. International growth and expansion
5. Outcomes of claims, litigation, government investigations, and other proceedings
6. Fulfillment, sortation, delivery, and data center optimization
7. Risks of inventory management
8. Variability in demand
9. Degree to which they enter into, maintain, and develop commercial agreements
10. Payments risks
11. Risks of fulfillment throughput and productivity
12. Global economic climate and additional or unforeseen economic conditions
13. Customer demand and spending
14. Inflation
15. Interest rates
16. Regional labor market and global supply chain constraints
17. World events
18. Rate of growth of the Internet, online commerce, and cloud services
19. Amount and timing of investments in new business opportunities
20. Mix of products and services sold to customers
21. Mix of net 

In [27]:
query = "Was Amazon involved in any lawsuits in 2022? What were they?"
result = qa({"query": query})
print(result['result'])

print(f"\n{result['source_documents']}")

According to the provided context, Amazon was involved in the following lawsuits in 2022:

1. Frame-Wilson: Amazon's motions to dismiss were granted in part and denied in part in March 2022.
2. De Coster v. Amazon.com, Inc. (W.D. Wash.): Amazon's motions to dismiss were granted in part and denied in part in March 2022.
3. DC Attorney General's lawsuit: The DC Superior Court dismissed the lawsuit in its entirety in March 2022, and the dismissal is under appeal as of January 2023.

These lawsuits were related to allegations of price fixing, monopolization, and consumer protection claims.

[Document(page_content='injunctive and structural relief, an unspecified amount of damages, and costs. Amazon’s motions to dismiss were granted in part and denied in part in Frame-\nWilson in March 2022 and March 2023, De Coster v. Amazon.com, Inc. (W.D. Wash.) in January 2023, and the California Attorney General’s lawsuit in March\n2023. All three courts dismissed claims alleging that Amazon’s pricing 

In [28]:
query = "What was Amazon's revenue in 2021?"

result = qa({"query": query})

print(result['result'])

print(f"\n{result['source_documents']}")

According to the provided financial statements, Amazon's total net sales in 2021 were $469,822 million.

[Document(page_content='Table of Contents\nAMAZON.COM, INC.\nCONSOLIDATED STATEMENTS OF OPERATIONS\n(in millions, except per share data)\n  Year Ended December 31,\n 2020 2021 2022\nNet product sales $ 215,915 $ 241,787 $ 242,901 \nNet service sales 170,149 228,035 271,082 \nTotal net sales 386,064 469,822 513,983 \nOperating expenses:\nCost of sales 233,307 272,344 288,831 \nFulfillment 58,517 75,111 84,299 \nTechnology and content 42,740 56,052 73,213 \nSales and marketing 22,008 32,551 42,238 \nGeneral and administrative 6,668 8,823 11,891 \nOther operating expense (income), net (75) 62 1,263 \nTotal operating expenses 363,165 444,943 501,735 \nOperating income 22,899 24,879 12,248 \nInterest income 555 448 989 \nInterest expense (1,647) (1,809) (2,367)\nOther income (expense), net 2,371 14,633 (16,806)\nTotal non-operating income (expense) 1,279 13,272 (18,184)\nIncome (loss) be

## Parent Document Retriever Chain
---

In this scenario, let's have a look at a more advanced rag option with the help of [ParentDocumentRetriever](https://python.langchain.com/docs/modules/data_connection/retrievers/parent_document_retriever). When working with document retrieval, you may encounter a trade-off between storing small chunks of a document for accurate embeddings and larger documents to preserve more context. The `ParentDocumentRetriever` strikes that balance by splitting and storing small chunks of data. 

First, a `parent_splitter` is used to divide the original documents into larger chunks called `parent documents.` These parent documents can preserve a reasonable amount of context so the LLM can.

Next, a `child_splitter` is applied to create smaller `child documents` from the original documents. These child documents allow the embeddings to reflect more accurately their meaning.

The child documents are then indexed in a vectorstore using embeddings. This enables efficient retrieval of relevant child documents based on similarity.

To retrieve relevant information, the `ParentDocumentRetriever` first fetches the child documents from the vectorstore. It then looks up the parent IDs for those child documents and returns the corresponding larger parent documents.

The `ParentDocumentRetriever` uses an [InMemoryStore](https://api.python.langchain.com/en/v0.1.4/storage/langchain.storage.in_memory.InMemoryBaseStore.html) to store and manage the parent documents. By working with both parent and child documents, this approach aims to balance accurate embeddings with contextual information, providing more meaningful and relevant retrieval results.

In [29]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore

Sometimes, the full documents can be too big to want to retrieve them as is. In that case, what we really want to do is to first split the raw documents into larger chunks, and then split it into smaller chunks. We then index the smaller chunks, but on retrieval we retrieve the larger chunks (but still not the full documents).

In [30]:
%%time
# This text splitter is used to create the parent documents
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)

# This text splitter is used to create the child documents
# It should create documents smaller than the parent
child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)

# The vectorstore to use to index the child chunks
vectorstore_faiss = FAISS.from_documents(
    child_splitter.split_documents(documents),
    embeddings_doc,
)

# The storage layer for the parent documents
store = InMemoryStore()

CPU times: user 19.3 s, sys: 1.32 s, total: 20.6 s
Wall time: 10min 2s


In [31]:
%%time
# The storage layer for the parent documents
store = InMemoryStore()
retriever = ParentDocumentRetriever(
    vectorstore=vectorstore_faiss,
    docstore=store,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)

CPU times: user 111 µs, sys: 0 ns, total: 111 µs
Wall time: 115 µs


In [32]:
retriever.add_documents(documents, ids=None)

Let’s now call the vector store search functionality - we should see that it returns small chunks (since we’re storing the small chunks).

In [33]:
sub_docs = vectorstore_faiss.similarity_search("What was Amazon's revenue in 2021?")

In [34]:
len(sub_docs[0].page_content)

367

In [35]:
print(sub_docs[0].page_content)

Table of Contents
AMAZON.COM, INC.
CONSOLIDATED STATEMENTS OF CASH FLOWS
(in millions)
  Year Ended December 31,
 2020 2021 2022
CASH, CASH EQUIV ALENTS, AND RESTRICTED CASH, BEGINNING OF PERIOD $ 36,410 $ 42,377 $ 36,477 
OPERA TING ACTIVITIES:
Net income (loss) 21,331 33,364 (2,722)
Adjustments to reconcile net income (loss) to net cash from operating activities:


Let’s now retrieve from the overall retriever. This should return large documents - since it returns the documents where the smaller chunks are located.

In [36]:
retrieved_docs = retriever.get_relevant_documents("What was Amazon's revenue in 2021?")

  warn_deprecated(


In [37]:
len(retrieved_docs[0].page_content)

1918

In [38]:
print(retrieved_docs[0].page_content)

Table of Contents
AMAZON.COM, INC.
CONSOLIDATED STATEMENTS OF CASH FLOWS
(in millions)
  Year Ended December 31,
 2020 2021 2022
CASH, CASH EQUIV ALENTS, AND RESTRICTED CASH, BEGINNING OF PERIOD $ 36,410 $ 42,377 $ 36,477 
OPERA TING ACTIVITIES:
Net income (loss) 21,331 33,364 (2,722)
Adjustments to reconcile net income (loss) to net cash from operating activities:
Depreciation and amortization of property and equipment and capitalized content costs, operating lease
assets, and other 25,180 34,433 41,921 
Stock-based compensation 9,208 12,757 19,621 
Other expense (income), net (2,582) (14,306) 16,966 
Deferred income taxes (554) (310) (8,148)
Changes in operating assets and liabilities:
Inventories (2,849) (9,487) (2,592)
Accounts receivable, net and other (8,169) (18,163) (21,897)
Accounts payable 17,480 3,602 2,945 
Accrued expenses and other 5,754 2,123 (1,558)
Unearned revenue 1,265 2,314 2,216 
Net cash provided by (used in) operating activities 66,064 46,327 46,752 
INVESTING AC

Now, let's initialize the chain using the `ParentDocumentRetriever`. We will pass the prompt in via the chain_type_kwargs argument.

In [39]:
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)

Let's start asking questions:

In [40]:
query = "How did AWS perform in 2023?"
result = qa({"query": query})
print(result['result'])

print(f"\n{result['source_documents']}")

According to the provided context, AWS sales increased 13% in 2023, compared to the prior year, primarily reflecting increased customer usage, partially offset by pricing changes, primarily driven by long-term customer contracts. Additionally, AWS operating income increased by $1.79 billion in 2023, compared to the prior year, primarily due to increased sales, partially offset by increased payroll and related expenses and spending on technology infrastructure.

[Document(page_content='Table of Contents\nAWS sales increased 13% in 2023, compared to the prior year. The sales growth primarily reflects increased customer usage, partially offset by pricing\nchanges, primarily driven by long-term customer contracts.\nOperating Income (Loss)\nOperating income (loss) by segment is as follows (in millions):\nYear Ended December 31,\n2022 2023\nOperating Income (Loss)\nNorth America $ (2,847)$ 14,877 \nInternational (7,746) (2,656)\nAWS 22,841 24,631 \nConsolidated $ 12,248 $ 36,852 \nOperating 

In [41]:
query = "What are some of the risk factors associated to Amazon?"
result = qa({"query": query})
print(result['result'])

print(f"\n{result['source_documents']}")

Based on the provided context, some of the risk factors associated with Amazon include:

1. Inventory risk factors:
	* Overstocking or understocking products
	* Difficulty in establishing vendor relationships and forecasting demand for new products
	* Significant lead-time and prepayment required for certain inventory or components
	* Inventory levels of certain products, such as consumer electronics, may not be sold in sufficient quantities or meet demand during relevant selling seasons
2. Payments-related risks:
	* Compliance with regulations and requirements for various payment options, including enhanced authentication obligations

Please note that this is not an exhaustive list, and Amazon may face other risk factors not mentioned in the provided context.

[Document(page_content='eavor to accurately predict these trends and avoid overstocking or understocking products we manufacture and/or sell. Demand forproducts, however, can change signi\nficantly between the time inventory or 

In [42]:
query = "Was Amazon involved in any lawsuits in 2022? What were they?"
result = qa({"query": query})
print(result['result'])

print(f"\n{result['source_documents']}")

According to the provided context, Amazon was involved in the following lawsuits in 2022:

1. In March 2022, a case was stayed pending resolution of review petitions filed with the United States Patent and Trademark Office in Kove IO, Inc. v. Amazon Web Services, Inc. The case was stayed in March 2022 and the stay was lifted in November 2022.
2. In September 2022, the California Attorney General brought a suit against Amazon in the California Superior Court for the County of San Francisco, alleging violations of federal and state antitrust and consumer protection laws.

Note that there may be other lawsuits mentioned in the context that are not specifically dated to 2022, but these are the two cases that are explicitly mentioned as occurring in that year.

[Document(page_content='Photos, Alexa, AWS cloud services, Ring, Amazon Connect, Amazon’s Flex driver app, and Amazon’s virtual try-on technology. The complaints seek\ncertification as class actions, unspecified amounts of damages, i

In [43]:
query = "What was the net sales change from 2022 to 2023?"
result = qa({"query": query})
print(result['result'])

print(f"\n{result['source_documents']}")

According to the provided context, the net sales change from 2022 to 2023 was a 12% increase, from $574,785 million to $513,983 million.

[Document(page_content='Table of Contents\nResults of Operations\nWe have organized our operations into three segments: North America, International, and AWS. These segments reflect the way the Company evaluates\nits business performance and manages its operations. See Item 8 of Part II, “Financial Statements and Supplementary Data — Note 10 — Segment\nInformation.”\nOverview\nMacroeconomic factors, including inflation, increased interest rates, significant capital market and supply chain volatility, and global economic and\ngeopolitical developments, have direct and indirect impacts on our results of operations that are difficult to isolate and quantify. In addition, changes in fuel,\nutility, and food costs, interest rates, and economic outlook may impact customer demand and our ability to forecast consumer spending patterns. We also\nexpect the cu

## Conclusion
---

Congratulations on completing the advanced retrieval augmented generation with `Llama3 8b`! These are important techniques that combines the power of large language models with the precision of retrieval methods. Upon comparing these different techniques, we are able to see that in contexts like detailing AWS’s transition from a simple service to a complex, multi-billion-dollar entity, or explaining Amazon's strategic successes, the Regular Retriever Chain lacks the precision the more sophisticated techniques offer, leading to less targeted information. While there are quite few differences visible between the Advanced techniques discussed, they are far and away more informative than Regular Retriever Chains. For customers in industries such as HCLS, Telecommunications, and FSI who are looking to implement RAG in their applications,  the limitations of the Regular Retriever Chain in providing precision, avoiding redundancy, and effectively compressing information make them less suited to fulfilling these needs compared to the more advanced Parent Document Retriever and Contextual Compression techniques, that are able to distill the vast amounts of information into the concentrated, impactful insights that customers need, while helping improve price performance.

In the above implementation of Advanced RAG based Question Answering we have explored the following concepts and how to implement them using Amazon Bedrock and it's LangChain integration.

- Setting up `Llama3-8b` and `Titan Embedding Text v2` with Bedrock and LangChain
- Loading documents of different kind and generating embeddings to create a vector store
- Retrieving documents to the question using the following approaches from LangChain
    - Regular Retrieval Chain
    - Parent Document Retriever Chain
- Preparing a prompt which goes as input to the LLM
- Present an answer in a human friendly manner

### 

### Take-aways
---
- Experiment with different retrieval techniques
- Leverage `Llama3-8b` and `Amazon Titan Embedding Text v2` models available under Amazon Bedrock
- Explore options such as persistent storage of embeddings and document chunks
- Integration with enterprise data stores

# Thank You!