# Improving accuracy for RAG based applications using Langchain, OpenSearch Serverless, Amazon Bedrock and a Re-ranking model 

In this notebook, we are going to demonstrate how to build a RAG solution that utilises Langchain, OpenSearch Serverless, Amazon Bedrock and a Re-ranking model. Additionally, we'll also evaluate the performance of each approach (i.e. standard RAG approach vs RAG + reranking model) and perform analysis and share the results.


## Overview

When it comes to building a chatbot using GenAI LLMs, RAG is a popular architectural choice. It combines the strengths of knowledge base retrieval and generative models for text generation. Using RAG approach for building a chatbot has many advantages. For example, retrieving responses from its database before generating a response could provide more relevant and coherent responses. This helps improve the conversational flow. RAG also scales better with more data compared to pure generative models, and it doesn’t require fine-tuning of the model when new data is added to the knowledge base. Additionally, the retrieval component enables the model to incorporate external knowledge by retrieving relevant background information from its database. This approach helps provide factual, in-depth and knowledgeable responses.

## RAG Challenges
Despite clear advantages of using RAG for building Chatbots, there are some challenges when it comes to applying it for practical use. 
In order to find an answer, RAG takes an approach that uses vector search across the documents. The advantage of using vector search is the speed and scalability. Rather than scanning every single document to find the answer, using RAG approach, we would turn the texts (knowledge base) into embeddings and store these embeddings in the database. The embeddings are compressed version of the documents, represented by array of numerical values. After the embeddings are stored,  vector search queries the vector database to find the similarity based on the vectors associated with the documents. Typically, vector search will return the top k most relevant documents based on the user question, and return the k results. However, since the similarity algorithm in vector database works on vectors and not documents, vector search does not always return the most relevant information in the top k results. This directly impacts the accuracy of the response if the most relevant contexts are not available to the LLM. 

A proposed solution to address the challenge of RAG approach is called Reranking. Reranking is a technique that can further improve the responses by selecting the best option out of several candidate responses. Here is how reranking could work, described in the sequential order:

1. The chatbot generates its top k response candidates using RAG.
2. These candidates are fed into a reranking model. This model scores each response based on how relevant, natural and informative they are.
3. The response with the highest reranking score is selected as the context to feed the LLM in generating a response .

In summary, reranking allows the chatbot to filter out poor responses and pick the best one to send back. This further improves the quality and consistency of the conversations.

Reference links for research papers and Bedrock documentation:
- https://arxiv.org/pdf/2404.07221
- https://arxiv.org/pdf/2409.07691
- https://docs.aws.amazon.com/bedrock/latest/userguide/rerank.html

## Improve the relevance of query responses with a reranker model in Amazon Bedrock

Amazon Bedrock provides access to reranker models that you can use when querying to improve the relevance of the retrieved results. A reranker model calculates the relevance of chunks to a query and reorders the results based on the scores that it calculates. By using a reranker model, you can return responses that are better suited to answering the query. Or, you can include the results in a prompt when running model inference to generate more pertinent and accurate responses. With a reranker model, you can retrieve fewer, but more relevant, results. By feeding these results to the foundation model that you use to generate a response, you can also decrease cost and latency.

Reranker models are trained to identify relevance signals based on a query and then use those signals to rank documents. Because of this, the models can provide more relevant, more accurate results.

## Prerequisites
A user requires the following permissions to use reranking:

Access to the reranking models that they plan to use. For more information, see https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html.

Permissions for their role: bedrock:Rerank and bedrock:InvokeModel

#### Install Required Dependencies
This cell installs all the necessary Python packages listed in the `requirements.txt` file. The `--quiet` flag reduces the output verbosity.

Note: Make sure the `requirements.txt` file is present in your working directory before running this cell.






In [29]:
%pip install -r requirements.txt --quiet

Note: you may need to restart the kernel to use updated packages.


#### Configuration Constants
This cell defines essential configuration variables for Amazon OpenSearch Serverless (AOSS) setup.

These constants will be used throughout the notebook for consistent resource naming and configuration.





In [1]:
#const
region_name = 'us-west-2'
encryption_policy_name = f"rag-ep-oss"
network_policy_name = f"rag-np-oss"
access_policy_name = f"rag-ap-oss"
vector_store_name = f'vector-store-oss'
index_name = f"index-name-oss"
service = "aoss"

## Dataset
To demonstrate RAG semantic search, we'll need to first ingest documents into a vector database. For this example, we'll ingest 4 Amazon Shareholder Letters
#### PDF Document Loading and Processing
We will Import required libraries, set up document processing, process PDF files
Note: Ensure PDF files are placed in the "./data/" directory before running this cell.

In [2]:
from langchain.document_loaders import PyPDFLoader
import os

data_root = "./data/"
folder_path = data_root
documents = []


# Loop through all files in the folder
for filename in os.listdir(folder_path):
    if filename.lower().endswith('.pdf'):
     file_path = os.path.join(folder_path, filename)
     loader = PyPDFLoader(file_path)
     # Load the PDF data
     data = loader.load()
     # Add the loaded data to the documents list
     documents.extend(data)

# Print the text of the first page of the first document
if documents:
    print(documents[0].page_content)
else:
    print("No PDF files found in the folder.")

To our shareowners:
In Amazon’s 1997 letter to shareholders, our first, I talked about our hope to create an “enduring franchise,”
one that would reinvent what it means to serve customers by unlocking the internet’s power. I noted that
Amazon had grown from having 158 employees to 614, and that we had surpassed 1.5 million customer
accounts. We had just gone public at a split-adjusted stock price of $1.50 per share. I wrote that it was Day 1.
We’ve come a long way since then, and we are working harder than ever to serve and delight customers.
Last year, we hired 500,000 employees and now directly employ 1.3 million people around the world. We have
more than 200 million Prime members worldwide. More than 1.9 million small and medium-sized businesses
sell in our store, and they make up close to 60% of our retail sales. Customers have connected more than
100 million smart home devices to Alexa. Amazon Web Services serves millions of customers and ended 2020
with a $50 billion annualized r

This cell sets up the necessary AWS SDK (boto3) components and utilities:
1. Creates a boto3 session with the specified region
2. Initializes an Amazon OpenSearch Serverless client

In [3]:
import boto3


boto3_session = boto3.session.Session(region_name=region_name)
aoss_client = boto3_session.client('opensearchserverless')


This cell creates the required security policies for an OpenSearch Serverless collection:

1. Gets the AWS caller identity ARN using STS client

2. Defines a function `create_policies_in_oss()` that creates 3 types of policies:

   - **Encryption Policy**: Configures encryption settings using AWS owned KMS keys
   - **Network Policy**: Controls network access (allows public access in this case) 
   - **Access Policy**: Sets up data access permissions for:
     - Collection operations (create, delete, update, describe)
     - Index operations (create, delete, update, describe)
     - Document operations (read, write)

The policies are scoped to the specified vector store collection and the caller's IAM identity.

Parameters:
- vector_store_name: Name of the OpenSearch collection
- aoss_client: OpenSearch Serverless client instance

Returns tuple of created policies (encryption_policy, network_policy, access_policy)

In [6]:
import json

sts_client = boto3.client("sts")
identity = sts_client.get_caller_identity()["Arn"]



def create_policies_in_oss(
    vector_store_name, aoss_client
):
    encryption_policy = aoss_client.create_security_policy(
        name=encryption_policy_name,
        policy=json.dumps(
            {
                "Rules": [
                    {
                        "Resource": ["collection/" + vector_store_name],
                        "ResourceType": "collection",
                    }
                ],
                "AWSOwnedKey": True,
            }
        ),
        type="encryption",
    )

    network_policy = aoss_client.create_security_policy(
        name=network_policy_name,
        policy=json.dumps(
            [
                {
                    "Rules": [
                        {
                            "Resource": ["collection/" + vector_store_name],
                            "ResourceType": "collection",
                        }
                    ],
                    "AllowFromPublic": True,
                }
            ]
        ),
        type="network",
    )
    access_policy = aoss_client.create_access_policy(
        name=access_policy_name,
        policy=json.dumps(
            [
                {
                    "Rules": [
                        {
                            "Resource": ["collection/" + vector_store_name],
                            "Permission": [
                                "aoss:CreateCollectionItems",
                                "aoss:DeleteCollectionItems",
                                "aoss:UpdateCollectionItems",
                                "aoss:DescribeCollectionItems",
                            ],
                            "ResourceType": "collection",
                        },
                        {
                            "Resource": ["index/" + vector_store_name + "/*"],
                            "Permission": [
                                "aoss:CreateIndex",
                                "aoss:DeleteIndex",
                                "aoss:UpdateIndex",
                                "aoss:DescribeIndex",
                                "aoss:ReadDocument",
                                "aoss:WriteDocument",
                            ],
                            "ResourceType": "index",
                        },
                    ],
                    "Principal": [identity],
                    "Description": "Easy data policy",
                }
            ]
        ),
        type="data",
    )
    return encryption_policy, network_policy, access_policy

This cell performs two key operations:

1. **Create Security Policies**:
   - Creates three essential security policies for the OpenSearch vector store:
     - Encryption policy
     - Network policy
     - Access policy
   - The `create_policies_in_oss()` function handles the creation of these policies
   - Policies are associated with the specified vector store name

2. **Create Vector Search Collection**:
   - Initializes a new collection in OpenSearch using the AWS OpenSearch client
   - Collection is configured for vector search operations
   - Named according to the `vector_store_name` parameter
   - Type is set to "VECTORSEARCH" for vector similarity search capabilities

These operations are fundamental setup steps for implementing vector search functionality in OpenSearch.

In [8]:
encryption_policy, network_policy, access_policy = create_policies_in_oss(
    vector_store_name=vector_store_name,
    aoss_client=aoss_client,
)
collection = aoss_client.create_collection(name=vector_store_name, type="VECTORSEARCH")


This cell sets up the necessary connection parameters for Amazon OpenSearch:

1. Extracts the collection ID from the previously created collection
2. Constructs the OpenSearch host URL using the collection ID and AWS region
3. Creates the full HTTPS endpoint URL with port 443

The host URL follows the format: `<collection_id>.<region>.aoss.amazonaws.com`

The final OpenSearch URL will be: `https://<host>:443`

This URL will be used to connect to the OpenSearch service in subsequent steps.

In [9]:
collection_id = collection["createCollectionDetail"]["id"]
host = collection_id + "." + region_name + ".aoss.amazonaws.com"
print(host)
opensearch_url="https://" + host +":443",
print(opensearch_url)

680rk9clnq24bkyx4sx9.us-west-2.aoss.amazonaws.com
('https://680rk9clnq24bkyx4sx9.us-west-2.aoss.amazonaws.com:443',)


This cell performs two essential tasks for document processing:

1. **Text Splitting**
   - Uses `RecursiveCharacterTextSplitter` to break documents into manageable chunks
   - Sets `chunk_size=1000` (characters per chunk)
   - Uses `chunk_overlap=200` to maintain context between chunks
   - Splits the input documents into smaller segments

2. **Embedding Model Configuration**
   - Initializes Amazon Bedrock client for the specified region
   - Sets up the Bedrock embeddings model using `BedrockEmbeddings`
   - Uses Amazon's Titan text embedding model (version 2.0)

This preprocessing step is crucial for:
- Making large documents more manageable
- Preparing text for vector embeddings
- Enabling efficient similarity searches

In [10]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_aws.embeddings.bedrock import BedrockEmbeddings

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(documents)


bedrock_client = boto3.client("bedrock-runtime", region_name=region_name)

embeddings_model = BedrockEmbeddings(
    client=bedrock_client, model_id="amazon.titan-embed-text-v2:0"
)

This cell configures and initializes an OpenSearch vector store using AWS authentication:

1. Imports required dependencies for AWS authentication and OpenSearch vector storage
2. Creates AWS authentication credentials using AWS4Auth
3. Initializes OpenSearchVectorSearch with:
   - Document embeddings
   - OpenSearch endpoint URL
   - AWS authentication
   - SSL/TLS security settings
   - Connection and timeout configurations
   - Index name and vector engine (FAISS)

The vector store will be used to index and search document embeddings for semantic similarity.

!NB it can take a few minutes for collection to be created and ready for indexing. Security policy updates in OpenSearch Serverless may need time to take effect. If you get AuthenticationException, please retry in a couple of minutes.

In [12]:
from opensearchpy import RequestsHttpConnection
from requests_aws4auth import AWS4Auth
from langchain_community.vectorstores import OpenSearchVectorSearch



credentials = boto3_session.get_credentials()
#service = 'aoss'
awsauth = AWS4Auth(region=region_name, service=service, refreshable_credentials=credentials)

docsearch = OpenSearchVectorSearch.from_documents(
    documents,
    embeddings_model,
    opensearch_url=opensearch_url,#"https://813704av6f47ieeyku2k.us-west-2.aoss.amazonaws.com:443",#host,
    http_auth=awsauth,#AWSV4SignerAuth(boto3_session.get_credentials(), region_name),#awsauth,
    timeout=300,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    index_name=index_name,
    engine="faiss",
    bulk_size=100,

)

#### !NB - Wait for indexing to complete - it can take about 2-5 minutes

#### Test vectore store
This cell performs a similarity search in the document collection to find relevant text passages about Amazon's investment philosophy. 
The `similarity_search()` method will return documents that most closely match the query "The key elements of Amazon's investment philosophy".

If print return 0, it means indexing is not completed. Wait a  bit longer and try again.

In [14]:
example_query = "The key elements of Amazon's investment philosophy"
docs = docsearch.similarity_search(example_query)
print (len(docs))

4


This cell counts and displays metadata for each document chunk:

1. First, we count the total number of chunks in our `docs` collection
2. Then, we iterate through each chunk and print its associated metadata

This helps us verify:
- The total number of document chunks created
- The metadata properties of each chunk (e.g., source, page numbers)
- Proper document segmentation

In [15]:
number_of_chunks = len(docs)
print(number_of_chunks)
for i in range(number_of_chunks):
    print(docs[i].metadata)


4
{'source': './data/AMZN-1997-Shareholder-Letter.pdf', 'page': 0}
{'source': './data/AMZN-1997-Shareholder-Letter.pdf', 'page': 2}
{'source': './data/AMZN-2022-Shareholder-Letter.pdf', 'page': 0}
{'source': './data/AMZN-2022-Shareholder-Letter.pdf', 'page': 6}


As you can see we got 4 chunks in the order: 

0-letter from 1997, page 1

1-letter from 1997, page 3

2-letter from 2022, page 1

3-letter from 2022, page 7

(if you use different chunking strategy, different embedding model, or with changes over time the chunks and their order  can be different)

## Reranking

#### Now let's have a look for order of chunks if we use reranking model.

This code cell:
This cell initializes a client for Amazon Bedrock Agent Runtime service
It creates a connection to AWS Bedrock Agent Runtime using boto3, AWS's SDK for Python
The client is configured for the specified AWS region (stored in region_name variable)
This client will be used to make API calls to the Bedrock Agent Runtime service for reranking operations


In [16]:
# Initialize the Bedrock Agent Runtime client
rerank_client = boto3.client('bedrock-agent-runtime', region_name=region_name)

The cell below:

- Define the Bedrock model ID and construct the model package ARN

- Specify the model ID for Amazon's reranking model (we will use Amazon Rerank model, alternatively can use Cohere's reranking model)

- Construct the complete ARN (Amazon Resource Name) for the Bedrock foundation model.
This combines the AWS region, model ID, and required ARN format for Bedrock models

In [17]:
modelId = "amazon.rerank-v1:0"#"cohere.rerank-v3-5:0"
model_package_arn = f"arn:aws:bedrock:{region_name}::foundation-model/{modelId}"

This function, rerank_text, is designed to rerank a list of text sources based on their relevance to a given query using Amazon Bedrock's reranking service. Here's what the function does:

1. It takes four parameters:

- text_query: The search query text
- text_sources: A list of text documents to be reranked
- num_results: The number of results to return
- model_package_arn: The Amazon Resource Name (ARN) of the Bedrock reranking model to use
2. It calls the rerank client's rerank method with:

  - A query configuration specifying the text query
  - The source documents to be reranked
  - Reranking configuration including:
    + The type of reranking model (Bedrock)
    + Number of results to return
    + The specific model to use (via ARN)
3. Finally, it returns the 'results' portion of the API response, which contains the reranked documents in order of relevance.

This function is useful for improving search results by reordering documents based on their semantic relevance to the query, rather than just keyword matching.

In [18]:
def rerank_text(text_query, text_sources, num_results, model_package_arn):
    api_response = rerank_client.rerank(
        queries=[
            {
                "type": "TEXT",
                "textQuery": {
                    "text": text_query
                }
            }
        ],
        sources=text_sources,
        rerankingConfiguration={
            "type": "BEDROCK_RERANKING_MODEL",
            "bedrockRerankingConfiguration": {
                "numberOfResults": num_results,
                "modelConfiguration": {
                    "modelArn": model_package_arn,
                }
            }
        }
    )
    return api_response['results']

This code snippet creates a list called text_sources by iterating through a collection of documents (docs). For each text document, it creates a dictionary with a specific structure that appears to be formatting document sources in a particular way. The structure includes:

1. A "type" field set to "INLINE"
2. An "inlineDocumentSource" object containing:
- A "type" field set to "TEXT"
- A "textDocument" object with the actual text content from the document

This formatting might be used for preparing documents for processing in a document management system, API request, or text analysis tool.

In [19]:
text_sources = []
for text in docs:
    text_sources.append({
        "type": "INLINE",
        "inlineDocumentSource": {
            "type": "TEXT",
            "textDocument": {
                "text": text.page_content,
            }
        }
    })

This cell is executing a text reranking operation using a rerank_text function. Here's what it does:

1. Takes an example query (a search query or question)
2. Uses text_sources (a collection of text documents or passages)
3. Specifies '3' as a parameter (the number of results to return)
4. Uses a specific model_package_arn (to specify which reranking model to use)
5. Prints the reranked results (higher relevance score means the text is more relevant to the query)

The function reorders the text_sources based on their relevance to the example_query, using a reranking model specified by the model_package_arn, and returns the top 3 most relevant results.


In [20]:
response = rerank_text(example_query, text_sources, 3, model_package_arn)
print(response)

[{'index': 1, 'relevanceScore': 0.01833445206284523}, {'index': 0, 'relevanceScore': 0.0028449096716940403}, {'index': 2, 'relevanceScore': 0.0020190139766782522}]


As you can see order of documents changed:

1-letter from 1997, page 3

0-letter from 1997, page 1

2-letter from 2022, page 1




You can use reranking model to narrow down amount of chunks to reduce context side to LLM and use most relevant chunks.
