# Re-ranking

Amazon Bedrock provides access to reranker models that you can use when querying to improve the relevance of the retrieved results.  reranker model calculates the relevance of chunks to a query and reorders the results based on the scores that it calculates. By using a reranker model, you can return responses that are better suited to answering the query. 

Reranker models are trained to identify relevance signals based on a query and then use those signals to rank documents. Because of this, the models can provide more relevant, more accurate results.

If you're using `Amazon Bedrock Knowledge Bases` for building your Retrieval Augmented Generation (RAG) application, use a reranker model while calling the `Retrieve` or `RetrieveAndGenerate operation`. The results from reranking override the default ranking that Amazon Bedrock Knowledge Bases determines.

This notebook demonstrates the use of **reranking model** with Amazon Bedrock Knowledge Bases, through the Rerank API which will help to further improve the accuracy and relevance of RAG applications. With a reranker model, you can retrieve fewer, but more relevant, results. By feeding these results to the foundation model that you use to generate a response, you can also decrease cost and latency.

Let's explore how to implement and utilize reranking models with Amazon Bedrock Knowledge Bases for an example use case.

## Pre-req
You must run the `[workshop_setup.ipynb]`(../lab00-setup/workshop_setup.ipynb) notebook in `lab00-setup` before starting this lab.

In [None]:
import warnings
warnings.warn("Warning: if you did not run lab00-setup, please go back and run the lab00 notebook") 

### Load the parameters

In [None]:
print("load the data parameters....\n")
# bucket and parameter stored from Initial setup lab01
%store -r root_dir
%store -r jsonl_files
%store -r bucket
%store -r amzn10k_prefix
%store -r amzn10k_s3_path

## check all 5 values are printed and do not fail
print(root_dir)
print(jsonl_files)
print(bucket)
print(amzn10k_prefix)
print(amzn10k_s3_path)

print("\nload the vector db parameters....\n")

# vector parameters stored from Initial setup
%store -r vector_host
%store -r vector_collection_arn
%store -r vector_collection_id
%store -r bedrock_kb_execution_role_arn

## check all 4 values are printed and do not fail
print(vector_host)
print(vector_collection_arn)
print(vector_collection_id)
print(bedrock_kb_execution_role_arn)

### Initialize other parameters

In [None]:
import os
import sys
import time
import boto3
import logging
import pprint
import json
import random

sys.path.append('../lab00-setup')

from knowledge_base import BedrockKnowledgeBase

# auth for opensearch
boto3_session = boto3.Session()
region_name = boto3_session.region_name
# try out KB using RetrieveAndGenerate API
bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime", region_name=region_name)
model_id = "us.anthropic.claude-sonnet-4-20250514-v1:0"
model_arn = f'arn:aws:bedrock:{region_name}::foundation-model/{model_id}'

## 2 - Create knowledge bases with fixed chunking strategy
Let's start by creating a [Amazon Bedrock Knowledge Bases](https://aws.amazon.com/bedrock/knowledge-bases/) to store video games data in csv format. Knowledge Bases allow you to integrate with different vector databases including [Amazon OpenSearch Serverless](https://aws.amazon.com/opensearch-service/features/serverless/), [Amazon Aurora](https://aws.amazon.com/rds/aurora/), [Pinecone](http://app.pinecone.io/bedrock-integration), [Redis Enterprise]() and [MongoDB Atlas](). For this example, we will integrate the knowledge base with Amazon OpenSearch Serverless. To do so, we will use the helper class `BedrockKnowledgeBase` which will create the knowledge base and all of its pre-requisites:
1. IAM roles and policies
2. S3 bucket
3. Amazon OpenSearch Serverless encryption, network and data access policies
4. Amazon OpenSearch Serverless collection
5. Amazon OpenSearch Serverless vector index
6. Knowledge base
7. Knowledge base data source

We will create a knowledge base using fixed chunking strategy. 

You can chhose different chunking strategies by changing the below parameter values: 
```
"chunkingStrategy": "FIXED_SIZE | NONE | HIERARCHICAL | SEMANTIC"
```

In [None]:
suffix = random.randrange(200, 900)
kb_name = f'reranking-kb-{suffix}'
index_name = f"{kb_name}-index-{suffix}"
description = "This knowledge base contain Amazon 10K financial document from 2022 and 2023"

knowledge_base = BedrockKnowledgeBase(
    kb_name=kb_name,
    kb_description=description,
    data_bucket_name=bucket,
    data_prefix=[amzn10k_prefix],
    vector_collection_arn=vector_collection_arn,
    vector_collection_id=vector_collection_id,
    vector_host=vector_host,
    bedrock_kb_execution_role_arn=bedrock_kb_execution_role_arn,
    index_name=index_name,
    suffix=suffix,
    chunking_strategy="FIXED_SIZE"
)

In [None]:
# ensure that the kb is available
time.sleep(30)
# Start the data ingestion
knowledge_base.start_ingestion_job()
kb_id = knowledge_base.get_knowledge_base_id()
print(f"Knowledge Base Id: {kb_id}")

### Prompt to test

we are going to use the same prompt and test against all the different knowledge base with different chucking strategy to compare

"What is Amazon doing in the field of entertainment, movies and cinema?"
"Key challenges faced by Amazon in year 2022 and 2023"

In [None]:
prompt = "How does the 2023 net sales mix reflect Amazon's global priorities and strategic investments across segments?"

### Generate and render the response Reranking vs no reranking

In [None]:
kb_mapping = dict()

for reranking_enabled in [False, True]:
    
    print("========================================================================================")
    print(f"Generate a response with reranking {'enabled' if reranking_enabled else 'disabled'}")
    
    retrieval_config = {
        "vectorSearchConfiguration": {
            "numberOfResults": 30 if reranking_enabled else 3
        }
    }
    
    # Add reranking configuration if enabled
    if reranking_enabled:
        retrieval_config["vectorSearchConfiguration"]["rerankingConfiguration"] = {
            "type": "BEDROCK_RERANKING_MODEL",
            "bedrockRerankingConfiguration": {
                "modelConfiguration": {
                    "modelArn": f'arn:aws:bedrock:{region_name}::foundation-model/amazon.rerank-v1:0',
                },
                "numberOfRerankedResults": 3
            }
        }
    
    response_ret = bedrock_agent_runtime_client.retrieve_and_generate(
        input={
            "text": prompt
        },
        retrieveAndGenerateConfiguration={
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {
                'knowledgeBaseId': kb_id, 
                "modelArn": model_id,
                "retrievalConfiguration": retrieval_config
            }
        }
    )
    
    # Store the response
    result_key = "with_reranking" if reranking_enabled else "without_reranking"
    if not hasattr(kb_mapping, result_key):
        kb_mapping[result_key] = {}
    kb_mapping[result_key]["Response"] = response_ret['output']['text']
    
    # Perform separate retrieval
    response_ret = bedrock_agent_runtime_client.retrieve(
        knowledgeBaseId=kb_id,  # Use your knowledge base ID here
        retrievalQuery={
            'text': prompt
        },
        retrievalConfiguration=retrieval_config
    )
    
    # Store the search results
    kb_mapping[result_key]["SearchResults"] = response_ret['retrievalResults']
    
    print("========================================================================================")

In [None]:
import pandas as pd
from IPython.display import display, HTML

# First, determine the maximum length needed
max_length = 0
for key in kb_mapping:
    current_length = 2 + len(kb_mapping[key]["SearchResults"])
    max_length = max(max_length, current_length)

display_map = dict()
        
# reformat results
for key in kb_mapping:
    display_map[key] = []
    
    # Add response
    response = kb_mapping[key]["Response"]
    display_map[key].append(response)
    
    # Add separator
    display_map[key].append("======[Search Results]======")
    
    # Add search results
    for result in kb_mapping[key]["SearchResults"]:
        display_map[key].append(f'{result["content"]["text"][:1000]}...')
    
    # Pad with empty strings if needed
    while len(display_map[key]) < max_length:
        display_map[key].append("")

# Create DataFrame
df = pd.DataFrame(display_map)

# Create custom CSS
custom_css = """
<style>
    table {
        width: 100% !important;
        table-layout: fixed !important;
    }
    td {
        width: 50% !important;
        word-wrap: break-word !important;
        padding: 8px !important;
        vertical-align: top !important;
    }
    th {
        width: 50% !important;
        word-wrap: break-word !important;
        padding: 8px !important;
    }
</style>
"""

# Combine CSS with DataFrame HTML
output = custom_css + df.style.hide()._repr_html_()
output += "&nbsp;"

display(HTML(output))

The superior performance of reranking can be attributed to its sophisticated two-stage retrieval process. While the non-reranking approach simply returns the top 3 results based on vector similarity, the reranking method first casts a wider net by retrieving 30 initial results, then employs the amazon.rerank-v1:0 model to select the best 3 from this larger pool. This combination of a larger initial result set (30 vs 3) and the additional semantic analysis layer enables the reranking approach to identify and prioritize the most pertinent information, leading to more comprehensive and accurate final results.

### Clean up

In [None]:
# knowledge_base.delete_kb()