# RAG Post Retrieval Optimization - Reranking featuring Bedrock and llamaindex

Reranking is a crucial post-retrieval process that refines the initial set of retrieved documents to improve the relevance and quality of information used for generating responses. After the initial retrieval step, which typically uses methods like vector similarity search, reranking takes the retrieved documents and re-scores their relevance to the query using more sophisticated techniques. These may include cross-encoders, BERT-based models, or other advanced algorithms that can better capture semantic nuances and context. Based on these scores, the documents are reordered, pushing the most relevant and informative ones to the top, thereby enhancing the accuracy and contextual appropriateness of the final response.

In this lab, we will highlight the usage of an existing LLM (for example Bedrock Claude3 Haiku) as a reranker model to improve the relevance of the documents retrieved from Amazon SEC filings. We also utilized the Bedrock Claude-3 Sonnet model and the Bedrock Titan Text Embedding v2.0 model for natural language processing and document embeddings to build this RAG system. The key steps were: ingesting PDF documents into a vector database index, retrieving top relevant nodes based on a query using semantic search, reranking the retrieved nodes using the fine-tuned transformer model deployed on SageMaker, and synthesizing a final response consolidating the reranked information. See below image for details.

- Vector Database (Faiss / local on the notebook for this demo)
- LLM (Amazon Bedrock - Claude3 Sonnet)
- Embeddings Model (Bedrock Titan Text Embedding v2.0)
- ReRanking Model (Claude Haiku) running on Bedrock
- Llamaindex for orchestration (ingestion, reranking, retrieval and final response synthesis)
- Datasets (Amazon SEC 10-k statements for year 2022 and 2023 )

<img src="reranking-bedrock.png" width="800" height="400">

## Pre-req
You must run the `[workshop_setup.ipynb]`(../lab00-setup/workshop_setup.ipynb) notebook in `lab00-setup` before starting this lab.

In [None]:
import warnings
warnings.warn("Warning: if you did not run lab00-setup, please go back and run the lab00 notebook") 


### > Setup
We start by importing necessary llamaindex libraries

In [None]:
from llama_index.embeddings.bedrock import BedrockEmbedding
from llama_index.core.postprocessor import LLMRerank
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core import Settings

We select Anthropic Claude3 Sonnet as our LLM. For embedding model, we are selecting Amazon Titan Text Embed v2.0. Chunk size is set at 512 for this example.

In [None]:
import json
from typing import Sequence, List
from llama_index.llms.bedrock import Bedrock
from llama_index.embeddings.bedrock import BedrockEmbedding, Models

llm = Bedrock(model = "anthropic.claude-3-sonnet-20240229-v1:0")
embed_model = BedrockEmbedding(model = "amazon.titan-embed-text-v2:0")

Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 512

from llama_index.core.llms import ChatMessage
from llama_index.core.tools import BaseTool, FunctionTool
import nest_asyncio
nest_asyncio.apply()

### > Document Ingestion
We ingest and index the data stored in data directory. The amazon folder has SEC-10k files from 2022 and 2023.

In [None]:
# load data
amazon_secfiles = SimpleDirectoryReader(input_dir="../data/lab03/amazon/").load_data()

# build index
amazon_index = VectorStoreIndex.from_documents(
    amazon_secfiles,
    use_async=True,
)

### > Helper functions
Helper functions to help retrieve and visualize relevant nodes from vector database based on a given query. It allows for optional reranking of results using a language model (in this case, Claude 3 Haiku). The retrieved nodes are then displayed in a formatted HTML table.

In [None]:
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core import QueryBundle
import pandas as pd
from IPython.display import display, HTML

def get_retrieved_nodes(
    query_str, vector_top_k=5, reranker_top_n=3, with_reranker=False
):
    query_bundle = QueryBundle(query_str)
    # configure retriever
    retriever = VectorIndexRetriever(
        index=amazon_index,
        similarity_top_k=vector_top_k,
    )
    retrieved_nodes = retriever.retrieve(query_bundle)

    if with_reranker:
        # configure reranker
        reranker = RankGPTRerank(
            llm=Bedrock(model = "anthropic.claude-3-haiku-20240307-v1:0"),
            top_n=reranker_top_n,
            verbose=True,
        )
        retrieved_nodes = reranker.postprocess_nodes(
            retrieved_nodes, query_bundle
        )

    return retrieved_nodes


def pretty_print(df):
    return display(HTML(df.to_html().replace("\\n", "")))


def visualize_retrieved_nodes(nodes) -> None:
    result_dicts = []
    for node in nodes:
        result_dict = {"Score": node.score, "Text": node.node.get_text()}
        result_dicts.append(result_dict)

    pretty_print(pd.DataFrame(result_dicts))

### > Test retrieval without reranking

In [None]:
new_nodes = get_retrieved_nodes(
    "Describe key business risks for Amazon during covid",
    vector_top_k=10,
    with_reranker=False,
)

In [None]:
visualize_retrieved_nodes(new_nodes)

### > Add reranking
We will use a llamaIndex module called `RankGPTRerank`. This is a zero-shot listwise passage reranking method that uses large language models (Anthropic Claude 3 Haiku) to efficiently reorder retrieved passages based on their relevance to a given query.

In [None]:
from llama_index.postprocessor.rankgpt_rerank import RankGPTRerank

new_nodes = get_retrieved_nodes(
    "Describe key business risks for Amazon during covid",
    vector_top_k=10,
    reranker_top_n=5,
    with_reranker=True,
)

In [None]:
visualize_retrieved_nodes(new_nodes)

### > Apply reranking to final response generation

In [None]:
query_engine_naive = amazon_index.as_query_engine(similarity_top_k=10)

In [None]:
response = query_engine_naive.query(
    "Describe key business risks for Amazon during covid"
)

In [None]:
reranker = RankGPTRerank(
            llm=Bedrock(model = "anthropic.claude-3-haiku-20240307-v1:0"),
            top_n=5,
            verbose=True,
        )

In [None]:
query_engine_rerank = amazon_index.as_query_engine(similarity_top_k=10, node_postprocessor =[reranker])

In [None]:
response = query_engine_rerank.query(
    "Describe key business risks for Amazon during covid"
)

### > Final reponse

In [None]:
print(response)