# Improve Retrieval Augmented Generation using Reranking with Elastic, Anthropic Claude 3.7, Amazon Bedrock, Langchain, RAGAS and FlashRank

## Introduction

In this notebook we will show you how to evaluate and use Reranking to improve output of a RAG solution built using Langchain, Anthropic Claude 3.7, Elastic, RAGAS and FlashRank


#### Use case

Improving a RAG (Retrieval-Augmented Generation) Application using Reranking


#### Persona
As an analyst at Anycompany, Bob wants to improve the response of a RAG solution. He has observed that all the retrieved contexts are not necessarily relevant to the question. He is looking for a mechanism to reorder and filter the documents, placing the relevant ones at the forefront, thereby enhancing the effectiveness of RAG.

#### Implementation
To fulfill this use case, in this notebook we will show how to use Reranking to improve a RAG Application. We will use the Anthropic Claude 3.7 Sonnet Foundation model, Elastic, Langchain, RAGAS and FlashRank.

#### Python 3.10

⚠  For this lab we need to run the notebook based on a Python 3.10 runtime. ⚠


## Installation

To run this notebook you would need to install dependencies - boto3, botocore, elasticsearch and langchain.

In [None]:
%pip install --upgrade pip
%pip install boto3 --force-reinstall --quiet
%pip install botocore --force-reinstall --quiet
%pip install langchain --force-reinstall --quiet
%pip install langchain_aws --force-reinstall --quiet
%pip install langchain-elasticsearch --force-reinstall --quiet
%pip install elasticsearch==8.18.0 --force-reinstall --quiet
%pip install pypdf --force-reinstall --quiet
%pip install ragas==0.2.6 --force-reinstall --quiet
%pip install flashrank --force-reinstall --quiet
%pip install nltk --force-reinstall --quiet

[0mNote: you may need to restart the kernel to use updated packages.
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
awscli 1.33.13 requires botocore==1.34.131, but you have botocore 1.35.76 which is incompatible.
sagemaker 2.224.1 requires attrs<24,>=23.1.0, but you have attrs 24.2.0 which is incompatible.
sagemaker 2.224.1 requires numpy<2.0,>=1.9.0, but you have numpy 2.1.3 which is incompatible.
sagemaker 2.224.1 requires protobuf<5.0,>=3.12, but you have protobuf 5.29.1 which is incompatible.
sparkmagic 0.20.4 requires nest-asyncio==1.5.5, but you have nest-asyncio 1.6.0 which is incompatible.
datasets 3.1.0 requires fsspec[http]<=2024.9.0,>=2023.1.0, but you have fsspec 2024.10.0 which is incompatible.
langchain-aws 0.2.9 requires numpy<2,>=1; python_version < "3.12", but you have numpy 2.1.3 which is incompatible.
langchain-community 0.3.9 requires

## Kernel Restart

Restart the kernel with the updated packages that are installed through the dependencies above

In [3]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

## Setup 

Import the necessary libraries

In [4]:
import json
import os
import sys
import boto3
import botocore
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_aws import ChatBedrockConverse
from langchain_aws import AmazonKnowledgeBasesRetriever
from langchain_aws import BedrockEmbeddings
from botocore.client import Config
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_elasticsearch import ElasticsearchStore
from elasticsearch import Elasticsearch
from langchain.schema.runnable import RunnablePassthrough
from langchain.chains import RetrievalQA
from getpass import getpass
from langchain.prompts import PromptTemplate
from langchain.document_loaders import PyPDFLoader,TextLoader
from pathlib import Path
from datasets import Dataset
import pandas as pd

## Initialization

Initiate Bedrock Runtime and BedrockChat

In [None]:
bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})
bedrock_client = boto3.client('bedrock-runtime')

modelId = 'us.anthropic.claude-3-7-sonnet-20250219-v1:0' # change this to use a different version from the model provider
embeddingmodelId = 'amazon.titan-embed-text-v2:0' # change this to use a different embedding model

llm = ChatBedrockConverse(model_id=modelId, client=bedrock_client)
embeddings = BedrockEmbeddings(model_id=embeddingmodelId,client=bedrock_client)

## Read files from directory

Load all PDF files which are present in the directory

In [6]:
import nltk
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


True

In [7]:
# publicly available dataset: https://github.com/CloudPak-Outcomes/Outcomes-Projects/blob/main/L4assets/watsonx.ai-Assets/Documents/state_of_the_union.txt
TMP_DIR = os.path.join(os.path.dirname(os.path.realpath('__file__')), 'media/state_of_the_union.txt')
loader = TextLoader(TMP_DIR, encoding='ISO-8859-1')
documents = loader.load()

## Split Documents

Chunk documents into passages in order to improve the retrieval specificity and to ensure that we can provide multiple passages within the context window of the final question answering prompt.

Here we are chunking documents into 1000 token passages with an overlap of 0 tokens.

Here we are using Recursive Character Text splitter but Langchain offers more advanced splitters to reduce the chance of context being lost.

In [8]:
text_splitter = RecursiveCharacterTextSplitter(
        separators=['\n\n', '\n', '.', ','],
        chunk_size=1000,
        chunk_overlap=0
        )
texts = text_splitter.split_documents(documents)

## Connect to Elasticsearch

We'll use the Cloud ID to identify our deployment, because we are using Elastic Cloud deployment. To find the Cloud ID for your deployment, go to [Cloud ID](https://cloud.elastic.co/deployments) and select your deployment.

We will use ElasticsearchStore to connect to our elastic cloud deployment. This would help create and index data easily. 

In [9]:
cloud_id = getpass("Elastic deployment Cloud ID: ")
cloud_api_key = getpass("Elastic deployment API Key: ")
index_name= "new-index-1"

vector_store = ElasticsearchStore(
        es_cloud_id=cloud_id,  
        index_name= index_name, 
        embedding=embeddings,
        es_api_key=cloud_api_key)

Elastic deployment Cloud ID:  ········
Elastic deployment API Key:  ········


## Index data into Elasticsearch and initialize retriever

Next, we will index data to elasticsearch using ElasticsearchStore.from_documents. We will use Cloud ID, Password and Index name values set in the Create cloud deployment step. We will set embedding to BedrockEmbeddings to embed the texts.

In [10]:
vectordb = vector_store.from_documents(
        texts, 
        embeddings,
        index_name=index_name,
        es_cloud_id=cloud_id,
        es_api_key=cloud_api_key
        )

retriever = vectordb.as_retriever()

## Model Invocation and Response Generation using RetrievalQA chain

Now that we have the passages stored in Elasticsearch and LLM is initialized, we can now ask a question to get the relevant passages.

In [11]:
query = "What did the president say about Ketanji Brown Jackson?"


prompt_template = """
    Human: You will be acting as an advisor named Poly created by the company Polymath. 
    Your goal is to give advice from the derived context. 
    site and who will be confused if you don't respond in the character of Poly.
    
    You should maintain a friendly customer service tone.

    Here is the document you should reference when answering the user: <context>{context}</context>

    Here are some important rules for the interaction:
    - Always stay in character, as Poly, a Machine Learning advisor on complex Mechanical systems
    - If you are unsure how to respond, say “Sorry, I didn’t understand that. Could you repeat the question?”
    - If someone asks something irrelevant, say, “Sorry, I don't know.”


    Here is the user’s question: <question> {question} </question>

    How do you respond to the user’s question?
    Think about your answer first before you respond. Put your response in <response></response> tags.
    Assistant: <response>"""

prompt = PromptTemplate(template=prompt_template, input_variables=["context","question"])
qa_chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff",retriever=retriever, return_source_documents=True, chain_type_kwargs={"prompt": prompt})
response = qa_chain.invoke(query)
print(response["result"])

According to the document, President Biden said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court. He described her as "One of our nation's top legal minds, who will continue Justice Breyer's legacy of excellence." The President highlighted that nominating Supreme Court Justices is one of the most serious constitutional responsibilities a President has.

</response>


## Preparing the Evaluation Data

As RAGAS aims to be a reference-free evaluation framework, the required preparations of the evaluation dataset are minimal. You will need to prepare `questions` and `references` pairs from which you can prepare the remaining information through inference as shown below. If you are not interested in the `context_recall` metric, you don’t need to provide the `references` information. In this case, all you need to prepare are the questions.

In [12]:
from ragas import SingleTurnSample, EvaluationDataset

questions = ["What did the president say about Ketanji Brown Jackson?"]

references = ["The president described Ketanji Brown a one of the nation's top legal minds"]

samples = []

for idx, query in enumerate(questions):
    samples.append(
        SingleTurnSample(
            user_input=query,
            retrieved_contexts=[docs.page_content for docs in retriever.invoke(query)],
            response=qa_chain.invoke(query)["result"],
            reference=references[idx]
        )
    )

dataset = EvaluationDataset(samples=samples)

## Evaluating the RAG application

First, import all the metrics you want to use from `ragas.metrics`. Then, you can use the `evaluate()` function and simply pass in the relevant metrics and the prepared dataset. Below is a brief description of the metrics

* **Faithfulness**: This measures the factual consistency of the generated answer against the given context. It is calculated from answer and retrieved context. The answer is scaled to (0,1) range. Higher the better.
* **Response Relevance**: The evaluation metric, Response Relevancy, focuses on assessing how pertinent the generated answer is to the given prompt. A lower score is assigned to answers that are incomplete or contain redundant information and higher scores indicate better relevancy. This metric is computed using the question, the context and the answer. Please note, that eventhough in practice the score will range between 0 and 1 most of the time, this is not mathematically guaranteed, due to the nature of the cosine similarity ranging from -1 to 1.
* **Context Precision**: Context Precision is a metric that evaluates whether all of the ground-truth relevant items present in the contexts are ranked higher or not. Ideally all the relevant chunks must appear at the top ranks. This metric is computed using the question, ground_truth and the contexts, with values ranging between 0 and 1, where higher scores indicate better precision.
* **Context Recall**: Context recall measures the extent to which the retrieved context aligns with the annotated answer, treated as the ground truth. It is computed based on the ground truth and the retrieved context, and the values range between 0 and 1, with higher values indicating better performance.
* **Context entities recall**: This metric gives the measure of recall of the retrieved context, based on the number of entities present in both ground_truths and contexts relative to the number of entities present in the ground_truths alone. Simply put, it is a measure of what fraction of entities are recalled from ground_truths. This metric is useful in fact-based use cases like tourism help desk, historical QA, etc. This metric can help evaluate the retrieval mechanism for entities, based on comparison with entities present in ground_truths, because in cases where entities matter, we need the contexts which cover them.
* **Answer Semantic Similarity**: The concept of Answer Semantic Similarity pertains to the assessment of the semantic resemblance between the generated answer and the ground truth. This evaluation is based on the ground truth and the answer, with values falling within the range of 0 to 1. A higher score signifies a better alignment between the generated answer and the ground truth.
* **Answer Correctness**: The assessment of Answer Correctness involves gauging the accuracy of the generated answer when compared to the ground truth. This evaluation relies on the ground truth and the answer, with scores ranging from 0 to 1. A higher score indicates a closer alignment between the generated answer and the ground truth, signifying better correctness. Answer correctness encompasses two critical aspects: semantic similarity between the generated answer and the ground truth, as well as factual similarity. These aspects are combined using a weighted scheme to formulate the answer correctness score. Users also have the option to employ a ‘threshold’ value to round the resulting score to binary, if desired.
* **Aspect Critique**: This is designed to assess submissions based on predefined aspects such as harmlessness, maliciousness, coherence, and conciseness. The output of aspect critiques is binary, indicating whether the submission aligns with the defined aspect or not. This evaluation is performed using the ‘answer’ as input.

In [13]:
from ragas.metrics import (
        LLMContextRecall, 
        Faithfulness, 
        LLMContextPrecisionWithReference, 
        AnswerCorrectness, 
        ResponseRelevancy, 
        SemanticSimilarity, 
        AspectCritic
    )
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas import evaluate 

#You can also choose a different model for evaluation
llm_for_evaluation = LangchainLLMWrapper(ChatBedrockConverse(model_id=modelId, client=bedrock_client))
bedrock_embeddings = LangchainEmbeddingsWrapper(BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0",client=bedrock_client))

#specify the metrics here
metrics = [
    LLMContextRecall(llm=llm_for_evaluation), 
    LLMContextPrecisionWithReference(llm=llm_for_evaluation),
    AnswerCorrectness(llm=llm_for_evaluation, embeddings=bedrock_embeddings), 
    ResponseRelevancy(llm=llm_for_evaluation, embeddings=bedrock_embeddings),
    Faithfulness(llm=llm_for_evaluation),
    SemanticSimilarity(embeddings=bedrock_embeddings),
    AspectCritic(name="harmfulness", 
         definition="Does the submission cause or have the potential to cause harm to individuals, groups, or society at large?", 
         llm=llm_for_evaluation
        ),
    AspectCritic(name="maliciousness", 
                 definition="Is the submission intended to harm, deceive, or exploit users?", 
                 llm=llm_for_evaluation
                ),
    AspectCritic(name="coherence", 
             definition="Is the submission logical, relevant, and informative along with clear structure?", 
             llm=llm_for_evaluation
            ),
    AspectCritic(name="conciseness", 
         definition="Is the submission brief, direct, and avoids unnecessary wordiness while conveying intended meaning?", 
         llm=llm_for_evaluation
        )
    ]

result = evaluate(
    dataset = dataset, 
    metrics=metrics
)

df = result.to_pandas()

df.style.set_properties(**{'text-align': 'left'}).set_table_styles([ dict(selector='th', props=[('text-align', 'left')] ) ])
pd.options.display.max_colwidth = 8000
df

Evaluating:   0%|          | 0/10 [00:00<?, ?it/s]

Unnamed: 0,user_input,retrieved_contexts,response,reference,context_recall,llm_context_precision_with_reference,answer_correctness,answer_relevancy,faithfulness,semantic_similarity,harmfulness,maliciousness,coherence,conciseness
0,What did the president say about Ketanji Brown Jackson?,"[Vice President Harris and I ran for office with a new economic vision for America. \n\nInvest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up \nand the middle out, not from the top down. \n\nBecause we know that when the middle class grows, the poor have a ladder up and the wealthy do very well. \n\nAmerica used to have the best roads, bridges, and airports on Earth. \n\nNow our infrastructure is ranked 13th in the world. \n\nWe wont be able to compete for the jobs of the 21st Century if we dont fix that. \n\nThats why it was so important to pass the Bipartisan Infrastructure Lawthe most sweeping investment to rebuild America in history. \n\nThis was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen. \n\nWere done talking about infrastructure weeks. \n\nWere going to have an infrastructure decade., And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \n\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \n\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \n\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \n\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \n\nFirst, beat the opioid epidemic., Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyeran Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence., And my report is this: the State of the Union is strongbecause you, the American people, are strong. \n\nWe are stronger today than we were a year ago. \n\nAnd we will be stronger a year from now than we are today. \n\nNow is our moment to meet and overcome the challenges of our time. \n\nAnd we will, as one people. \n\nOne America. \n\nThe United States of America. \n\nMay God bless you all. May God protect our troops.]","The president stated that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson for the vacant seat on the United States Supreme Court to continue the legacy of retiring Justice Stephen Breyer. He described Judge Jackson as ""one of our nation's top legal minds"" who ""will continue Justice Breyer's legacy of excellence"" on the Supreme Court.\n\n</response>",The president described Ketanji Brown a one of the nation's top legal minds,1.0,0.333333,0.140029,0.429117,1.0,0.560116,0,0,1,1


## Helper Function to print documents

In [14]:
# Helper function for printing docs
def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [
                f"Document {i+1}:\n\n{d.page_content}\nMetadata: {d.metadata}"
                for i, d in enumerate(docs)
            ]
        )
    )

## Visualize the order of retrieved outputs

In [15]:
docs = retriever.invoke(query)
pretty_print_docs(docs)

Document 1:

Vice President Harris and I ran for office with a new economic vision for America. 

Invest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up  
and the middle out, not from the top down.  

Because we know that when the middle class grows, the poor have a ladder up and the wealthy do very well. 

America used to have the best roads, bridges, and airports on Earth. 

Now our infrastructure is ranked 13th in the world. 

We wont be able to compete for the jobs of the 21st Century if we dont fix that. 

Thats why it was so important to pass the Bipartisan Infrastructure Lawthe most sweeping investment to rebuild America in history. 

This was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen. 

Were done talking about infrastructure weeks. 

Were going to have an infrastructure decade.
Metadata: {'source': '/root/aws-generativeai-partner-samples/elastic/samples/media/state_of_the_union

## Reranking

In the context of a Retrieval-Augmented Generation (RAG) application, reranking refers to a technique used to improve the quality of the retrieved passages or documents from a large corpus, before they are used for downstream tasks such as question answering or text generation.

The reranking process typically involves the following steps:

1. Initial Retrieval: The RAG model first retrieves a set of top-k relevant passages or documents from the corpus, based on some initial retrieval method.

2. Reranking: The initially retrieved passages are then reranked using a reranking model, which takes into account additional information or signals, such as the question or input text, to better assess the relevance of each retrieved passage.

3. Top Passage Selection: After reranking, the top-N passages or documents with the highest reranked scores are selected for use in the downstream task, such as being passed to the generation component of the RAG model.

The reranking model is typically a neural network model that is trained to score or rank the relevance of each retrieved passage, given the input query or context. It can leverage various types of information, such as the textual similarity between the query and the passage, passage quality scores, or even external knowledge sources.

## Visualize the retrieved output after Reranking

Now let’s wrap our base retriever with a `ContextualCompressionRetriever`, using `FlashrankRerank` as a compressor. Then let's visualize the output. You will see the most relevant document with reference to Kitanji Brown is assigned the highest score and appears on top.

In [16]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import FlashrankRerank

compressor = FlashrankRerank()
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

compressed_docs = compression_retriever.invoke(query)
pretty_print_docs(compressed_docs)

INFO:elastic_transport.transport:POST https://58c8f86a2a884bb3860d278e8e0a8e81.us-east-1.aws.found.io:443/new-index-1/_search?_source_includes=metadata,text [status:200 duration:0.011s]


Document 1:

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyeran Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.
Metadata: {'id': 2, 'relevance_score': np.float32(0.99844295), 'source': '/root/aws-generativeai-partner-samples/elastic/samples/media/state_of_the_union.txt'}
---------------------------------------

## Preparing Evaluation Data with Reranked Retriever

In [17]:
questions = ["What did the president say about Ketanji Brown Jackson?"]

references = ["The president described Ketanji Brown a one of the nation's top legal minds"]

samples = []

for idx, query in enumerate(questions):
    samples.append(
        SingleTurnSample(
            user_input=query,
            retrieved_contexts=[docs.page_content for docs in compression_retriever.invoke(query)],
            response=qa_chain.invoke(query)["result"],
            reference=references[idx]
        )
    )

dataset = EvaluationDataset(samples=samples)

INFO:elastic_transport.transport:POST https://58c8f86a2a884bb3860d278e8e0a8e81.us-east-1.aws.found.io:443/new-index-1/_search?_source_includes=metadata,text [status:200 duration:0.012s]
INFO:elastic_transport.transport:POST https://58c8f86a2a884bb3860d278e8e0a8e81.us-east-1.aws.found.io:443/new-index-1/_search?_source_includes=metadata,text [status:200 duration:0.011s]


## Evaluate RAG after Reranking

You can expect to see improvements in the metrics like `answer relevancy`, `context precision` and `context recall`

In [18]:
#You can also choose a different model for evaluation
llm_for_evaluation = LangchainLLMWrapper(ChatBedrockConverse(model_id=modelId, client=bedrock_client))
bedrock_embeddings = LangchainEmbeddingsWrapper(BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0",client=bedrock_client))

#specify the metrics here
metrics = [
    LLMContextRecall(llm=llm_for_evaluation), 
    LLMContextPrecisionWithReference(llm=llm_for_evaluation),
    AnswerCorrectness(llm=llm_for_evaluation, embeddings=bedrock_embeddings), 
    ResponseRelevancy(llm=llm_for_evaluation, embeddings=bedrock_embeddings),
    Faithfulness(llm=llm_for_evaluation),
    SemanticSimilarity(embeddings=bedrock_embeddings),
    AspectCritic(name="harmfulness", 
         definition="Does the submission cause or have the potential to cause harm to individuals, groups, or society at large?", 
         llm=llm_for_evaluation
        ),
    AspectCritic(name="maliciousness", 
                 definition="Is the submission intended to harm, deceive, or exploit users?", 
                 llm=llm_for_evaluation
                ),
    AspectCritic(name="coherence", 
             definition="Is the submission logical, relevant, and informative along with clear structure?", 
             llm=llm_for_evaluation
            ),
    AspectCritic(name="conciseness", 
         definition="Is the submission brief, direct, and avoids unnecessary wordiness while conveying intended meaning?", 
         llm=llm_for_evaluation
        )
    ]

result = evaluate(
    dataset = dataset, 
    metrics=metrics
)

df = result.to_pandas()

df.style.set_properties(**{'text-align': 'left'}).set_table_styles([ dict(selector='th', props=[('text-align', 'left')] ) ])
pd.options.display.max_colwidth = 8000
df

Evaluating:   0%|          | 0/10 [00:00<?, ?it/s]

Unnamed: 0,user_input,retrieved_contexts,response,reference,context_recall,llm_context_precision_with_reference,answer_correctness,answer_relevancy,faithfulness,semantic_similarity,harmfulness,maliciousness,coherence,conciseness
0,What did the president say about Ketanji Brown Jackson?,"[Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyeran Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence., Vice President Harris and I ran for office with a new economic vision for America. \n\nInvest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up \nand the middle out, not from the top down. \n\nBecause we know that when the middle class grows, the poor have a ladder up and the wealthy do very well. \n\nAmerica used to have the best roads, bridges, and airports on Earth. \n\nNow our infrastructure is ranked 13th in the world. \n\nWe wont be able to compete for the jobs of the 21st Century if we dont fix that. \n\nThats why it was so important to pass the Bipartisan Infrastructure Lawthe most sweeping investment to rebuild America in history. \n\nThis was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen. \n\nWere done talking about infrastructure weeks. \n\nWere going to have an infrastructure decade., And my report is this: the State of the Union is strongbecause you, the American people, are strong. \n\nWe are stronger today than we were a year ago. \n\nAnd we will be stronger a year from now than we are today. \n\nNow is our moment to meet and overcome the challenges of our time. \n\nAnd we will, as one people. \n\nOne America. \n\nThe United States of America. \n\nMay God bless you all. May God protect our troops.]","According to the context provided, the president said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court. Specifically, he stated:\n\n""One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation's top legal minds, who will continue Justice Breyer's legacy of excellence.""\n\nThe president praised Judge Jackson as one of the nation's top legal minds and said she will continue the legacy of excellence set by the retiring Justice Stephen Breyer.\n\n</response>",The president described Ketanji Brown a one of the nation's top legal minds,1.0,1.0,0.150387,0.400471,1.0,0.601547,0,0,1,1


## Delete Elasticsearch Index

Delete the Elasticsearch index

In [19]:
es = Elasticsearch(cloud_id=cloud_id, api_key=cloud_api_key)
es.options(ignore_status=[400,404]).indices.delete(index=index_name)

INFO:elastic_transport.transport:DELETE https://58c8f86a2a884bb3860d278e8e0a8e81.us-east-1.aws.found.io:443/new-index-1 [status:200 duration:0.073s]


ObjectApiResponse({'acknowledged': True})

## Conclusion
You have now experimented with reranking to improve the output of a RAG Application.

### Take aways
- Adapt this notebook to experiment with different Claude 3 models available through Amazon Bedrock. 
- Change the prompts to your specific usecase and evaluate the output of different models.
- Play with the token length to understand the latency and responsiveness of the service.
- Apply different prompt engineering principles to get better outputs.

## Thank You