#  Building RAG-based radiology report suumarization with using Knowledge Bases for Amazon Bedrock - RetrieveAndGenerate API


With knowledge bases, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for Retrieval Augmented Generation (RAG). Access to additional data helps the model generate more relevant, context-speciﬁc, and accurate responses without continuously retraining the FM. All information retrieved from knowledge bases comes with source attribution to improve transparency and minimize hallucinations. For more information on creating a knowledge base using console, please refer to this post.

In this notebook, we will dive deep into building Radiology report summarization using RetrieveAndGenerate API provided by Knowledge Bases for Amazon Bedrock. This API will query the knowledge base to get the desired number of document chunks based on similarity search, integrate it with Large Language Model (LLM) for answering questions.

Pattern
We can implement the solution using Retreival Augmented Generation (RAG) pattern. RAG retrieves data from outside the language model (non-parametric) and augments the prompts by adding the relevant retrieved data in context. Here, we are performing RAG effectively on the knowledge base created in the previous notebook or using console.

Pre-requisite
The sample reports  must be processed and stored in knowledge base.

Load the documents into the knowledge base by connecting your s3 bucket (data source).
Knowledge base will split them into smaller chunks (based on the strategy selected), generate embeddings and store it in the associated vectore store and 

Notebook Walkthrough
For our notebook we will use the RetreiveAndGenerate API provided by Knowledge Bases for Amazon Bedrock which converts user queries into embeddings, searches the knowledge base, get the relevant results, augment the custom prompt and then invoking a LLM to generate the response.



Make sure right version of SDK are usec
⚠ For this lab we need to run the notebook based on a Python 3.10 runtime with Boto3 > 1.34.79 version ⚠

Setup
Install following packages.


In [None]:
%pip install --no-build-isolation --force-reinstall \
    "boto3>=1.34.79.57" \
    "awscli>=1.29.57" \
    "botocore>=1.31.57"

In [None]:
import boto3
import pprint
from botocore.client import Config

pp = pprint.PrettyPrinter(indent=2)

bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})
bedrock_client = boto3.client('bedrock-runtime')
bedrock_agent_client = boto3.client("bedrock-agent-runtime",
                              config=bedrock_config)
boto3_session = boto3.session.Session()
region_name = boto3_session.region_name

kb_id = "XXXX" # replace it with the Knowledge Base ID
model_id = "anthropic.claude-3-sonnet-20240229-v1:0"
region_id = region_name # replace it with the region you're running sagemaker notebook


In [32]:
def retrieveAndGenerate(input, kbId, sessionId=None, model_id = "anthropic.claude-3-sonnet-20240229-v1:0", region_id = "us-east-1"):
    model_arn = f'arn:aws:bedrock:{region_id}::foundation-model/{model_id}'
    #print(f'{model_id}')
    promptTemplate = f"""
    You have to generate radiology report impressions based on the following findings. Your job is to generate impression using only information from the search results.
    Return only a single sentence and do not return the findings given.
   
    Findings: $query$
                          
    Here are the search results in numbered order:
    $search_results$ """
    
  
    return bedrock_agent_client.retrieve_and_generate(
        input={
            'text': input
        },
        retrieveAndGenerateConfiguration={
            'knowledgeBaseConfiguration': {
                'generationConfiguration': {
                    'promptTemplate': {
                    'textPromptTemplate': promptTemplate
                    }
                },
                'knowledgeBaseId': kbId,
                'modelArn': model_arn,
                'retrievalConfiguration': {
                    'vectorSearchConfiguration': {
                        'numberOfResults': 3,
                        'overrideSearchType': 'HYBRID'
                        }
                }
               
            },
            'type': 'KNOWLEDGE_BASE'
            
        },
    )
    

In [None]:
query = "Stability of the severe bilateral pleural effusion with compressive atelectasis. There is no visible pneumothorax. The tracheostomy and left-sided subclavian line is unchanged. The mediastinal and cardiac contour are stable. The nasogastric tube and feeding tube has been removed since the previous exam."
response = retrieveAndGenerate(query, kb_id,model_id,region_id=region_id)
generated_text = response['output']['text']
pp.pprint(generated_text)

In [None]:
citations = response["citations"]
contexts = []
for citation in citations:
    retrievedReferences = citation["retrievedReferences"]
    for reference in retrievedReferences:
         contexts.append(reference["content"]["text"])

pp.pprint(contexts)

### Model Evaluation
Dev 1

In [None]:
import pandas as pd
dev1 = pd.read_csv('dev1.csv')
dev1.head()

In [None]:
query_list_dev1 = dev1.iloc[:,2].to_list()
len(query_list_dev1)

In [36]:
from botocore.exceptions import ClientError

def generate_reports(query_list):
    results = []
    for query in query_list:
        try:
            generated_text =  ""
            response = retrieveAndGenerate(query, kb_id,model_id,region_id=region_id)
            generated_text = response['output']['text']
        except ClientError as e:
            print(f'Error generating impression bucket {e}')
        results.append(generated_text)
    return results


In [None]:
result_list_dev1 = generate_reports(query_list_dev1)

In [39]:
dev1['rag_claude3_impressions'] = result_list_dev1

In [None]:
dev1['rag_claude3_impressions'] = dev1['rag_claude3_impressions'].str.replace('Impression:', '')
dev1

In [None]:
 !pip install evaluate
 !pip install rouge_score

In [42]:
dev1.to_csv("RAG_results/dev1_3rag_bedrock_kb.csv", index = False)

In [43]:
import pandas as pd
import matplotlib.pyplot as plt
dev1 = pd.read_csv("RAG_results/dev1_3rag_bedrock_kb.csv")

In [None]:
import evaluate
from rouge_score import rouge_scorer, scoring
#from transformers import AutoTokenizer, BartTokenizer


rouge_score = evaluate.load("rouge") #"/home/hd/hd_hd/hd_rk435/evaluate/metrics/rouge")
#tokenizer = BartTokenizer.from_pretrained("facebook/bart-large")
result_RAGClaude3_dev1 = rouge_score.compute(predictions=list(dev1['rag_claude3_impressions']), references=list(dev1["impression"]), use_aggregator=True) #, use_stemmer=True) #, tokenizer=tokenizer)
print("ROUGE Score for RAG Implentation with Claudev3 Model on Dev1 Set:")
print(result_RAGClaude3_dev1)

In [None]:
results_RAGClaude3_dev1_all = rouge_score.compute(predictions=list(dev1['rag_claude3_impressions']), references=list(dev1["impression"]), use_aggregator=False)
results_RAGClaude3_dev1_all_df = pd.DataFrame(results_RAGClaude3_dev1_all)
results_RAGClaude3_dev1_all_df.plot(kind='box', color = 'red')

plt.savefig('RAG_dev1_KB.png', bbox_inches='tight', dpi = 300)

In [None]:
results_RAGClaude3_dev1_all_df.describe()

Model Evaluation
Dev 2

In [None]:
dev2 = pd.read_csv('dev2.csv')
dev2.head()

In [None]:
query_list_dev2 = dev2.iloc[:,2].to_list()
len(query_list_dev2)
result_list_dev2 = generate_reports(query_list_dev2)

In [None]:
dev2['rag_claude3_impressions'] = result_list_dev2
dev2['rag_claude3_impressions'] = dev2['rag_claude3_impressions'].str.replace('Impression:', '')
dev2

In [52]:
dev2.to_csv("RAG_results/dev2_3rag_bedrock_kb.csv", index = False)

In [53]:
import pandas as pd
import matplotlib.pyplot as plt
dev2 = pd.read_csv("RAG_results/dev2_3rag_bedrock_kb.csv")

In [None]:
import evaluate
from rouge_score import rouge_scorer, scoring
#from transformers import AutoTokenizer, BartTokenizer


rouge_score = evaluate.load("rouge") #"/home/hd/hd_hd/hd_rk435/evaluate/metrics/rouge")
#tokenizer = BartTokenizer.from_pretrained("facebook/bart-large")
result_RAGClaude3_dev2 = rouge_score.compute(predictions=list(dev2['rag_claude3_impressions']), references=list(dev1["impression"]), use_aggregator=True) #, use_stemmer=True) #, tokenizer=tokenizer)
print("ROUGE Score for RAG Implentation with Claudev3 Model on Dev2 Set:")
print(result_RAGClaude3_dev2)

In [None]:
results_RAGClaude3_dev2_all = rouge_score.compute(predictions=list(dev2['rag_claude3_impressions']), references=list(dev2["impression"]), use_aggregator=False)
results_RAGClaude3_dev2_all_df = pd.DataFrame(results_RAGClaude3_dev2_all)
results_RAGClaude3_dev2_all_df.plot(kind='box', color = 'red')

plt.savefig('RAG_dev2_KB.png', bbox_inches='tight', dpi = 300)

In [None]:
results_RAGClaude3_dev2_all_df.describe()