## Custom JSON Processing with Transformation Functions in Amazon Bedrock Knowledge Bases

In modern RAG applications, the ability to effectively process and transform data before it reaches your Foundation Models is crucial for optimal performance. While standard JSON processing works for many use cases, complex enterprise applications often require more nuanced control over how their data is structured and presented. 

Just as query reformulation helps break down complex queries for better retrieval, transformation functions allow you to reshape and refine your JSON data to better serve your specific use case. This capability is particularly valuable when working with varied data sources or when you need to standardize information across different formats. By customizing how your JSON data is processed, you can enhance the quality of responses from your RAG applications while maintaining efficiency and scalability.

This example will explore how to leverage transformation functions in Amazon Bedrock Knowledge Bases to optimize your JSON processing pipeline and achieve more precise and relevant results from your GenAI applications.

### Dataset

This example is processing JSON files from FootballData dataset, for UEFA European Championship 2016.

We're not going to use entire dataset, we just filtered a couple of teams, to be part of our example.

Dataset can be found on [this](https://github.com/jokecamp/FootballData/tree/master/UEFA_European_Championship/Euro%202016/players_json) under MIT license.

### 1. Import the needed libraries

First step is to install the pre-requisites packages.

In [None]:
%pip install --upgrade pip --quiet
%pip install -r ../requirements.txt --no-deps --quiet
%pip install -r ../requirements.txt --upgrade --quiet

In [None]:
# Uncomment to restart kernel

#import IPython

#IPython.Application.instance().kernel.do_shutdown(True)

In [None]:
import sys
import boto3
import json
from datetime import datetime


sys.path.insert(0, ".")
sys.path.insert(1, "..")


from utils.knowledge_base import BedrockKnowledgeBase

Following are clients and variables that will be used across this example:

In [None]:
#Clients
s3_client = boto3.client('s3')
sts_client = boto3.client('sts')
session = boto3.session.Session()
region =  session.region_name
account_id = sts_client.get_caller_identity()["Account"]
bedrock_client = boto3.client('bedrock') 
bedrock_agent_client = boto3.client('bedrock-agent')
bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime')
iam_client = boto3.client('iam')


region, account_id

In [None]:
import time

# Get the current timestamp
current_time = time.time()

# Format the timestamp as a string
timestamp_str = time.strftime("%Y%m%d%H%M%S", time.localtime(current_time))[-7:]
# Create the suffix using the timestamp
suffix = f"{timestamp_str}"

knowledge_base_name_custom = 'custom-chunking-kb'
knowledge_base_description = "Knowledge Base containing complex Json"
bucket_name = f'{knowledge_base_name_custom}-{suffix}'
intermediate_bucket_name = f'{knowledge_base_name_custom}-intermediate-{suffix}'
lambda_function_name = f'{knowledge_base_name_custom}-lambda-{suffix}'
foundation_model = "anthropic.claude-3-sonnet-20240229-v1:0"

# Define data sources
data_source=[{"type": "S3", "bucket_name": bucket_name}]

### 2 - Create Lambda Function

Following customized Lambda function will work as a transformation function to process JSON elements from input datasets and split it before ingest on Vector Database.


In [None]:
%%writefile lambda_function.py
import json
import logging
import boto3


logger = logging.getLogger()
logger.setLevel(logging.INFO)


def read_s3_file(s3_client, bucket, key):
    response = s3_client.get_object(Bucket=bucket, Key=key)
    return json.loads(response['Body'].read().decode('utf-8'))

def write_to_s3(s3_client, bucket, key, content):
    s3_client.put_object(Bucket=bucket, Key=key, Body=json.dumps(content))


def lambda_handler(event, context):
    logger.info('input={}'.format(json.dumps(event)))
    s3 = boto3.client('s3')

    # Extract relevant information from the input event
    input_files = event.get('inputFiles')
    input_bucket = event.get('bucketName')

    if not all([input_files, input_bucket]):
        raise ValueError("Missing required input parameters")

    output_files = []

    for input_file in input_files:
        logger.info('input file ={}'.format(input_file))
        content_batches = input_file.get('contentBatches', [])
        original_file_location = input_file.get('originalFileLocation', {})

        processed_batches = []

        for batch in content_batches:
            input_key = batch.get('key')
            #print(input_key)
            if not input_key:
                    raise ValueError("Missing key in content batch")

            file_content = read_s3_file(s3, input_bucket, input_key)

            # Process content
            file_key = "sheets"
            json_content = json.loads(file_content['fileContents'][0]['contentBody'])
            sec_key = ""
            
            if 'Players' in json_content[file_key]:
                sec_key = 'Players'
            elif 'Teams' in json_content[file_key]:
                sec_key = 'Teams'
            else:
                raise Exception("Key Not Found on File")

            for i in json_content[file_key][sec_key]:
                filename_key = input_key.split('/')[-1].replace('-players_1.JSON','')
                i.update({'filename': filename_key})
                print(i)
                id_key = 'name' if 'name' in i else 'Team'
                #output_key = "output/{}_{}.json".format(file_key, i['id'])
                output_key = "output/{}_{}.json".format(filename_key, i[id_key])
                print(output_key)
                processed_content = {'fileContents': []}
                processed_content['fileContents'].append({
                        'contentType': 'json', 
                        'contentBody': json.dumps(i)
                })
                
                # Write processed content back to S3
                write_to_s3(s3, input_bucket, output_key, processed_content)

                # Add processed batch information
                processed_batches.append({
                    'key': output_key
                })
        
        output_file = {
            'originalFileLocation': original_file_location,
            'contentBatches': processed_batches
        }

        output_files.append(output_file)

    result = {'outputFiles': output_files}

    return result

### 3 - Create Knowledge Base with custom chunking strategy

Let's start by creating a Amazon Bedrock Knowledge Base to store couple of datasets (on [dataset/](../dataset/) folder):

- `team-player.json`: data from specific team, where team will be an specific Country soccer team, for example: `italy-players.json` has information about Italy team during Euro 2016.

**As mentioned, in [Dataset](#Dataset) section, this is an open dataset under MIT license.**

Knowledge Bases allow you to integrate with different vector databases including Amazon OpenSearch Serverless, Amazon Aurora, Pinecone, Redis Enterprise and MongoDB Atlas. For this example, we will integrate the knowledge base with Amazon OpenSearch Serverless. To do so, we will use the helper class BedrockKnowledgeBase which will create the knowledge base and all of its pre-requisites:

1. IAM roles and policies
1. S3 bucket
1. Amazon OpenSearch Serverless encryption, network and data access policies
1. Amazon OpenSearch Serverless collection
1. Amazon OpenSearch Serverless vector index
1. Knowledge base
1. Knowledge base data source
1. Create a knowledge base using CUSTOM chunking strategy.

In [None]:
knowledge_base_custom = BedrockKnowledgeBase(
    kb_name=f'{knowledge_base_name_custom}-{suffix}',
    kb_description=knowledge_base_description,
    data_sources=data_source,
    lambda_function_name=lambda_function_name,
    intermediate_bucket_name=intermediate_bucket_name, 
    chunking_strategy = "CUSTOM",
    suffix = f'{suffix}-c',
)

### 4 - Upload datasets to S3 and start ingestion Job

After Knowledge Base creation, let's upload both datasets into a S3 Bucket.

In [None]:
import glob

file_list = glob.glob('../dataset/*.json') 
file_list

In [None]:
for f in file_list:
    s3_client.upload_file(f, bucket_name, f.split('/')[-1])

Now, let's start the ingestion job to process those files.

If you want to check processing logs, you can find lambda function attached to your Knowledge Base and go to monitoring tab, to find Cloud Watch Logs link and see the logs.

In [None]:
# ensure that the kb is available
time.sleep(30)
# sync knowledge base
knowledge_base_custom.start_ingestion_job()

### 5 - Test Knowledge Base

Now the Knowlegde Base is available we can test it out using the [retrieve](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent-runtime/client/retrieve.html) and [retrieve_and_generate](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent-runtime/client/retrieve_and_generate.html) functions.

First, let's retrieve Knowledge Base ID and store it

In [None]:
kb_id_custom = knowledge_base_custom.get_knowledge_base_id()

#### 5.1 Testing Knowledge Base with Retrieve and Generate API

Now, let's start with a simple question, asking about a place called Elephanta Caves and languages they speak over there.

The answer is in the `"id":1037` on the `destinations.json` file, which means Mumbai is the expected answer with Marathi, Hindi, and English being the languages spoken there."

In [None]:
query = "Who is main Spain's goalkeeper?" 
# Expected: Casillas

In [None]:
response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": query
    },
    retrieveAndGenerateConfiguration={        
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id_custom,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/{}".format(region, foundation_model),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults": 10
                } 
            }
        }
    }
)

print(response['output']['text'],end='\n'*2)

Now let's ask another question, about places where they speak Japanese and also visualize both APIs, to see data returned from knowledge base and model thinking with those answers.

In [None]:
query = "Who was the midfielder player that has more goals for Spain?" 
# Expected: David Silva

In [None]:
response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id_custom,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/{}".format(region, foundation_model),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults": 20
                } 
            }
        }
    }
)

print(response['output']['text'],end='\n'*2)

As you can see, with the retrieve and generate API, we get the final response directly. Now let's observe the citations for the RetrieveAndGenerate API.

Since, our primary focus on this notebook is to observe the retrieved chunks and citations returned by the model while generating the response. When we provide the relevant context to the foundation model alongwith the query, it will most likely generate the high quality response.

In [None]:
def citations_rag_print(response_ret):
#structure 'retrievalResults': list of contents. Each list has content, location, score, metadata
    for num,chunk in enumerate(response_ret,1):
        print(f'Chunk {num}: ',chunk['content']['text'],end='\n'*2)
        print(f'Chunk {num} Location: ',chunk['location'],end='\n'*2)
        print(f'Chunk {num} Metadata: ',chunk['metadata'],end='\n'*2)

In [None]:
response_custom = response['citations'][0]['retrievedReferences']
print("# of citations or chunks used to generate the response: ", len(response_custom))
citations_rag_print(response_custom)

#### 5.2 Testing Knowledge Base with Retrieve API

If you need an extra layer of control, you can retrieve the chunks that best match your query using the retrieve API. In this setup, we can configure the desired number of results and control the final answer with your own application logic. The API then provides you with the matching content, its S3 location, the similarity score and the chunk metadata.

In [None]:
def response_print(response_ret):
#structure 'retrievalResults': list of contents. Each list has content, location, score, metadata
    for num,chunk in enumerate(response_ret['retrievalResults'],1):
        print(f'Chunk {num}: ',chunk['content']['text'],end='\n'*2)
        print(f'Chunk {num} Location: ',chunk['location'],end='\n'*2)
        print(f'Chunk {num} Score: ',chunk['score'],end='\n'*2)
        print(f'Chunk {num} Metadata: ',chunk['metadata'],end='\n'*2)

In [None]:
response_custom_ret = bedrock_agent_runtime_client.retrieve(
    knowledgeBaseId=kb_id_custom, 
    nextToken='string',
    retrievalConfiguration={
        "vectorSearchConfiguration": {
            "numberOfResults":10,
        } 
    },
    retrievalQuery={
        'text': query
    }
)
print("# of citations or chunks used to generate the response: ", len(response_custom_ret['retrievalResults']))
response_print(response_custom_ret)

As you can notice, with CUSTOM chunking, we get 5 retrieved results as requested in the API using semantic similarity, which is the default for the Retrieve API.

Those references are stored separately in the Vector Database, following the JSON structure, but all of them are part of the same file. This makes our model return better responses.

### 6 - Create Knowledge Base without custom chunking

On this optional step, you are going to create another Knowledge base, using fixed chunking strategy.

This will be baseline Knowledge base to compare against previous created with custom Json chunking.

In [None]:
# Get the current timestamp
current_time = time.time()

# Format the timestamp as a string
timestamp_str = time.strftime("%Y%m%d%H%M%S", time.localtime(current_time))[-7:]
# Create the suffix using the timestamp
suffix = f"{timestamp_str}"

In [None]:
knowledge_base_name_regular = 'fixed-chunk'

knowledge_base_regular = BedrockKnowledgeBase(
    kb_name=f'{knowledge_base_name_regular}-{suffix}',
    #kb_description=knowledge_base_description,
    data_sources=data_source,
    suffix = f'{suffix}-c',
    #chunking_strategy = "FIXED"
)

In [None]:
# sync knowledge base
knowledge_base_regular.start_ingestion_job()

In [None]:
kb_id_fixed = knowledge_base_regular.get_knowledge_base_id()

Quick test to check that KB with fixed size is working

In [None]:
query = "Who was the midfielder player that has more goals for Spain?" 
# Expected: David Silva

In [None]:
response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': kb_id_fixed,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/{}".format(region, foundation_model),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults": 20
                } 
            }
        }
    }
)

print(response['output']['text'],end='\n'*2)

###  7 Evaluating RAG Performance

Now, we're going to use Bedrock Model Evaluation feature (for RAG) to evaluate how accurate our KB is, and compare it with a fixed size KB.

We have ground truth [examples](ground_truth.jsonl) with sample answers based into our dataset. It contains 15 examples.

Create bucket to store eval dataset and evaluation outputs

In [None]:
eval_bucket_name = f'eval-rag-bucket-{suffix}'
eval_role_name = f'Amazon-Bedrock-Eval-Role-{suffix}'

kb_retrieve_policy_name = f'Bedrock-Eval-Policy-Retrieve-{suffix}'
kb_invoke_model_policy_name = f'Bedrock-Eval-Policy-Invoke-{suffix}'
kb_bucket_policy_name = f'Bedrock-Eval-Policy-S3-{suffix}'


eval_bucket_name, eval_role_name

In [None]:
eval_bucket = s3_client.create_bucket(Bucket=eval_bucket_name)

#### 7.1 Create Policy and Role

On this step, we're going to create required permissions to run RAG Evaluation.

In [None]:
kb_retrieve_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowKBCombinedCallOnKnowledgeBaseInstance",
            "Effect": "Allow",
            "Action": [
                "bedrock:Retrieve",
                "bedrock:RetrieveAndGenerate"
            ],
            "Resource": 
            [f"arn:aws:bedrock:{region}:459440633540:knowledge-base/{kb_id_custom}",
             f"arn:aws:bedrock:{region}:459440633540:knowledge-base/{kb_id_fixed}"]
        }
    ]
}

kb_invoke_model_policy = {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "AllowAccessToBedrockResources",
                "Effect": "Allow",
                "Action": [
                    "bedrock:InvokeModel",
                    "bedrock:InvokeModelWithResponseStream",
                    "bedrock:CreateModelInvocationJob",
                    "bedrock:StopModelInvocationJob",
                    "bedrock:GetProvisionedModelThroughput",
                    "bedrock:GetInferenceProfile",
                    "bedrock:ListInferenceProfiles",
                    "bedrock:GetImportedModel",
                    "bedrock:GetPromptRouter",
                    "sagemaker:InvokeEndpoint"
                ],
                "Resource": [
                    "arn:aws:bedrock:*::foundation-model/*",
                    f"arn:aws:bedrock:*:{account_id}:inference-profile/*",
                    f"arn:aws:bedrock:*:{account_id}:provisioned-model/*",
                    f"arn:aws:bedrock:*:{account_id}:imported-model/*",
                    f"arn:aws:bedrock:*:{account_id}:application-inference-profile/*",
                    f"arn:aws:bedrock:*:{account_id}:default-prompt-router/*",
                    f"arn:aws:sagemaker:*:{account_id}:endpoint/*",
                    f"arn:aws:bedrock:*:{account_id}:marketplace/model-endpoint/all-access"
                ]
            }
        ]
}

kb_bucket_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "FetchInputBuckets",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                f"arn:aws:s3:::{eval_bucket_name}",
                f"arn:aws:s3:::{eval_bucket_name}/*"
            ]
        },
        {
            "Sid": "FetchAndUpdateOutputBucket",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket",
                "s3:PutObject",
                "s3:GetBucketLocation",
                "s3:AbortMultipartUpload",
                "s3:ListBucketMultipartUploads"
            ],
            "Resource": [
                f"arn:aws:s3:::{eval_bucket_name}",
                f"arn:aws:s3:::{eval_bucket_name}/*"
            ]
        }
    ]
}

assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowBedrockToAssumeRole",
            "Effect": "Allow",
            "Principal": {
                "Service": "bedrock.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": f"{account_id}"
                },
                "ArnEquals": {
                    "aws:SourceArn": f"arn:aws:bedrock:{region}:{account_id}:evaluation-job/*"
                }
            }
        }
    ]
}

kb_retrieve_policy_json = json.dumps(kb_retrieve_policy)
kb_invoke_model_policy_json = json.dumps(kb_invoke_model_policy)
kb_bucket_policy_json = json.dumps(kb_bucket_policy)

assume_role_policy_document_json = json.dumps(assume_role_policy_document)

Create Evaluation Role

In [None]:
kb_eval_iam_role = iam_client.create_role(
    RoleName=eval_role_name,
    AssumeRolePolicyDocument=assume_role_policy_document_json
)

In [None]:
kb_eval_iam_role['Role']['RoleName']

Create policies

In [None]:
kb_eval_retrieve_response = iam_client.create_policy(
    PolicyName=kb_retrieve_policy_name,
    PolicyDocument= kb_retrieve_policy_json
)

iam_client.attach_role_policy(
            RoleName=kb_eval_iam_role['Role']['RoleName'],
            PolicyArn=kb_eval_retrieve_response['Policy']['Arn']
)

In [None]:
kb_invoke_model_policy_response = iam_client.create_policy(
    PolicyName=kb_invoke_model_policy_name,
    PolicyDocument= kb_invoke_model_policy_json
)

iam_client.attach_role_policy(
            RoleName=kb_eval_iam_role['Role']['RoleName'],
            PolicyArn=kb_invoke_model_policy_response['Policy']['Arn']
)

In [None]:
kb_bucket_policy_response = iam_client.create_policy(
    PolicyName=kb_bucket_policy_name,
    PolicyDocument= kb_bucket_policy_json
)

iam_client.attach_role_policy(
            RoleName=kb_eval_iam_role['Role']['RoleName'],
            PolicyArn=kb_bucket_policy_response['Policy']['Arn']
)    

#### 7.2 Create Evaluation Jobs

We're going to create two evaluation jobs, first one for Custom RAG, with JSON chunking and second one for fixed size chunking, not considering custom function.

In [None]:
eval_file_name = 'ground_truth.jsonl'
s3_client.upload_file(eval_file_name, eval_bucket_name, f'evaluation/{eval_file_name}')

In [None]:
job_name_chuck = f"kb-evaluation-json-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"
job_name_fixed = f"kb-evaluation-fixed-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"
input_uri = f's3://{eval_bucket_name}/evaluation/{eval_file_name}'
output_uri = f's3://{eval_bucket_name}/output/'
#eval_model = 'anthropic.claude-3-5-sonnet-20240620-v1:0'
eval_model = 'anthropic.claude-3-haiku-20240307-v1:0'
generator_model = 'us.anthropic.claude-3-5-haiku-20241022-v1:0'

# Configure retrieval settings
num_results = 20
#search_type = "HYBRID"
search_type = "SEMANTIC"

input_uri, output_uri

This job will evaluate two metrics:

**Builtin.Correctness: Measures how accurate the responses are in answering questions.**

Why is this important? This metric is crucial to surfacing the issue of the responses not using accurate information to answer the questions. A lower score indicates that there's an issue.

How does scoring of 0-1 work? 1 indicates fully correct answers, 0 indicates incorrect answers. The higher the score the more correct the answers.


**Builtin.Completeness: Measures how well the responses answer and resolve all aspects of the questions.**

Why is this important? This metric is crucial to surfacing the issue of the responses not addressing all of the requirements of the questions. A lower score indicates that there's an issue.

How does scoring of 0-1 work? 1 indicates fully complete answers, 0 indicates entirely incomplete answers. The higher the score the more complete the answers.

In [None]:
def evaluate_rag(job_name, kb_id, num_results, search_type):
    resp = bedrock_client.create_evaluation_job(
        jobName=job_name,
        jobDescription="Evaluate retrieval performance",
        roleArn=kb_eval_iam_role['Role']['Arn'],
        applicationType="RagEvaluation",
        inferenceConfig={
            "ragConfigs": [{
            "knowledgeBaseConfig": {
                "retrieveAndGenerateConfig": {
                    "type": "KNOWLEDGE_BASE",
                    "knowledgeBaseConfiguration": {
                        "knowledgeBaseId": kb_id,
                        "modelArn": generator_model,
                        "retrievalConfiguration": {
                            "vectorSearchConfiguration": {
                                "numberOfResults": num_results,
                                "overrideSearchType": search_type
                                }
                            }
                        }
                    }
                }
            }]
        },
        outputDataConfig={
            "s3Uri": output_uri
        },
        evaluationConfig={
            "automated": {
                "datasetMetricConfigs": [{
                    "taskType": "Generation",
                    "dataset": {
                        "name": "RagDataset",
                        "datasetLocation": {
                            "s3Uri": input_uri
                        }
                    },
                    "metricNames": [
                        "Builtin.Correctness",
                        "Builtin.Completeness"
                    ]
                }],
                "evaluatorModelConfig": {
                    "bedrockEvaluatorModels": [{
                        "modelIdentifier": eval_model
                    }]
                }
            }
        }
    )
    return resp

In [None]:
def check_eval_job(job_arn):

    # Get job ARN based on job type
    evaluation_job_arn = job_arn

    # Check job status
    response = bedrock_client.get_evaluation_job(
        jobIdentifier=evaluation_job_arn 
    )
    print(f"Job Status: {response['status']}")

    return response

In [None]:
custom_rag_eval = evaluate_rag(job_name_chuck, kb_id_custom, num_results, search_type)

In [None]:
custom_job_resp = check_eval_job(custom_rag_eval['jobArn'])

In [None]:
fixed_rag_eval = evaluate_rag(job_name_fixed, kb_id_fixed, num_results, search_type)

In [None]:
fixed_job_resp = check_eval_job(fixed_rag_eval['jobArn'])

#### 7.3 Read evaluation files and compare results

This step will download generate output files into local folder to compare results.

In [None]:
out_dir_resp = f"output/{custom_job_resp['jobName']}/{custom_job_resp['jobArn'].split('/')[-1]}/inference_configs/0/datasets/RagDataset/"
#out_dir_resp
s3_files_resp = s3_client.list_objects_v2(Bucket=eval_bucket_name, Prefix=out_dir_resp)

In [None]:
s3_files_resp = s3_client.download_file(eval_bucket_name, s3_files_resp['Contents'][0]['Key'], 'custom_kb.jsonl')

In [None]:
out_dir_resp = f"output/{fixed_job_resp['jobName']}/{fixed_job_resp['jobArn'].split('/')[-1]}/inference_configs/0/datasets/RagDataset/"

#out_dir_resp
s3_files_resp = s3_client.list_objects_v2(Bucket=eval_bucket_name, Prefix=out_dir_resp)

In [None]:
s3_files_resp = s3_client.download_file(eval_bucket_name, s3_files_resp['Contents'][0]['Key'], 'fixed_kb.jsonl')

Now, let's process downloaded files and compare both metrics

In [None]:
# Custom KB with Json chunking
import pandas as pd

custom_kb_df = pd.read_json('custom_kb.jsonl', lines=True)
json_objects = []

for index, row in custom_kb_df.iterrows():
    row_dict = row.to_dict()
    json_objects.append(row_dict)


ans_correctness_custom = []
ans_completeness_custom = []

for obj in json_objects:
    for i in obj['conversationTurns'][0]['results']:
        if i['metricName'] == "Builtin.Correctness":
            rresult = 0 if i["result"] is None else i["result"]
            ans_correctness_custom.append(rresult)
        elif i['metricName'] == "Builtin.Completeness":
            rresult = 0 if i["result"] is None else i["result"]
            ans_completeness_custom.append(rresult)
        else:
            raise Exception

In [None]:
# Regular KB with Fixed chunking
import pandas as pd

fixed_kb_df = pd.read_json('fixed_kb.jsonl', lines=True)
json_objects = []

for index, row in fixed_kb_df.iterrows():
    row_dict = row.to_dict()
    json_objects.append(row_dict)


ans_correctness_fixed = []
ans_completeness_fixed = []

for obj in json_objects:
    for i in obj['conversationTurns'][0]['results']:
        if i['metricName'] == "Builtin.Correctness":
            rresult = 0 if i["result"] is None else i["result"]
            ans_correctness_fixed.append(rresult)
        elif i['metricName'] == "Builtin.Completeness":
            rresult = 0 if i["result"] is None else i["result"]
            ans_completeness_fixed.append(rresult)
        else:
            raise Exception

Let's run next cell and compare results

In [None]:
custom_correctness_final = round((sum(ans_correctness_custom)/len(ans_correctness_custom)) * 100, 2)
custom_completeness_final = round((sum(ans_completeness_custom)/len(ans_completeness_custom)) * 100, 2)
fixed_correctness_final = round((sum(ans_correctness_fixed)/len(ans_correctness_fixed)) * 100, 2)
fixed_completeness_final = round((sum(ans_completeness_fixed)/len(ans_completeness_fixed)) * 100, 2)

print(f"Fixed Chunking Strategy - Correctness:{fixed_correctness_final}, Json Custom Chunking - Correctness:{custom_correctness_final}")
print(f"Fixed Chunking Strategy - Completeness:{fixed_completeness_final}, Json Custom Chunking - Completeness:{custom_completeness_final}")

#### 7.4 Conclusions

To see detailed information about model comparison, you can go to Amazon Bedrock console, and then open Evaluations tab. On Evaluations tab, click on RAG tab.

You will be able to see evaluation jobs that had been executed. Select your two jobs and click on "Compare" button:

![Evaluations](../images/eval-models.png)

You will able to see comparison metrics as folowing:

![Evaluations](../images/model-eval-1.png)

![Evaluations](../images/model-eval-2.png)

As you can see, on following report (generated on AWS Console) JSON custom chunking has shown 7.1% of improvement in Correctness (0.75 against 0.70) and 15.6% in Completeness (0.93 against 0.8), using 15 data samples for this evaluation job.

### 7 - Clean Up

To clean up resources, execute following method from helper class:

In [None]:
knowledge_base_custom.delete_kb(delete_s3_bucket=True, delete_lambda_function=True)

In [None]:
knowledge_base_regular.delete_kb(delete_s3_bucket=True, delete_lambda_function=True)

In [None]:
eval_bucket_name

In [None]:
kb_eval_iam_role['Role']['RoleName']

In [None]:
iam_client.detach_role_policy(RoleName=kb_eval_iam_role['Role']['RoleName'],
                              PolicyArn=kb_invoke_model_policy_response['Policy']['Arn'])

iam_client.detach_role_policy(RoleName=kb_eval_iam_role['Role']['RoleName'],
                              PolicyArn=kb_bucket_policy_response['Policy']['Arn'])

iam_client.detach_role_policy(RoleName=kb_eval_iam_role['Role']['RoleName'],
                              PolicyArn=kb_eval_retrieve_response['Policy']['Arn'])

In [None]:
iam_client.delete_role(RoleName=kb_eval_iam_role['Role']['RoleName'])

In [None]:
iam_client.delete_policy(PolicyArn=kb_eval_retrieve_response['Policy']['Arn'])
iam_client.delete_policy(PolicyArn=kb_invoke_model_policy_response['Policy']['Arn'])
iam_client.delete_policy(PolicyArn=kb_bucket_policy_response['Policy']['Arn'])

#### 7.1 Delete Bucket (Optional)

This action will delete Evaluation reports generated (it will stop to work on AWS Console also).

In [None]:
s3 = boto3.resource('s3')

bucket = s3.Bucket(eval_bucket_name)
if bucket in s3.buckets.all():
    print(f"Found bucket {eval_bucket_name}")
    # Delete all objects including versions (if versioning enabled)
    bucket.object_versions.delete()
    bucket.objects.all().delete()
    print(f"Deleted all objects in bucket {eval_bucket_name}")
    
    # Delete the bucket
    bucket.delete()
    print(f"Deleted bucket {eval_bucket_name}")