# Using Bedrock , Evaluating the Ideal Chunk Size for a RAG System using LlamaIndex

- https://blog.llamaindex.ai/evaluating-the-ideal-chunk-size-for-a-rag-system-using-llamaindex-6207e5d3fec5
- Colab
    - https://colab.research.google.com/drive/1LPvJyEON6btMpubYdwySfNs0FuNR9nza?usp=sharing

In [2]:
%load_ext autoreload
%autoreload 2

import sys, os
# module_path = "../../../utils"
# sys.path.append(os.path.abspath(module_path))
# print(os.path.abspath(module_path))


module_path = "../utils"
sys.path.append(os.path.abspath(module_path))
print(os.path.abspath(module_path))

module_path = "../"
sys.path.append(os.path.abspath(module_path))
print(os.path.abspath(module_path))

/root/aws-ai-ml-workshop-kr/genai/aws-gen-ai-kr/20_applications/02_qa_chatbot/01_preprocess_docs/utils
/root/aws-ai-ml-workshop-kr/genai/aws-gen-ai-kr/20_applications/02_qa_chatbot/01_preprocess_docs


# 1. Setup

In [3]:
!pip install llama-index pypdf

In [3]:
import nest_asyncio

nest_asyncio.apply()

from llama_index import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    ServiceContext,
)
from llama_index.evaluation import (
    DatasetGenerator,
    FaithfulnessEvaluator,
    RelevancyEvaluator
)
import time


# 2. Download Data

In [4]:
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/jerryjliu/llama_index/main/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'

--2023-11-18 11:36:22--  https://raw.githubusercontent.com/jerryjliu/llama_index/main/docs/examples/data/10k/uber_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1880483 (1.8M) [application/octet-stream]
Saving to: ‘data/10k/uber_2021.pdf’


2023-11-18 11:36:23 (167 MB/s) - ‘data/10k/uber_2021.pdf’ saved [1880483/1880483]



# 3. Load Data

In [5]:
# Load Data
reader = SimpleDirectoryReader("./data/10k/")
documents = reader.load_data()

In [6]:
print(len(documents))

307


# 4. Question Generation

In [7]:
import json
import boto3
from pprint import pprint
from termcolor import colored
from utils import bedrock, print_ww
from utils.bedrock import bedrock_info

# ---- ⚠️ Un-comment and edit the below lines as needed for your AWS setup ⚠️ ----

# os.environ["AWS_DEFAULT_REGION"] = "<REGION_NAME>"  # E.g. "us-east-1"
# os.environ["AWS_PROFILE"] = "<YOUR_PROFILE>"
# os.environ["BEDROCK_ASSUME_ROLE"] = "<YOUR_ROLE_ARN>"  # E.g. "arn:aws:..."
# os.environ["BEDROCK_ENDPOINT_URL"] = "<YOUR_ENDPOINT_URL>"  # E.g. "https://..."


boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    endpoint_url=os.environ.get("BEDROCK_ENDPOINT_URL", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None),
)

print(colored("\n== FM lists ==", "green"))
pprint(bedrock_info.get_list_fm_models())

Create new client
  Using region: us-east-1
  Using profile: None
boto3 Bedrock client successfully created!
bedrock-runtime(https://bedrock-runtime.us-east-1.amazonaws.com)

== FM lists ==
{'Claude-Instant-V1': 'anthropic.claude-instant-v1',
 'Claude-V1': 'anthropic.claude-v1',
 'Claude-V2': 'anthropic.claude-v2',
 'Command': 'cohere.command-text-v14',
 'Jurassic-2-Mid': 'ai21.j2-mid-v1',
 'Jurassic-2-Ultra': 'ai21.j2-ultra-v1',
 'Llama2-13b-Chat': 'meta.llama2-13b-chat-v1',
 'Titan-Embeddings-G1': 'amazon.titan-embed-text-v1',
 'Titan-Text-G1': 'TBD'}


In [8]:
from langchain.llms.bedrock import Bedrock
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

llm_claude_v2 = Bedrock(
    model_id=bedrock_info.get_model_id(model_name="Claude-V2"),
    client=boto3_bedrock,
    model_kwargs={
        "max_tokens_to_sample": 1024
    },
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

llm_claude_v2

Bedrock(client=<botocore.client.BedrockRuntime object at 0x7f86e7fcf2e0>, model_id='anthropic.claude-v2', model_kwargs={'max_tokens_to_sample': 1024}, streaming=True, callbacks=[<langchain.callbacks.streaming_stdout.StreamingStdOutCallbackHandler object at 0x7f86e7baff40>])

In [9]:
# llm_llama2_chat = Bedrock(
#     model_id=bedrock_info.get_model_id(model_name="Llama2-13b-Chat"),
#     client=boto3_bedrock,
#     model_kwargs={
#         "max_tokens_to_sample": 1024
#     },
#     streaming=True,
#     callbacks=[StreamingStdOutCallbackHandler()]
# )


In [10]:
from llama_index.embeddings import LangchainEmbedding
from langchain.embeddings import BedrockEmbeddings

# create embeddings
bedrock_embedding = BedrockEmbeddings(
    client=boto3_bedrock,
    model_id="amazon.titan-embed-text-v1",
)

# load in Bedrock embedding model from langchain
embed_model = LangchainEmbedding(bedrock_embedding)

#####################################################################
# Service Context
#####################################################################

from llama_index import ServiceContext, set_global_service_context

service_context_claude_v2 = ServiceContext.from_defaults(
  llm=llm_claude_v2,
  embed_model=embed_model,
  system_prompt="You are an AI assistant answering questions."
)

## 1 페이지에 대해서 질문 생성

In [11]:
# To evaluate for each chunk size, we will first generate a set of 40 questions from first 20 pages.
eval_documents = documents[:1]
data_generator = DatasetGenerator.from_documents(eval_documents, service_context=service_context_claude_v2)
eval_questions = data_generator.generate_questions_from_nodes(num = 20)

 Here are 10 potential quiz questions based on the context information provided:

1. What is the name of the company filing this 10-K report?

2. What is the Commission File Number listed for this company?

3. In what state is this company incorporated? 

4. What is the company's I.R.S. Employer Identification Number?

5. What is the company's trading symbol listed on the New York Stock Exchange?

6. What form is this company filing with the SEC? 

7. What act requires the filing of this form?

8. Is this company considered a "well-known seasoned issuer" according to the context provided?

9. What is the address listed for this company's principal executive offices?

10. Is this company required to file reports pursuant to Section 13 or Section 15(d) of the Act according to the context?

# 5. Setting Up Evaluation


In [12]:
# Define Faithfulness and Relevancy Evaluators which are based on GPT-4
faithfulness_claude_v2 = FaithfulnessEvaluator(service_context=service_context_claude_v2)
relevancy_claude_v2 = RelevancyEvaluator(service_context=service_context_claude_v2)


# 6.Response Evaluation For A Chunk Size

In [13]:
# Define function to calculate average response time, average faithfulness and average relevancy metrics for given chunk size
# We use GPT-3.5-Turbo to generate response and GPT-4 to evaluate it.
def evaluate_response_time_and_accuracy(service_context, chunk_size, eval_questions):
    """
    Evaluate the average response time, faithfulness, and relevancy of responses generated by GPT-3.5-turbo for a given chunk size.

    Parameters:
    chunk_size (int): The size of data chunks being processed.

    Returns:
    tuple: A tuple containing the average response time, faithfulness, and relevancy metrics.
    """

    total_response_time = 0
    total_faithfulness = 0
    total_relevancy = 0

    # create vector index
    # llm = OpenAI(model="gpt-3.5-turbo")
    # service_context = ServiceContext.from_defaults(llm=llm, chunk_size=chunk_size)
    vector_index = VectorStoreIndex.from_documents(
        eval_documents, service_context=service_context
    )
    # build query engine
    # By default, similarity_top_k is set to 2. To experiment with different values, pass it as an argument to as_query_engine()
    query_engine = vector_index.as_query_engine()
    num_questions = len(eval_questions)

    # Iterate over each question in eval_questions to compute metrics.
    # While BatchEvalRunner can be used for faster evaluations (see: https://docs.llamaindex.ai/en/latest/examples/evaluation/batch_eval.html),
    # we're using a loop here to specifically measure response time for different chunk sizes.
    for question in eval_questions:
        print("question: ", question)
        start_time = time.time()
        response_vector = query_engine.query(question)
        elapsed_time = time.time() - start_time

        faithfulness_result = faithfulness_claude_v2.evaluate_response(
            response=response_vector
        ).passing

        relevancy_result = relevancy_claude_v2.evaluate_response(
            query=question, response=response_vector
        ).passing

        
        total_response_time += elapsed_time
        total_faithfulness += faithfulness_result
        total_relevancy += relevancy_result

    average_response_time = total_response_time / num_questions
    average_faithfulness = total_faithfulness / num_questions
    average_relevancy = total_relevancy / num_questions

    return average_response_time, average_faithfulness, average_relevancy

In [14]:
# Iterate over different chunk sizes to evaluate the metrics to help fix the chunk size.

# for chunk_size in [128, 256, 512, 1024, 2048]:
for chunk_size in [256]:
  avg_response_time, avg_faithfulness, avg_relevancy = evaluate_response_time_and_accuracy(service_context_claude_v2,
                                                                                           chunk_size, 
                                                                                           eval_questions)
  print(f"Chunk size {chunk_size} - Average Response time: {avg_response_time:.2f}s, Average Faithfulness: {avg_faithfulness:.2f}, Average Relevancy: {avg_relevancy:.2f}")

question:  Here are 10 potential quiz questions based on the context information provided:
 Here are 10 potential quiz questions based on the context:

1. What is the full name of the company filing this Annual Report?

2. What is the fiscal year end date for this Annual Report? 

3. What city and state is Uber Technologies, Inc. headquartered in?

4. What stock exchange does Uber trade its common stock on?

5. What is Uber's SEC file number?

6. What form type is this filing? 

7. What is Uber's IRS Employer Identification Number?

8. Is Uber required to file reports under Section 13 or 15(d) of the Securities Exchange Act?

9. Has Uber filed all reports required under Section 13 or 15(d) of the Securities Exchange Act over the past 12 months?

10. What is Uber's trading symbol on the stock exchange? Here are the answers to the 10 potential quiz questions:

1. YES - Uber Technologies, Inc.
2. YES - December 31, 2021
3. YES - San Francisco, California  
4. YES - New York Stock Exchange