## Step 1: Invoke Bedrock Models to get _Inferences_ on a user provided Dataset
---

This notebook does as follows:

1. Generates inferences on a user provided dataset, using Foundation models on Amazon Bedrock

1. Uses [Litellm](https://www.litellm.ai/) as an interface to interact with the Bedrock API

1. Uses `Ray`, which is used to run inferences in an asynchronous manner

1. Records metrics like the `p90, p95` latency, `prompt token counts`, `completion token counts`, and more.

1. Saves all the combined _model responses_ to user questions from the source dataset in a `all_results.csv` that is used later in the _evaluation step_ for the evaluation process.

In [25]:
# import the libraries
import os
import ray
import json
import yaml
import time
import boto3
import logging
import textwrap
import pandas as pd
from pathlib import Path
from functools import reduce
from litellm import completion
from typing import Dict, List, Optional

In [26]:
# set a logger
logging.basicConfig(format='[%(asctime)s] p%(process)s {%(filename)s:%(lineno)d} %(levelname)s - %(message)s', level=logging.INFO)
logger = logging.getLogger(__name__)

In [27]:
# initialize the ray service to run async calls in parallel to bedrock easily
if ray.is_initialized():
    ray.shutdown()
ray.init()

KeyboardInterrupt: 

In [None]:
# global constants
CONFIG_FILE_PATH = "config.yaml"

# read the config yaml file
fpath = CONFIG_FILE_PATH
with open(fpath, 'r') as yaml_in:
    config = yaml.safe_load(yaml_in)
logger.info(f"config read from {fpath} -> {json.dumps(config, indent=2)}")

[2024-06-07 12:36:25,086] p90233 {2927026569.py:8} INFO - config read from config.yaml -> {
  "app_name": "llm-as-a-judge-eval-pipeline",
  "aws": {
    "region": "us-east-1"
  },
  "run_steps": {
    "1_get_inference.ipynb": true,
    "2_get_llm_as_a_judge_eval.ipynb": true
  },
  "pdf_dir_info": {
    "data_dir": "data",
    "dataset_dir": "source_data",
    "dataset_file_name": "data.csv",
    "metrics": "results",
    "llm_as_a_judge_dir": "eval_completions",
    "prompt_dir": "prompt_template",
    "llm_as_a_judge_completions": "llm_as_a_judge_completions.csv",
    "raw_llm_as_a_judge_completions": "raw_llm_responses.csv",
    "llm_as_a_judge_comparisons": "llm_as_a_judge_comparisons.csv",
    "llm_comparisons_txt": "llm_as_a_judge_comparisons.txt",
    "llm_as_a_judge_pick_rate": "llm_as_a_judge_pick_rate.csv",
    "eval_prompt_template": "llama3_eval_prompt.txt",
    "prompt_template": "prompt_template.txt",
    "processed_eval_prompts": "processed_eval_prompts.csv",
    "infere

In [None]:
# initialize all global variables that are used across this notebook hydrated from the `config.yaml` file

# name of your source xlsx/xls/csv file 
FILE_NAME: str = config['pdf_dir_info']['dataset_file_name']
# data directory
DATA_DIR: str = config['pdf_dir_info']['data_dir']
FILE_RELATIVE_PATH: str = os.path.join(config['pdf_dir_info']['dataset_dir'], FILE_NAME)
INPUT_FPATH: str = os.path.join(DATA_DIR, FILE_RELATIVE_PATH)
USER_PROMPT_COL: str = config['dataset_info']['user_question_col']
SYSTEM_PROMPT_COL: str = config['dataset_info']['system_prompt_col']
INFERENCE_PARAMETERS: Dict = config['inference_parameters']
LIST_INPUTS = list(filter(None, [USER_PROMPT_COL, 
                            SYSTEM_PROMPT_COL]))

# result files
ALL_RESULTS_FPATH = os.path.join(DATA_DIR, config['pdf_dir_info']['metrics'])
INFERENCE_LATENCY_SUMMARY_FPATH = os.path.join(ALL_RESULTS_FPATH, config['pdf_dir_info']['inference_latency_summary_fname'])
bedrock_model_ids: List[str] = config['bedrock_fms_to_test']

### Get Inference for the given dataset
---

This portion of the notebook gets inference using `Ray` (which is used to handle asynchronous calls to `Litellm`) to get inferences from the user questions in the given dataset

In [28]:
def generate_task_inference(model_id: str, 
                            user_prompt: str, 
                            system_prompt: str) -> Dict:
    """
    This function takes in a dictionary (which contains information on the user data and prompts) 
    to generate inference using a bedrock model id, and returns a dictionary containing the model 
    completion, and latency (in seconds).
    """
    print(f"user_prompt: {user_prompt}")
    print(f"system_prompt: {system_prompt}")
    # represents the service name
    service_name: str = "bedrock"
    inference_parameters = config['inference_parameters']
    temperature = inference_parameters.get('temperature', 0.1)
    caching = inference_parameters.get('caching', False)
    max_tokens = inference_parameters.get("max_tokens", 500)
    # represents creating the bedrock model to invoke the litellm api for response for titan, llama and claude
    bedrock_model: str = f"{service_name}/{model_id}"
    # represents the current aws region
    aws_region = boto3.Session().region_name
    # initialize the response dict
    ret = dict(user_prompt=user_prompt,
               system_prompt=system_prompt,
               completion=None,
               model_id=model_id,
               time_taken_in_seconds=None,
               prompt_token_count=None,
               completion_token_count=None,
               exception=None)
    # custom messages formatting for when the user/system roles are given together
    if config['dataset_info']['system_prompt_col'] is not None:
        messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt},
        ]
        print(f"messages: {messages}")
    else:
        body = ret['user_prompt']
        messages=[{ "content": body, "role": "user"}]
    # set the env var for aws_region
    os.environ["AWS_REGION_NAME"] = aws_region 
    try:
        print(f"Invoking {bedrock_model}......")
        response = completion(model=bedrock_model,
                              messages=messages,
                              temperature=temperature,
                              max_tokens=max_tokens,
                              caching=caching)
        # iterate through the entire model response
        for idx, choice in enumerate(response.choices):
            print(f"choice {idx+1} of {len(response.choices)} ")
            # extract the message and the message's content from litellm
            if choice.message and choice.message.content:
                # extract the response from the dict
                ret["completion"] = choice.message.content.strip()
        # Commenting out the code below that records the number of input and output tokens.
        # Extract number of input and completion prompt tokens (this is the same structure for embeddings and text generation models on Amazon Bedrock)
        ret['prompt_token_count'] = response.usage.prompt_tokens
        ret['completion_token_count'] = response.usage.completion_tokens
        # Extract latency in seconds
        latency_ms = response._response_ms
        ret['time_taken_in_seconds']  = latency_ms / 1000
    except Exception as e:
        print(f"Exception occurred during invoking {model_id}, exception={e}")
        ret['exception'] = e
    return ret

In [29]:
@ray.remote
def async_generate_task_inference(input: Dict, model_id: str) -> Dict:
    resp = generate_task_inference(model_id, input.get('user_prompt'), input.get('system_prompt'))
    resp_this_model = {"model_id": model_id,
                       f"{model_id}-response": resp['completion'],
                       f"{model_id}-time_taken_in_seconds": resp['time_taken_in_seconds'],
                       f"{model_id}-prompt_token_count": resp['prompt_token_count'],
                       f"{model_id}-completion_token_count": resp['completion_token_count'],
                       f"{model_id}-exception": resp['exception']}
    return input | resp_this_model

In [30]:
logger.info(f"File name to be processed: {INPUT_FPATH}")
data_file = Path(INPUT_FPATH)
if data_file.suffix == '.csv':
    logger.info(f"processing the csv file: {data_file}")
    original_eval_df = pd.read_csv(data_file)
elif data_file.suffix in ['.xls', '.xlsx']:
    logger.info(f"processing the xls/xlsx file: {data_file}")
    original_eval_df = pd.read_excel(data_file)
else:
    raise ValueError(f"Unsupported file format: {data_file.suffix}")
logger.info(f"input data frame shape is {original_eval_df.shape}")
# drop the columns that have all 'NaN' values
original_eval_df = original_eval_df.dropna(axis=1, how='all')
original_eval_df.head(10)

[2024-06-07 12:44:26,234] p90233 {3661773823.py:1} INFO - File name to be processed: data/source_data/data.csv
[2024-06-07 12:44:26,236] p90233 {3661773823.py:4} INFO - processing the csv file: data/source_data/data.csv
[2024-06-07 12:44:26,254] p90233 {3661773823.py:11} INFO - input data frame shape is (10, 2)


Unnamed: 0,user_input,model_1
0,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...
1,Human: You are an assistant for question-answe...,The Schrödinger equation is a fundamental equa...
2,Human: You are an assistant for question-answe...,The greenhouse effect is a natural process tha...
3,Human: You are an assistant for question-answe...,"When light shines on a metal, electrons can be..."
4,Human: You are an assistant for question-answe...,"Modern atomic models, based on quantum mechani..."
5,Human: You are an assistant for question-answe...,A catalyst is a substance that can be added to...
6,Human: You are an assistant for question-answe...,The second law of thermodynamics states that t...
7,Human: You are an assistant for question-answe...,The phenomenon of nuclear fission. Fission occ...
8,Human: You are an assistant for question-answe...,Classical mechanics describes the physics of m...
9,Human: You are an assistant for question-answe...,If you touch a container that holds an endothe...


In [31]:
original_eval_list = json.loads(original_eval_df.to_json(orient='records'))
original_eval_list

[{'user_input': 'Human: You are an assistant for question-answering tasks. Use the following pieces of retrieved context in the section demarcated by "```" to answer the question. If you don\'t know the answer just say that you don\'t know. Use three sentences maximum and keep the answer concise.\n\n```\nThe Heisenberg uncertainty principle is a fundamental principle in quantum mechanics that states that there is a fundamental limit to the precision with which certain pairs of physical properties of a particle, such as position and momentum, can be known simultaneously. This principle arises from the wave-particle duality of quantum particles and has profound implications for our understanding of the behavior of matter at the atomic and subatomic scales\n```\n\nQuestion: What is the Heisenberg uncertainty principle?\n\nAssistant:',
  'model_1': 'The Heisenberg uncertainty principle states that the position and momentum of a particle cannot be measured precisely at the same time, reflec

In [32]:
# list of the bedrock model ids that are used in generating inferences
bedrock_model_ids: List[str] =[d['model_id'] for d in config['bedrock_fms_to_test']]
bedrock_model_ids

['anthropic.claude-3-haiku-20240307-v1:0',
 'anthropic.claude-3-sonnet-20240229-v1:0']

### Run the inferences to get model responses in parallel using `Ray`

In [33]:
erroneous_count: int = 0
resp_list = []
n: int = config['parallel_inference_count']
st_overall = time.perf_counter()
# Iterate over each bedrock model ID
for model_id in bedrock_model_ids:
    logger.info(f"going to get inference from model={model_id}")
    list_of_lists = [original_eval_list[i * n:(i + 1) * n] for i in range((len(original_eval_list) + n - 1) // n )]
    st = time.perf_counter()
    for idx, sublist in enumerate(list_of_lists):
        logger.info(f"processing sublist={idx+1}/{len(list_of_lists)} for model_id={model_id}")
        for input in sublist:
            print(f"input logged: {input}")
            try:
                input_dict = dict(user_prompt=input.get(USER_PROMPT_COL), system_prompt=input.get(SYSTEM_PROMPT_COL))
                result = ray.get(async_generate_task_inference.remote(input_dict, model_id))
                resp_list.append(result)
            except Exception as e:
                logger.error(f"Error processing input: {input} for model_id={model_id}, error: {e}")
                erroneous_count += 1

    elapsed_time = time.perf_counter() - st
    logger.info(f"total time taken for {len(original_eval_list)} with model={model_id} is {elapsed_time:0.2f}")
elapsed_time = time.perf_counter() - st_overall
logger.info(f"total time taken for {len(original_eval_list)} with models={bedrock_model_ids} is {elapsed_time:0.2f}")
logger.info(f"total erroneous count: {erroneous_count}")

[2024-06-07 12:44:29,925] p90233 {1988596990.py:7} INFO - going to get inference from model=anthropic.claude-3-haiku-20240307-v1:0
[2024-06-07 12:44:29,949] p90233 {1988596990.py:11} INFO - processing sublist=1/2 for model_id=anthropic.claude-3-haiku-20240307-v1:0


input logged: {'user_input': 'Human: You are an assistant for question-answering tasks. Use the following pieces of retrieved context in the section demarcated by "```" to answer the question. If you don\'t know the answer just say that you don\'t know. Use three sentences maximum and keep the answer concise.\n\n```\nThe Heisenberg uncertainty principle is a fundamental principle in quantum mechanics that states that there is a fundamental limit to the precision with which certain pairs of physical properties of a particle, such as position and momentum, can be known simultaneously. This principle arises from the wave-particle duality of quantum particles and has profound implications for our understanding of the behavior of matter at the atomic and subatomic scales\n```\n\nQuestion: What is the Heisenberg uncertainty principle?\n\nAssistant:', 'model_1': 'The Heisenberg uncertainty principle states that the position and momentum of a particle cannot be measured precisely at the same t

2024-06-07 12:44:33,504	INFO worker.py:1749 -- Started a local Ray instance.


[36m(async_generate_task_inference pid=90852)[0m user_prompt: Human: You are an assistant for question-answering tasks. Use the following pieces of retrieved context in the section demarcated by "```" to answer the question. If you don't know the answer just say that you don't know. Use three sentences maximum and keep the answer concise.
[36m(async_generate_task_inference pid=90852)[0m 
[36m(async_generate_task_inference pid=90852)[0m ```
[36m(async_generate_task_inference pid=90852)[0m The Heisenberg uncertainty principle is a fundamental principle in quantum mechanics that states that there is a fundamental limit to the precision with which certain pairs of physical properties of a particle, such as position and momentum, can be known simultaneously. This principle arises from the wave-particle duality of quantum particles and has profound implications for our understanding of the behavior of matter at the atomic and subatomic scales
[36m(async_generate_task_inference pid=9

[33m(raylet)[0m [2024-06-07 12:44:42,356 E 90836 5026861] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-06-07_12-44-30_194135_90233 is over 95% full, available space: 54538240; capacity: 245107195904. Object creation will fail if spilling is required.


input logged: {'user_input': 'Human: You are an assistant for question-answering tasks. Use the following pieces of retrieved context in the section demarcated by "```" to answer the question. If you don\'t know the answer just say that you don\'t know. Use three sentences maximum and keep the answer concise.\n\n```\nThe structure of the atom has been a fundamental area of study in physics and chemistry, and our understanding of it has evolved over time through various experiments and theoretical models.\n\nIn the early 20th century, the plum pudding model proposed by J.J. Thomson suggested that atoms were composed of a uniform positive charge with negatively charged electrons embedded within it. However, this model was challenged by the famous Rutherford gold foil experiment in 1911, where Hans Geiger and Ernest Marsden, under the guidance of Ernest Rutherford, bombarded a thin gold foil with alpha particles.\n\nThe results of the Rutherford experiment showed that most of the alpha pa

[2024-06-07 12:44:44,707] p90233 {1988596990.py:11} INFO - processing sublist=2/2 for model_id=anthropic.claude-3-haiku-20240307-v1:0


input logged: {'user_input': 'Human: You are an assistant for question-answering tasks. Use the following pieces of retrieved context in the section demarcated by "```" to answer the question. If you don\'t know the answer just say that you don\'t know. Use three sentences maximum and keep the answer concise.\n\n```\nCatalysts are substances that increase the rate of a chemical reaction without being consumed or altered in the process. They play a crucial role in many chemical processes by providing an alternative pathway with a lower activation energy, which is the minimum energy required for a reaction to occur.\n\nThe role of catalysts can be understood by considering the energy profile of a chemical reaction. In an uncatalyzed reaction, reactants must overcome a high activation energy barrier to form the products. This energy barrier is determined by the strength of the bonds that need to be broken and formed during the reaction.\n\nCatalysts provide an alternative pathway with a l

KeyboardInterrupt: 

[33m(raylet)[0m [2024-06-07 12:44:52,360 E 90836 5026861] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-06-07_12-44-30_194135_90233 is over 95% full, available space: 54067200; capacity: 245107195904. Object creation will fail if spilling is required.


In [None]:
# view some responses generated
resp_list[:5]

[{'user_input': 'user_input',
  'model_id': 'anthropic.claude-3-haiku-20240307-v1:0',
  'anthropic.claude-3-haiku-20240307-v1:0-response': 'I\'m sorry, but I don\'t have enough context to provide a meaningful response to "user_input". Could you please provide more details about what you\'re asking or what kind of information you\'re looking for? I\'d be happy to try to assist you further if you can give me some more context.',
  'anthropic.claude-3-haiku-20240307-v1:0-time_taken_in_seconds': 1.888985,
  'anthropic.claude-3-haiku-20240307-v1:0-prompt_token_count': 10,
  'anthropic.claude-3-haiku-20240307-v1:0-completion_token_count': 65,
  'anthropic.claude-3-haiku-20240307-v1:0-exception': None},
 {'user_input': 'user_input',
  'model_id': 'anthropic.claude-3-haiku-20240307-v1:0',
  'anthropic.claude-3-haiku-20240307-v1:0-response': 'I\'m sorry, but I don\'t have enough context to understand what "user_input" means. Could you please provide more details about what you\'re asking? I\'d 

[33m(raylet)[0m [2024-06-07 12:31:32,987 E 89336 5008502] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-06-07_12-27-09_695272_89326 is over 95% full, available space: 153829376; capacity: 245107195904. Object creation will fail if spilling is required.
[33m(raylet)[0m [2024-06-07 12:31:42,994 E 89336 5008502] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-06-07_12-27-09_695272_89326 is over 95% full, available space: 161935360; capacity: 245107195904. Object creation will fail if spilling is required.
[33m(raylet)[0m [2024-06-07 12:31:53,085 E 89336 5008502] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-06-07_12-27-09_695272_89326 is over 95% full, available space: 161984512; capacity: 245107195904. Object creation will fail if spilling is required.
[33m(raylet)[0m [2024-06-07 12:32:03,104 E 89336 5008502] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-06-07_12-27-09_695272_89326 is over 95% full, available space: 161779712; cap

In [None]:
df_list = []
for model_id in bedrock_model_ids:
    df_list.append(pd.DataFrame([r for r in resp_list if r['model_id'] == model_id]).drop(['model_id'], axis=1))    
from functools import reduce
on_list = list(filter(None, [config['dataset_info']['user_question_col'], 
                            config['dataset_info']['system_prompt_col'], 
                            config['dataset_info']['pre_existing_response_col']]))
logger.info(f"on_list: {on_list}")
try:
    # if the system prompt is separately provided, merge on that column too else, just use the user
    # column for the merge
    df_resp = reduce(lambda x, y: pd.merge(x, y, on=on_list), 
                    df_list)
except Exception as e:
    logger.error(f"df was not merged: {e}")
logger.info(f"shape of response data frame={df_resp.shape}")

[2024-06-07 12:05:06,269] p86800 {3398647774.py:8} INFO - on_list: ['user_input', 'model_1']
[2024-06-07 12:05:06,286] p86800 {3398647774.py:15} ERROR - df was not merged: 'user_input'


NameError: name 'df_resp' is not defined

[33m(raylet)[0m [2024-06-07 12:05:08,737 E 86810 4977099] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-06-07_11-56-44_145662_86800 is over 95% full, available space: 110972928; capacity: 245107195904. Object creation will fail if spilling is required.
[33m(raylet)[0m [2024-06-07 12:05:18,740 E 86810 4977099] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-06-07_11-56-44_145662_86800 is over 95% full, available space: 111243264; capacity: 245107195904. Object creation will fail if spilling is required.
[33m(raylet)[0m [2024-06-07 12:05:28,746 E 86810 4977099] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-06-07_11-56-44_145662_86800 is over 95% full, available space: 111067136; capacity: 245107195904. Object creation will fail if spilling is required.
[33m(raylet)[0m [2024-06-07 12:05:38,784 E 86810 4977099] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-06-07_11-56-44_145662_86800 is over 95% full, available space: 110862336; cap

In [None]:
# view the data frame
df_resp.head(10)

Unnamed: 0,user_input,model_1,anthropic.claude-3-haiku-20240307-v1:0-response,anthropic.claude-3-haiku-20240307-v1:0-time_taken_in_seconds,anthropic.claude-3-haiku-20240307-v1:0-prompt_token_count,anthropic.claude-3-haiku-20240307-v1:0-completion_token_count,anthropic.claude-3-haiku-20240307-v1:0-exception,anthropic.claude-3-sonnet-20240229-v1:0-response,anthropic.claude-3-sonnet-20240229-v1:0-time_taken_in_seconds,anthropic.claude-3-sonnet-20240229-v1:0-prompt_token_count,anthropic.claude-3-sonnet-20240229-v1:0-completion_token_count,anthropic.claude-3-sonnet-20240229-v1:0-exception
0,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The Heisenberg uncertainty principle is a fund...,1.660078,170,77,
1,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The Schrödinger equation is a fundamental equa...,2.299184,661,117,
2,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The greenhouse effect is a natural process whe...,2.246156,604,99,
3,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The photoelectric effect is a phenomenon where...,2.467869,588,120,
4,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The structure of the atom was determined throu...,2.117565,582,129,
5,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The role of catalysts in chemical reactions is...,2.80695,544,85,
6,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The second law of thermodynamics states that i...,3.982463,599,107,
7,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The main difference between nuclear fission an...,1.947344,472,75,
8,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The main differences between classical mechani...,5.042923,702,94,
9,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The main difference between endothermic and ex...,2.499933,679,92,


[33m(raylet)[0m [2024-06-07 11:33:21,395 E 85036 4952511] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-06-07_11-32-08_807355_85026 is over 95% full, available space: 52039680; capacity: 245107195904. Object creation will fail if spilling is required.


In [None]:
# get the original/target responses if any and merge it with the current df
try: 
    if df_resp is not None and config['dataset_info']['pre_existing_response_col'] is not None:
        if config['dataset_info']['system_prompt_col'] is not None:
            df_resp_all = pd.merge(left=df_resp, right=original_eval_df, how="left",
                                left_on=[config['dataset_info']['user_question_col'], 
                                            config['dataset_info']['system_prompt_col'], 
                                            config['dataset_info']['pre_existing_response_col']], 
                                right_on=[config['dataset_info']['user_question_col'], 
                                            config['dataset_info']['system_prompt_col'], 
                                            config['dataset_info']['pre_existing_response_col']])
        else:
            df_resp_all = pd.merge(left=df_resp, right=original_eval_df, how="left", 
                                left_on=[config['dataset_info']['user_question_col'], 
                                            config['dataset_info']['pre_existing_response_col']], 
                                right_on=[config['dataset_info']['user_question_col'], 
                                            config['dataset_info']['pre_existing_response_col']])
except Exception as e:
    logger.error(f"Could not perform the merge with the original data frame: {e}")


In [None]:
# view the current data in the df
df_resp_all.head(10)

Unnamed: 0,user_input,model_1,anthropic.claude-3-haiku-20240307-v1:0-response,anthropic.claude-3-haiku-20240307-v1:0-time_taken_in_seconds,anthropic.claude-3-haiku-20240307-v1:0-prompt_token_count,anthropic.claude-3-haiku-20240307-v1:0-completion_token_count,anthropic.claude-3-haiku-20240307-v1:0-exception,anthropic.claude-3-sonnet-20240229-v1:0-response,anthropic.claude-3-sonnet-20240229-v1:0-time_taken_in_seconds,anthropic.claude-3-sonnet-20240229-v1:0-prompt_token_count,anthropic.claude-3-sonnet-20240229-v1:0-completion_token_count,anthropic.claude-3-sonnet-20240229-v1:0-exception
0,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The Heisenberg uncertainty principle is a fund...,1.660078,170,77,
1,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The Schrödinger equation is a fundamental equa...,2.299184,661,117,
2,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The greenhouse effect is a natural process whe...,2.246156,604,99,
3,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The photoelectric effect is a phenomenon where...,2.467869,588,120,
4,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The structure of the atom was determined throu...,2.117565,582,129,
5,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The role of catalysts in chemical reactions is...,2.80695,544,85,
6,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The second law of thermodynamics states that i...,3.982463,599,107,
7,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The main difference between nuclear fission an...,1.947344,472,75,
8,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The main differences between classical mechani...,5.042923,702,94,
9,Human: You are an assistant for question-answe...,The Heisenberg uncertainty principle states th...,The Heisenberg uncertainty principle states th...,1.189661,170,77,,The main difference between endothermic and ex...,2.499933,679,92,


### Record the `p50` and `p95` inference latencies in a `txt` file

In [None]:
time_taken_in_seconds_cols = [c for c in df_resp_all.columns if 'time_taken_in_seconds' in c]
Latency_cols = [c for c in df_resp_all.columns if 'Latency ' in c]
all_latency_cols_of_interest = time_taken_in_seconds_cols + Latency_cols
summary = ""
for c in all_latency_cols_of_interest:
    quantiles = list(round(df_resp_all[c].quantile([0.5, 0.95]), 2))
    s = f"[p50, p95] for {c}={quantiles}\n"
    summary += s
    logger.info(s)
Path(INFERENCE_LATENCY_SUMMARY_FPATH).write_text(summary)

[2024-06-07 11:33:27,588] p85026 {4188724700.py:9} INFO - [p50, p95] for anthropic.claude-3-haiku-20240307-v1:0-time_taken_in_seconds=[1.36, 1.91]

[2024-06-07 11:33:27,589] p85026 {4188724700.py:9} INFO - [p50, p95] for anthropic.claude-3-sonnet-20240229-v1:0-time_taken_in_seconds=[2.38, 5.04]



179

### Upload the overall results to a `results.csv` file

In [None]:
def wrap_text(df, width):
    """
    This function wraps the text in a specific cell to a given
    width
    """
    for col in df.columns:
        df[col] = df[col].apply(lambda x: '\n'.join(textwrap.wrap(str(x), width)))
    return df

In [None]:
df_resp_all = wrap_text(df_resp_all, width=40)
cols = list(df_resp_all.columns)
user_input_index = df_resp_all.columns.get_loc(config['dataset_info']['user_question_col'])
response_cols = [col for col in df_resp_all.columns if col.endswith('-response')]
for col in response_cols:
    cols.pop(cols.index(col))
# Reinsert the response columns right after the user_input column
for col in reversed(response_cols):
    cols.insert(user_input_index + 1, col)
df_resp_all = df_resp_all[cols]

# Save the DataFrame to a CSV file
os.makedirs(ALL_RESULTS_FPATH, exist_ok=True)
all_results_csv_fpath: str = os.path.join(ALL_RESULTS_FPATH, 
                                          config['pdf_dir_info']['all_results_file_name'])
df_resp_all.to_csv(all_results_csv_fpath, index=False)
df_resp_all.head(10)

Unnamed: 0,user_input,anthropic.claude-3-haiku-20240307-v1:0-response,anthropic.claude-3-sonnet-20240229-v1:0-response,model_1,anthropic.claude-3-haiku-20240307-v1:0-time_taken_in_seconds,anthropic.claude-3-haiku-20240307-v1:0-prompt_token_count,anthropic.claude-3-haiku-20240307-v1:0-completion_token_count,anthropic.claude-3-haiku-20240307-v1:0-exception,anthropic.claude-3-sonnet-20240229-v1:0-time_taken_in_seconds,anthropic.claude-3-sonnet-20240229-v1:0-prompt_token_count,anthropic.claude-3-sonnet-20240229-v1:0-completion_token_count,anthropic.claude-3-sonnet-20240229-v1:0-exception
0,Human: You are an assistant for\nquestion-answ...,The Heisenberg uncertainty principle\nstates t...,The Heisenberg uncertainty principle is\na fun...,The Heisenberg uncertainty principle\nstates t...,1.189661,170,77,,1.660078,170,77,
1,Human: You are an assistant for\nquestion-answ...,The Heisenberg uncertainty principle\nstates t...,The Schrödinger equation is a\nfundamental equ...,The Heisenberg uncertainty principle\nstates t...,1.189661,170,77,,2.299184,661,117,
2,Human: You are an assistant for\nquestion-answ...,The Heisenberg uncertainty principle\nstates t...,The greenhouse effect is a natural\nprocess wh...,The Heisenberg uncertainty principle\nstates t...,1.189661,170,77,,2.246156,604,99,
3,Human: You are an assistant for\nquestion-answ...,The Heisenberg uncertainty principle\nstates t...,The photoelectric effect is a phenomenon\nwher...,The Heisenberg uncertainty principle\nstates t...,1.189661,170,77,,2.467869,588,120,
4,Human: You are an assistant for\nquestion-answ...,The Heisenberg uncertainty principle\nstates t...,The structure of the atom was determined\nthro...,The Heisenberg uncertainty principle\nstates t...,1.189661,170,77,,2.117565,582,129,
5,Human: You are an assistant for\nquestion-answ...,The Heisenberg uncertainty principle\nstates t...,The role of catalysts in chemical\nreactions i...,The Heisenberg uncertainty principle\nstates t...,1.189661,170,77,,2.80695,544,85,
6,Human: You are an assistant for\nquestion-answ...,The Heisenberg uncertainty principle\nstates t...,The second law of thermodynamics states\nthat ...,The Heisenberg uncertainty principle\nstates t...,1.189661,170,77,,3.982463,599,107,
7,Human: You are an assistant for\nquestion-answ...,The Heisenberg uncertainty principle\nstates t...,The main difference between nuclear\nfission a...,The Heisenberg uncertainty principle\nstates t...,1.189661,170,77,,1.947344,472,75,
8,Human: You are an assistant for\nquestion-answ...,The Heisenberg uncertainty principle\nstates t...,The main differences between classical\nmechan...,The Heisenberg uncertainty principle\nstates t...,1.189661,170,77,,5.042923,702,94,
9,Human: You are an assistant for\nquestion-answ...,The Heisenberg uncertainty principle\nstates t...,The main difference between endothermic\nand e...,The Heisenberg uncertainty principle\nstates t...,1.189661,170,77,,2.499933,679,92,


[33m(raylet)[0m [2024-06-07 11:33:31,396 E 85036 4952511] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-06-07_11-32-08_807355_85026 is over 95% full, available space: 50958336; capacity: 245107195904. Object creation will fail if spilling is required.
[33m(raylet)[0m [2024-06-07 11:33:41,396 E 85036 4952511] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-06-07_11-32-08_807355_85026 is over 95% full, available space: 51552256; capacity: 245107195904. Object creation will fail if spilling is required.
[33m(raylet)[0m [*** LOG ERROR #0001 ***] [2024-06-07 11:33:51] [log.event.reporter/tmp/ray/session_2024-06-07_11-32-08_807355_85026/logs/events/event_RAYLET.log] Failed flush to file /tmp/ray/session_2024-06-07_11-32-08_807355_85026/logs/events/event_RAYLET.log: No space left on device
[33m(raylet)[0m [2024-06-07 11:33:51,498 E 85036 4952511] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2024-06-07_11-32-08_807355_85026 is over 95% full, available s