## Evaluating Claude v2 on factual knowledge using Bedrock

In this notebook, we use the FMEval library to evaluate the Claude v2 model (available through Bedrock) on factual knowledge.

Environment:
- Base Python 3.0 kernel
- Studio Notebook instance type: ml.c5.xlarge

### Setup

In [None]:
# Install the fmeval package

!rm -Rf ~/.cache/pip/*
!pip3 install fmeval --upgrade-strategy only-if-needed --force-reinstall

In [None]:
import boto3
import os

# Bedrock clients for model inference
bedrock = boto3.client(service_name='bedrock')
bedrock_runtime = boto3.client(service_name='bedrock-runtime')

In [None]:
import glob

# Check that the dataset file to be used by the evaluation is present
if not glob.glob("trex_sample.jsonl"):
    print("ERROR - please make sure the file, trex_sample.jsonl, exists.")

### Sample Bedrock model invocation

In [None]:
import json

# We use Claude v2 in this example notebook.
# See https://docs.anthropic.com/claude/reference/claude-on-amazon-bedrock#list-available-models
# for instructions on how to list the model IDs for all available Claude model variants.
model_id = 'anthropic.claude-v2'
accept = "application/json"
contentType = "application/json"


# `prompt_data` is structured in the format that the Claude model expects, as documented here:
# https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-claude.html#model-parameters-claude-request-body
prompt_data = """Human: Who is Barack Obama?

Assistant:
"""

# For more details on parameters that can be included in `body` (such as "max_tokens_to_sample"),
# see https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-claude.html#model-parameters-claude-request-body
body = json.dumps({"prompt": prompt_data, "max_tokens_to_sample": 500})

# Invoke the model
response = bedrock_runtime.invoke_model(
    body=body, modelId=model_id, accept=accept, contentType=contentType
)

# Parse the invocation response
response_body = json.loads(response.get("body").read())
print(response_body.get("completion"))

### FMEval Setup

In [None]:
from fmeval.data_loaders.data_config import DataConfig
from fmeval.model_runners.bedrock_model_runner import BedrockModelRunner
from fmeval.constants import MIME_TYPE_JSONLINES
from fmeval.eval_algorithms.factual_knowledge import FactualKnowledge, FactualKnowledgeConfig

#### Data Config Setup

Below, we create a DataConfig for the local dataset file, trex_sample.jsonl.
- `dataset_name` is just an identifier for your own reference
- `dataset_uri` is either a local path to a file or an S3 URI
- `dataset_mime_type` is the MIME type of the dataset. Currently, JSON and JSON Lines are supported.
- `model_input_location` and `target_output_location` are JMESPath queries used to find the model inputs and target outputs within the dataset. The values that you specify here depend on the structure of the dataset itself. Take a look at trex_sample.jsonl to see where "question" and "answers" show up.

In [None]:
config = DataConfig(
    dataset_name="trex_sample",
    dataset_uri="trex_sample.jsonl",
    dataset_mime_type=MIME_TYPE_JSONLINES,
    model_input_location="question",
    target_output_location="answers"
)

#### Model Runner Setup

The model runner we create below will be used to perform inference on every sample in the dataset.

In [None]:
bedrock_model_runner = BedrockModelRunner(
    model_id=model_id,
    output='completion',
    content_template='{"prompt": $prompt, "max_tokens_to_sample": 500}'
)

### Run Evaluation

In [None]:
eval_algo = FactualKnowledge(FactualKnowledgeConfig(target_output_delimiter="<OR>"))
eval_output = eval_algo.evaluate(model=bedrock_model_runner, dataset_config=config, 
                                 prompt_template="Human: $feature\n\nAssistant:\n", save=True)

#### Parse Evaluation Results

In [None]:
# Pretty-print the evaluation output (notice the score).
import json
print(json.dumps(eval_output, default=vars, indent=4))

In [None]:
# Create a Pandas DataFrame to visualize the results
import pandas as pd

data = []

# We obtain the path to the results file from "output_path" in the cell above
with open("/tmp/eval_results/factual_knowledge_trex_sample.jsonl", "r") as file:
    for line in file:
        data.append(json.loads(line))
df = pd.DataFrame(data)
df['eval_algo'] = df['scores'].apply(lambda x: x[0]['name'])
df['eval_score'] = df['scores'].apply(lambda x: x[0]['value'])
df