#### Computing Custom Prompt based metrics in AzureML SDK

This sample notebook demonstrates how to compute custom prompt based metrics in AzureML SDK. The notebook demonstrates the following steps:

<pre>
1) Define the custom prompt template
2) Initialize the custom prompt metric parameters
3) Compute the custom prompt based metrics and get the results
</pre>

#### Prerequisites

1) Please install the latest version of azureml-metrics package (text based requirements) using the following command:

``` $ pip install --upgrade azureml-metrics[text] ```

For more details on azureml-metrics package, please refer to the following link: https://aka.ms/azureml-metrics-quick-start

##### 1) Define the custom prompt template

<pre>
a) Retrieve the existing prompt template for any in-built metric using list_prompts API.
b) Modify the prompt template as per our requirement.
</pre>

In [12]:
# a) Retrieve the existing prompt template for gpt_coherence metric.
from azureml.metrics import list_prompts, constants

coherence_prompt = list_prompts(task_type=constants.Tasks.QUESTION_ANSWERING,
                                metric="gpt_coherence")
print(coherence_prompt)

Coherence of an answer is measured by how well all the sentences fit together and sound naturally as a whole. Consider the overall quality of the answer when evaluating coherence. Given the question and answer, score the coherence of answer between one to five stars using the following rating scale:
One star: the answer completely lacks coherence
Two stars: the answer mostly lacks coherence
Three stars: the answer is partially coherent
Four stars: the answer is mostly coherent
Five stars: the answer has perfect coherency

This rating value should always be an integer between 1 and 5. So the rating produced should be 1 or 2 or 3 or 4 or 5.

question: What is your favorite indoor activity and why do you enjoy it?
answer: I like pizza. The sun is shining.
stars: 1

question: Can you describe your favorite movie without giving away any spoilers?
answer: It is a science fiction movie. There are dinosaurs. The actors eat cake. People must stop the villain.
stars: 2

question: What are some b

In [13]:
# b) Modify the existing prompt template for gpt_coherence metric based on our requirements.
custom_coherence_prompt = 'Coherence of an answer is measured by how well all the sentences fit together and sound naturally as a whole. Consider the overall quality of the answer when evaluating coherence. Given the question and answer, score the coherence of answer between one to five stars using the following rating scale:\nOne star: the answer completely lacks coherence\nTwo stars: the answer mostly lacks coherence\nThree stars: the answer is partially coherent\nFour stars: the answer is mostly coherent\nFive stars: the answer has perfect coherency\n\nThis rating value should always be an integer between 1 and 5. So the rating produced should be 1 or 2 or 3 or 4 or 5.\n\nquestion: What is your favorite indoor activity and why do you enjoy it?\nanswer: I like pizza. The sun is shining.\nstars: 1\n\nquestion: Can you describe your favorite movie without giving away any spoilers?\nanswer: It is a science fiction movie. There are dinosaurs. The actors eat cake. People must stop the villain.\nstars: 2\n\nquestion: What are some benefits of regular exercise?\nanswer: Regular exercise improves your mood. A good workout also helps you sleep better. Trees are green.\nstars: 3\n\nquestion: How do you cope with stress in your daily life?\nanswer: I usually go for a walk to clear my head. Listening to music helps me relax as well. Stress is a part of life, but we can manage it through some activities.\nstars: 4\n\nquestion: What can you tell me about climate change and its effects on the environment?\nanswer: Climate change has far-reaching effects on the environment. Rising temperatures result in the melting of polar ice caps, contributing to sea-level rise. Additionally, more frequent and severe weather events, such as hurricanes and heatwaves, can cause disruption to ecosystems and human societies alike.\nstars: 5\n\nquestion: {{question}}\nanswer: {{prediction}}\nstars:'
print(custom_coherence_prompt)

Coherence of an answer is measured by how well all the sentences fit together and sound naturally as a whole. Consider the overall quality of the answer when evaluating coherence. Given the question and answer, score the coherence of answer between one to five stars using the following rating scale:
One star: the answer completely lacks coherence
Two stars: the answer mostly lacks coherence
Three stars: the answer is partially coherent
Four stars: the answer is mostly coherent
Five stars: the answer has perfect coherency

This rating value should always be an integer between 1 and 5. So the rating produced should be 1 or 2 or 3 or 4 or 5.

question: What is your favorite indoor activity and why do you enjoy it?
answer: I like pizza. The sun is shining.
stars: 1

question: Can you describe your favorite movie without giving away any spoilers?
answer: It is a science fiction movie. There are dinosaurs. The actors eat cake. People must stop the villain.
stars: 2

question: What are some b

##### 2) Initialize the custom prompt metric parameters

In [14]:
import os

context = "In 2018, a group of scientists discovered a new type of deep-sea fish that has a transparent head. The fish, named Barreleye, has tubular eyes that can rotate to look either upward or forward, allowing it to see potential prey and predators in the dark depths of the ocean."
question = "What is the name of the deep-sea fish discovered by scientists in 2018, and what is unique about its head?"
coherent_answer = "The deep-sea fish discovered by scientists in 2018 is called Barreleye, and it has a transparent head. The fish has tubular eyes that can rotate to look either upward or forward, allowing it to see potential prey and predators in the dark depths of the ocean."
incoherent_answer = "The scientists who made the discovery in 2018 were actually studying coral reefs, not deep-sea fish. However, they did come across an unusual creature that they couldn't identify. It turned out to be a type of sea cucumber that has a strange, tube-like shape."

# Note: Please replace the values for the following variables with your own values.
openai_params = {
    "api_version" : os.environ["OPENAI_API_VERSION"],
    "api_base" : os.environ["OPENAI_API_BASE"],
    "api_type" : os.environ["OPENAI_API_TYPE"],
    "api_key" : os.environ["OPENAI_API_KEY"],
    "deployment_id" : "<deployment_name>"
}

metric_name = "custom_coherence"
metric_description = "Computing custom coherence prompt based metric"
user_prompt_template = custom_coherence_prompt
input_vars = ["question", "prediction"]
question_list = [question, question]
prediction_list = [coherent_answer, incoherent_answer]
openai_params = openai_params

custom_prompt_config = {
    "input_vars" : input_vars,
    "openai_params" : openai_params,
    "metric_name" : metric_name,
    "metric_description" : metric_description,
    "user_prompt_template" : user_prompt_template,
}

##### 3) Compute the custom prompt based metrics and get the results

In [15]:
from pprint import pprint
from azureml.metrics import AzureMLCustomPromptMetric

custom_coherence_metric = AzureMLCustomPromptMetric(**custom_prompt_config)
result = custom_coherence_metric.compute(question=question_list,
                                         prediction=prediction_list)
pprint("Custom Coherence Result : \n" + str(result))
print("\n")
print("-"*60, "Custom Coherence Metric Details", "-"*60, sep="\n")
pprint(custom_coherence_metric.get_custom_metric_details())

  0%|          | 0/2 [00:00<?, ?it/s]100%|██████████| 2/2 [00:01<00:00,  1.67it/s]

('Custom Coherence Result : \n'
 "{'metrics': {'mean_custom_coherence': 4.0, 'median_custom_coherence': 4.0}, "
 "'artifacts': {'custom_coherence': ['5', '3']}}")


------------------------------------------------------------
Custom Coherence Metric Details
------------------------------------------------------------
{'custom_metric_computation_status': 'SUCCESS',
 'custom_metric_description': 'Computing custom coherence prompt based metric',
 'custom_metric_name': 'custom_coherence',
 'custom_metric_results': {'artifacts': {'custom_coherence': ['5', '3']},
                           'metrics': {'mean_custom_coherence': 4.0,
                                       'median_custom_coherence': 4.0}}}



