## Evaluating the Answers

### 1. Setup the Environment

In [None]:
ds_name = "output_data_1000"
doc_path = "../../sample_data/"
ds_path = f"../../{ds_name}"
print("Using dataset: " + ds_name)

raft_arrow_file = f"{ds_path}/data-00000-of-00001.arrow"
dataset_path = f"{ds_path}-files/{ds_name}-full.jsonl"
dataset_path_hf = f"{ds_path}-files/{ds_name}-hf.full.jsonl"

dataset_path_hf_train = f"{ds_path}-files/{ds_name}-hf.train.jsonl"
dataset_path_hf_valid = f"{ds_path}-files/{ds_name}-hf.valid.jsonl"
dataset_path_hf_eval = f"{ds_path}-files/{ds_name}-hf.eval.jsonl"

dataset_path_ft_train = f"{ds_path}-files/{ds_name}-ft.train.jsonl"
dataset_path_ft_train_filtered = f"{ds_path}-files/{ds_name}-ft.train.filtered.jsonl"
dataset_path_ft_valid = f"{ds_path}-files/{ds_name}-ft.valid.jsonl"
dataset_path_ft_valid_filtered = f"{ds_path}-files/{ds_name}-ft.valid.filtered.jsonl"
dataset_path_ft_eval = f"{ds_path}-files/{ds_name}-ft.eval.jsonl"
dataset_path_ft_eval_5 = f"{ds_path}-files/{ds_name}-ft.eval-5.jsonl"
dataset_path_ft_answer = f"{ds_path}-files/{ds_name}-ft.answer.jsonl"
dataset_path_local_results = f"{ds_path}-files/{ds_name}.pf-eval-local-results.jsonl"


### 2. Set the Model Name for Evaluation

In [None]:
model_name="gpt-4-0613.ft-88d65450f4204c35b7857e330659e247"
deployment_name="gpt-4-0613-ft-88d65450f4204c35b7857e330659e247"

### 3a. Generate Answers from the Fine Tuned Chat Completions Model
This updates the generated evaluation file which then can be used in AI Studio for quality checks. 
This step will use the fine-tuned model specified above and process all of the questions and save all of the answers in the evaluation data.
This file can then be uploaded to the Evaluation tool in Azure AI Studio to do the evaluation remotely.

The system prompt will be loaded from the template file based on the key specified. The default is "gpt" and is loaded from gpt_template.txt.

To generate a random subset of questions, pass the count parameter to the script. The random set will be limited to questions that do not start with a #.

The following environment variables must be set:
- EVAL_AZURE_OPENAI_ENDPOINT
- EVAL_AZURE_OPENAI_API_KEY

In [None]:
! python3 ../answer.py \
--input $dataset_path_ft_eval \
--output $dataset_path_ft_answer \
--model $model_name \
--deployment $deployment_name \
--templates ./ \
--count 50

### 4a. Execute the Azure AI Studio Evaluation Tool locally with the answers file: Run the pfeval-local.py
This will run the prompt flow evaluation using the answer file generated in the previous step. This is an alternative to uploading the file to Azure AI Studio.

The following environment variables must be set:
- SCORE_AZURE_OPENAI_ENDPOINT
- SCORE_AZURE_OPENAI_API_KEY
- SCORE_AZURE_OPENAI_DEPLOYMENT
- GROUNDEDNESS_SUB_ID
- GROUNDEDNESS_GROUP
- GROUNDEDNESS_PROJECT_NAME
- REPORT_SUB_ID
- REPORT_GROUP
- REPORT_PROJECT_NAME

In [None]:
! python3 ../pfeval-local.py \
--input $dataset_path_ft_answer \
--output $dataset_path_local_results \
--model_source $model_name 

### 3b. Execute the Azure AI Studio Evalutaion Tool while new target answers for a Chat Completion Model: Run pfeval-chat.py
This will run the prompt flow evaluation while retrieving a new answer for each question in the evaluation file.

The following environment variables must be set:
- EVAL_AZURE_OPENAI_ENDPOINT
- EVAL_AZURE_OPENAI_API_KEY
- SCORE_AZURE_OPENAI_ENDPOINT
- SCORE_AZURE_OPENAI_API_KEY
- SCORE_OPENAI_API_VERSION
- SCORE_AZURE_OPENAI_DEPLOYMENT
- GROUNDEDNESS_SUB_ID
- GROUNDEDNESS_GROUP
- GROUNDEDNESS_PROJECT_NAME
- REPORT_SUB_ID
- REPORT_GROUP
- REPORT_PROJECT_NAME

In [None]:
! python3 ../pfeval-chat.py \
--input $dataset_path_ft_eval \
--output $dataset_path_local_results \
--score-model $score_model_name \
--model $model_name \
--deployment $deployment_name \
--templates "../"

### 3c. For Completions Models: Run the eval.py to generate an answers and run the comparison locally.
This will run the evaluation locally for a Completions model.

The following environment variables must be set:
- EVAL_AZURE_OPENAI_ENDPOINT
- EVAL_AZURE_OPENAI_API_KEY

In [None]:
! python3 ../eval.py \
--question-file $dataset_path_hf_eval \
--answer-file $dataset_path_ft_answer \
--model $model_name

### 3d. Execute the Azure AI Studio Evalutaion Tool while new target answers for a Completion Model: Run pfeval.py
This will run the prompt flow evaluation retrieving a new answer for each question in the evaluation file. This is an alternative to the eval.py script.

The following environment variables must be set:
- EVAL_AZURE_OPENAI_ENDPOINT
- EVAL_AZURE_OPENAI_API_KEY
- SCORE_AZURE_OPENAI_ENDPOINT
- SCORE_AZURE_OPENAI_API_KEY
- SCORE_OPENAI_API_VERSION
- SCORE_AZURE_OPENAI_DEPLOYMENT
- GROUNDEDNESS_SUB_ID
- GROUNDEDNESS_GROUP
- GROUNDEDNESS_PROJECT_NAME
- REPORT_SUB_ID
- REPORT_GROUP
- REPORT_PROJECT_NAME

In [None]:
! python3 ../pfeval.py \
--input $dataset_path_ft_eval \
--output $dataset_path_local_results \
--score-model $score_model_name \
--model $model_name \
--deployment $deployment_name \
--templates "../"