In [1]:
%pip install --upgrade --quiet google-cloud-aiplatform google-cloud-aiplatform[evaluation]

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/6.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/6.5 MB[0m [31m63.0 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━[0m [32m4.2/6.5 MB[0m [31m69.5 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m6.5/6.5 MB[0m [31m65.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.5/6.5 MB[0m [31m51.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
import pandas as pd
from inspect import cleandoc
from IPython.display import display, Markdown

import vertexai
from vertexai.generative_models import GenerativeModel, GenerationConfig
from vertexai.evaluation import (
    MetricPromptTemplateExamples,
    EvalTask,
    PairwiseMetric,
    PairwiseMetricPromptTemplate,
    PointwiseMetric,
    PointwiseMetricPromptTemplate,
)

pd.set_option("display.max_colwidth", None)

In [None]:
vertexai.init(project="<GCP PROJECT ID>", location="<REGION>")

In [4]:
hourly_rates = cleandoc("""
  Screenwriter: $40
  Actor: $25
  Director: $30
  Camera Operator: $35
  Sound Engineer: $20
  Editor: $30
  """)

planning_notes = cleandoc("""
 Phases of Production:
   Writing:
   The Screenwriter will write the script.
   They need 72 hours to do so.


   Pre-Production:
   The Director needs time to analyze the script.
   They will work on it for 36 hours.
   The Camera Operator will join the director for 24 hours of planning.


   Production Phase 1
   The first three days of filming will require the director, 4 actors, the camera operator, and the sound engineer


   Production Phase 2
   The next three days of filming will require the director, 8 actors, the camera operator, and the sound engineer


   Post-Production
   The editor will take 64 hours to edit the film.
   The director will work with the editor for 24 hours during this phase.
""")

In [5]:
tasks = [
  "What is the cost of each phase of production? \
  If days are mentioned, assume an 8 hour work day.",

  "How many days will each phase require? Assume an \
  8 hour work day. If multiple people are working in parallel, \
  do not add those times together, but only use the longest time. \
  Also include a count of the total number of days of the entire \
  project.",

  "Prepare a text schedule for all phases of the film starting \
  on Feb 3, 2025. The whole crew should be off Saturdays \
  and Sundays.",
]

In [6]:
prompt_template = cleandoc("""

  Prepare a document to fulfill the task based on the context provided.


  {task}


  {context}

  """)

In [8]:
llm_pro = GenerativeModel(
  "gemini-1.5-pro-001",
  generation_config={
      "temperature": 0,
  },
)

llm_flash = GenerativeModel(
  "gemini-1.5-flash-001",
  generation_config={
      "temperature": 0,
  },
)

context = hourly_rates + "\n\n" + planning_notes

prompt = prompt_template.format(task=tasks[1], context=context)

Markdown(llm_pro.generate_content(prompt).text)

## Project Timeline & Total Days Calculation

Here's a breakdown of the time required for each phase, assuming an 8-hour workday:

**Phase:** | **Description** | **Duration (Days)**
------- | -------- | --------
Writing | Screenwriter writes the script (72 hours) | **9 days** (72 hours / 8 hours/day)
Pre-Production | Director analyzes the script (36 hours)  | **4.5 days** (36 hours / 8 hours/day)
 | Camera Operator joins for planning (24 hours) | **3 days** (24 hours / 8 hours/day)
Production Phase 1 | Filming with Director, 4 Actors, Camera Operator, Sound Engineer (3 days) | **3 days** 
Production Phase 2 | Filming with Director, 8 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**
Post-Production | Editor edits the film (64 hours) | **8 days** (64 hours / 8 hours/day)
 | Director collaborates with Editor (24 hours) | **3 days** (24 hours / 8 hours/day)

**Total Project Days:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**

**Note:** 

* We only consider the longest duration within each phase to avoid double-counting parallel work.
* The total project days represent the calendar days needed to complete all phases. 


In [9]:
Markdown(llm_flash.generate_content(prompt).text)

## Project Timeline and Budget

**Assumptions:**

* 8-hour workday
* Parallel tasks do not add time, only the longest duration is considered.

**Phase Breakdown:**

| Phase | Task | Duration (hours) | Duration (days) | Personnel |
|---|---|---|---|---|
| **Writing** | Screenwriter writes script | 72 | 9 | Screenwriter |
| **Pre-Production** | Director analyzes script | 36 | 4.5 | Director |
| **Pre-Production** | Director and Camera Operator plan | 24 | 3 | Director, Camera Operator |
| **Production Phase 1** | Filming | 24 | 3 | Director, 4 Actors, Camera Operator, Sound Engineer |
| **Production Phase 2** | Filming | 24 | 3 | Director, 8 Actors, Camera Operator, Sound Engineer |
| **Post-Production** | Editor edits film | 64 | 8 | Editor |
| **Post-Production** | Director collaborates with editor | 24 | 3 | Director, Editor |

**Total Project Duration:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**

**Budget Breakdown:**

| Role | Hourly Rate | Total Hours | Total Cost |
|---|---|---|---|
| Screenwriter | $40 | 72 | $2880 |
| Director | $30 | 84 | $2520 |
| Camera Operator | $35 | 48 | $1680 |
| Sound Engineer | $20 | 48 | $960 |
| Editor | $30 | 64 | $1920 |
| Actors | $25 | 96 | $2400 | 
| **Total** | | | **$12,360** |

**Notes:**

* The actor's cost is calculated based on the total hours worked across both production phases (4 actors x 24 hours + 8 actors x 24 hours = 96 hours).
* The budget does not include any additional costs such as equipment rentals, location fees, or post-production software. 


In [10]:
response_pro = llm_pro.generate_content(prompt).text
response_flash = llm_flash.generate_content(prompt).text

eval_dataset = pd.DataFrame({
    "prompt": prompt[0:5],
    "baseline": [response_pro] * 5,  # Copying Gemini Pro response for each prompt
    "candidate": [response_flash] * 5,  # Copying Gemini Flash response for each prompt
})

In [11]:
from vertexai.evaluation import (
    MetricPromptTemplateExamples,
    EvalTask,
    PairwiseMetric,
    PairwiseMetricPromptTemplate,
    PointwiseMetric,
    PointwiseMetricPromptTemplate,
)

eval_task = EvalTask(
    dataset=eval_dataset,  # The dataset with prompts, baseline, and candidate responses
    metrics= [MetricPromptTemplateExamples.Pairwise.QUESTION_ANSWERING_QUALITY],  # Metric to evaluate quality
    experiment="indie-film-planning",  # Name for the experiment
    metric_column_mapping={
        "baseline_model_response": "baseline",  # Map 'baseline' to 'baseline_model_response'
        "candidate_model_response": "candidate",  # Map 'candidate' to 'candidate_model_response'
    }
)

# Display the EvalTask to verify
print(eval_task)

<vertexai.evaluation.eval_task.EvalTask object at 0x7914192e6f20>


In [16]:
import datetime
import json
run_ts = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

eval_results_to_compare = []
eval_result = eval_task.evaluate(
    model=llm_pro
)
eval_results_to_compare.append(eval_result)

eval_result = eval_task.evaluate(
    model=llm_flash
)
eval_results_to_compare.append(eval_result)

INFO:google.cloud.aiplatform.metadata.experiment_resources:Associating projects/49437052547/locations/us-west1/metadataStores/default/contexts/indie-film-planning-68e05650-e2e4-457f-ba37-c8517af9ec62 to Experiment: indie-film-planning


INFO:vertexai.evaluation.eval_task:Logging Eval Experiment metadata: {'model_name': 'publishers/google/models/gemini-1.5-pro-001', 'temperature': 0}
INFO:vertexai.evaluation._evaluation:Generating a total of 5 responses from Gemini model gemini-1.5-pro-001.
100%|██████████| 5/5 [00:03<00:00,  1.44it/s]
INFO:vertexai.evaluation._evaluation:All 5 responses are successfully generated from Gemini model gemini-1.5-pro-001.
INFO:vertexai.evaluation._evaluation:Multithreaded Batch Inference took: 3.4843618849999984 seconds.
INFO:vertexai.evaluation._evaluation:Computing metrics with a total of 5 Vertex Gen AI Evaluation Service API requests.
100%|██████████| 5/5 [00:05<00:00,  1.14s/it]
INFO:vertexai.evaluation._evaluation:All 5 metric requests are successfully computed.
INFO:vertexai.evaluation._evaluation:Evaluation Took:5.6890636839999615 seconds


INFO:google.cloud.aiplatform.metadata.experiment_resources:Associating projects/49437052547/locations/us-west1/metadataStores/default/contexts/indie-film-planning-882d71e6-e5b6-4893-bb8e-671bd1d1a9a1 to Experiment: indie-film-planning


INFO:vertexai.evaluation.eval_task:Logging Eval Experiment metadata: {'model_name': 'publishers/google/models/gemini-1.5-flash-001', 'temperature': 0}
INFO:vertexai.evaluation._evaluation:Generating a total of 5 responses from Gemini model gemini-1.5-flash-001.
100%|██████████| 5/5 [00:00<00:00,  5.47it/s]
INFO:vertexai.evaluation._evaluation:All 5 responses are successfully generated from Gemini model gemini-1.5-flash-001.
INFO:vertexai.evaluation._evaluation:Multithreaded Batch Inference took: 0.9244996399999081 seconds.
INFO:vertexai.evaluation._evaluation:Computing metrics with a total of 5 Vertex Gen AI Evaluation Service API requests.
100%|██████████| 5/5 [00:05<00:00,  1.08s/it]
INFO:vertexai.evaluation._evaluation:All 5 metric requests are successfully computed.
INFO:vertexai.evaluation._evaluation:Evaluation Took:5.426754564000021 seconds


In [21]:
for eval_result in eval_results_to_compare:
    print(eval_result.summary_metrics)

{'row_count': 5, 'pairwise_question_answering_quality/candidate_model_win_rate': 1.0, 'pairwise_question_answering_quality/baseline_model_win_rate': 0.0}
{'row_count': 5, 'pairwise_question_answering_quality/candidate_model_win_rate': 1.0, 'pairwise_question_answering_quality/baseline_model_win_rate': 0.0}


In [50]:
for eval_result in eval_results_to_compare:
    display(eval_result.metrics_table)

Unnamed: 0,prompt,baseline,candidate,response,pairwise_question_answering_quality/explanation,pairwise_question_answering_quality/pairwise_choice
0,Prepa,"## Project Timeline & Total Days Calculation\n\nHere's a breakdown of the time required for each phase, assuming an 8-hour workday:\n\n**Phase:** | **Description** | **Duration (Days)**\n------- | -------- | --------\nWriting | Screenwriter writes the script (72 hours) | **9 days** (72 hours / 8 hours/day)\nPre-Production | Director analyzes the script (36 hours) | **4.5 days** (36 hours / 8 hours/day) \n | Camera Operator joins Director for planning (24 hours) | **3 days** (24 hours / 8 hours/day)\nProduction Phase 1 | Filming with Director, 4 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nProduction Phase 2 | Filming with Director, 8 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nPost-Production | Editor edits the film (64 hours) | **8 days** (64 hours / 8 hours/day)\n | Director collaborates with Editor (24 hours) | **3 days** (24 hours / 8 hours/day)\n\n**Total Project Days:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Note:** We only consider the longest duration within each phase to avoid double-counting parallel work. \n","## Project Timeline and Budget\n\n**Assumptions:**\n\n* 8-hour workday\n* Parallel tasks do not add time, only the longest duration is considered.\n\n**Phase Breakdown:**\n\n| Phase | Task | Duration (hours) | Duration (days) | Personnel |\n|---|---|---|---|---|\n| **Writing** | Screenwriter writes script | 72 | 9 | Screenwriter |\n| **Pre-Production** | Director analyzes script | 36 | 4.5 | Director |\n| **Pre-Production** | Director and Camera Operator plan | 24 | 3 | Director, Camera Operator |\n| **Production Phase 1** | Filming | 24 | 3 | Director, 4 Actors, Camera Operator, Sound Engineer |\n| **Production Phase 2** | Filming | 24 | 3 | Director, 8 Actors, Camera Operator, Sound Engineer |\n| **Post-Production** | Editor edits film | 64 | 8 | Editor |\n| **Post-Production** | Director collaborates with editor | 24 | 3 | Director, Editor |\n\n**Total Project Duration:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Budget Breakdown:**\n\n| Role | Hourly Rate | Total Hours | Total Cost |\n|---|---|---|---|\n| Screenwriter | $40 | 72 | $2880 |\n| Director | $30 | 84 | $2520 |\n| Camera Operator | $35 | 48 | $1680 |\n| Sound Engineer | $20 | 48 | $960 |\n| Editor | $30 | 64 | $1920 |\n| Actors | $25 | 96 | $2400 |\n\n**Total Project Cost:** $2880 + $2520 + $1680 + $960 + $1920 + $2400 = **$12,360** \n","Please provide me with more context or information. ""Prepa"" can refer to several things, such as:\n\n* **Preparatoria (Prepa):** A type of high school in Mexico and some other Latin American countries.\n* **Puerto Rico Electric Power Authority (PREPA):** The main electricity provider in Puerto Rico.\n* **Preparation:** Shortened form of the word, meaning the act of getting ready for something.\n\nPlease tell me:\n\n* **What is the context of your question?** \n* **What do you want to know about ""Prepa""?**\n\nOnce you provide more information, I can give you a more helpful and relevant answer. \n","BASELINE response hallucinates a detailed project plan for something related to film production, while CANDIDATE response correctly asks for clarification as the prompt is unclear.",CANDIDATE
1,Prepa,"## Project Timeline & Total Days Calculation\n\nHere's a breakdown of the time required for each phase, assuming an 8-hour workday:\n\n**Phase:** | **Description** | **Duration (Days)**\n------- | -------- | --------\nWriting | Screenwriter writes the script (72 hours) | **9 days** (72 hours / 8 hours/day)\nPre-Production | Director analyzes the script (36 hours) | **4.5 days** (36 hours / 8 hours/day) \n | Camera Operator joins Director for planning (24 hours) | **3 days** (24 hours / 8 hours/day)\nProduction Phase 1 | Filming with Director, 4 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nProduction Phase 2 | Filming with Director, 8 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nPost-Production | Editor edits the film (64 hours) | **8 days** (64 hours / 8 hours/day)\n | Director collaborates with Editor (24 hours) | **3 days** (24 hours / 8 hours/day)\n\n**Total Project Days:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Note:** We only consider the longest duration within each phase to avoid double-counting parallel work. \n","## Project Timeline and Budget\n\n**Assumptions:**\n\n* 8-hour workday\n* Parallel tasks do not add time, only the longest duration is considered.\n\n**Phase Breakdown:**\n\n| Phase | Task | Duration (hours) | Duration (days) | Personnel |\n|---|---|---|---|---|\n| **Writing** | Screenwriter writes script | 72 | 9 | Screenwriter |\n| **Pre-Production** | Director analyzes script | 36 | 4.5 | Director |\n| **Pre-Production** | Director and Camera Operator plan | 24 | 3 | Director, Camera Operator |\n| **Production Phase 1** | Filming | 24 | 3 | Director, 4 Actors, Camera Operator, Sound Engineer |\n| **Production Phase 2** | Filming | 24 | 3 | Director, 8 Actors, Camera Operator, Sound Engineer |\n| **Post-Production** | Editor edits film | 64 | 8 | Editor |\n| **Post-Production** | Director collaborates with editor | 24 | 3 | Director, Editor |\n\n**Total Project Duration:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Budget Breakdown:**\n\n| Role | Hourly Rate | Total Hours | Total Cost |\n|---|---|---|---|\n| Screenwriter | $40 | 72 | $2880 |\n| Director | $30 | 84 | $2520 |\n| Camera Operator | $35 | 48 | $1680 |\n| Sound Engineer | $20 | 48 | $960 |\n| Editor | $30 | 64 | $1920 |\n| Actors | $25 | 96 | $2400 |\n\n**Total Project Cost:** $2880 + $2520 + $1680 + $960 + $1920 + $2400 = **$12,360** \n","Please give me more context! What about ""prepa"" are you interested in? \n\nFor example, are you asking about:\n\n* **""Prepa"" as a shortened form of ""preparatoria,"" the Mexican equivalent of high school?** \n * Are you looking for information about a specific preparatoria?\n * Do you want to know about the Mexican education system?\n* **""PREPA"" as an acronym for something else?** \n * There are many organizations and concepts that use this acronym. \n* **Something else entirely?**\n\nPlease provide more details so I can give you a helpful response! \n","BASELINE response hallucinates a project plan based on the prompt ""prepa"". CANDIDATE response asks clarifying questions which is appropriate given the vague prompt.",CANDIDATE
2,Prepa,"## Project Timeline & Total Days Calculation\n\nHere's a breakdown of the time required for each phase, assuming an 8-hour workday:\n\n**Phase:** | **Description** | **Duration (Days)**\n------- | -------- | --------\nWriting | Screenwriter writes the script (72 hours) | **9 days** (72 hours / 8 hours/day)\nPre-Production | Director analyzes the script (36 hours) | **4.5 days** (36 hours / 8 hours/day) \n | Camera Operator joins Director for planning (24 hours) | **3 days** (24 hours / 8 hours/day)\nProduction Phase 1 | Filming with Director, 4 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nProduction Phase 2 | Filming with Director, 8 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nPost-Production | Editor edits the film (64 hours) | **8 days** (64 hours / 8 hours/day)\n | Director collaborates with Editor (24 hours) | **3 days** (24 hours / 8 hours/day)\n\n**Total Project Days:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Note:** We only consider the longest duration within each phase to avoid double-counting parallel work. \n","## Project Timeline and Budget\n\n**Assumptions:**\n\n* 8-hour workday\n* Parallel tasks do not add time, only the longest duration is considered.\n\n**Phase Breakdown:**\n\n| Phase | Task | Duration (hours) | Duration (days) | Personnel |\n|---|---|---|---|---|\n| **Writing** | Screenwriter writes script | 72 | 9 | Screenwriter |\n| **Pre-Production** | Director analyzes script | 36 | 4.5 | Director |\n| **Pre-Production** | Director and Camera Operator plan | 24 | 3 | Director, Camera Operator |\n| **Production Phase 1** | Filming | 24 | 3 | Director, 4 Actors, Camera Operator, Sound Engineer |\n| **Production Phase 2** | Filming | 24 | 3 | Director, 8 Actors, Camera Operator, Sound Engineer |\n| **Post-Production** | Editor edits film | 64 | 8 | Editor |\n| **Post-Production** | Director collaborates with editor | 24 | 3 | Director, Editor |\n\n**Total Project Duration:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Budget Breakdown:**\n\n| Role | Hourly Rate | Total Hours | Total Cost |\n|---|---|---|---|\n| Screenwriter | $40 | 72 | $2880 |\n| Director | $30 | 84 | $2520 |\n| Camera Operator | $35 | 48 | $1680 |\n| Sound Engineer | $20 | 48 | $960 |\n| Editor | $30 | 64 | $1920 |\n| Actors | $25 | 96 | $2400 |\n\n**Total Project Cost:** $2880 + $2520 + $1680 + $960 + $1920 + $2400 = **$12,360** \n","Please provide me with more context or information! ""Prepa"" can refer to many things, such as:\n\n* **Preparatory school:** A type of secondary school that prepares students for college. \n* **Puerto Rico Electric Power Authority (PREPA):** The public corporation responsible for electricity generation and distribution in Puerto Rico.\n* **Something else entirely!** \n\n**Tell me more about what you're interested in, and I can give you a more helpful response.** For example:\n\n* ""What are the best prepa schools in New York City?""\n* ""What is the current status of PREPA's debt crisis?""\n* ""How do I say 'preparation' in Spanish?"" \n\nI'm here to help! 😊 \n","CANDIDATE response correctly asks for clarification as the prompt is incomplete and unclear, while BASELINE response hallucinates a project plan based on the ""prepa"" fragment, which could stand for ""preparation.""",CANDIDATE
3,Prepa,"## Project Timeline & Total Days Calculation\n\nHere's a breakdown of the time required for each phase, assuming an 8-hour workday:\n\n**Phase:** | **Description** | **Duration (Days)**\n------- | -------- | --------\nWriting | Screenwriter writes the script (72 hours) | **9 days** (72 hours / 8 hours/day)\nPre-Production | Director analyzes the script (36 hours) | **4.5 days** (36 hours / 8 hours/day) \n | Camera Operator joins Director for planning (24 hours) | **3 days** (24 hours / 8 hours/day)\nProduction Phase 1 | Filming with Director, 4 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nProduction Phase 2 | Filming with Director, 8 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nPost-Production | Editor edits the film (64 hours) | **8 days** (64 hours / 8 hours/day)\n | Director collaborates with Editor (24 hours) | **3 days** (24 hours / 8 hours/day)\n\n**Total Project Days:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Note:** We only consider the longest duration within each phase to avoid double-counting parallel work. \n","## Project Timeline and Budget\n\n**Assumptions:**\n\n* 8-hour workday\n* Parallel tasks do not add time, only the longest duration is considered.\n\n**Phase Breakdown:**\n\n| Phase | Task | Duration (hours) | Duration (days) | Personnel |\n|---|---|---|---|---|\n| **Writing** | Screenwriter writes script | 72 | 9 | Screenwriter |\n| **Pre-Production** | Director analyzes script | 36 | 4.5 | Director |\n| **Pre-Production** | Director and Camera Operator plan | 24 | 3 | Director, Camera Operator |\n| **Production Phase 1** | Filming | 24 | 3 | Director, 4 Actors, Camera Operator, Sound Engineer |\n| **Production Phase 2** | Filming | 24 | 3 | Director, 8 Actors, Camera Operator, Sound Engineer |\n| **Post-Production** | Editor edits film | 64 | 8 | Editor |\n| **Post-Production** | Director collaborates with editor | 24 | 3 | Director, Editor |\n\n**Total Project Duration:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Budget Breakdown:**\n\n| Role | Hourly Rate | Total Hours | Total Cost |\n|---|---|---|---|\n| Screenwriter | $40 | 72 | $2880 |\n| Director | $30 | 84 | $2520 |\n| Camera Operator | $35 | 48 | $1680 |\n| Sound Engineer | $20 | 48 | $960 |\n| Editor | $30 | 64 | $1920 |\n| Actors | $25 | 96 | $2400 |\n\n**Total Project Cost:** $2880 + $2520 + $1680 + $960 + $1920 + $2400 = **$12,360** \n","Please give me more context! What about ""prepa"" are you interested in? \n\nFor example, are you asking about:\n\n* **""Prepa"" as a shortened form of ""preparatoria,"" the Mexican equivalent of high school?** \n * If so, what do you want to know about it? The curriculum, the application process, the social life?\n* **""PREPA"" as an acronym for something else?** \n * There are many possibilities, so please tell me what field or topic it relates to.\n* **Something else entirely?**\n\nThe more information you provide, the better I can understand your request and give you a helpful response. 😊 \n","CANDIDATE response seeks clarification for the ambiguous prompt ""Prepa"", while BASELINE response hallucinates a project plan based on an assumption that is not provided in the prompt.",CANDIDATE
4,Prepa,"## Project Timeline & Total Days Calculation\n\nHere's a breakdown of the time required for each phase, assuming an 8-hour workday:\n\n**Phase:** | **Description** | **Duration (Days)**\n------- | -------- | --------\nWriting | Screenwriter writes the script (72 hours) | **9 days** (72 hours / 8 hours/day)\nPre-Production | Director analyzes the script (36 hours) | **4.5 days** (36 hours / 8 hours/day) \n | Camera Operator joins Director for planning (24 hours) | **3 days** (24 hours / 8 hours/day)\nProduction Phase 1 | Filming with Director, 4 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nProduction Phase 2 | Filming with Director, 8 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nPost-Production | Editor edits the film (64 hours) | **8 days** (64 hours / 8 hours/day)\n | Director collaborates with Editor (24 hours) | **3 days** (24 hours / 8 hours/day)\n\n**Total Project Days:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Note:** We only consider the longest duration within each phase to avoid double-counting parallel work. \n","## Project Timeline and Budget\n\n**Assumptions:**\n\n* 8-hour workday\n* Parallel tasks do not add time, only the longest duration is considered.\n\n**Phase Breakdown:**\n\n| Phase | Task | Duration (hours) | Duration (days) | Personnel |\n|---|---|---|---|---|\n| **Writing** | Screenwriter writes script | 72 | 9 | Screenwriter |\n| **Pre-Production** | Director analyzes script | 36 | 4.5 | Director |\n| **Pre-Production** | Director and Camera Operator plan | 24 | 3 | Director, Camera Operator |\n| **Production Phase 1** | Filming | 24 | 3 | Director, 4 Actors, Camera Operator, Sound Engineer |\n| **Production Phase 2** | Filming | 24 | 3 | Director, 8 Actors, Camera Operator, Sound Engineer |\n| **Post-Production** | Editor edits film | 64 | 8 | Editor |\n| **Post-Production** | Director collaborates with editor | 24 | 3 | Director, Editor |\n\n**Total Project Duration:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Budget Breakdown:**\n\n| Role | Hourly Rate | Total Hours | Total Cost |\n|---|---|---|---|\n| Screenwriter | $40 | 72 | $2880 |\n| Director | $30 | 84 | $2520 |\n| Camera Operator | $35 | 48 | $1680 |\n| Sound Engineer | $20 | 48 | $960 |\n| Editor | $30 | 64 | $1920 |\n| Actors | $25 | 96 | $2400 |\n\n**Total Project Cost:** $2880 + $2520 + $1680 + $960 + $1920 + $2400 = **$12,360** \n","Please give me more context! What about ""prepa"" are you interested in? \n\nFor example, are you asking about:\n\n* **""Prepa"" as a shortened form of ""preparatoria,"" the Mexican equivalent of high school?** \n * If so, what do you want to know about it? The curriculum, the application process, the social life?\n* **""PREPA"" as an acronym for something else?** \n * There are many organizations and concepts that use this acronym. Please tell me more about what you're looking for.\n* **Something else entirely?**\n\nThe more information you provide, the better I can understand your request and give you a helpful response. \n","CANDIDATE response seeks clarification on the ambiguous prompt ""prepa"", while BASELINE response hallucinates a project timeline assuming ""prepa"" refers to pre-production. Seeking clarification is preferable to hallucination.",CANDIDATE


Unnamed: 0,prompt,baseline,candidate,response,pairwise_question_answering_quality/explanation,pairwise_question_answering_quality/pairwise_choice
0,Prepa,"## Project Timeline & Total Days Calculation\n\nHere's a breakdown of the time required for each phase, assuming an 8-hour workday:\n\n**Phase:** | **Description** | **Duration (Days)**\n------- | -------- | --------\nWriting | Screenwriter writes the script (72 hours) | **9 days** (72 hours / 8 hours/day)\nPre-Production | Director analyzes the script (36 hours) | **4.5 days** (36 hours / 8 hours/day) \n | Camera Operator joins Director for planning (24 hours) | **3 days** (24 hours / 8 hours/day)\nProduction Phase 1 | Filming with Director, 4 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nProduction Phase 2 | Filming with Director, 8 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nPost-Production | Editor edits the film (64 hours) | **8 days** (64 hours / 8 hours/day)\n | Director collaborates with Editor (24 hours) | **3 days** (24 hours / 8 hours/day)\n\n**Total Project Days:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Note:** We only consider the longest duration within each phase to avoid double-counting parallel work. \n","## Project Timeline and Budget\n\n**Assumptions:**\n\n* 8-hour workday\n* Parallel tasks do not add time, only the longest duration is considered.\n\n**Phase Breakdown:**\n\n| Phase | Task | Duration (hours) | Duration (days) | Personnel |\n|---|---|---|---|---|\n| **Writing** | Screenwriter writes script | 72 | 9 | Screenwriter |\n| **Pre-Production** | Director analyzes script | 36 | 4.5 | Director |\n| **Pre-Production** | Director and Camera Operator plan | 24 | 3 | Director, Camera Operator |\n| **Production Phase 1** | Filming | 24 | 3 | Director, 4 Actors, Camera Operator, Sound Engineer |\n| **Production Phase 2** | Filming | 24 | 3 | Director, 8 Actors, Camera Operator, Sound Engineer |\n| **Post-Production** | Editor edits film | 64 | 8 | Editor |\n| **Post-Production** | Director collaborates with editor | 24 | 3 | Director, Editor |\n\n**Total Project Duration:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Budget Breakdown:**\n\n| Role | Hourly Rate | Total Hours | Total Cost |\n|---|---|---|---|\n| Screenwriter | $40 | 72 | $2880 |\n| Director | $30 | 84 | $2520 |\n| Camera Operator | $35 | 48 | $1680 |\n| Sound Engineer | $20 | 48 | $960 |\n| Editor | $30 | 64 | $1920 |\n| Actors | $25 | 96 | $2400 |\n\n**Total Project Cost:** $2880 + $2520 + $1680 + $960 + $1920 + $2400 = **$12,360** \n","""Prepa"" can refer to a few different things, depending on the context. Could you please provide me with more information? \n\nFor example, are you asking about:\n\n* **Preparatory school (Prepa) in Mexico?** This is a type of high school that prepares students for university.\n* **Preparatory school (Prepa) in France?** This is a type of high school that prepares students for the baccalaureate exam.\n* **Preparation (Prepa) for a specific event or task?** This could refer to any kind of preparation, such as studying for an exam, training for a competition, or planning a trip.\n\nOnce I know what you're asking about, I can give you a more specific answer. \n","CANDIDATE response correctly identifies the ambiguity of the prompt and asks for clarification, while BASELINE response hallucinates a detailed project timeline based on an unclear prompt.",CANDIDATE
1,Prepa,"## Project Timeline & Total Days Calculation\n\nHere's a breakdown of the time required for each phase, assuming an 8-hour workday:\n\n**Phase:** | **Description** | **Duration (Days)**\n------- | -------- | --------\nWriting | Screenwriter writes the script (72 hours) | **9 days** (72 hours / 8 hours/day)\nPre-Production | Director analyzes the script (36 hours) | **4.5 days** (36 hours / 8 hours/day) \n | Camera Operator joins Director for planning (24 hours) | **3 days** (24 hours / 8 hours/day)\nProduction Phase 1 | Filming with Director, 4 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nProduction Phase 2 | Filming with Director, 8 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nPost-Production | Editor edits the film (64 hours) | **8 days** (64 hours / 8 hours/day)\n | Director collaborates with Editor (24 hours) | **3 days** (24 hours / 8 hours/day)\n\n**Total Project Days:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Note:** We only consider the longest duration within each phase to avoid double-counting parallel work. \n","## Project Timeline and Budget\n\n**Assumptions:**\n\n* 8-hour workday\n* Parallel tasks do not add time, only the longest duration is considered.\n\n**Phase Breakdown:**\n\n| Phase | Task | Duration (hours) | Duration (days) | Personnel |\n|---|---|---|---|---|\n| **Writing** | Screenwriter writes script | 72 | 9 | Screenwriter |\n| **Pre-Production** | Director analyzes script | 36 | 4.5 | Director |\n| **Pre-Production** | Director and Camera Operator plan | 24 | 3 | Director, Camera Operator |\n| **Production Phase 1** | Filming | 24 | 3 | Director, 4 Actors, Camera Operator, Sound Engineer |\n| **Production Phase 2** | Filming | 24 | 3 | Director, 8 Actors, Camera Operator, Sound Engineer |\n| **Post-Production** | Editor edits film | 64 | 8 | Editor |\n| **Post-Production** | Director collaborates with editor | 24 | 3 | Director, Editor |\n\n**Total Project Duration:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Budget Breakdown:**\n\n| Role | Hourly Rate | Total Hours | Total Cost |\n|---|---|---|---|\n| Screenwriter | $40 | 72 | $2880 |\n| Director | $30 | 84 | $2520 |\n| Camera Operator | $35 | 48 | $1680 |\n| Sound Engineer | $20 | 48 | $960 |\n| Editor | $30 | 64 | $1920 |\n| Actors | $25 | 96 | $2400 |\n\n**Total Project Cost:** $2880 + $2520 + $1680 + $960 + $1920 + $2400 = **$12,360** \n","""Prepa"" can refer to a few different things, depending on the context. Could you please provide me with more information? \n\nFor example, are you asking about:\n\n* **Preparatory school (Prepa) in Mexico?** This is a type of high school that prepares students for university.\n* **Preparatory school (Prepa) in France?** This is a type of high school that prepares students for the baccalaureate exam.\n* **Preparation (Prepa) for a specific event or task?** This could refer to any kind of preparation, such as studying for an exam, training for a competition, or planning a trip.\n\nOnce I know what you're asking about, I can give you a more specific answer. \n","CANDIDATE response correctly asks for clarification as the prompt is incomplete and unclear, while BASELINE response hallucinates a project plan based on the incomplete prompt ""Prepa"".",CANDIDATE
2,Prepa,"## Project Timeline & Total Days Calculation\n\nHere's a breakdown of the time required for each phase, assuming an 8-hour workday:\n\n**Phase:** | **Description** | **Duration (Days)**\n------- | -------- | --------\nWriting | Screenwriter writes the script (72 hours) | **9 days** (72 hours / 8 hours/day)\nPre-Production | Director analyzes the script (36 hours) | **4.5 days** (36 hours / 8 hours/day) \n | Camera Operator joins Director for planning (24 hours) | **3 days** (24 hours / 8 hours/day)\nProduction Phase 1 | Filming with Director, 4 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nProduction Phase 2 | Filming with Director, 8 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nPost-Production | Editor edits the film (64 hours) | **8 days** (64 hours / 8 hours/day)\n | Director collaborates with Editor (24 hours) | **3 days** (24 hours / 8 hours/day)\n\n**Total Project Days:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Note:** We only consider the longest duration within each phase to avoid double-counting parallel work. \n","## Project Timeline and Budget\n\n**Assumptions:**\n\n* 8-hour workday\n* Parallel tasks do not add time, only the longest duration is considered.\n\n**Phase Breakdown:**\n\n| Phase | Task | Duration (hours) | Duration (days) | Personnel |\n|---|---|---|---|---|\n| **Writing** | Screenwriter writes script | 72 | 9 | Screenwriter |\n| **Pre-Production** | Director analyzes script | 36 | 4.5 | Director |\n| **Pre-Production** | Director and Camera Operator plan | 24 | 3 | Director, Camera Operator |\n| **Production Phase 1** | Filming | 24 | 3 | Director, 4 Actors, Camera Operator, Sound Engineer |\n| **Production Phase 2** | Filming | 24 | 3 | Director, 8 Actors, Camera Operator, Sound Engineer |\n| **Post-Production** | Editor edits film | 64 | 8 | Editor |\n| **Post-Production** | Director collaborates with editor | 24 | 3 | Director, Editor |\n\n**Total Project Duration:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Budget Breakdown:**\n\n| Role | Hourly Rate | Total Hours | Total Cost |\n|---|---|---|---|\n| Screenwriter | $40 | 72 | $2880 |\n| Director | $30 | 84 | $2520 |\n| Camera Operator | $35 | 48 | $1680 |\n| Sound Engineer | $20 | 48 | $960 |\n| Editor | $30 | 64 | $1920 |\n| Actors | $25 | 96 | $2400 |\n\n**Total Project Cost:** $2880 + $2520 + $1680 + $960 + $1920 + $2400 = **$12,360** \n","""Prepa"" can refer to a few different things, depending on the context. Could you please provide me with more information? \n\nFor example, are you asking about:\n\n* **Preparatory school (Prepa) in Mexico?** This is a type of high school that prepares students for university.\n* **Preparatory school (Prepa) in France?** This is a type of high school that prepares students for the baccalaureate exam.\n* **Preparation (Prepa) for a specific event or task?** This could refer to any kind of preparation, such as studying for an exam, training for a competition, or planning a trip.\n\nOnce I know what you're asking about, I can give you a more specific answer. \n","CANDIDATE response seeks clarification for the ambiguous prompt ""Prepa"", while BASELINE response hallucinates details about a project and provides an arbitrary timeline.",CANDIDATE
3,Prepa,"## Project Timeline & Total Days Calculation\n\nHere's a breakdown of the time required for each phase, assuming an 8-hour workday:\n\n**Phase:** | **Description** | **Duration (Days)**\n------- | -------- | --------\nWriting | Screenwriter writes the script (72 hours) | **9 days** (72 hours / 8 hours/day)\nPre-Production | Director analyzes the script (36 hours) | **4.5 days** (36 hours / 8 hours/day) \n | Camera Operator joins Director for planning (24 hours) | **3 days** (24 hours / 8 hours/day)\nProduction Phase 1 | Filming with Director, 4 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nProduction Phase 2 | Filming with Director, 8 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nPost-Production | Editor edits the film (64 hours) | **8 days** (64 hours / 8 hours/day)\n | Director collaborates with Editor (24 hours) | **3 days** (24 hours / 8 hours/day)\n\n**Total Project Days:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Note:** We only consider the longest duration within each phase to avoid double-counting parallel work. \n","## Project Timeline and Budget\n\n**Assumptions:**\n\n* 8-hour workday\n* Parallel tasks do not add time, only the longest duration is considered.\n\n**Phase Breakdown:**\n\n| Phase | Task | Duration (hours) | Duration (days) | Personnel |\n|---|---|---|---|---|\n| **Writing** | Screenwriter writes script | 72 | 9 | Screenwriter |\n| **Pre-Production** | Director analyzes script | 36 | 4.5 | Director |\n| **Pre-Production** | Director and Camera Operator plan | 24 | 3 | Director, Camera Operator |\n| **Production Phase 1** | Filming | 24 | 3 | Director, 4 Actors, Camera Operator, Sound Engineer |\n| **Production Phase 2** | Filming | 24 | 3 | Director, 8 Actors, Camera Operator, Sound Engineer |\n| **Post-Production** | Editor edits film | 64 | 8 | Editor |\n| **Post-Production** | Director collaborates with editor | 24 | 3 | Director, Editor |\n\n**Total Project Duration:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Budget Breakdown:**\n\n| Role | Hourly Rate | Total Hours | Total Cost |\n|---|---|---|---|\n| Screenwriter | $40 | 72 | $2880 |\n| Director | $30 | 84 | $2520 |\n| Camera Operator | $35 | 48 | $1680 |\n| Sound Engineer | $20 | 48 | $960 |\n| Editor | $30 | 64 | $1920 |\n| Actors | $25 | 96 | $2400 |\n\n**Total Project Cost:** $2880 + $2520 + $1680 + $960 + $1920 + $2400 = **$12,360** \n","""Prepa"" can refer to a few different things, depending on the context. Could you please provide me with more information? \n\nFor example, are you asking about:\n\n* **Preparatory school (Prepa) in Mexico?** This is a type of high school that prepares students for university.\n* **Preparatory school (Prepa) in France?** This is a type of high school that prepares students for the baccalaureate exam.\n* **Preparation (Prepa) for a specific event or task?** This could refer to any kind of preparation, such as studying for an exam, training for a competition, or planning a trip.\n\nOnce I know what you're asking about, I can give you a more specific answer. \n","CANDIDATE response correctly asks for clarification as the prompt ""Prepa"" is ambiguous and requires more context to provide a helpful response, while BASELINE response hallucinates a detailed project timeline based on the unclear prompt.",CANDIDATE
4,Prepa,"## Project Timeline & Total Days Calculation\n\nHere's a breakdown of the time required for each phase, assuming an 8-hour workday:\n\n**Phase:** | **Description** | **Duration (Days)**\n------- | -------- | --------\nWriting | Screenwriter writes the script (72 hours) | **9 days** (72 hours / 8 hours/day)\nPre-Production | Director analyzes the script (36 hours) | **4.5 days** (36 hours / 8 hours/day) \n | Camera Operator joins Director for planning (24 hours) | **3 days** (24 hours / 8 hours/day)\nProduction Phase 1 | Filming with Director, 4 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nProduction Phase 2 | Filming with Director, 8 Actors, Camera Operator, Sound Engineer (3 days) | **3 days**\nPost-Production | Editor edits the film (64 hours) | **8 days** (64 hours / 8 hours/day)\n | Director collaborates with Editor (24 hours) | **3 days** (24 hours / 8 hours/day)\n\n**Total Project Days:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Note:** We only consider the longest duration within each phase to avoid double-counting parallel work. \n","## Project Timeline and Budget\n\n**Assumptions:**\n\n* 8-hour workday\n* Parallel tasks do not add time, only the longest duration is considered.\n\n**Phase Breakdown:**\n\n| Phase | Task | Duration (hours) | Duration (days) | Personnel |\n|---|---|---|---|---|\n| **Writing** | Screenwriter writes script | 72 | 9 | Screenwriter |\n| **Pre-Production** | Director analyzes script | 36 | 4.5 | Director |\n| **Pre-Production** | Director and Camera Operator plan | 24 | 3 | Director, Camera Operator |\n| **Production Phase 1** | Filming | 24 | 3 | Director, 4 Actors, Camera Operator, Sound Engineer |\n| **Production Phase 2** | Filming | 24 | 3 | Director, 8 Actors, Camera Operator, Sound Engineer |\n| **Post-Production** | Editor edits film | 64 | 8 | Editor |\n| **Post-Production** | Director collaborates with editor | 24 | 3 | Director, Editor |\n\n**Total Project Duration:** 9 + 4.5 + 3 + 3 + 3 + 8 + 3 = **33.5 days**\n\n**Budget Breakdown:**\n\n| Role | Hourly Rate | Total Hours | Total Cost |\n|---|---|---|---|\n| Screenwriter | $40 | 72 | $2880 |\n| Director | $30 | 84 | $2520 |\n| Camera Operator | $35 | 48 | $1680 |\n| Sound Engineer | $20 | 48 | $960 |\n| Editor | $30 | 64 | $1920 |\n| Actors | $25 | 96 | $2400 |\n\n**Total Project Cost:** $2880 + $2520 + $1680 + $960 + $1920 + $2400 = **$12,360** \n","""Prepa"" can refer to a few different things, depending on the context. Could you please provide me with more information? \n\nFor example, are you asking about:\n\n* **Preparatory school (Prepa) in Mexico?** This is a type of high school that prepares students for university.\n* **Preparatory school (Prepa) in France?** This is a type of high school that prepares students for the baccalaureate exam.\n* **Preparation (Prepa) for a specific event or task?** This could refer to any kind of preparation, such as studying for an exam, training for a competition, or planning a trip.\n\nOnce I know what you're asking about, I can give you a more specific answer. \n","CANDIDATE response asks clarifying questions since the prompt is unclear, while BASELINE response hallucinates a response based on an assumption about what the user meant.",CANDIDATE


In [53]:
for eval_result in eval_results_to_compare:
    display(eval_result.metrics_table["pairwise_question_answering_quality/pairwise_choice"])

Unnamed: 0,pairwise_question_answering_quality/pairwise_choice
0,CANDIDATE
1,CANDIDATE
2,CANDIDATE
3,CANDIDATE
4,CANDIDATE


Unnamed: 0,pairwise_question_answering_quality/pairwise_choice
0,CANDIDATE
1,CANDIDATE
2,CANDIDATE
3,CANDIDATE
4,CANDIDATE


In [54]:
for eval_result in eval_results_to_compare:
    display(eval_result.metrics_table["pairwise_question_answering_quality/explanation"])

Unnamed: 0,pairwise_question_answering_quality/explanation
0,"BASELINE response hallucinates a detailed project plan for something related to film production, while CANDIDATE response correctly asks for clarification as the prompt is unclear."
1,"BASELINE response hallucinates a project plan based on the prompt ""prepa"". CANDIDATE response asks clarifying questions which is appropriate given the vague prompt."
2,"CANDIDATE response correctly asks for clarification as the prompt is incomplete and unclear, while BASELINE response hallucinates a project plan based on the ""prepa"" fragment, which could stand for ""preparation."""
3,"CANDIDATE response seeks clarification for the ambiguous prompt ""Prepa"", while BASELINE response hallucinates a project plan based on an assumption that is not provided in the prompt."
4,"CANDIDATE response seeks clarification on the ambiguous prompt ""prepa"", while BASELINE response hallucinates a project timeline assuming ""prepa"" refers to pre-production. Seeking clarification is preferable to hallucination."


Unnamed: 0,pairwise_question_answering_quality/explanation
0,"CANDIDATE response correctly identifies the ambiguity of the prompt and asks for clarification, while BASELINE response hallucinates a detailed project timeline based on an unclear prompt."
1,"CANDIDATE response correctly asks for clarification as the prompt is incomplete and unclear, while BASELINE response hallucinates a project plan based on the incomplete prompt ""Prepa""."
2,"CANDIDATE response seeks clarification for the ambiguous prompt ""Prepa"", while BASELINE response hallucinates details about a project and provides an arbitrary timeline."
3,"CANDIDATE response correctly asks for clarification as the prompt ""Prepa"" is ambiguous and requires more context to provide a helpful response, while BASELINE response hallucinates a detailed project timeline based on the unclear prompt."
4,"CANDIDATE response asks clarifying questions since the prompt is unclear, while BASELINE response hallucinates a response based on an assumption about what the user meant."
