#  Task 1 Set Up

In [1]:
%pip install --upgrade --quiet google-genai nest-asyncio==1.5.9

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/200.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m200.0/200.0 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [1]:
import pandas as pd
from inspect import cleandoc
from IPython.display import display, Markdown

import vertexai
from vertexai.generative_models import GenerativeModel, GenerationConfig
from vertexai.evaluation import (
    MetricPromptTemplateExamples,
    EvalTask,
    PairwiseMetric,
    PairwiseMetricPromptTemplate,
    PointwiseMetric,
    PointwiseMetricPromptTemplate,
)

pd.set_option("display.max_colwidth", None)

In [2]:
PROJECT_ID = "qwiklabs-gcp-01-6cd835eecace"
LOCATION = "us-central1"
vertexai.init(project=PROJECT_ID, location=LOCATION)

# Task 2. Explore example data and generate a document

In [3]:
hourly_rates = cleandoc("""
  Screenwriter: $40
  Actor: $25
  Director: $30
  Camera Operator: $35
  Sound Engineer: $20
  Editor: $30
  """)

planning_notes = cleandoc("""
 Phases of Production:
   Writing:
   The Screenwriter will write the script.
   They need 72 hours to do so.


   Pre-Production:
   The Director needs time to analyze the script.
   They will work on it for 36 hours.
   The Camera Operator will join the director for 24 hours of planning.


   Production Phase 1
   The first three days of filming will require the director, 4 actors, the camera operator, and the sound engineer


   Production Phase 2
   The next three days of filming will require the director, 8 actors, the camera operator, and the sound engineer


   Post-Production
   The editor will take 64 hours to edit the film.
   The director will work with the editor for 24 hours during this phase.
""")

In [4]:
tasks = [
    """What is the cost of each phase of production?
    If days are mentioned, assume an 8 hour work day.""",

    """How many days will each phase require? Assume an
    8 hour work day. If multiple people are working in parallel,
    do not add those times together, but only use the longest time.
    Also include a count of the total number of days of the entire
    project.""",

    """Prepare a text schedule for all phases of the film starting
    on Feb 3, 2025. The whole crew should be off Saturdays
    and Sundays."""
]

In [5]:
prompt_template = cleandoc("""
  <instructions>
  Prepare a document to fulfill the task based on the context provided.
  </instructions>
<task>
  {task}
  </task>
<context>
  {context}
  </context>
  """)

### Set Model

In [6]:
llm_pro = GenerativeModel(
  "gemini-2.5-pro-preview-05-06",
  generation_config={
      "temperature": 0,
  },
)

llm_flash = GenerativeModel(
  "gemini-2.0-flash-001",
  generation_config={
      "temperature": 0,
  },
)

Combine hourly_rates and planning_notes (with a pair of line breaks as a separator) to form a context chunk.

In [7]:
context = hourly_rates + "\n\n" + planning_notes

Using the prompt template and the context, generate a response to the second task (tasks[1]) for each model (llm_pro and llm_flash).

In [10]:
prompt = prompt_template.format(task=tasks[1], context=context)

response_pro = llm_pro.generate_content(prompt)
response_flash = llm_flash.generate_content(prompt)

Use the Markdown() class imported from IPython.display to wrap the response text to render Gemini's responses, which are often formatted as Markdown strings.

In [11]:
display(Markdown("# Gemini Pro Response\n\n" + response_pro.text))
display(Markdown("# Gemini Flash Response\n\n" + response_flash.text))

# Gemini Pro Response

Okay, let's break down the project timeline phase by phase, assuming an 8-hour workday.

**Phase Durations:**

1.  **Writing:**
    *   Screenwriter: 72 hours
    *   Days: 72 hours / 8 hours/day = **9 days**

2.  **Pre-Production:**
    *   Director: 36 hours
    *   Camera Operator (with Director): 24 hours
    *   Since these activities happen in parallel (or the Camera Operator's time is within the Director's), we take the longest duration.
    *   Longest duration: 36 hours
    *   Days: 36 hours / 8 hours/day = **4.5 days**

3.  **Production Phase 1:**
    *   The context explicitly states: "The first three days of filming..."
    *   Days: **3 days**

4.  **Production Phase 2:**
    *   The context explicitly states: "The next three days of filming..."
    *   Days: **3 days**

5.  **Post-Production:**
    *   Editor: 64 hours
    *   Director (with Editor): 24 hours
    *   Since these activities happen in parallel (the Director works *with* the Editor), we take the longest duration.
    *   Longest duration: 64 hours
    *   Days: 64 hours / 8 hours/day = **8 days**

**Summary of Days per Phase:**

*   **Writing:** 9 days
*   **Pre-Production:** 4.5 days
*   **Production Phase 1:** 3 days
*   **Production Phase 2:** 3 days
*   **Post-Production:** 8 days

**Total Number of Days for the Entire Project:**

Total Days = 9 (Writing) + 4.5 (Pre-Production) + 3 (Production 1) + 3 (Production 2) + 8 (Post-Production)
Total Days = **27.5 days**

# Gemini Flash Response

**Project Timeline Breakdown**

Here's a breakdown of the project timeline, assuming an 8-hour workday:

**Phase Breakdown:**

*   **Writing:**
    *   Screenwriter: 72 hours
    *   Days Required: 72 hours / 8 hours/day = 9 days

*   **Pre-Production:**
    *   Director: 36 hours
    *   Camera Operator: 24 hours
    *   Since the director and camera operator are working in parallel, we take the longest time.
    *   Days Required: 36 hours / 8 hours/day = 4.5 days

*   **Production Phase 1:**
    *   Director, 4 Actors, Camera Operator, Sound Engineer: 3 days
    *   Days Required: 3 days

*   **Production Phase 2:**
    *   Director, 8 Actors, Camera Operator, Sound Engineer: 3 days
    *   Days Required: 3 days

*   **Post-Production:**
    *   Editor: 64 hours
    *   Director: 24 hours
    *   Since the editor and director are working in parallel, we take the longest time.
    *   Days Required: 64 hours / 8 hours/day = 8 days

**Total Project Days:**

9 days (Writing) + 4.5 days (Pre-Production) + 3 days (Production Phase 1) + 3 days (Production Phase 2) + 8 days (Post-Production) = **27.5 days**


# Task 3. Prepare the Evaluation Dataset and EvalTask

Prepare Evaluation DataFrame

In [22]:
# Prepare evaluation data
eval_dataset = pd.DataFrame({
    "input": [prompt],  # the formatted prompt from earlier
    "baseline_output": [response_pro.text],  # Response A (Pro)
    "output": [response_flash.text],         # Response B (Flash)
})

Define Pairwise Metric & EvalTask

In [27]:
eval_task = EvalTask(
    dataset=eval_dataset,
    metrics=[
        MetricPromptTemplateExamples.Pairwise.QUESTION_ANSWERING_QUALITY
    ],
    experiment="indie-film-planning"
)

In [24]:
print(MetricPromptTemplateExamples.get_prompt_template('groundedness'))


# Instruction
You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models.
We will provide you with the user input and an AI-generated response.
You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the criteria provided in the Evaluation section below.
You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step by step explanations for your rating, and only choose ratings from the Rating Rubric.


# Evaluation
## Metric Definition
You will be assessing groundedness, which measures the ability to provide or reference information included only in the user prompt.

## Criteria
Groundedness: The response contains information included only in the user prompt. The response does not reference any outside information.

## Rating Rubric
1: (Fully grounded). All aspects of the response are attributable to the context.
0: (Not fully grounde

In [26]:
from vertexai.evaluation import run_eval_task

ImportError: cannot import name 'run_eval_task' from 'vertexai.evaluation' (/usr/local/lib/python3.11/dist-packages/vertexai/evaluation/__init__.py)

In [25]:
eval_response = eval_task()

TypeError: 'EvalTask' object is not callable

In [29]:
eval_task.summary_table
eval_task.metrics_table
eval_task.metrics_table["preferred_response"]
eval_task.metrics_table["explanation"]

AttributeError: 'EvalTask' object has no attribute 'summary_table'

In [32]:
from vertexai.evaluation.evaluation import EvaluationRunner

runner = EvaluationRunner()
eval_response = runner.evaluate(eval_task)

ModuleNotFoundError: No module named 'vertexai.evaluation.evaluation'