In [None]:
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Get Started with Vertex AI Prompt Optimizer

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/prompts/prompt_optimizer/get_started_with_vertex_ai_prompt_optimizer.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fprompts%2Fprompt_optimizer%2Fget_started_with_vertex_ai_prompt_optimizer.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/prompts/prompt_optimizer/get_started_with_vertex_ai_prompt_optimizer.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/prompts/prompt_optimizer/get_started_with_vertex_ai_prompt_optimizer.ipynb">
      <img width="32px" src="https://www.svgrepo.com/download/217753/github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/prompts/prompt_optimizer/get_started_with_vertex_ai_prompt_optimizer.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/prompts/prompt_optimizer/get_started_with_vertex_ai_prompt_optimizer.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/prompts/prompt_optimizer/get_started_with_vertex_ai_prompt_optimizer.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/prompts/prompt_optimizer/get_started_with_vertex_ai_prompt_optimizer.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/prompts/prompt_optimizer/get_started_with_vertex_ai_prompt_optimizer.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>

| Author(s) |
| --- |
| [Ivan Nardini](https://github.com/inardini) |

## Get Started with Vertex AI Prompt Optimizer

When developing with large language models, crafting the perfect prompt—a process known as prompt engineering—is both an art and a science. It can be time-consuming and challenging to write prompts that consistently produce the desired results. Furthermore, as new and improved models are released, prompts that worked well before may need to be updated.

To address these challenges, Vertex AI offers the **Prompt Optimizer**, a prompt optimization tool to help you refine and enhance your prompts automatically. This notebook serves as a comprehensive guide to both of its  approaches: the **Zero-Shot Optimizer** and the **Data-Driven Optimizer**.

### The two approaches to prompt optimization

#### 1\. Zero-Shot Optimizer

This is your go-to tool for rapid prompt refinement and generation *without* needing an evaluation dataset.

  * **Generate from Scratch**: Simply describe a task in plain language, and it will generate a complete, well-structured system instruction for you.
  * **Refine Existing Prompts**: Provide an existing prompt, and it will rewrite it based on established best practices for clarity, structure, and effectiveness.

#### 2\. Data-Driven Optimizer

This tool performs a deep, performance-based optimization that uses your data to measure success.

  * **Tune for Performance**: You provide a dataset of sample inputs and expected outputs, and it systematically tests and rewrites your system instructions to find the version that scores highest on the evaluation metrics you define.
  * **Task-Specific**: It's the ideal choice when you want to fine-tune a prompt for a specific task and have data to prove what "better" looks like.

In this tutorial, we will walk through both methods. First, we'll explore the **Zero-Shot Optimizer** for quick, data-free improvements. Then, we'll dive deep into the **Data-Driven Optimizer**, learning how to leverage a dataset to achieve the best possible performance for a specific task.


## Get started

Before we can start optimizing, we need to set up our Python environment and configure our Google Cloud project.


### Install required packages

This command installs the necessary Python libraries.


In [None]:
%pip install "google-cloud-aiplatform>=1.108.0" "pydantic" "etils" "protobuf==4.25.3" "gradio" --force-reinstall --quiet

### Authenticate your notebook environment (Colab only)

If you are running this notebook in Google Colab, this cell handles authentication, allowing the notebook to securely access your Google Cloud resources.

In [None]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information

Here, we define essential variables for our Google Cloud project. The Prompt Optimizer job will run within a Google Cloud project. You need to [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com) and use the specified Cloud Storage bucket to read input data and write results.

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [None]:
# Use the environment variable if the user doesn't provide Project ID.
import os

PROJECT_ID = "[your-project-id]"  # @param {type: "string", placeholder: "[your-project-id]", isTemplate: true}
if not PROJECT_ID or PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

PROJECT_NUMBER = !gcloud projects describe {PROJECT_ID} --format="get(projectNumber)"[0]
PROJECT_NUMBER = PROJECT_NUMBER[0]

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "global")

BUCKET_NAME = "[your-bucket-name]"  # @param {type: "string", placeholder: "[your-bucket-name]", isTemplate: true}
BUCKET_URI = f"gs://{BUCKET_NAME}" 

! gsutil mb -l {LOCATION} -p {PROJECT_ID} {BUCKET_URI}

import vertexai

client = vertexai.Client(project=PROJECT_ID, location=LOCATION)

### Service account and permissions

The Prompt Optimizer runs as a backend job that needs permission to perform actions on your behalf. We grant the necessary IAM roles to the default Compute Engine service account, which the job uses to operate.

  * `Vertex AI User`: Allows the job to call Vertex AI models.
  * `Storage Object Admin`: Allows the job to read your dataset from and write results to your GCS bucket.
  * `Artifact Registry Reader`: Allows the job to download necessary components.

[Check out the documentation](https://cloud.google.com/iam/docs/manage-access-service-accounts#iam-view-access-sa-gcloud) to learn how to grant those permissions to a single service account.

In [None]:
SERVICE_ACCOUNT = f"{PROJECT_NUMBER}-compute@developer.gserviceaccount.com"

for role in ['aiplatform.user', 'storage.objectAdmin', 'artifactregistry.reader']:

    ! gcloud projects add-iam-policy-binding {PROJECT_ID} \
      --member=serviceAccount:{SERVICE_ACCOUNT} \
      --role=roles/{role} --condition=None

### Import libraries

In [None]:
import io
import json
import re
from typing import Any, Dict, List, Tuple, Optional
from pydantic import BaseModel, Field
from etils import epath
import pandas as pd
from google.cloud import storage
import gradio as gr
import logging
logging.basicConfig(level=logging.INFO, force=True)
from IPython.display import display, Markdown

### Helpers

In [None]:
# @title
def get_best_vapo_results(
    base_path: str,
    metric_name: Optional[str] = None
) -> Tuple[str, List[str]]:
    """Get the best system instruction and demonstrations across all VAPO runs."""
    # Find all valid runs
    required_files = ["eval_results.json", "templates.json"]
    runs = find_directories_with_files(base_path, required_files)

    if not runs:
        raise ValueError(f"No valid runs found in {base_path}")

    best_score = float('-inf')
    best_instruction = ""
    best_demonstrations = []

    for run_path in runs:
        try:
            # Check main templates.json first
            templates_path = f"{run_path}/templates.json"
            with epath.Path(templates_path).open("r") as f:
                templates_data = json.load(f)

            if templates_data:
                df = pd.json_normalize(templates_data)

                # Find metric column
                metric_columns = [col for col in df.columns if "metric" in col and "mean" in col]
                if metric_columns:
                    # Select appropriate metric
                    if metric_name:
                        metric_col = next((col for col in metric_columns if metric_name in col), None)
                    else:
                        composite_cols = [col for col in metric_columns if "composite_metric" in col]
                        metric_col = composite_cols[0] if composite_cols else metric_columns[0]

                    if metric_col:
                        best_idx = df[metric_col].argmax()
                        score = float(df.iloc[best_idx][metric_col])

                        if score > best_score:
                            best_score = score
                            best_row = df.iloc[best_idx]

                            # Extract instruction if present
                            if 'prompt' in best_row or 'instruction' in best_row:
                                instruction = best_row.get('prompt', best_row.get('instruction', ''))
                                if instruction:
                                    instruction = instruction.replace("store('answer', llm())", "{{llm()}}")
                                    best_instruction = instruction

                            # Extract demonstrations if present
                            if 'demonstrations' in best_row or 'demo_set' in best_row:
                                demos = best_row.get('demonstrations', best_row.get('demo_set', []))
                                best_demonstrations = format_demonstrations(demos)

            # Check instruction-specific optimization
            instruction_path = f"{run_path}/instruction/templates.json"
            try:
                with epath.Path(instruction_path).open("r") as f:
                    instruction_data = json.load(f)

                if instruction_data:
                    inst_df = pd.json_normalize(instruction_data)
                    metric_columns = [col for col in inst_df.columns if "metric" in col and "mean" in col]

                    if metric_columns:
                        if metric_name:
                            metric_col = next((col for col in metric_columns if metric_name in col), None)
                        else:
                            composite_cols = [col for col in metric_columns if "composite_metric" in col]
                            metric_col = composite_cols[0] if composite_cols else metric_columns[0]

                        if metric_col and metric_col in inst_df.columns:
                            inst_best_idx = inst_df[metric_col].argmax()
                            inst_score = float(inst_df.iloc[inst_best_idx][metric_col])

                            if inst_score > best_score:
                                best_score = inst_score
                                best_row = inst_df.iloc[inst_best_idx]

                                instruction = best_row.get('prompt', best_row.get('instruction', ''))
                                if instruction:
                                    instruction = instruction.replace("store('answer', llm())", "{{llm()}}")
                                    best_instruction = instruction
                                # In instruction-only mode, there might not be demonstrations
                                if 'demonstrations' not in best_row and 'demo_set' not in best_row:
                                    best_demonstrations = []
            except FileNotFoundError:
                pass

            # Check demonstration-specific optimization
            demo_path = f"{run_path}/demonstration/templates.json"
            try:
                with epath.Path(demo_path).open("r") as f:
                    demo_data = json.load(f)

                if demo_data:
                    demo_df = pd.json_normalize(demo_data)
                    metric_columns = [col for col in demo_df.columns if "metric" in col and "mean" in col]

                    if metric_columns:
                        if metric_name:
                            metric_col = next((col for col in metric_columns if metric_name in col), None)
                        else:
                            composite_cols = [col for col in metric_columns if "composite_metric" in col]
                            metric_col = composite_cols[0] if composite_cols else metric_columns[0]

                        if metric_col and metric_col in demo_df.columns:
                            demo_best_idx = demo_df[metric_col].argmax()
                            demo_score = float(demo_df.iloc[demo_best_idx][metric_col])

                            if demo_score > best_score:
                                best_score = demo_score
                                best_row = demo_df.iloc[demo_best_idx]

                                demos = best_row.get('demonstrations', best_row.get('demo_set', []))
                                best_demonstrations = format_demonstrations(demos)
                                # In demo-only mode, there might not be an instruction
                                if 'prompt' not in best_row and 'instruction' not in best_row:
                                    best_instruction = ""
                                else:
                                    instruction = best_row.get('prompt', best_row.get('instruction', ''))
                                    if instruction:
                                        instruction = instruction.replace("store('answer', llm())", "{{llm()}}")
                                        best_instruction = instruction
            except:
                pass

        except Exception as e:
            print(f"Error processing run {run_path}: {e}")
            continue

    if best_score == float('-inf'):
        raise ValueError("Could not find any valid results")

    return best_instruction, best_demonstrations

def format_demonstrations(demos: any) -> List[str]:
    """Format demonstrations into list of strings."""
    if isinstance(demos, str):
        try:
            demos = json.loads(demos)
        except:
            return []

    if not isinstance(demos, list):
        return []

    formatted_demos = []
    for demo in demos:
        if isinstance(demo, dict):
            # Format dict as "key: value" pairs
            demo_str = "\n".join([f"{k}: {v}" for k, v in demo.items()])
            formatted_demos.append(demo_str)
        else:
            formatted_demos.append(str(demo))

    return formatted_demos

def split_gcs_path(gcs_path: str) -> tuple[str, str]:
    """Split GCS path into bucket name and prefix."""
    if not gcs_path.startswith("gs://"):
        raise ValueError(f"Invalid GCS path. Must start with gs://")

    path = gcs_path[len("gs://"):]
    parts = path.split("/", 1)
    return parts[0], parts[1] if len(parts) > 1 else ""


def list_gcs_objects(gcs_path: str) -> List[str]:
    """List all objects under given GCS path."""
    bucket_name, prefix = split_gcs_path(gcs_path)

    client = storage.Client()
    bucket = client.bucket(bucket_name)
    blobs = bucket.list_blobs(prefix=prefix)

    return [blob.name for blob in blobs]


def find_directories_with_files(base_path: str, required_files: List[str]) -> List[str]:
    """Find directories containing all required files."""
    bucket_name, prefix = split_gcs_path(base_path)
    all_paths = list_gcs_objects(base_path)

    # Group files by directory
    directories = {}
    for path in all_paths:
        dir_path = "/".join(path.split("/")[:-1])
        filename = path.split("/")[-1]

        if dir_path not in directories:
            directories[dir_path] = set()
        directories[dir_path].add(filename)

    # Find directories with all required files
    matching_dirs = []
    for dir_path, files in directories.items():
        if all(req_file in files for req_file in required_files):
            matching_dirs.append(f"gs://{bucket_name}/{dir_path}")

    return matching_dirs


class VAPOResultsViewer:
    """Gradio-based viewer for VAPO optimization results."""

    def __init__(self):
        self.base_path = None
        self.runs = []
        self.templates = []
        self.eval_results = []
        self.current_run = None

    def clear_all(self) -> Tuple[str, gr.Dropdown, gr.Dropdown, pd.DataFrame, pd.DataFrame]:
        """Clear all data and reset the interface."""
        self.base_path = None
        self.runs = []
        self.templates = []
        self.eval_results = []
        self.current_run = None

        return (
            "",  # Clear base path input
            gr.Dropdown(choices=[], value=None),  # Clear run dropdown
            gr.Dropdown(choices=[], value=None),  # Clear template dropdown
            pd.DataFrame(),  # Clear template display
            pd.DataFrame()   # Clear eval display
        )

    def load_runs(self, base_path: str) -> gr.Dropdown:
        """Load available runs from the base path."""
        if not base_path:
            return gr.Dropdown(choices=[], value=None)

        try:
            self.base_path = base_path
            required_files = ["eval_results.json", "templates.json"]
            self.runs = find_directories_with_files(base_path, required_files)

            if not self.runs:
                return gr.Dropdown(choices=[], value=None)

            return gr.Dropdown(choices=self.runs, value=self.runs[0])

        except Exception as e:
            print(f"Error loading runs: {e}")
            return gr.Dropdown(choices=[], value=None)

    def load_run_data(self, run_path: str) -> Tuple[gr.Dropdown, pd.DataFrame, pd.DataFrame]:
        """Load data for a specific run."""
        if not run_path:
            return gr.Dropdown(choices=[], value=None), pd.DataFrame(), pd.DataFrame()

        try:
            self.current_run = run_path

            # Load templates
            templates_path = f"{run_path}/templates.json"
            with epath.Path(templates_path).open("r") as f:
                templates_data = json.load(f)

            # Load evaluation results
            eval_path = f"{run_path}/eval_results.json"
            with epath.Path(eval_path).open("r") as f:
                eval_data = json.load(f)

            # Process data
            self.templates = [pd.json_normalize(t) for t in templates_data]
            self.eval_results = [self._process_eval_result(r) for r in eval_data]

            # Handle potential mismatch
            if len(self.templates) == len(self.eval_results) + 1:
                self.templates = self.templates[1:]
            elif len(self.templates) != len(self.eval_results):
                raise ValueError(
                    f"Mismatch: {len(self.templates)} templates vs "
                    f"{len(self.eval_results)} results"
                )

            # Create template options
            template_options = self._create_template_options()

            # Load first template by default
            if template_options:
                template_df, eval_df = self._get_template_data(0)
                return (
                    gr.Dropdown(choices=template_options, value=template_options[0]),
                    template_df,
                    eval_df
                )

            return gr.Dropdown(choices=[], value=None), pd.DataFrame(), pd.DataFrame()

        except Exception as e:
            print(f"Error loading run data: {e}")
            return gr.Dropdown(choices=[], value=None), pd.DataFrame(), pd.DataFrame()

    def _process_eval_result(self, result: Dict[str, Any]) -> pd.DataFrame:
        """Process evaluation result for display."""
        df = pd.read_json(io.StringIO(result["metrics_table"]))

        # Remove potentially confusing columns
        columns_to_drop = [
            col for col in df.columns
            if any(term in col for term in ["confidence", "raw_eval_resp", "instruction", "context"])
        ]

        return df.drop(columns=columns_to_drop, errors="ignore")

    def _create_template_options(self) -> List[str]:
        """Create dropdown options for templates."""
        options = []

        for i, template_df in enumerate(self.templates):
            # Extract metrics for display
            metrics = []
            for col in template_df.columns:
                if "metric" in col and "mean" in col:
                    value = template_df[col].iloc[0]
                    metric_name = self._extract_metric_name(col)
                    metrics.append(f"{metric_name}: {value:.3f}")

            metrics_str = " | ".join(metrics) if metrics else "No metrics"
            options.append(f"Template {i} - {metrics_str}")

        return options

    def _extract_metric_name(self, column: str) -> str:
        """Extract clean metric name from column."""
        match = re.search(r"\.(\w+)/", column)
        if match:
            return match.group(1)

        parts = column.split(".")
        return parts[-1].split("/")[0] if parts else column

    def _get_template_data(self, index: int) -> Tuple[pd.DataFrame, pd.DataFrame]:
        """Get template and evaluation data for a specific index."""
        if 0 <= index < len(self.templates):
            # Transpose template data for better display
            template_df = self.templates[index].T.reset_index()
            template_df.columns = ["Field", "Value"]

            eval_df = self.eval_results[index]
            return template_df, eval_df

        return pd.DataFrame(), pd.DataFrame()

    def display_template(self, template_selection: str) -> Tuple[pd.DataFrame, pd.DataFrame]:
        """Display selected template and evaluation results."""
        if not template_selection:
            return pd.DataFrame(), pd.DataFrame()

        try:
            # Extract index from selection
            index = int(template_selection.split()[1])
            return self._get_template_data(index)
        except:
            return pd.DataFrame(), pd.DataFrame()

    def create_interface(self) -> gr.Blocks:
        """Create the Gradio interface."""
        with gr.Blocks(title="VAPO Results Viewer", theme=gr.themes.Soft()) as interface:
            gr.Markdown("# VAPO Results Viewer")
            gr.Markdown("View and analyze Vertex AI Prompt Optimizer (VAPO) results")

            with gr.Row():
                with gr.Column(scale=3):
                    base_path_input = gr.Textbox(
                        label="GCS Base Path",
                        placeholder="gs://your-bucket/vapo-results",
                        info="Enter the base GCS path containing VAPO runs"
                    )
                with gr.Column(scale=1):
                    with gr.Row():
                        load_btn = gr.Button("Load Runs", variant="primary")
                        clear_btn = gr.Button("Clear", variant="secondary")

            with gr.Row():
                run_dropdown = gr.Dropdown(
                    label="Select Run",
                    choices=[],
                    interactive=True
                )
                template_dropdown = gr.Dropdown(
                    label="Select Template",
                    choices=[],
                    interactive=True
                )

            with gr.Tabs():
                with gr.Tab("Template Details"):
                    template_display = gr.DataFrame(
                        label="Template Information",
                        wrap=True,
                        interactive=False
                    )

                with gr.Tab("Evaluation Results"):
                    eval_display = gr.DataFrame(
                        label="Evaluation Metrics",
                        wrap=True,
                        interactive=False
                    )

            # Event handlers
            load_btn.click(
                fn=self.load_runs,
                inputs=[base_path_input],
                outputs=[run_dropdown]
            )

            clear_btn.click(
                fn=self.clear_all,
                inputs=[],
                outputs=[base_path_input, run_dropdown, template_dropdown, template_display, eval_display]
            )

            run_dropdown.change(
                fn=self.load_run_data,
                inputs=[run_dropdown],
                outputs=[template_dropdown, template_display, eval_display]
            )

            template_dropdown.change(
                fn=self.display_template,
                inputs=[template_dropdown],
                outputs=[template_display, eval_display]
            )

        return interface


def launch_app(share: bool = False, server_port: int = 7860, server_name: str = "0.0.0.0"):
    """Launch the Gradio application.

    Args:
        share: Whether to create a public share link
        server_port: Port to run the server on
        server_name: Server name/IP to bind to
    """
    viewer = VAPOResultsViewer()
    interface = viewer.create_interface()
    interface.launch(
        share=share,
        server_port=server_port,
        server_name=server_name
    )

## **Part 1: Zero-Shot Optimizer**

We'll begin with the zero-shot approach. The following section will guide you through the process of optimizing your prompt without providing additional examples.

### Run a Zero-shot optimization job

To run a `Zero-shot optimization job`, you can use the `optimize_prompt` method. The service will use a research-based metaprompt to optimize your initial prompt.

In [None]:
prompt = "Generate system instructions for a question-answering assistant"
response = client.prompt_optimizer.optimize_prompt(prompt=prompt)

In [None]:
display(Markdown(response.suggested_prompt))

## **Part 2: The Data-Driven Optimizer**

The following sections will guide you through setting up your environment, preparing your data, and running an optimization job to find a better prompt using the data-driven optimizer

### The optimization configuration

This is the most critical part of setting up the optimization job.

The `OptimizationConfig` class, built using `pydantic`, acts as a structured and validated blueprint for our optimization task. It ensures all necessary parameters are defined before we submit the job.


In [None]:
class OptimizationConfig(BaseModel):
    """
    A comprehensive prompt optimization configuration model.
    """
    # Basic Configuration
    system_instruction: str = Field(..., description="System instructions for the target model. String. This field is required.")
    prompt_template: str = Field(..., description="Template for prompts. String. This field is required.")
    target_model: str = Field("gemini-2.5-flash", description='Target model for optimization. Supported models: "gemini-2.5-flash", "gemini-2.5-pro"')
    thinking_budget: int = Field(-1, description="Thinking budget for thinking models. -1 means auto/no thinking. Integer.")
    optimization_mode: str = Field("instruction", description='Optimization mode. Supported modes: "instruction", "demonstration", "instruction_and_demo".')
    project: str = Field(..., description="Google Cloud project ID. This field is required.")

    # Evaluation Settings
    # custom_metric_name="custom_engagement_personalization_score",  # Metric name, as defined by the key that corresponds in the dictionary returned from Cloud function. String.
    # custom_metric_cloud_function_name="custom_engagement_personalization_metric",  # Cloud Run function name you previously deployed. String.
    eval_metrics_types: List[str] = Field(
        description='List of evaluation metrics. E.g., "bleu", "rouge_l", "safety".'
    )
    eval_metrics_weights: List[float] = Field(
        description="Weights for evaluation metrics. Length must match eval_metrics_types and should sum to 1."
    )
    aggregation_type: str = Field("weighted_sum", description='Aggregation type for metrics. Supported: "weighted_sum", "weighted_average".')

    # Data and I/O Paths
    input_data_path: str = Field(..., description="Cloud Storage URI to input optimization data. This field is required.")
    output_path: str = Field(..., description="Cloud Storage URI to save optimization results. This field is required.")

    # (Optional) Advanced Configuration
    num_steps: int = Field(10, ge=10, le=20, description="Number of iterations in instruction optimization mode. Integer between 10 and 20.")
    num_demo_set_candidates: int = Field(10, ge=10, le=30, description="Number of demonstrations evaluated. Integer between 10 and 30.")
    demo_set_size: int = Field(3, ge=3, le=6, description="Number of demonstrations generated per prompt. Integer between 3 and 6.")


    # (Optional) Model Locations and QPS
    target_model_location: str = Field("us-central1", description="Location of the target model. Default us-central1.")
    target_model_qps: int = Field(1, ge=1, description="QPS for the target model. Integer >= 1, based on your quota.")
    optimizer_model_location: str = Field("us-central1", description="Location of the optimizer model. Default us-central1.")
    optimizer_model_qps: int = Field(1, ge=1, description="QPS for the optimization model. Integer >= 1, based on your quota.")
    source_model: str = Field("", description="Google model previously used with these prompts. Not needed if providing a target column.")
    source_model_location: str = Field("us-central1", description="Location of the source model. Default us-central1.")
    source_model_qps: Optional[int] = Field(None, ge=1, description="Optional QPS for the source model. Integer >= 1.")
    eval_qps: int = Field(1, ge=1, description="QPS for the eval model. Integer >= 1, based on your quota.")

    # (Optional) Response, Language, and Data Handling
    response_mime_type: str = Field("text/plain", description="MIME response type from the target model. E.g., 'text/plain', 'application/json'.")
    response_schema: str = Field("", description="The Vertex AI Controlled Generation response schema.")
    language: str = Field("English", description='Language of the system instructions. E.g., "English", "Japanese".')
    placeholder_to_content: Dict[str, Any] = Field({}, description="Dictionary of placeholders to replace parameters in the system instruction.")
    data_limit: int = Field(10, ge=5, le=100, description="Amount of data used for validation. Integer between 5 and 100.")
    translation_source_field_name: str = Field("", description="Field name for source text if using translation metrics (Comet, MetricX).")

### Preparing the Data and Running the Job

#### The dataset

The optimizer's performance depends heavily on the quality of your sample data.

For this example, we use a question-answering dataset where each row contains a `question`, context (`ctx`), and a ground-truth `target` answer. The `{target}` variable is crucial for computation-based evaluation metrics like `question_answering_correctness`.


In [None]:
input_data_path = "gs://github-repo/prompts/prompt_optimizer/rag_qa_dataset.jsonl"
prompt_optimization_df = pd.read_json(input_data_path, lines=True)
prompt_optimization_df.head()

#### Set optimization configuration

Now, we'll create a dictionary with our specific settings and use it to instantiate our `OptimizationConfig` class. This populates our configuration blueprint.


In [None]:
output_path = f"{BUCKET_URI}/optimization_results/"

vapo_data_settings = {
    "system_instruction": "You are an helpful assistant. Given a question with context, provide the correct answer to the question.",
    "prompt_template":  "Some examples of correct answer to a question are:\nQuestion: {question}\nContext: {ctx}\nAnswer: {target}",
    "target_model": "gemini-2.5-flash",
    "thinking_budget": -1,
    "optimization_mode": "instruction",
    "eval_metrics_types": ["question_answering_correctness", "fluency"],
    "eval_metrics_weights": [0.8, 0.2],
    "aggregation_type": "weighted_sum",
    "input_data_path": input_data_path,
    "output_path": output_path,
    "project": PROJECT_ID,
}

vapo_data_config = OptimizationConfig(**vapo_data_settings)
vapo_data_config_json = vapo_data_config.model_dump()

#### Upload configuration to Cloud Storage

Write the Prompt Optimizer configuration to the file in your GCS bucket.


In [None]:
config_path = f'{BUCKET_URI}/config.json'

with epath.Path(config_path).open("w") as config_file:
    json.dump(vapo_data_config_json, config_file)
config_file.close()

#### Run the prompt optimization job

This is the final step. We pass the path to our configuration file and the service account to the Vertex AI client. The `optimize` method starts the custom job on the Vertex AI backend. We set `wait_for_completion` to `True` so the script will pause until the job is finished.


In [None]:
vapo_data_run_config = {
    "config_path": config_path,
    "wait_for_completion": True,
    "service_account": SERVICE_ACCOUNT
}

result = client.prompt_optimizer.optimize(method="vapo", config=vapo_data_run_config)

### Visualize results with the interactive app

The tutorial includes a helper function to launch a Gradio-based web interface. This is a great way to visually explore all the different instructions the optimizer generated and compare their evaluation scores side-by-side.

In [None]:
launch_app(share=True, server_port=7861, server_name="0.0.0.0")

### Get and use the best prompt programmatically

For use in an application, you can programmatically retrieve the top-performing instruction from the output files stored in GCS.


In [None]:
best_instruction, _ = get_best_vapo_results(output_path)
print("The optimized instruction is:" , best_instruction)