In [None]:
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Tutorial for Running Prompt Management and Evaluation

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/tools/llmevalkit/prompt-management-tutorial.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Ftools%2Fllmevalkit%2Fprompt-management-tutorial.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/tools/llmevalkit/prompt-management-tutorial.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/tools/llmevalkit/prompt-management-tutorial.ipynb">
      <img width="32px" src="https://www.svgrepo.com/download/217753/github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/tools/llmevalkit/prompt-management-tutorial.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/tools/llmevalkit/prompt-management-tutorial.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/tools/llmevalkit/prompt-management-tutorial.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/tools/llmevalkit/prompt-management-tutorial.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/tools/llmevalkit/prompt-management-tutorial.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>

| Author(s) |
| --- |
| [Mike Santoro](https://github.com/Michael-Santoro) |

##  1. Overview

This tutorial provides a comprehensive guide to prompt engineering, covering the entire lifecycle from creation to evaluation and optimization. It's broken down into the following sections:

1. **Prompt Management:** This section focuses on the core tasks of creating, editing, and managing prompts. You can: 
    - **Create new prompts:** Define the prompt's name, text, the model it's designed for, and any system instructions. 
    - **Load and edit existing prompts:** Browse a library of saved prompts, load a specific version, and make modifications.
    - **Test prompts:** Before saving, you can provide sample input and generate a response to see how the prompt performs.
    - **Versioning:** Each time you save a change to a prompt, a new version is created, allowing you to track its evolution and compare different iterations.

2. **Dataset Creation:** A crucial part of prompt engineering is having good data to test and evaluate your prompts. This section allows you to:

    - **Create new datasets:** A dataset is essentially a folder in Google Cloud Storage where you can group related files.
    - **Upload data:** You can upload files in CSV, JSON, or JSONL format to your datasets. This data will be used for evaluating your prompts.

3. **Evaluation:** Once you have a prompt and a dataset, you need to see how well the prompt performs. The evaluation section helps you with this by:

    - **Running evaluations:** You can select a prompt and a dataset and run an evaluation. This will generate responses from the model for each item in your dataset.
    - **Human-in-the-loop rating:** For a more nuanced evaluation, you can manually review the model's responses and rate them.
    - **Automated metrics:** The tutorial also supports automated evaluation metrics to get a quantitative measure of your prompt's performance.

4. **Prompt Optimization:** This section helps you automatically improve your prompts. It uses Vertex AI's prompt optimization capabilities to:

    - **Configure and launch optimization jobs:** You can set up and run a job that will take your prompt and a dataset and try to find a better-performing version of the prompt.

5. **Prompt Optimization Results:** After an optimization job has run, this section allows you to:

    - **View the results:** You can see the different prompt versions that the optimizer came up with and how they performed.
    - **Compare versions:** The results are presented in a way that makes it easy to compare the different optimized prompts and choose the best one.

6. **Prompt Records:** This is a leaderboard that shows you the evaluation results of all your different prompt versions. It helps you to:

    - **Track performance over time:** See how your prompts have improved with each new version.
    - **Compare different prompts:** You can compare the performance of different prompts for the same task.

In summary, this tutorial provides a complete and integrated environment for all your prompt engineering needs, from initial creation to sophisticated optimization and evaluation.


## 2. Before you start

### Clone the GitHub Repo

In [None]:
! git clone https://github.com/GoogleCloudPlatform/generative-ai.git

In [None]:
! gsutil cp gs://github-repo/prompts/prompt_optimizer/mathvista_dataset/mathvista_input.jsonl mathvista_input.jsonl

### Install Python Dependencies

In [None]:
% pip install -r requirements.txt

### Authenticate your notebook environment (Colab only)

Authenticate your environment on Google Colab.

In [None]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

### Alternative Authenticate

In [None]:
# fmt: off
PROJECT_ID = "[your-project-id]"  # @param {type: "string", placeholder: "[your-project-id]", isTemplate: true}
LOCATION = "[your-project-region]"  # @param{type: "string", placeholder: "[your-project-region]", isTemplate: true}
# fmt: on

! gcloud auth application-default login
! gcloud config set project {PROJECT_ID}

### Set Google Cloud project information

**TO-DO: Check these APIs**
To get started using Vertex AI, you must have an existing Google Cloud project and [enable the following APIs](https://console.cloud.google.com/flows/enableapi?apiid=cloudresourcemanager.googleapis.com,aiplatform.googleapis.com,cloudfunctions.googleapis.com,run.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [None]:
! cp src/.env.example src/.env

### Copy sample.env and Modify

- BUCKET_NAME - Pick an existing bucket or make a new one below
- PROJECT_ID
- SERVICE_ACCOUNT - Created Below


#### Create a New Bucket (Not Required if using existing)

In [None]:
# fmt: off
BUCKET_NAME = "[your-bucket-name]"  # @param {type: "string", placeholder: "[your-bucket-name]", isTemplate: true}
# fmt: on

BUCKET_URI = f"gs://{BUCKET_NAME}"


! gsutil mb -l {LOCATION} {BUCKET_URI}

#### Create a Service Account

In [None]:
PROJECT_NUMBER = !gcloud projects describe {PROJECT_ID} --format="get(projectNumber)"[0]
PROJECT_NUMBER = PROJECT_NUMBER[0]
SERVICE_ACCOUNT = f"{PROJECT_NUMBER}-compute@developer.gserviceaccount.com"

for role in ['aiplatform.user', 'storage.objectAdmin']:

    ! gcloud projects add-iam-policy-binding {PROJECT_ID} \
      --member=serviceAccount:{SERVICE_ACCOUNT} \
      --role=roles/{role} --condition=None

## 3. Run the App

In [None]:
! cd generative-ai/llmevalkit && streamlit run index.py & npx localtunnel --port 8501

Click the link and use just the external ip as the password.

📝 **Note:** You can run `wget -q -O - https://loca.lt/mytunnelpassword` to get the external ip (i.e 35.194.128.20)

📝 **Note:** If you are having issues displaying the app, clear your cache.

![image.png](assets/welcome_page.png)

## 4. Work with the App

### 1. Prompt Management

In the Prompt Name field enter:

```
math_prompt_test
```

In the Prompt Data field enter:

```
Problem: {{query}}
Image: {{image}} @@@image/jpeg
Answer: {{target}}
```

In the Model Name field enter:
```
gemini-2.0-flash-001
```

In the System Instructions field enter:
```
Solve the problem given the image.
```

Click `Save`

Copy this text for testing:

```
{"query": "Hint: Please answer the question and provide the correct option letter, e.g., A, B, C, D, at the end.\nQuestion: As shown in the figure, CD is the diameter of \u2299O, chord DE \u2225 OA, if the degree of \u2220D is 50.0, then the degree of \u2220C is ()\nChoices:\n(A) 25\u00b0\n(B) 30\u00b0\n(C) 40\u00b0\n(D) 50\u00b0", "image": "gs://github-repo/prompts/prompt_optimizer/mathvista_dataset/images/643.jpg", "target": "25\u00b0"}
```

🖱️ Click `Generate`.


### 2. Dataset Creation

Download a copy of the dataset. Then upload this file in the application.

**Dataset Name:** `mathvista`

You can preview the dataset at the bottom of the page.

In [None]:
! gsutil cp gs://github-repo/prompts/prompt_optimizer/mathvista_dataset/mathvista_input.jsonl .

### 3. Evaluation

We will now run an evaluation, prior to doing any tweaking to get a baseline.

- **Existing Dataset:** 'mathvista'
- **Dataset File:** 'mathvista_input.jsonl'
- **Number of Samples:** '100'
- **Ground Truth Column Name:** 'target'
- **Existing Prompt:** 'math_prompt_test'
- **Version:** '1'

Click Load Prompt, and Upload and Get Response... ⏰ Wait!!

Review the responses.

- **Model-Based:** 'question-answering-quality'

Launch the Eval... ⏰ Wait!!

View the Evaluation Results, and save to prompt records. This will save this initial version to the prompt records for the baseline.


### 4. Prompt Optimization

🔧 Set-Up Prompt Optimization.

- **Target Model:** 'gemini-2.0-flash-001'
- **Existing Prompt:** 'math_prompt_test'
- **Version:** '1'

🖱️ Click Load Prompt.

- **Select Existing Dataset:** 'mathvista'
- **Select the File:** 'mathvista_input.jsonl'

🖱️ Click Load Dataset.

Preview the dataset.

🖱️ Click Start Optimization.

**Note:** If Interested in viewing the progress, Navigate to https://console.cloud.google.com/vertex-ai/training/custom-jobs

⏰ Wait!! This step will take about 20-min to run.

### 5. Prompt Optimization Results

View the Optimization Results.

The last run will be shown at the top of the screen. Pick this from the dropdown menu: 

![image.png](assets/prompt_optimization_result.png)

Review the results and select the highest scoring version and copy the instruction.

### 6. Navigate Back to Prompt for New Version

Load your existing prompt from before.

📋 Paste your new instructions from the prompt optimizer, and save new version.

### 7. Run new Evaluation

Repeat step 3 with your new version.

### 8. View the Records

Navigate to the leaderboard and load the results.