### Objective

In this tutorial, you will learn how to use `Vertex AI` to tune a `Gemini 2.5 Flash` model.


This tutorial uses the following Google Cloud ML services:

- `Vertex AI`


The steps performed include:

- Prepare and load the dataset
- Load the `gemini-2.5-flash` model
- Evaluate the model before tuning
- Tune the model.
  - This will automatically create a Vertex AI endpoint and deploy the model to it
- Make a prediction using tuned model
- Evaluate the model after tuning

### Costs

This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing), [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

### Install Gen AI SDK and other required packages

The new Google Gen AI SDK provides a unified interface to Gemini through both the Gemini Developer API and the Gemini API on Vertex AI. With a few exceptions, code that runs on one platform will run on both. This means that you can prototype an application using the Developer API and then migrate the application to Vertex AI without rewriting your code.


In [None]:
%pip install --upgrade --user --quiet google-genai google-cloud-aiplatform rouge_score plotly jsonlines

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/45.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.3/45.3 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m236.7/236.7 kB[0m [31m17.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.8/9.8 MB[0m [31m133.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
[0m

## Step0: Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the cell below to authenticate your environment.

In [None]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

In [None]:
!gcloud auth application-default login --quiet

Go to the following link in your browser, and complete the sign-in prompts:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=764086051850-6qr4p6gpi6hn506pt8ejuq83di341hur.apps.googleusercontent.com&redirect_uri=https%3A%2F%2Fsdk.cloud.google.com%2Fapplicationdefaultauthcode.html&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login&state=ANwBGTrvoXTVsbFFV3ISscn6gmNsAN&prompt=consent&token_usage=remote&access_type=offline&code_challenge=fkeasPfrQMbfBddxfy5rktSJss1OJ8EecsvBTKCVzRo&code_challenge_method=S256

Once finished, enter the verification code provided in your browser: 

In [None]:
!gcloud auth login --quiet

In [None]:
!gcloud config set project golden-cove-474806-r1

In [None]:
!gcloud auth list

In [None]:
!gcloud config set account dschoi@snclab.kr

In [None]:
!gcloud auth application-default login --account=dschoi@snclab.kr --quiet

- If you are running this notebook in a local development environment:
  - Install the [Google Cloud SDK](https://cloud.google.com/sdk).
  - Obtain authentication credentials. Create local credentials by running the following command and following the oauth2 flow (read more about the command [here](https://cloud.google.com/sdk/gcloud/reference/beta/auth/application-default/login)):

    ```bash
    gcloud auth application-default login
    ```

## Step1: Import Libraries

In [None]:
# 필수 패키지 설치
!pip install google-genai google-cloud-aiplatform jsonlines plotly pandas tqdm rouge-score

In [None]:
!pip install jsonlines

In [None]:
import time

# For data handling.
# import jsonlines
import pandas as pd

# For visualization.
import plotly.graph_objects as go

# For fine tuning Gemini model.
import vertexai
from google import genai

# For extracting vertex experiment details.
from google.cloud import aiplatform
from google.cloud.aiplatform.metadata import context
from google.cloud.aiplatform.metadata import utils as metadata_utils
from google.genai import types
from plotly.subplots import make_subplots

## Step2: Set Google Cloud project information and initialize Vertex AI and Gen AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).


In [None]:
PROJECT_ID = "golden-cove-474806-r1"  # @param {type:"string"}
REGION = "us-central1"  # @param {type:"string"}

In [None]:
vertexai.init(project=PROJECT_ID, location=REGION)

client = genai.Client(vertexai=True, project=PROJECT_ID, location=REGION)

## Step3: Create Dataset in correct format

The dataset used to tune a foundation model needs to include examples that align with the task that you want the model to perform. Structure your training dataset in a text-to-text format. Each record, or row, in the dataset contains the input text (also referred to as the prompt) which is paired with its expected output from the model. Supervised tuning uses the dataset to teach the model to mimic a behavior, or task, you need by giving it hundreds of examples that illustrate that behavior.

Your dataset size depends on the task, and follows the recommendation mentioned in the `Overview` section. The more examples you provide in your dataset, the better the results.

### Dataset format

Training data should be structured within a JSONL file located at a Google Cloud Storage (GCS) URI. Each line (or row) of the JSONL file must adhere to a specific schema: It should contain a `contents` array, with objects inside defining a `role` (either "user" for user input or "model" for model output) and `parts`, containing the input data. For example, a valid data row would look like this:


```json
{
  "contents": [
    {
      "role": "user", # This indicates input content
      "parts": [
        {
          "text": "How are you?"
        }
      ]
    },
    {
      "role": "model", # This indicates target content
      "parts": [ # text only
        {
          "text": "I am good, thank you!"
        }
      ]
    }
  ] #  ... repeat "user", "model" for multi turns.
}
```

Refer to the public [documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-supervised-tuning-prepare#about-datasets) for more details.

To run a tuning job, you need to upload one or more datasets to a Cloud Storage bucket. You can either create a new Cloud Storage bucket or use an existing one to store dataset files. The region of the bucket doesn't matter, but we recommend that you use a bucket that's in the same Google Cloud project where you plan to tune your model.

### Step3 [a]: Create a Cloud Storage bucket

Create a storage bucket to store intermediate artifacts such as datasets.


In [None]:
# Provide a bucket name
BUCKET_NAME = "a11y-error-dataset"  # @param {type:"string"}
BUCKET_URI = f"gs://{BUCKET_NAME}"

Only if your bucket doesn't already exist: Run the following cell to create your Cloud Storage bucket.


In [None]:
! gsutil mb -l {REGION} -p {PROJECT_ID} {BUCKET_URI}

Creating gs://a11y-error-dataset/...


### Step3 [b]: Upload tuning data to Cloud Storage

- Data used in this notebook is present in the public Google Cloud Storage(GCS) bucket.
- It's in Gemini finetuning dataset format

#### Convert Gemini tuning dataset to Gemini 2.5 tuning dataset format

In [None]:
!pip install datasets google-cloud-storage pillow jsonlines tqdm langchain --quiet

In [None]:
# !pip install datasets google-cloud-storage pillow tqdm langchain --quiet

import io, os, hashlib, mimetypes, random, json
from pathlib import Path
from PIL import Image
from datasets import load_dataset
from google.cloud import storage
from tqdm import tqdm

# ---------------------------
# 1) 그대로 유지할 프롬프트(사용자 제공)
# ---------------------------
from langchain.prompts import PromptTemplate

user_prompt = PromptTemplate(
    template="""
    너는 접근성 평가 전문가야.

    진단할 페이지의 전체 및 오류로 의심되는 영역의 스크린샷과 오류 영역 코드를 주면,
    너는 접근성 진단 결과(검사항목/오류유형, 문제점 및 개선방안)를 도출하면 돼.

    해당 오류 영역이 위반한 검사항목/오류유형을 작성하고,
    문제점 및 개선방안은 해당 검사항목/오류유형을 준수하기 위해 사용자들에게 설명하는 설명문 혹은 코드를 작성해.

    또 위와 같은 진단 결과를 내기 전에 왜 그러한 진단이 나와야 하는지에 대해 추론해야 해.
    아래와 같은 절차를 따라 추론해.
    [추론 지침]
    1. 전체 페이지 스크린샷을 통해 해당 페이지의 목적을 파악하고,
    2. 페이지 목적을 참고할 때, 오류 영역 스크린샷에 드러난 진단 콘텐츠의 역할을 파악.
    3. 이제 오류 영역 코드까지 함께 고려할 때, 진단 결과 작성

    오류 영역 코드: {error_code}
        """,
    input_variables=["error_code"],
)

assistant_prompt = PromptTemplate(
    template="""
    <|begin_of_thought|>
    {rationale}
    <|end_of_thought|>

    <|begin_of_solution|>
    [검사항목]: {test_item}
    [오류유형]: {error_type}
    [문제점 및 개선방안_텍스트]: {text}
    [문제점 및 개선방안_코드]: {code}
    <|end_of_solution|>
    """,
    input_variables=["rationale", "test_item", "error_type", "text", "code"]
)

# ---------------------------
# 2) 환경 설정
# ---------------------------
GCS_PREFIX  = "image-tuning"      # 버킷 내 프리픽스
DATASET_ID  = "doodoo77/a11y-error-dataset-kor"
HF_SPLIT    = "train"
SEED        = 42
VAL_RATIO   = 0.1                  # 검증 분할 비율

random.seed(SEED)

# GCS client
storage_client = storage.Client(project=PROJECT_ID)
bucket = storage_client.bucket(BUCKET_NAME)

# ---------------------------
# 3) 유틸: 이미지 바이트화 & 업로드
# ---------------------------
def to_image_bytes_and_mime(img):
    """허깅페이스 이미지 타입(bytes/PIL/dict/경로)을 JPEG 바이트로 표준화"""
    if isinstance(img, bytes):
        b = img
        mime = "image/jpeg"
    elif isinstance(img, Image.Image):
        pil = img
        if pil.mode not in ("RGB", "L"):
            pil = pil.convert("RGB")
        buf = io.BytesIO()
        pil.save(buf, format="JPEG", quality=95)
        b = buf.getvalue()
        mime = "image/jpeg"
    elif isinstance(img, dict) and "bytes" in img:
        b = img["bytes"]
        mime = "image/jpeg"
    elif isinstance(img, str) and os.path.exists(img):
        with open(img, "rb") as f:
            b = f.read()
        mime, _ = mimetypes.guess_type(img)
        mime = mime or "application/octet-stream"
    else:
        raise ValueError("지원하지 않는 이미지 타입입니다.")
    return b, mime

def upload_image_to_gcs(img_bytes: bytes, rel_path: str, mime: str = "image/jpeg") -> str:
    """바이트를 GCS에 업로드하고 gs:// URI 반환"""
    blob = bucket.blob(rel_path)
    blob.upload_from_string(img_bytes, content_type=mime)
    return f"gs://{BUCKET_NAME}/{rel_path}"

# ---------------------------
# 4) 허깅페이스 데이터셋 로드
# ---------------------------
dataset = load_dataset(DATASET_ID, split=HF_SPLIT)

# ---------------------------
# 5) 한 샘플 -> Gemini 2.5(이미지 튜닝) 레코드
# ---------------------------
def build_record(sample, index):
    """
    sample 구조 가정:
      sample["images"] : 단일 또는 리스트(스크린샷들)
      sample["output"] : {"추론","검사항목","오류유형","문제점 및 개선방안_텍스트","문제점 및 개선방안_코드","문제점"}
    """
    imgs = sample.get("images", [])
    if not isinstance(imgs, list):
        imgs = [imgs]

    # user.parts: 여러 fileData + 마지막에 text 프롬프트
    user_parts = []
    for j, img in enumerate(imgs):
        img_bytes, _ = to_image_bytes_and_mime(img)
        sha = hashlib.sha1(img_bytes).hexdigest()[:12]
        rel = f"{GCS_PREFIX}/images/sample_{index:08d}_{j:02d}_{sha}.jpg"
        gs_uri = upload_image_to_gcs(img_bytes, rel, mime="image/jpeg")
        user_parts.append({
            "fileData": {"mimeType": "image/jpeg", "fileUri": gs_uri}
        })

    # user 텍스트
    user_text = user_prompt.format(error_code=sample["output"]["문제점"])
    user_parts.append({"text": user_text})

    # model 텍스트
    model_text = assistant_prompt.format(
        rationale = sample["output"]["추론"],
        test_item = sample["output"]["검사항목"],
        error_type= sample["output"]["오류유형"],
        text      = sample["output"]["문제점 및 개선방안_텍스트"],
        code      = sample["output"]["문제점 및 개선방안_코드"],
    )

    return {
        "contents": [
            {"role": "user",  "parts": user_parts},
            {"role": "model", "parts": [{"text": model_text}]}
        ]
    }

# ---------------------------
# 6) 변환 & train/val 분할
# ---------------------------
records = []
for i, sample in tqdm(enumerate(dataset), total=len(dataset)):
    try:
        rec = build_record(sample, i)
        records.append(rec)
    except Exception as e:
        print(f"[WARN] {i} 변환 실패: {e}")

random.shuffle(records)
n_total = len(records)
n_val = max(1, int(n_total * VAL_RATIO))
val_records = records[:n_val]
train_records = records[n_val:]

# ---------------------------
# 7) JSONL 저장 (표준 라이브러리 사용)
# ---------------------------
def write_jsonl(path, items):
    with open(path, "w", encoding="utf-8") as f:
        for obj in items:
            f.write(json.dumps(obj, ensure_ascii=False))
            f.write("\n")

Path("out").mkdir(exist_ok=True)
train_jsonl = "out/train_image_tuning.jsonl"
val_jsonl   = "out/valid_image_tuning.jsonl"

write_jsonl(train_jsonl, train_records)
write_jsonl(val_jsonl,   val_records)

print(f"train: {len(train_records)}, valid: {len(val_records)}")

# ---------------------------
# 8) JSONL을 GCS로 업로드
# ---------------------------
train_gcs_rel = f"{GCS_PREFIX}/train/train_image_tuning.jsonl"
val_gcs_rel   = f"{GCS_PREFIX}/valid/valid_image_tuning.jsonl"

bucket.blob(train_gcs_rel).upload_from_filename(train_jsonl, content_type="application/json")
bucket.blob(val_gcs_rel).upload_from_filename(val_jsonl,   content_type="application/json")

TRAIN_URI = f"gs://{BUCKET_NAME}/{train_gcs_rel}"
VAL_URI   = f"gs://{BUCKET_NAME}/{val_gcs_rel}"

print("GCS JSONL 경로")
print("  train:", TRAIN_URI)
print("  valid:", VAL_URI)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/614 [00:00<?, ?B/s]

data/train-00000-of-00002.parquet:   0%|          | 0.00/352M [00:00<?, ?B/s]

data/train-00001-of-00002.parquet:   0%|          | 0.00/220M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/767 [00:00<?, ? examples/s]

100%|██████████| 767/767 [23:02<00:00,  1.80s/it]


train: 691, valid: 76
GCS JSONL 경로
  train: gs://a11y-error-dataset/image-tuning/train/train_image_tuning.jsonl
  valid: gs://a11y-error-dataset/image-tuning/valid/valid_image_tuning.jsonl


## Step4: Initailize model

The following Gemini text model support supervised tuning:

* `gemini-2.5-flash`

In [None]:
base_model = "gemini-2.5-pro"

## Step7: Fine-tune the Model

 - `source_model`: Specifies the base Gemini model version you want to fine-tune.
 - `train_dataset`: Path to your training data in JSONL format.

  *Optional parameters*
 - `validation_dataset`: If provided, this data is used to evaluate the model during tuning.
 - `tuned_model_display_name`: Display name for the tuned model.
 - `epochs`: The number of training epochs to run.
 - `learning_rate_multiplier`: A value to scale the learning rate during training.
 - `adapter_size` : Gemini 2.5 Flash supports Adapter length [1, 2, 4, 8], default value is 4.

**Note: The default hyperparameter settings are optimized for optimal performance based on rigorous testing and are recommended for initial use. Users may customize these parameters to address specific performance requirements.**

In [None]:
tuned_model_display_name = "gemini-2.5-pro"  # @param {type:"string"}

In [None]:
!gcloud auth application-default set-quota-project golden-cove-474806-r1


Credentials saved to file: [/content/.config/application_default_credentials.json]

These credentials will be used by any library that requests Application Default Credentials (ADC).

Quota project "golden-cove-474806-r1" was added to ADC which can be used by Google client libraries for billing and quota. Note that some services may still bill the project owning the resource.


In [None]:
!gcloud beta billing projects describe golden-cove-474806-r1

billingAccountName: billingAccounts/015528-8BD5C2-18F2DD
billingEnabled: true
name: projects/golden-cove-474806-r1/billingInfo
projectId: golden-cove-474806-r1


In [None]:
!gcloud services enable aiplatform.googleapis.com storage.googleapis.com compute.googleapis.com \
  --project=golden-cove-474806-r1

Operation "operations/acf.p2-762759016971-eb0ac1a0-663a-4b26-baf6-586799f29c09" finished successfully.


In [None]:
training_dataset = {
    "gcs_uri": f"{BUCKET_URI}/image-tuning/train/train_image_tuning.jsonl",
}

validation_dataset = types.TuningValidationDataset(
    gcs_uri=f"{BUCKET_URI}/image-tuning/valid/valid_image_tuning.jsonl"
)

client = genai.Client(vertexai=True, project="golden-cove-474806-r1", location=REGION)

# Tune a model using `tune` method.
sft_tuning_job = client.tunings.tune(
    base_model=base_model,
    training_dataset=training_dataset,
    config=types.CreateTuningJobConfig(
        tuned_model_display_name=tuned_model_display_name,
        validation_dataset=validation_dataset,
    ),
)

In [None]:
# Get the tuning job info.
tuning_job = client.tunings.get(name=sft_tuning_job.name)
tuning_job

**Note: Tuning time depends on several factors, such as training data size, number of epochs, learning rate multiplier, etc.**

<div class="alert alert-block alert-warning">
<b>⚠️ It will take ~15 mins for the model tuning job to complete on the provided dataset and set configurations/hyperparameters. ⚠️</b>
</div>

### [Optional] Cancel Tuning Job

- Uncomment the below code to cancel the tuning job

In [None]:
## Cancel the tuning job
# tuning_job = client.tunings.cancel(name=sft_tuning_job.name)
# tuning_job

### Status Check

In [None]:
%%time
# Wait for job completion

running_states = [
    "JOB_STATE_PENDING",
    "JOB_STATE_RUNNING",
]

while tuning_job.state.name in running_states:
    print(".", end="")
    tuning_job = client.tunings.get(name=tuning_job.name)
    time.sleep(10)
print()

In [None]:
tuned_model = tuning_job.tuned_model.endpoint
experiment_name = tuning_job.experiment

print("Tuned model experiment", experiment_name)
print("Tuned model endpoint resource name:", tuned_model)

### Step7 [a]: Tuning and evaluation metrics

#### Model tuning metrics

- `/train_total_loss`: Loss for the tuning dataset at a training step.
- `/train_fraction_of_correct_next_step_preds`: The token accuracy at a training step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the tuning dataset.
- `/train_num_predictions`: Number of predicted tokens at a training step

#### Model evaluation metrics:

- `/eval_total_loss`: Loss for the evaluation dataset at an evaluation step.
- `/eval_fraction_of_correct_next_step_preds`: The token accuracy at an evaluation step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the evaluation dataset.
- `/eval_num_predictions`: Number of predicted tokens at an evaluation step.

The metrics visualizations are available after the model tuning job completes. If you don't specify a validation dataset when you create the tuning job, only the visualizations for the tuning metrics are available.


In [None]:
# Locate Vertex AI Experiment and Vertex AI Experiment Run
experiment = aiplatform.Experiment(experiment_name=experiment_name)
filter_str = metadata_utils._make_filter_string(
    schema_title="system.ExperimentRun",
    parent_contexts=[experiment.resource_name],
)
experiment_run = context.Context.list(filter_str)[0]

In [None]:
# Read data from Tensorboard
tensorboard_run_name = f"{experiment.get_backing_tensorboard_resource().resource_name}/experiments/{experiment.name}/runs/{experiment_run.name.replace(experiment.name, '')[1:]}"
tensorboard_run = aiplatform.TensorboardRun(tensorboard_run_name)
metrics = tensorboard_run.read_time_series_data()

In [None]:
def get_metrics(metric: str = "/train_total_loss"):
    """Get metrics from Tensorboard.

    Args:
      metric: metric name, eg. /train_total_loss or /eval_total_loss.

    Returns:
      steps: list of steps.
      steps_loss: list of loss values.
    """
    loss_values = metrics[metric].values
    steps_loss = []
    steps = []
    for loss in loss_values:
        steps_loss.append(loss.scalar.value)
        steps.append(loss.step)
    return steps, steps_loss

In [None]:
# Get Train and Eval Loss
train_loss = get_metrics(metric="/train_total_loss")
eval_loss = get_metrics(metric="/eval_total_loss")

### Step7 [b]: Plot the metrics

In [None]:
# Plot the train and eval loss metrics using Plotly python library

fig = make_subplots(
    rows=1, cols=2, shared_xaxes=True, subplot_titles=("Train Loss", "Eval Loss")
)

# Add traces
fig.add_trace(
    go.Scatter(x=train_loss[0], y=train_loss[1], name="Train Loss", mode="lines"),
    row=1,
    col=1,
)
fig.add_trace(
    go.Scatter(x=eval_loss[0], y=eval_loss[1], name="Eval Loss", mode="lines"),
    row=1,
    col=2,
)

# Add figure title
fig.update_layout(title="Train and Eval Loss", xaxis_title="Steps", yaxis_title="Loss")

# Set x-axis title
fig.update_xaxes(title_text="Steps")

# Set y-axes titles
fig.update_yaxes(title_text="Loss")

# Show plot
fig.show()

## Step8: Load the Tuned Model

 - Load the fine-tuned model using `GenerativeModel` class with the tuning job model endpoint name.

 - Test the tuned model with the following prompt

In [None]:
from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))

responses = client.tunings.list()
for response in responses:
    print(response.name)
    # Example response:
    # projects/123456789012/locations/us-central1/tuningJobs/123456789012345

In [None]:
from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))

# Get the tuning job and the tuned model.
# Eg. tuning_job_name = "projects/123456789012/locations/us-central1/tuningJobs/123456789012345"
tuning_job_name = "projects/123456789012/locations/us-central1/tuningJobs/1824796776457043968"

tuning_job = client.tunings.get(name=tuning_job_name)

print(tuning_job.tuned_model.model)
print(tuning_job.tuned_model.endpoint)
print(tuning_job.experiment)
# Example response:
# projects/123456789012/locations/us-central1/models/1234567890@1
# projects/123456789012/locations/us-central1/endpoints/123456789012345
# projects/123456789012/locations/us-central1/metadataStores/default/contexts/tuning-experiment-2025010112345678

In [None]:
contents =

response = client.models.generate_content(
    model=tuning_job.tuned_model.endpoint,
    contents=contents,
)
print(response.text)

- We can clearly see the difference between summary generated pre and post tuning, as tuned summary is more inline with the ground truth format (**Note**: Pre and Post outputs, might vary based on the set parameters.)

  - *Pre*: `This article describes a method for applying lotion to your back using your forearms as applicators. By squeezing lotion onto your forearms and then reaching behind your back, you can use a windshield wiper motion to spread the lotion across your back. The method acknowledges potential limitations for those with shoulder pain or limited flexibility.`
  - *Post*: `Squeeze a line of lotion on your forearm. Reach behind you and rub your back.`
  - *Ground Truth*:` Squeeze a line of lotion onto the tops of both forearms and the backs of your hands. Place your arms behind your back. Move your arms in a windshield wiper motion.`

## Step9: Evaluation post model tuning

<div class="alert alert-block alert-warning">
<b>⚠️ It will take ~5 mins for the evaluation on the provided batch. ⚠️</b>
</div>

In [None]:
# run evaluation
evaluation_df_post_tuning = run_evaluation(tuned_model, corpus_batch)

In [None]:
evaluation_df_post_tuning.head()

In [None]:
evaluation_df_post_tuning_stats = evaluation_df_post_tuning.dropna().describe()

In [None]:
# Statistics of the evaluation dataframe post model tuning.
evaluation_df_post_tuning_stats

In [None]:
print(
    "Mean rougeL_precision is", evaluation_df_post_tuning_stats.rougeL_precision["mean"]
)

#### Improvement

In [None]:
improvement = round(
    (
        (
            evaluation_df_post_tuning_stats.rougeL_precision["mean"]
            - evaluation_df_stats.rougeL_precision["mean"]
        )
        / evaluation_df_stats.rougeL_precision["mean"]
    )
    * 100,
    2,
)
print(
    f"Model tuning has improved the rougeL_precision by {improvement}% (result might differ based on each tuning iteration)"
)

## Conclusion

Performance could be further improved:
- By adding more training samples. In general, improve your training data quality and/or quantity towards getting a more diverse and comprehensive dataset for your task
- By tuning the hyperparameters, such as epochs and learning rate multiplier
  - To find the optimal number of epochs for your dataset, we recommend experimenting with different values. While increasing epochs can lead to better performance, it's important to be mindful of overfitting, especially with smaller datasets. If you see signs of overfitting, reducing the number of epochs can help mitigate the issue
- You may try different prompt structures/formats and opt for the one with better performance

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.


Otherwise, you can delete the individual resources you created in this tutorial.

Refer to this [instructions](https://cloud.google.com/vertex-ai/docs/tutorials/image-classification-custom/cleanup#delete_resources) to delete the resources from console.

In [None]:
# Delete Experiment.
delete_experiments = True
if delete_experiments:
    experiments_list = aiplatform.Experiment.list()
    for experiment in experiments_list:
        if experiment.resource_name == experiment_name:
            print(experiment.resource_name)
            experiment.delete()
            break

print("***" * 10)

# Delete Endpoint.
delete_endpoint = True
# If force is set to True, all deployed models on this
# Endpoint will be first undeployed.
if delete_endpoint:
    for endpoint in aiplatform.Endpoint.list():
        if endpoint.resource_name == tuned_model:
            print(endpoint.resource_name)
            endpoint.delete(force=True)
            break

print("***" * 10)

# Delete Cloud Storage Bucket.
delete_bucket = True
if delete_bucket:
    ! gsutil -m rm -r $BUCKET_URI