| Author(s) |
| --- |
| [Erwin Huizenga](https://github.com/erwinh85) |

## Overview

**Gemini** is a family of generative AI models developed by Google DeepMind designed for multimodal use cases. The Gemini API gives you access to the various Gemini models, such as Gemini 2.0 and Gemini 2.0.
This notebook demonstrates fine-tuning the Gemini 2.0 using the Vertex AI Supervised Tuning feature. Supervised Tuning allows you to use your own labeled training data to further refine the base model's capabilities toward your specific tasks.
Supervised Tuning uses labeled examples to tune a model. Each example demonstrates the output you want from your text model during inference.
First, ensure your training data is of high quality, well-labeled, and directly relevant to the target task. This is crucial as low-quality data can adversely affect the performance and introduce bias in the fine-tuned model.
Training: Experiment with different configurations to optimize the model's performance on the target task.
Evaluation:
Metric: Choose appropriate evaluation metrics that accurately reflect the success of the fine-tuned model for your specific task
Evaluation Set: Use a separate set of data to evaluate the model's performance


Refer to public [documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-supervised-tuning) for more details.

<hr/>

Before running this notebook, ensure you have:

- A Google Cloud project: Provide your project ID in the `PROJECT_ID` variable.

- Authenticated your Colab environment: Run the authentication code block at the beginning.

- Prepared training data (Test with your own data or use the one in the notebook): Data should be formatted in JSONL with prompts and corresponding completions.

### Costs

This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing), [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

To estimate the cost of token please have a look at this [notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/tuning/vertexai_supervised_tuning_token_count_and_cost_estimation.ipynb)

## Get started

### Install the Google Gen AI SDK and other required packages

The new Google Gen AI SDK provides a unified interface to Gemini through both the Gemini Developer API and the Gemini API on Vertex AI. With a few exceptions, code that runs on one platform will run on both. This means that you can prototype an application using the Developer API and then migrate the application to Vertex AI without rewriting your code.


In [None]:
%pip install --upgrade --quiet google-cloud-aiplatform google-genai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/7.6 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.2/7.6 MB[0m [31m96.5 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m7.6/7.6 MB[0m [31m145.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m96.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/159.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m159.7/159.7 kB[0m [31m16.0 MB/s[0m eta [36m0:00:00[0m
[?25h

### Restart runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.

The restart might take a minute or longer. After it's restarted, continue to the next step.

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>
</div>


### Authenticate your notebook environment (Colab only)

If you're running this notebook on Google Colab, run the cell below to authenticate your environment.

In [None]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

In [None]:
gcloud auth application-default set-quota-project

SyntaxError: invalid syntax (<ipython-input-2-c7850f4a3cfa>, line 1)

### Set the Google Cloud project information and initialize the Google Gen AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [None]:

import os

from google import genai
from google.genai import types

# Replace 'your-project-id' with your actual Google Cloud Project ID
PROJECT_ID = "nodal-album-456823-v9"  # @param {type:"string", isTemplate: true}
if PROJECT_ID == "nodal-album-456823-v9":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

### Import libraries

In [None]:
from collections import Counter
import json
import random

# Vertex AI SDK
from google.cloud import aiplatform
from google.cloud.aiplatform.metadata import context
from google.cloud.aiplatform.metadata import utils as metadata_utils
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import vertexai

vertexai.init(project=PROJECT_ID, location=LOCATION)

### Data

#### SQuAD dataset
Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

You can find more information on the SQuAD [github page](https://rajpurkar.github.io/SQuAD-explorer/)


```bibtex
@inproceedings{rajpurkar-etal-2016-squad,
    title = "{SQ}u{AD}: 100,000+ Questions for Machine Comprehension of Text",
    author = "Rajpurkar, Pranav  and
      Zhang, Jian  and
      Lopyrev, Konstantin  and
      Liang, Percy",
    editor = "Su, Jian  and
      Duh, Kevin  and
      Carreras, Xavier",
    booktitle = "Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2016",
    address = "Austin, Texas",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/D16-1264",
    doi = "10.18653/v1/D16-1264",
    pages = "2383--2392",
    eprint={1606.05250},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
}
```

First update the `BUCKET_NAME` parameter below. You can either use an existing bucket or create a new one.

In [None]:
# Provide a bucket name
BUCKET_NAME = "mathinmind"  # @param {type:"string"}
BUCKET_URI = f"gs://{BUCKET_NAME}"
print(BUCKET_URI)

gs://mathinmind


Only run the code below if you want to create a new Google Cloud Storage bucket.

In [None]:
!pip install datasets

Collecting datasets
  Downloading datasets-3.5.0-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.12.0,>=2023.1.0 (from fsspec[http]<=2024.12.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.12.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.5.0-py3-none-any.whl (491 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.2/491.2 kB[0m [31m15.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2024.12.0-py3-none-any.

In [None]:
from datasets import load_dataset

# Load both datasets
math_ds = load_dataset("DigitalLearningGmbH/MATH-lighteval", "default")
meta_ds = load_dataset("meta-math/MetaMathQA")

# Save all splits to CSVs for MATH-lighteval
for split in math_ds:
    math_ds[split].to_csv(f"math_lighteval_{split}.csv", index=False)

# Save all splits to CSVs for MetaMathQA
for split in meta_ds:
    meta_ds[split].to_csv(f"metamathqa_{split}.csv", index=False)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/8.41k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/2.99M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/1.86M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/7500 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/5000 [00:00<?, ? examples/s]

README.md:   0%|          | 0.00/4.45k [00:00<?, ?B/s]

MetaMathQA-395K.json:   0%|          | 0.00/396M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/395000 [00:00<?, ? examples/s]

Creating CSV from Arrow format:   0%|          | 0/8 [00:00<?, ?ba/s]

Creating CSV from Arrow format:   0%|          | 0/5 [00:00<?, ?ba/s]

Creating CSV from Arrow format:   0%|          | 0/395 [00:00<?, ?ba/s]

In [None]:
# ! gsutil mb -l {LOCATION} -p {PROJECT_ID} {BUCKET_URI}

Next you will copy the data into your bucket.

In [None]:
!gsutil cp gs://mathinmind/datasets/math_lighteval_test.csv .
!gsutil cp gs://mathinmind/datasets/math_lighteval_train.csv .
!gsutil cp gs://mathinmind/datasets/metamathqa_train.csv .

Copying gs://mathinmind/datasets/math_lighteval_test.csv...
/ [1 files][  3.5 MiB/  3.5 MiB]                                                
Operation completed over 1 objects/3.5 MiB.                                      
Copying gs://mathinmind/datasets/math_lighteval_train.csv...
/ [1 files][  5.7 MiB/  5.7 MiB]                                                
Operation completed over 1 objects/5.7 MiB.                                      
Copying gs://mathinmind/datasets/metamathqa_train.csv...
==> NOTE: You are downloading one or more large file(s), which would
run significantly faster if you enabled sliced object downloads. This
feature is enabled by default but requires that compiled crcmod be
installed (see "gsutil help crcmod").

/ [1 files][349.7 MiB/349.7 MiB]                                                
Operation completed over 1 objects/349.7 MiB.                                    


### Baseline

Next you will prepare some data that you will use to establish a baseline.  This means evaluating the out of the box default model on a representative sample of your dataset before any fine-tuning. A baseline allows you to quantify the improvements achieved through fine-tuning.

In [None]:
test_df = pd.read_csv("math_lighteval_test.csv")
test_df.head(2)

Unnamed: 0,problem,level,solution,type
0,How many vertical asymptotes does the graph of...,Level 3,The denominator of the rational function facto...,Algebra
1,What is the positive difference between $120\%...,Level 1,One hundred twenty percent of 30 is $120\cdot3...,Algebra


First you need to prepare some data to evaluate the out of the box model and set a baseline. In this case, we will lower the text and remove extra whitespace, but preserve newlines.

In [None]:
math_train = pd.read_csv("math_lighteval_train.csv")
math_test = pd.read_csv("math_lighteval_test.csv")
meta_train = pd.read_csv("metamathqa_train.csv")

In [None]:
math_train.head(2)

Unnamed: 0,problem,level,solution,type,input
0,"Let \[f(x) = \left\{\n\begin{array}{cl} ax+3, ...",Level 5,"For the piecewise function to be continuous, t...",Algebra,### Question:\nLet \[f(x) = \left\{\n\begin{ar...
1,A rectangular band formation is a formation wi...,Level 5,Let $x$ be the number of band members in each ...,Algebra,### Question:\nA rectangular band formation is...


In [None]:
meta_train.head()

Unnamed: 0,type,query,original_question,response,input
0,MATH_AnsAug,Gracie and Joe are choosing numbers on the com...,Gracie and Joe are choosing numbers on the com...,"The distance between two points $(x_1,y_1)$ an...",### Question:\nGracie and Joe are choosing num...
1,GSM_Rephrased,What is the total cost of purchasing equipment...,The treasurer of a football team must buy equi...,"Each player requires a $25 jersey, a $15.20 pa...",### Question:\nWhat is the total cost of purch...
2,GSM_SV,Diego baked 12 cakes for his sister's birthday...,Diego baked 12 cakes for his sister's birthday...,"To solve this problem, we need to determine th...",### Question:\nDiego baked 12 cakes for his si...
3,MATH_AnsAug,Convert $10101_3$ to a base 10 integer.,Convert $10101_3$ to a base 10 integer.,$10101_3 = 1 \cdot 3^4 + 0 \cdot 3^3 + 1 \cdot...,### Question:\nConvert $10101_3$ to a base 10 ...
4,GSM_FOBAR,"Sue works in a factory and every 30 minutes, a...","Sue works in a factory and every 30 minutes, a...","We know that every 30 minutes, a machine produ...",### Question:\nSue works in a factory and ever...


In [None]:
import pandas as pd
import json

# Load both CSV files
mathleval_df = pd.read_csv("math_lighteval_train.csv")
metaqa_df = pd.read_csv("metamathqa_train.csv")

# Extract required columns and rename for consistency
mathleval_df = mathleval_df[["problem", "solution"]].rename(columns={
    "problem": "input_text",
    "solution": "output_text"
})

metaqa_df = metaqa_df[["original_question", "response"]].rename(columns={
    "original_question": "input_text",
    "response": "output_text"
})

# Combine the two datasets
combined_df = pd.concat([mathleval_df, metaqa_df], ignore_index=True)

# Optional: Drop rows with missing values
combined_df.dropna(subset=["input_text", "output_text"], inplace=True)

# Save to .jsonl
with open("math_qa_train.jsonl", "w", encoding="utf-8") as f:
    for _, row in combined_df.iterrows():
        json.dump({
            "input_text": row["input_text"],
            "output_text": row["output_text"]
        }, f)
        f.write("\n")

print("✅ Saved as 'math_qa_train.jsonl'")

✅ Saved as 'math_qa_train.jsonl'


In [None]:
from sklearn.model_selection import train_test_split

train_df, valid_df = train_test_split(combined_df, test_size=0.1, random_state=42)

# Save train
with open("math_qa_train.jsonl", "w", encoding="utf-8") as f:
    for _, row in train_df.iterrows():
        json.dump({"input_text": row["input_text"], "output_text": row["output_text"]}, f)
        f.write("\n")

# Save validation
with open("math_qa_valid.jsonl", "w", encoding="utf-8") as f:
    for _, row in valid_df.iterrows():
        json.dump({"input_text": row["input_text"], "output_text": row["output_text"]}, f)
        f.write("\n")

print("✅ Train/Validation files ready")

✅ Train/Validation files ready


In [None]:
!gsutil cp math_qa_train.jsonl gs://mathinmind/
!gsutil cp math_qa_valid.jsonl gs://mathinmind/

Copying file://math_qa_train.jsonl [Content-Type=application/octet-stream]...
/ [0 files][    0.0 B/258.3 MiB]                                                ==> NOTE: You are uploading one or more large file(s), which would run
significantly faster if you enable parallel composite uploads. This
feature can be enabled by editing the
"parallel_composite_upload_threshold" value in your .boto
configuration file. However, note that if you do this large files will
be uploaded as `composite objects
<https://cloud.google.com/storage/docs/composite-objects>`_,which
means that any user who downloads such objects will need to have a
compiled crcmod installed (see "gsutil help crcmod"). This is because
without a compiled crcmod, computing checksums on composite objects is
so slow that gsutil disables downloads of composite objects.

-
Operation completed over 1 objects/258.3 MiB.                                    
Copying file://math_qa_valid.jsonl [Content-Type=application/octet-stream]...
- [

In [None]:
BUCKET_URI = "gs://mathinmind"
train_dataset = f"{BUCKET_URI}/math_qa_train.jsonl"
validation_dataset = f"{BUCKET_URI}/math_qa_valid.jsonl"

In [None]:
from vertexai.preview.language_models import (
    TuningDataset,
    TuningValidationDataset,
    TuningJob,
)

project = "nodal-album-456823-v9"
location = "us-central1"  # Or your region
tuned_model_display_name = "gemini-math-qa-tuned"

# Dataset URIs
train_dataset = f"{BUCKET_URI}/math_qa_train.jsonl"
validation_dataset = f"{BUCKET_URI}/math_qa_valid.jsonl"

# Create dataset objects
training_dataset = TuningDataset(gcs_uri=train_dataset)
validation_dataset = TuningValidationDataset(gcs_uri=validation_dataset)

# Launch finetuning
job = TuningJob.create(
    display_name=tuned_model_display_name,
    model="models/gemini-1.5-flash",  # Or "models/gemini-1.5-pro"
    training_data=training_dataset,
    validation_data=validation_dataset,
    project=project,
    location=location,
)

print("🎉 Finetuning started! Job ID:", job.name)

ImportError: cannot import name 'TuningDataset' from 'vertexai.preview.language_models' (/usr/local/lib/python3.11/dist-packages/vertexai/preview/language_models.py)

In [None]:
!pip install --upgrade google-cloud-aiplatform google-genai



In [None]:
df = pd.read_csv("/content/math_lighteval_train.csv")

# Function to convert each row into a JSONL format and save to a file
def convert_to_jsonl(input_df, output_file):
    with open(output_file, 'w') as jsonl_file:
        for index, row in input_df.iterrows():
            # Prepare the example for JSONL format
            example = {
                "question": row["problem"],
                "answer": row["solution"],
            }
            # Write each line as a JSON object
            jsonl_file.write(json.dumps(example) + "\n")

# Convert and save the JSONL file
convert_to_jsonl(df, "mathleval_train1.jsonl")

In [None]:
import pandas as pd
import json

# Load the CSV file
df = pd.read_csv("/content/math_lighteval_train.csv")

# Function to convert each row into a JSONL format and save to a file
def convert_to_jsonl(input_df, output_file):
    with open(output_file, 'w') as jsonl_file:
        for index, row in input_df.iterrows():
            # Create a structured dictionary for each example
            example = {
                "input": {
                    "question": row["problem"],
                    "difficulty_level": row["level"],
                    "problem_type": row["type"]
                },
                "output": {
                    "solution": row["solution"]
                }
            }
            # Write each example as a JSON object on a new line
            jsonl_file.write(json.dumps(example) + "\n")

# Convert and save as JSONL
convert_to_jsonl(df, "mathleval_train2.jsonl")

In [None]:
import pandas as pd
import json

# Load the CSV file
df = pd.read_csv("/content/math_lighteval_train.csv")

# Convert and save in Vertex AI-compatible JSONL format
def convert_to_vertexai_jsonl(input_df, output_file):
    with open(output_file, 'w') as jsonl_file:
        for _, row in input_df.iterrows():
            question_context = f"Question: {row['problem']}\nLevel: {row['level']}\nType: {row['type']}"
            example = {
                "input_text": question_context,
                "output_text": row["solution"]
            }
            jsonl_file.write(json.dumps(example) + "\n")

# Save the converted file
convert_to_vertexai_jsonl(df, "mathleval_vertexai_train.jsonl")

In [None]:
!gsutil cp /content/mathleval_vertexai_train.jsonl gs://mathinmind/mathleval_vertexai_train.jsonl

Copying file:///content/mathleval_vertexai_train.jsonl [Content-Type=application/octet-stream]...
/ [1 files][  6.3 MiB/  6.3 MiB]                                                
Operation completed over 1 objects/6.3 MiB.                                      


In [None]:
base_model = "gemini-1.5-flash"  # Or whatever model you're using

# Define the GCS path to your training data


# Try this format for the training dataset
train_dataset = {
    "training_file_path": "gs://mathinmind/mathleval_train2.jsonl"
}
# Or alternatively, try:
# train_dataset = {
#     "data_path": gcs_uri
# }

# Create the fine-tuning job
try:
    sft_tuning_job = client.tunings.tune(
        base_model=base_model,
        training_dataset=train_dataset,
        # Add other parameters as needed
    )
    print("Job created successfully!")
except Exception as e:
    print(f"Error details: {str(e)}")

Error details: 1 validation error for _CreateTuningJobParameters
training_dataset.training_file_path
  Extra inputs are not permitted [type=extra_forbidden, input_value='gs://mathinmind/mathleval_train2.jsonl', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/extra_forbidden


In [None]:
# Load the CSV file
df = pd.read_csv("/content/math_lighteval_train.csv")

# Convert and save in Vertex AI-compatible JSONL format
def convert_to_vertexai_jsonl(input_df, output_file):
    with open(output_file, 'w') as jsonl_file:
        for _, row in input_df.iterrows():
            # Create the "contents" field with user and model roles
            example = {
                "contents": [
                    {
                        "role": "user",
                        "parts": [{"text": row["problem"]}]  # Use "problem" as the input
                    },
                    {
                        "role": "model",
                        "parts": [{"text": row["solution"]}]  # Use "solution" as the output
                    }
                ]
            }
            jsonl_file.write(json.dumps(example) + "\n")

# Save the converted file
convert_to_vertexai_jsonl(df, "mathleval_vertexai_train.jsonl")

In [None]:
BUCKET_URI = "gs://mathinmind"

In [None]:
!gsutil cp /content/mathleval_vertexai_train.jsonl gs://mathinmind/mathleval_vertexai_train.jsonl

Copying file:///content/mathleval_vertexai_train.jsonl [Content-Type=application/octet-stream]...
/ [1 files][  6.4 MiB/  6.4 MiB]                                                
Operation completed over 1 objects/6.4 MiB.                                      


In [None]:
train_dataset = f"""{BUCKET_URI}/mathleval_vertexai_train.jsonl"""


training_dataset = {
    "gcs_uri": train_dataset,
}


In [None]:
base_model = "gemini-2.0-flash-001"


# 3. Set up the fine-tuning configuration for Gemini Flash
tuning_config = types.CreateTuningJobConfig(
    adapter_size="ADAPTER_SIZE_EIGHT",  # Adapter size can be adjusted
    epoch_count=1,  # Set the epoch count to 1 for quick tuning and lower cost
    tuned_model_display_name="gemini-flash-1.5-math-qa",  # The name for the fine-tuned model
)

# 4. Define the base model for fine-tuning (Gemini Flash 1.5)


# 5. Create and run the fine-tuning job
sft_tuning_job = client.tunings.tune(
    base_model=base_model,  # Use the Gemini Flash base model
    training_dataset=training_dataset,
    config=tuning_config,
)

# 6. Monitor job status
sft_tuning_job

TuningJob(name='projects/895619341014/locations/us-central1/tuningJobs/1095056561279074304', state=<JobState.JOB_STATE_PENDING: 'JOB_STATE_PENDING'>, create_time=datetime.datetime(2025, 4, 20, 18, 27, 40, 731561, tzinfo=TzInfo(UTC)), start_time=None, end_time=None, update_time=datetime.datetime(2025, 4, 20, 18, 27, 40, 731561, tzinfo=TzInfo(UTC)), error=None, description=None, base_model='gemini-2.0-flash-001', tuned_model=None, supervised_tuning_spec=SupervisedTuningSpec(hyper_parameters=SupervisedHyperParameters(adapter_size=<AdapterSize.ADAPTER_SIZE_EIGHT: 'ADAPTER_SIZE_EIGHT'>, epoch_count=1, learning_rate_multiplier=None), training_dataset_uri='gs://mathinmind/mathleval_vertexai_train.jsonl', validation_dataset_uri=None), tuning_data_stats=None, encryption_spec=None, partner_model_tuning_spec=None, distillation_spec=None, experiment=None, labels=None, pipeline_job=None, tuned_model_display_name='gemini-flash-1.5-math-qa')

In [None]:
sft_tuning_job.state

<JobState.JOB_STATE_PENDING: 'JOB_STATE_PENDING'>

In [None]:
!pip install --upgrade google-cloud-aiplatform google-genai --user



Next, set the model that you will use. In this example you will use `"gemini-2.0-flash-001"`, a multimodal model that is designed for high-volume, cost-effective applications, and which delivers speed and efficiency to build fast, lower-cost applications that don't compromise on quality.

For the latest Gemini models and versions, please have a look at our [documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models).


In [None]:
base_model = "gemini-2.0-flash-001"

In [None]:
# Replace 'your-project-id' with your actual Google Cloud Project ID
PROJECT_ID = "nodal-album-456823-v9"  # @param {type:"string", isTemplate: true}
if PROJECT_ID == "nodal-album-456823-v9":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

In [None]:
# Assuming 'answers' column in math_test contains the true answers
y_true = math_test["solution"].values  # Changed to 'solution' from 'answers'
y_pred_question = math_test["input_question"].values

# Check two pairs of question and answer.
for i in range(2):  # Loop through the first two indices
    print(f"Pair {i+1}:")
    print(f"  True Answer: {y_true[i]}")
    print(f"  Predicted Question: {y_pred_question[i]}")

Pair 1:
  True Answer: The denominator of the rational function factors into $x^2+x-6=(x-2)(x+3)$. Since the numerator is always nonzero, there is a vertical asymptote whenever the denominator is $0$, which occurs for $x = 2$ and $x = -3$.  Therefore, the graph has $\boxed{2}$ vertical asymptotes.
  Predicted Question: 

 **Below the question with context that you need to answer**
 Context: How many vertical asymptotes does the graph of $y=\frac{2}{x^2+x-6}$ have?
 Question: What is the solution?
Pair 2:
  True Answer: One hundred twenty percent of 30 is $120\cdot30\cdot\frac{1}{100}=36$, and $130\%$ of 20 is $ 130\cdot 20\cdot\frac{1}{100}=26$.  The difference between 36 and 26 is $\boxed{10}$.
  Predicted Question: 

 **Below the question with context that you need to answer**
 Context: What is the positive difference between $120\%$ of 30 and $130\%$ of 20?
 Question: What is the solution?


Next lets take a question and get a prediction from Gemini that we can compare to the actual answer.

In [None]:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
%env GOOGLE_CLOUD_PROJECT="nodal-album-456823-v9"
%env GOOGLE_CLOUD_LOCATION=us-central1
%env GOOGLE_GENAI_USE_VERTEXAI=True

env: GOOGLE_CLOUD_PROJECT="nodal-album-456823-v9"
env: GOOGLE_CLOUD_LOCATION=us-central1
env: GOOGLE_GENAI_USE_VERTEXAI=True


In [None]:
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/content/nodal-album-456823-v9-e43da639607e.json"

In [None]:
import os
print(os.environ.get("GOOGLE_APPLICATION_CREDENTIALS"))

/content/nodal-album-456823-v9-e43da639607e.json


In [None]:
!gcloud auth list

     Credentialed Accounts
ACTIVE  ACCOUNT
*       marouazoubir00@gmail.com

To set the active account, run:
    $ gcloud config set account `ACCOUNT`



In [None]:
!gcloud config set account marouazoubir00@gmail.com

Updated property [core/account].


In [None]:
from google.colab import auth
auth.authenticate_user()

In [None]:
import os
os.environ['GOOGLE_CLOUD_PROJECT'] = 'nodal-album-456823-v9'

In [None]:
import google.generativeai as genai
genai.configure(api_key="e43da639607ec97fad5d33940a512db0c9d84443")

In [None]:
!gcloud auth application-default login


You are running on a Google Compute Engine virtual machine.
The service credentials associated with this virtual machine
will automatically be used by Application Default
Credentials, so it is not necessary to use this command.

If you decide to proceed anyway, your user credentials may be visible
to others with access to this virtual machine. Are you sure you want
to authenticate with your personal account?

Do you want to continue (Y/n)?  

Command killed by keyboard interrupt

^C


In [None]:
import google.generativeai as genai

genai.configure(api_key="e43da639607ec97fad5d33940a512db0c9d84443")

In [None]:
import google.generativeai as genai

# Set the environment variable for the project ID
os.environ['GOOGLE_CLOUD_PROJECT'] = "nodal-album-456823-v9"

# Configure the API key
genai.configure(api_key="AIzaSyA5L8SwpUej8-FIh6G-8EF2Iq2boIwGQ5g")

In [None]:
from google import genai

client = genai.Client(project="nodal-album-456823-v9", location="us-central1", vertexai=True)

In [None]:
test_answer = math_test["solution"].iloc[0]  # Use 0 for the first row, change to 'solution' from 'answers'

response = get_predictions(test_question, base_model)

print(f"Gemini response: {response}")
print(f"Actual answer: {test_answer}")

Gemini response: $y=\frac{2}{x^2+x-6}$ have two vertical asymptotes
Actual answer: The denominator of the rational function factors into $x^2+x-6=(x-2)(x+3)$. Since the numerator is always nonzero, there is a vertical asymptote whenever the denominator is $0$, which occurs for $x = 2$ and $x = -3$.  Therefore, the graph has $\boxed{2}$ vertical asymptotes.


In [None]:

!gcloud config set project "nodal-album-456823-v9"

Updated property [core/project].


In [None]:
!gcloud services enable aiplatform.googleapis.com
!gcloud services enable generativelanguage.googleapis.com

[1;31mERROR:[0m (gcloud.services.enable) The required property [project] is not currently set.
It can be set on a per-command basis by re-running your command with the [--project] flag.

You may set it for your current workspace by running:

  $ gcloud config set project VALUE

or it can be set temporarily by the environment variable [CLOUDSDK_CORE_PROJECT]
[1;31mERROR:[0m (gcloud.services.enable) The required property [project] is not currently set.
It can be set on a per-command basis by re-running your command with the [--project] flag.

You may set it for your current workspace by running:

  $ gcloud config set project VALUE

or it can be set temporarily by the environment variable [CLOUDSDK_CORE_PROJECT]


In [None]:
!gcloud services enable aiplatform.googleapis.com storage.googleapis.com

Operation "operations/acat.p2-895619341014-9f98764e-4912-4488-ae54-b72e80892dfd" finished successfully.


In [None]:
!gcloud services enable aiplatform.googleapis.com storage.googleapis.com --project {"nodal-album-456823-v9"}

Operation "operations/acat.p2-895619341014-a09a917a-c738-4439-bded-2e4a69c0fc48" finished successfully.


In [None]:
!gcloud iam service-accounts list --project {"nodal-album-456823-v9"}

DISPLAY NAME                            EMAIL                                                                    DISABLED
Compute Engine default service account  895619341014-compute@developer.gserviceaccount.com                       False
nodal-album-456823-v9                   nodal-album-456823-v9@nodal-album-456823-v9.iam.gserviceaccount.com      False
marouazoubir00@gmail.com                nodal-album-456823-v9-864@nodal-album-456823-v9.iam.gserviceaccount.com  False


In [None]:
import google.auth

credentials, project_id = google.auth.default()

# Check if the credentials object has the service_account_email attribute
if hasattr(credentials, 'service_account_email'):
    print(f"Compte utilisé : {credentials.service_account_email}")
else:
    print("Service account email not available. You may be using user authentication.")
    # If needed, you can try to obtain the user's email address using
    # `google.auth.transport.requests.Request()`.

Compte utilisé : default


In [None]:
from google.colab import auth
auth.authenticate_user()  # Ceci ouvre une popup pour vous connecter


In [None]:
!gcloud auth application-default set-quota-project "nodal-album-456823-v9"

[1;31mERROR:[0m (gcloud.auth.application-default.set-quota-project) Application default credentials have not been set up. Run $ gcloud auth application-default login to set it up first.


In [None]:
math_test.head()

Unnamed: 0,problem,level,solution,type,input,systemInstruct,input_question
0,How many vertical asymptotes does the graph of...,Level 3,The denominator of the rational function facto...,Algebra,### Question:\nHow many vertical asymptotes do...,Answer the question with a concise extract fro...,\n\n **Below the question with context that yo...
1,What is the positive difference between $120\%...,Level 1,One hundred twenty percent of 30 is $120\cdot3...,Algebra,### Question:\nWhat is the positive difference...,Answer the question with a concise extract fro...,\n\n **Below the question with context that yo...
2,Find $x$ such that $\lceil x \rceil + x = \dfr...,Level 4,"First, we note that $x$ must be positive, sinc...",Algebra,### Question:\nFind $x$ such that $\lceil x \r...,Answer the question with a concise extract fro...,\n\n **Below the question with context that yo...
3,Evaluate $i^5+i^{-25}+i^{45}$.,Level 5,We have $i^5 = i^4\cdot i = 1\cdot (i) = i$. ...,Algebra,### Question:\nEvaluate $i^5+i^{-25}+i^{45}$.\...,Answer the question with a concise extract fro...,\n\n **Below the question with context that yo...
4,"If $2^8=4^x$, what is the value of $x$?",Level 1,Rewrite $4$ as $2^2$ to find $4^x=2^{2x}$. Si...,Algebra,"### Question:\nIf $2^8=4^x$, what is the value...",Answer the question with a concise extract fro...,\n\n **Below the question with context that yo...


Sometimes you might get an answer from Gemini is more lengthy. However, answers in the SQuAD dataset are typically concise and clear.

Fine-tuning is a great way to control the type of output your use case requires. In this instance, you would want the model to provide short, clear answers.

In [None]:
math_test.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4997 entries, 0 to 4999
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   problem         4997 non-null   object
 1   level           4997 non-null   object
 2   solution        4997 non-null   object
 3   type            4997 non-null   object
 4   input           4997 non-null   object
 5   systemInstruct  4997 non-null   object
 6   input_question  4997 non-null   object
dtypes: object(7)
memory usage: 312.3+ KB


In [None]:
!pip install swifter

Collecting swifter
  Downloading swifter-1.4.0.tar.gz (1.2 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.2 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m1.2/1.2 MB[0m [31m34.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m25.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: swifter
  Building wheel for swifter (setup.py) ... [?25l[?25hdone
  Created wheel for swifter: filename=swifter-1.4.0-py3-none-any.whl size=16505 sha256=9627ed0ec6c2a370e872f9c9eb74949d7e454c12b973c8c30fffaac6254e6512
  Stored in directory: /root/.cache/pip/wheels/ef/7f/bd/9bed48f078f3ee1fa75e0b29b6e0335ce1cb03a38d3443b3a3
Successfully built swifter
Installing collected packages: swifter
Successfully installed swifter-1.4.0


In [None]:
# Apply the get_prediction() function to the 'question_column'
 # Install the swifter library if not already installed
import swifter # Import the library
# Apply the get_prediction() function to the 'question_column'
math_test["predicted_answer"] = math_test.swifter.apply(lambda row: get_predictions(row['problem'], base_model), axis=1)

KeyboardInterrupt: 

In [None]:
math_test.head()

Unnamed: 0,problem,level,solution,type,input,systemInstruct,input_question
0,How many vertical asymptotes does the graph of...,Level 3,The denominator of the rational function facto...,Algebra,### Question:\nHow many vertical asymptotes do...,Answer the question with a concise extract fro...,\n\n **Below the question with context that yo...
1,What is the positive difference between $120\%...,Level 1,One hundred twenty percent of 30 is $120\cdot3...,Algebra,### Question:\nWhat is the positive difference...,Answer the question with a concise extract fro...,\n\n **Below the question with context that yo...
2,Find $x$ such that $\lceil x \rceil + x = \dfr...,Level 4,"First, we note that $x$ must be positive, sinc...",Algebra,### Question:\nFind $x$ such that $\lceil x \r...,Answer the question with a concise extract fro...,\n\n **Below the question with context that yo...
3,Evaluate $i^5+i^{-25}+i^{45}$.,Level 5,We have $i^5 = i^4\cdot i = 1\cdot (i) = i$. ...,Algebra,### Question:\nEvaluate $i^5+i^{-25}+i^{45}$.\...,Answer the question with a concise extract fro...,\n\n **Below the question with context that yo...
4,"If $2^8=4^x$, what is the value of $x$?",Level 1,Rewrite $4$ as $2^2$ to find $4^x=2^{2x}$. Si...,Algebra,"### Question:\nIf $2^8=4^x$, what is the value...",Answer the question with a concise extract fro...,\n\n **Below the question with context that yo...


You also need to make sure that the predicted answer is in the same format.

In [None]:
test_df["predicted_answer"] = test_df["predicted_answer"].apply(normalize_answer)
test_df.head(4)

Next, let's establish a baseline using evaluation metrics.

Evaluating the performance of a Question Answering (QA) system requires specific metrics. Two commonly used metrics are Exact Match (EM) and F1 score.

EM is a strict measure that only considers an answer correct if it perfectly matches the ground truth, even down to the punctuation. It's a binary metric - either 1 for a perfect match or 0 otherwise. This makes it sensitive to minor variations in phrasing.

F1 score is more flexible. It considers the overlap between the predicted answer and the true answer in terms of individual words or tokens. It calculates the harmonic mean of precision (proportion of correctly predicted words out of all predicted words) and recall (proportion of correctly predicted words out of all true answer words). This allows for partial credit and is less sensitive to minor wording differences.

In practice, EM is useful when exact wording is crucial, while F1 is more suitable when evaluating the overall understanding and semantic accuracy of the QA system. Often, both metrics are used together to provide a comprehensive evaluation.

In [None]:
def f1_score_squad(prediction, ground_truth):
    prediction_tokens = normalize_answer(prediction).split()
    ground_truth_tokens = normalize_answer(ground_truth).split()
    common = Counter(prediction_tokens) & Counter(ground_truth_tokens)
    num_same = sum(common.values())
    if num_same == 0:
        return 0
    precision = 1.0 * num_same / len(prediction_tokens)
    recall = 1.0 * num_same / len(ground_truth_tokens)
    f1 = (2 * precision * recall) / (precision + recall)
    return f1


def exact_match_score(prediction, ground_truth):
    return normalize_answer(prediction) == normalize_answer(ground_truth)


def calculate_em_and_f1(y_true, y_pred):
    """Calculates EM and F1 scores for DataFrame columns."""

    # Ensure inputs are Series
    if not isinstance(y_true, pd.Series):
        y_true = pd.Series(y_true)
    if not isinstance(y_pred, pd.Series):
        y_pred = pd.Series(y_pred)

    em = np.mean(y_true.combine(y_pred, exact_match_score))
    f1 = np.mean(y_true.combine(y_pred, f1_score_squad))

    # # Print non-matching pairs (using index for clarity)
    # for i, (t, p) in enumerate(zip(y_true, y_pred)):
    #     if not exact_match_score(p, t):
    #         print(f"No EM Match at index {i}:\nTrue: {t}\nPred: {p}\n")

    return em, f1

In [None]:
em, f1 = calculate_em_and_f1(test_df["answers"], test_df["predicted_answer"])
print(f"EM score: {em}")
print(f"F1 score: {f1}")

### Prepare the data for fine-tuning

To optimize the supervised fine-tuning process for a foundation model, ensure your dataset includes examples that reflect the desired task. Each record in the dataset pairs an input text (or prompt) with its corresponding expected output. This supervised tuning approach uses the dataset to effectively teach the model the specific behavior or task you need it to perform, by providing numerous illustrative examples.

The size of your dataset will vary depending on the complexity of the task, but as a general rule, the more examples you include, the better the model's performance. For fine-tuning Gemini on Vertex AI, the minimum number of examples are 100.

#### Dataset Format
Your training data should be structured in a JSONL file and stored at a Google Cloud Storage (GCS) URI.  Each line in the JSONL file must adhere to the following schema:

A `contents` array containing objects that define:
- A `role` ("user" for user input or "model" for model output)
- `parts` containing the input data.

```json
{
   "contents":[
      {
         "role":"user",  # This indicate input content
         "parts":[
            {
               "text":"How are you?"
            }
         ]
      },
      {
         "role":"model", # This indicate target content
         "parts":[ # text only
            {
               "text":"I am good, thank you!"
            }
         ]
      }
      #  ... repeat "user", "model" for multi turns.
   ]
}
```

Refer to the public [documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-supervised-tuning-prepare#about-datasets) for more details.

In [None]:
# combine the systeminstruct + context + question into one column.
train_df = pd.read_csv("squad_train.csv")
validation_df = pd.read_csv("squad_validation.csv")

In [None]:
# combine the systeminstruct + context + question into one column.
combined_train_df["input_question"] = (
    "\n\n **Below the question with context that you need to answer**"
    + "\n Context: "
    + combined_train_df["context"]
    + "\n Question: "
    + combined_train_df["question"]
)
["input_question"] = (
    "\n\n **Below the question with context that you need to answer**"
    + "\n Context: "
    + validation_df["context"]
    + "\n Question: "
    + validation_df["question"]
)

In [None]:
def df_to_jsonl(df, output_file):
    """Converts a Pandas DataFrame to JSONL format and saves it to a file.

    Args:
      df: The DataFrame to convert.
      output_file: The name of the output file.
    """

    with open(output_file, "w") as f:
        for row in df.itertuples(index=False):
            jsonl_obj = {
                "systemInstruction": {"parts": [{"text": f"{systemInstruct}"}]},
                "contents": [
                    {
                        "role": "user",
                        "parts": [{"text": f"{row.input_question}"}],
                    },
                    {"role": "model", "parts": [{"text": row.answers}]},
                ],
            }
            f.write(json.dumps(jsonl_obj) + "\n")


# Process the DataFrames
df_to_jsonl(train_df, "squad_train.jsonl")
df_to_jsonl(validation_df, "squad_validation.jsonl")

print(f"JSONL data written to squad_train.jsonl")
print(f"JSONL data written to squad_validation.jsonl")

Next you will copy the files into your Google Cloud bucket

In [None]:
!gsutil cp ./squad_train.jsonl {BUCKET_URI}
!gsutil cp ./squad_validation.jsonl {BUCKET_URI}

### Start fine-tuning job
Next you can start the fine-tuning job.

- `source_model`: Specifies the base Gemini model version you want to fine-tune.
 - `train_dataset`: Path to your training data in JSONL format.

  *Optional parameters*
 - `validation_dataset`: If provided, this data is used to evaluate the model during tuning.
 - `tuned_model_display_name`: Display name for the tuned model.
 - `epochs`: The number of training epochs to run.
 - `learning_rate_multiplier`: A value to scale the learning rate during training.
 - `adapter_size` : Gemini 2.0 supports Adapter length [1, 4], default value is 4.

 **Important**: The default hyperparameter settings are optimized for optimal performance based on rigorous testing and are recommended for initial use. Users may customize these parameters to address specific performance requirements.**

In [None]:
train_dataset = f"""{BUCKET_URI}/squad_train.jsonl"""
validation_dataset = f"""{BUCKET_URI}/squad_train.jsonl"""

training_dataset = {
    "gcs_uri": train_dataset,
}

validation_dataset = types.TuningValidationDataset(gcs_uri=validation_dataset)

In [None]:
sft_tuning_job = client.tunings.tune(
    base_model=base_model,
    training_dataset=training_dataset,
    config=types.CreateTuningJobConfig(
        adapter_size="ADAPTER_SIZE_EIGHT",
        epoch_count=1,  # set to one to keep time and cost low
        tuned_model_display_name="gemini-flash-1.5-qa",
    ),
)
sft_tuning_job

**Important:** Tuning time depends on several factors, such as training data size, number of epochs, learning rate multiplier, etc.

<div class="alert alert-block alert-warning">
<b>⚠️ It will take ~30 mins for the model tuning job to complete on the provided dataset and set configurations/hyperparameters. ⚠️</b>
</div>

In [None]:
sft_tuning_job.state

In [None]:
tuning_job = client.tunings.get(name=sft_tuning_job.name)
tuning_job

#### Model tuning metrics

- `/train_total_loss`: Loss for the tuning dataset at a training step.
- `/train_fraction_of_correct_next_step_preds`: The token accuracy at a training step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the tuning dataset.
- `/train_num_predictions`: Number of predicted tokens at a training step

#### Model evaluation metrics:

- `/eval_total_loss`: Loss for the evaluation dataset at an evaluation step.
- `/eval_fraction_of_correct_next_step_preds`: The token accuracy at an evaluation step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the evaluation dataset.
- `/eval_num_predictions`: Number of predicted tokens at an evaluation step.

The metrics visualizations are available after the model tuning job completes. If you don't specify a validation dataset when you create the tuning job, only the visualizations for the tuning metrics are available.

In [None]:
experiment_name = tuning_job.experiment
experiment_name

In [None]:
# Locate Vertex AI Experiment and Vertex AI Experiment Run
experiment = aiplatform.Experiment(experiment_name=experiment_name)
filter_str = metadata_utils._make_filter_string(
    schema_title="system.ExperimentRun",
    parent_contexts=[experiment.resource_name],
)
experiment_run = context.Context.list(filter_str)[0]

In [None]:
# Read data from Tensorboard
tensorboard_run_name = f"{experiment.get_backing_tensorboard_resource().resource_name}/experiments/{experiment.name}/runs/{experiment_run.name.replace(experiment.name, '')[1:]}"
tensorboard_run = aiplatform.TensorboardRun(tensorboard_run_name)
metrics = tensorboard_run.read_time_series_data()

In [None]:
def get_metrics(metric: str = "/train_total_loss"):
    """
    Get metrics from Tensorboard.

    Args:
      metric: metric name, eg. /train_total_loss or /eval_total_loss.
    Returns:
      steps: list of steps.
      steps_loss: list of loss values.
    """
    loss_values = metrics[metric].values
    steps_loss = []
    steps = []
    for loss in loss_values:
        steps_loss.append(loss.scalar.value)
        steps.append(loss.step)
    return steps, steps_loss

In [None]:
# Get Train and Eval Loss
train_loss = get_metrics(metric="/train_total_loss")
eval_loss = get_metrics(metric="/eval_total_loss")

In [None]:
# Plot the train and eval loss metrics using Plotly python library
fig = make_subplots(
    rows=1, cols=2, shared_xaxes=True, subplot_titles=("Train Loss", "Eval Loss")
)

# Add traces
fig.add_trace(
    go.Scatter(x=train_loss[0], y=train_loss[1], name="Train Loss", mode="lines"),
    row=1,
    col=1,
)
fig.add_trace(
    go.Scatter(x=eval_loss[0], y=eval_loss[1], name="Eval Loss", mode="lines"),
    row=1,
    col=2,
)

# Add figure title
fig.update_layout(title="Train and Eval Loss", xaxis_title="Steps", yaxis_title="Loss")

# Set x-axis title
fig.update_xaxes(title_text="Steps")

# Set y-axes titles
fig.update_yaxes(title_text="Loss")

# Show plot
fig.show()

### Use the fine-tuned model and evaluation

In [None]:
prompt = """
Answer the question based on the context

Context: In the 1840s and 50s, there were attempts to overcome this problem by means of various patent valve gears with a separate, variable cutoff expansion valve riding on the back of the main slide valve; the latter usually had fixed or limited cutoff.
The combined setup gave a fair approximation of the ideal events, at the expense of increased friction and wear, and the mechanism tended to be complicated.
The usual compromise solution has been to provide lap by lengthening rubbing surfaces of the valve in such a way as to overlap the port on the admission side, with the effect that the exhaust side remains open for a longer period after cut-off on the admission side has occurred.
This expedient has since been generally considered satisfactory for most purposes and makes possible the use of the simpler Stephenson, Joy and Walschaerts motions.
Corliss, and later, poppet valve gears had separate admission and exhaust valves driven by trip mechanisms or cams profiled so as to give ideal events; most of these gears never succeeded outside of the stationary marketplace due to various other issues including leakage and more delicate mechanisms.

Question: How is lap provided by overlapping the admission side port?
"""

In [None]:
tuned_model = tuning_job.tuned_model.endpoint
tuned_model

In [None]:
get_predictions(prompt, tuned_model)

In [None]:
# Apply the get_prediction() function to the 'question_column'
test_df["predicted_answer"] = test_df["input_question"].apply(get_predictions)
test_df.head(2)

In [None]:
test_df["predicted_answer"] = test_df["predicted_answer"].apply(normalize_answer)

After running the evaluation you can see that the model generally performs better on our use case after fine-tuning. Of course, depending on things like use case or data quality performance will differ.

In [None]:
em, f1 = calculate_em_and_f1(test_df["answers"], test_df["predicted_answer"])
print(f"EM score: {em}")
print(f"F1 score: {f1}")