# Operationalize product data and content enrichment using GenAI and AutoSxS

### Overview

This series of notebooks showcases an end-to-end workflow for improving product catalog data using Generative AI and MLOps. The core focus is to operationalize the process for enriching product descriptions, a key element for effective product discovery and recommendation systems.

We begin in this notebook by enhancing product descriptions using the open-source GenAI model, Gemma. Next, we analyze and compare these results with those from Google's base models such as text-bison.

The subsequent notebook delves into operationalizing this process, building a robust and scalable workflow orchestrated by Vertex AI Pipelines.

### Objective

The steps performed include:

* Parameters, variables, and helper functions are defined
* Import product feed data and define prompts
* Deploy model A (OSS Gemma) and generate enriched product descriptions
* Compare the quality of descriptions generated by Model A (Gemma) and Model B (text-bison-32k) using the AutoSxS evaluation framework
* (optional) Provide human preference labels as ground truth data to review the evaluation with respect to human-aligned judgments

## Install additional packages
Install the following packages required to execute this notebook.

In [None]:
# # Install the packages
# ! pip3 install --upgrade --quiet google-cloud-aiplatform \
#                                  google-cloud-storage \
#                                  kfp \
#                                  google-cloud-pipeline-components \

## Import Libraries & Define Parameters

In [1]:
import pandas as pd
import pandas_gbq
import os
import sys
from datetime import datetime
from typing import Tuple
import time
import random
import string

from google.cloud import aiplatform, language
from google.cloud import bigquery
from google.cloud import storage
from google_cloud_pipeline_components.preview import model_evaluation
from kfp import compiler

  return component_factory.create_component_from_func(


### Set Parameters and Variables

* `VERTEX_MODEL_GARDEN_GEMMA` - Requires accepting conditions and enabling on Vertex. See steps. (TBD) ###########
* `PROJECT_ID`: The ID of your Google Cloud project where the pipeline and resources reside
* `REGION`: The Google Cloud region where pipeline components are executed and resources are located
* `BUCKET_URI`: The URI of the Google Cloud Storage bucket where data and pipeline artifacts are stored (e.g., "gs://passage-gen-test").
* `BUCKET_NAME`: The name of the GCS bucket derived from the BUCKET_URI
* `MODEL_RESOURCE`: The resource name of the baseline language model used for comparison in AutoSxS (e.g., "publishers/google/models/text-bison-32k@002").
* `input_feed_data`: The GCS path to the input CSV file containing product data for generating descriptions
* `evaluation_dataset_name`: The base name for the generated evaluation dataset files (without extension), used for both CSV and JSONL formats.
* `SERVICE_ACCOUNT`: The service account used for running the pipeline components and accessing Google Cloud resources.
* `DISPLAY_NAME`: The display name for the pipeline run, constructed using the ARTIFACT_REPO and a random string.
* `DATASET_ID`: The ID of the BigQuery dataset where evaluation results or other data might be stored.


In [2]:
PROJECT_ID = "sandbox-401718"  # @param {type:"string"}
REGION = "us-central1"  # @param {type:"string"}

BUCKET_URI = f"gs://{PROJECT_ID}-passage-gen-test"  # @param {type:"string"}
BUCKET_NAME = "/".join(BUCKET_URI.split("/")[:3])
STAGING_BUCKET = os.path.join(BUCKET_URI, "transient")
MODEL_BUCKET = os.path.join(BUCKET_URI, "gemma")
MODEL_RESOURCE = "publishers/google/models/text-bison-32k@002"
VERTEX_MODEL_GARDEN_GEMMA = "https://storage.googleapis.com/vertex-ai/generative-ai/model-garden/gemma.tar.gz?GoogleAccessId=service-689411112969@gcp-sa-aiplatform.iam.gserviceaccount.com&amp;Expires=1711731130&amp;Signature=fQZ%252FULLGg0LlGl4ot%252Fw9xW1hhask%252F3y4Kb1eut3NNtMzStSpALR5MvGkMyh71uiMJci1c0j5DwNkcBv1q52YmD8WV2oq5yY8X5IQqqoHD9UeaC5jjor3fdmDEpvaFvL8Plk4DK4uW1X8kegkFLewQSdfUdDN19naRikyX6j34FGfr5MvfzyLXGMiQ483DgsLaIXiamDMjMOpScFRQKWGxUNgwls3lq%252Fv7vnVjbrdTll2Jzayv51wulMDNcd7EwYbIa9Dc5GFdmE8C07cTa5y84RTA1G%252FAC9TX5mkfxB0HMqUgT8Xkh7%252FYDpDN5kB5IFiO0NFK0z0AWsCmkvUG1VHhQ%253D%253D"  # @param {type:"string", isTemplate:true} #HTTP address

SERVICE_ACCOUNT = (
    "757654702990-compute@developer.gserviceaccount.com"  # @param {type:"string"}
)
input_feed_data = "gs://passage-gen-test/FeedGen-Input-Feed.csv" # @param {type:"string"}
evaluation_dataset_name = "evaluation_dataset"# @param {type:"string"}

DATASET_ID = "passage_gen_autosxs"
BQ_TABLE_EVAL = f"{DATASET_ID}.eval_table"

**Only if your bucket doesn't already exist:** Run the following cell to create your Cloud Storage bucket

In [5]:
# ! gsutil mb -l {REGION} -p {PROJECT_ID} {BUCKET_URI}

Creating gs://sandbox-401718-passage-gen-test/...


## Import data set

The 'Input Feed' data is based on a collection of 1,000-row random sample of data from the public BigQuery dataset 'theLook eCommerce'. The data including text and enriched attributes was generated by [FeedGen](https://github.com/google-marketing-solutions/feedgen) and was extracted as a CSV from the FeedGen [input feed - template sheet](https://docs.google.com/spreadsheets/d/19eKTJrbZaUfipAvL5ZQmq_hoxEbLQIlDqURKFJA2OBU/edit#gid=1661242997).

For the purposes of this work, only the first 50 rows are used.

In [3]:
# Import data
df = pd.read_csv(input_feed_data) \
      .drop(['Link', 'Image Link'], axis=1) \
      .head(50)

df.head() 

Unnamed: 0,Item ID,Title,Description,Brand,Gender,Category,Size,Color,Material
0,2480,ASICS Women's Performance Running Capri Tight,"ASICS Women's Performance Running Capri Tight,...",ASICS,Women's,Active,S,White,"Cotton, Polyester"
1,21084,Agave Men's Waterman Relaxed Grey Jean,"Agave Men's Waterman Relaxed Grey Jean, Relaxe...",Agave,Men's,Jeans,33x30,,Denim
2,27569,2XU Men's Swim Compression Long Sleeve Top,"2XU Men's Swim Compression Long Sleeve Top, Li...",2XU,Men's,Swim,M,,PWX Fabric
3,8089,(6249-2) Smart Satin Evening Suit with Flute S...,Smart Satin Evening Suit with Flute Skirt Beig...,Ice,Women's,Suits,L,Beige,Satin
4,1,Seven7 Women's Long Sleeve Stripe Belted Top,"Seven7 Women's Long Sleeve Stripe Belted Top, ...",Seven7,Women's,Tops & Tees,M,Black,


## Set-up Prompts

Product descriptions are generated using a prompt inspired by [FeedGen](https://github.com/google-marketing-solutions/feedgen). This prompt also acts as a reference point for evaluating the quality of the descriptions during model comparisons. These prompts are dynamic and utilize information provided from each product entry in the input feed dataset.


In [4]:
# Predictions

def prompt_func(prompt_input: str):
    """Prompts designed to enrich Product description information"""
    prompt = f"""
        You are a leading digital marketer working for a top retail organization. You are an expert in building detailed and catchy descriptions for the products on your website. 

        Context: {prompt_input}

        Generate ONLY the product description in English that highlights the product's features using the above "Context" information. 
        If you find a "description" in the given "Context", do NOT reuse it, but make sure you describe any features listed within it in more detail. 
        Do NOT repeat sentences. The generated description should strictly be about the provided product. 
        Correct product type, number of items contained in the the product as well as product features such as color should be followed. 
        Any product features that are not present in the input should not be present in the generated description.
        Hyperbolic text, over promising or guarantees are to be avoided.
        The generated description should be at least 50 words long, preferably at least 150. 
        The generated description MUST NOT use special characters or any Markdown or JSON syntax. 

        New Detailed Product Description:"""
    return prompt


In [5]:
# Gemma deployment

def get_job_name_with_datetime(prefix: str) -> str:
    """Gets the job name with date time when triggering deployment jobs."""
    return prefix + datetime.now().strftime("_%Y%m%d_%H%M%S")


def deploy_model_vllm(
    model_name: str,
    model_id: str,
    service_account: str,
    machine_type: str = "g2-standard-12",
    accelerator_type: str = "NVIDIA_L4",
    accelerator_count: int = 1,
    max_model_len: int = 8192,
    dtype: str = "bfloat16",
) -> Tuple[aiplatform.Model, aiplatform.Endpoint]:
    """Deploys models with vLLM on GPU in Vertex AI."""
    endpoint = aiplatform.Endpoint.create(display_name=f"{model_name}-endpoint")

    vllm_args = [
        "--host=0.0.0.0",
        "--port=7080",
        f"--model={model_id}",
        f"--tensor-parallel-size={accelerator_count}",
        "--swap-space=16",
        "--gpu-memory-utilization=0.9",
        f"--max-model-len={max_model_len}",
        f"--dtype={dtype}",
        "--disable-log-stats",
    ]

    env_vars = {
        "MODEL_ID": model_id,
    }
    # if HF_TOKEN:
    #     env_vars["HF_TOKEN"] = HF_TOKEN

    model = aiplatform.Model.upload(
        display_name=model_name,
        serving_container_image_uri=VLLM_DOCKER_URI,
        serving_container_command=["python", "-m", "vllm.entrypoints.api_server"],
        serving_container_args=vllm_args,
        serving_container_ports=[7080],
        serving_container_predict_route="/generate",
        serving_container_health_route="/ping",
        serving_container_environment_variables=env_vars,
        serving_container_shared_memory_size_mb=(16 * 1024),  # 16 GB
        serving_container_deployment_timeout=7200,
    )

    model.deploy(
        endpoint=endpoint,
        machine_type=machine_type,
        accelerator_type=accelerator_type,
        accelerator_count=accelerator_count,
        deploy_request_timeout=1800,
        service_account=service_account,
        sync=True,
        enable_access_logging=True,
    )
    return model, endpoint

## UUID

# Generate a uuid of a specifed length(default=8)
def generate_uuid(length: int = 8) -> str:
    return "".join(random.choices(string.ascii_lowercase + string.digits, k=length))

UUID = generate_uuid()


def save_csv_gcs(BUCKET_NAME: str, evaluation_dataset_name: str):
    """
    Saves a CSV file to a Google Cloud Storage bucket.

    Args:
        BUCKET_NAME (str):  The name of the GCS bucket (excluding the 'gs://' prefix).
        evaluation_dataset_name (str):  The filename (without extension) to use for the saved CSV.

    """
    
    # save to GCS
    storage_client = storage.Client()
    bucket = storage_client.bucket(BUCKET_NAME[5:])
    blob = bucket.blob(f"data/{evaluation_dataset_name}.csv")
    blob.upload_from_filename(f"{evaluation_dataset_name}.csv")

    print(f"File uploaded to cloud storage in {BUCKET_NAME}/data/{evaluation_dataset_name}.csv")
    
    
    
def save_jsonl_gcs(BUCKET_NAME: str, evaluation_dataset_name: str):
    """
    Saves a JSON Lines (.jsonl) file to a Google Cloud Storage bucket.

    Args:
        BUCKET_NAME (str):  The name of the GCS bucket (excluding the 'gs://' prefix).
        evaluation_dataset_name (str):  The filename (without extension) to use for the saved JSONL file.
    """
    
    # save to GCS 
    storage_client = storage.Client()
    bucket = storage_client.bucket(BUCKET_NAME[5:])
    blob = bucket.blob(f"data/{evaluation_dataset_name}.jsonl")
    blob.upload_from_filename(f"{evaluation_dataset_name}.jsonl")

    print(f"File uploaded to cloud storage in {BUCKET_NAME}/data/{evaluation_dataset_name}.jsonl")

## Deploy Gemma and generate responses

Deploy Gemma on GPU using [vLLM](https://github.com/vllm-project/vllm), the state-of-the-art open source LLM serving solution on GPU.

Note: to use Gemma, users will need to click the agreement in Vertex AI Model Garden, and get the URL to Gemma model artifacts

In [6]:
# Initialize Vertex AI API.
aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=STAGING_BUCKET)

### Download Gemma, Extract locally, and save to Cloud Storage

In [7]:
assert (
    VERTEX_MODEL_GARDEN_GEMMA
), "Please click the agreement of Gemma in Vertex AI Model Garden, and get the URL to Gemma model artifacts."

# Only use the last part in case a full command is pasted.
signed_url = VERTEX_MODEL_GARDEN_GEMMA.split(" ")[-1].strip('"')

! mkdir -p ./gemma
! curl -X GET "{signed_url}" | tar -xzvf - -C ./gemma/
! gsutil -m cp -R ./gemma/* {MODEL_BUCKET}

model_path_prefix = MODEL_BUCKET

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0gemma/
gemma/gemma-2b-it/
gemma/gemma-2b-it/config.json
gemma/gemma-2b-it/tokenizer.model
gemma/gemma-2b-it/generation_config.json
gemma/gemma-2b-it/model.safetensors.index.json
gemma/gemma-2b-it/model-00001-of-00002.safetensors
  0 32.7G    0  297M    0     0  49.6M      0  0:11:14  0:00:05  0:11:09 52.0M^C
Copying file://./gemma/gemma/gemma-2b-it/tokenizer.model [Content-Type=application/octet-stream]...
Copying file://./gemma/gemma/gemma-2b-it/generation_config.json [Content-Type=application/json]...
^C


### Define docker images

In [8]:
# Serving docker images.
VLLM_DOCKER_URI = "us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20240220_0936_RC01"

### Deploy Gemma models with vLLM on GPU

vLLM is a high-performance library for serving large language models (LLMs) on GPUs, offering optimizations like paged attention and continuous batching. The following demonstrates deploying the Gemma LLM model using the vLLM serving library. Users can consider various GPU configurations for optimal performance and cost efficiency.

In [9]:
MODEL_ID = "gemma-7b"  # @param ["gemma-2b", "gemma-2b-it", "gemma-7b", "gemma-7b-it"]
model_id = os.path.join(MODEL_BUCKET, MODEL_ID)

In [10]:
# Finds Vertex AI prediction supported accelerators and regions in
# https://cloud.google.com/vertex-ai/docs/predictions/configure-compute.

if "2b" in MODEL_ID:
    # Sets 1 L4 (24G) to deploy Gemma 2B models.
    machine_type = "g2-standard-8"
    accelerator_type = "NVIDIA_L4"
    accelerator_count = 1
    vllm_dtype = "bfloat16"
else:
    # Sets 1 L4 (24G) to deploy Gemma 7B models.
    machine_type = "g2-standard-12"
    accelerator_type = "NVIDIA_L4"
    accelerator_count = 1
    vllm_dtype = "bfloat16"

# Alternative hardware configurations:

# Sets 1 A100 (40G) to deploy Gemma 2B and Gemma 7B models.
# machine_type = "a2-highgpu-1g"
# accelerator_type = "NVIDIA_TESLA_A100"
# accelerator_count = 1
# vllm_dtype = "bfloat16"

# Sets 1 T4 (16G) to deploy Gemma 2B models.
machine_type = "g2-standard-96"
accelerator_type = "NVIDIA_L4"
accelerator_count = 8
vllm_dtype = "float32"

# Note that a larger max_model_len will require more GPU memory.
max_model_len = 2048

model_vllm, endpoint_vllm = deploy_model_vllm(
    model_name=get_job_name_with_datetime(prefix="gemma-serve-vllm"),
    model_id=model_id,
    service_account=SERVICE_ACCOUNT,
    machine_type=machine_type,
    accelerator_type=accelerator_type,
    accelerator_count=accelerator_count,
    max_model_len=max_model_len,
    dtype=vllm_dtype,
)

Creating Endpoint
Create Endpoint backing LRO: projects/757654702990/locations/us-central1/endpoints/7117910623756746752/operations/7910200595150012416
Endpoint created. Resource name: projects/757654702990/locations/us-central1/endpoints/7117910623756746752
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/757654702990/locations/us-central1/endpoints/7117910623756746752')
Creating Model
Create Model backing LRO: projects/757654702990/locations/us-central1/models/1805751036040708096/operations/668412394338254848
Model created. Resource name: projects/757654702990/locations/us-central1/models/1805751036040708096@1
To use this Model in another session:
model = aiplatform.Model('projects/757654702990/locations/us-central1/models/1805751036040708096@1')
Deploying model to Endpoint : projects/757654702990/locations/us-central1/endpoints/7117910623756746752
Deploy Endpoint model backing LRO: projects/757654702990/locations/us-central1/endpoints/711791062375674

### Test Endpoint

In [11]:
# # Loads an existing endpoint instance using the endpoint name:
# endpoint_name = ""  # @param {type:"string"}
# aip_endpoint_name = (
#     f"projects/{PROJECT_ID}/locations/{REGION}/endpoints/{endpoint_name}"
# )
# endpoint_vllm = aiplatform.Endpoint(aip_endpoint_name)

instances = [
    {
        "prompt": "What is a car?",
        "max_tokens": 50,
        "temperature": 1.0,
        "top_p": 1.0,
        "top_k": 10,
        "raw_response": False,
    },
]
response = endpoint_vllm.predict(instances=instances)

prediction = response.predictions[0]
print(prediction)

Prompt:
What is a car?
Output:
 It’s a machine that can carry people and goods from one place to another, and is usually powered by an internal combustion engine running on petrol or diesel. The internal combustion engine is connected to the transmission and drives the wheels, which are connected to


## Run predictions

Generates a product description based on provided context

In [None]:
eval_df = []

# Generate description for each product row
for index, row in df.iterrows():
    if index % 5 == 0:
        print("Processing row:", index+1)

    # prompt_input = row.result
    prompt_input = row.to_dict()
    prompt = prompt_func(prompt_input)
    instances = [
        {
            "prompt": prompt,
            "max_tokens": 1000,
            "temperature": 0.5,
            "top_p": 0.5,
            "top_k": 10,
            "raw_response": True,
        },
    ]
    response = endpoint_vllm.predict(instances=instances)
    prediction = response.predictions[0]

    # Append results
    eval_df.append(
        {
            "prompt_id": prompt_input,
            "prompt": prompt,
            "response_a": prediction,
            "name": prompt_input["Title"],
            "id": prompt_input["Item ID"],
        }
    )


### Save responses and prompts to Cloud Storage

In [14]:
eval_df_ = pd.DataFrame(eval_df)

# Syntax cleanup 
def remove_newlines(text):
    return text.replace('\n', ' ')

eval_df_.loc[:, 'response_a'] = eval_df_['response_a'].apply(remove_newlines)

# Save CSV
eval_df_.to_csv(f"{evaluation_dataset_name}.csv", index=False)
save_csv_gcs(BUCKET_NAME, evaluation_dataset_name)

# Save JSON
eval_df_.to_json(f"{evaluation_dataset_name}.jsonl", orient="records", lines=True)
save_jsonl_gcs(BUCKET_NAME, evaluation_dataset_name)

File uploaded to cloud storage in gs://sandbox-401718-passage-gen-test/data/evaluation_dataset.csv
File uploaded to cloud storage in gs://sandbox-401718-passage-gen-test/data/evaluation_dataset.jsonl


In [15]:
eval_df_.head()

Unnamed: 0,prompt_id,prompt,response_a,name,id
0,"{'Item ID': 2480, 'Title': 'ASICS Women's Perf...",\n You are a leading digital marketer w...,ASICS Women's Performance Running Ca...,ASICS Women's Performance Running Capri Tight,2480
1,"{'Item ID': 21084, 'Title': 'Agave Men's Water...",\n You are a leading digital marketer w...,Agave Men's Waterman Relaxed Grey Je...,Agave Men's Waterman Relaxed Grey Jean,21084
2,"{'Item ID': 27569, 'Title': '2XU Men's Swim Co...",\n You are a leading digital marketer w...,2XU Men's Swim Compression Long Slee...,2XU Men's Swim Compression Long Sleeve Top,27569
3,"{'Item ID': 8089, 'Title': '(6249-2) Smart Sat...",\n You are a leading digital marketer w...,Smart Satin Evening Suit with Flute ...,(6249-2) Smart Satin Evening Suit with Flute S...,8089
4,"{'Item ID': 1, 'Title': 'Seven7 Women's Long S...",\n You are a leading digital marketer w...,Seven7 Women's Long Sleeve Stripe Be...,Seven7 Women's Long Sleeve Stripe Belted Top,1


## Run AutoSxS

Compare Google published models such as text-bison (Model A) to the responses generated by Gemma (model B).

The expected parameters are: `inference_instruction` (details on how to perform a task) and `inference_context` (content to reference to perform the task). As an example, `{'inference_context': {'column': 'my_prompt'}}` uses the evaluation dataset's `prompt` column for the AutoRater's context.




In [16]:
template_uri = 'pipeline.yaml'
compiler.Compiler().compile(
    pipeline_func=model_evaluation.autosxs_pipeline,
    package_path=template_uri,
)

In [17]:
UUID = generate_uuid()
display_name = f"examples-resp-model-full-32k-{UUID}"
context_column = "name"
question_column = "prompt"
response_column = "response_a"
model_prompt = "prompt"
model_resource = MODEL_RESOURCE

parameters = {
    "evaluation_dataset": "gs://passage-gen-test/data/evaluation_dataset.jsonl",
    "id_columns": [question_column],
    "autorater_prompt_parameters": {
        "inference_context": {"column": context_column},
        "inference_instruction": {"column": question_column},
    },
    "task": "question_answering@001",
    "model_a": model_resource,
    "model_a_prompt_parameters": {"prompt": {"column": model_prompt}},
    "response_column_b": response_column,
}

aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)
job = aiplatform.PipelineJob(
    job_id=display_name,
    display_name=display_name,
    pipeline_root=os.path.join(BUCKET_URI, display_name),
    template_path=template_uri,
    parameter_values=parameters,
    enable_caching=False,
)
job.run(sync=True)


Creating PipelineJob
PipelineJob created. Resource name: projects/757654702990/locations/us-central1/pipelineJobs/examples-resp-model-full-32k-3qt9dw7z
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/757654702990/locations/us-central1/pipelineJobs/examples-resp-model-full-32k-3qt9dw7z')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/examples-resp-model-full-32k-3qt9dw7z?project=757654702990
PipelineJob projects/757654702990/locations/us-central1/pipelineJobs/examples-resp-model-full-32k-3qt9dw7z current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/757654702990/locations/us-central1/pipelineJobs/examples-resp-model-full-32k-3qt9dw7z current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/757654702990/locations/us-central1/pipelineJobs/examples-resp-model-full-32k-3qt9dw7z current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/75765470

### Fetch the AutoSxS judgments and metrics

In [18]:
# To use an existing pipeline, override job using the line below.
# job = aiplatform.PipelineJob.get('projects/[PROJECT_NUMBER]/locations/[REGION]/pipelineJobs/[PIPELINE_RUN_NAME]')

for details in job.task_details:
    if details.task_name == "online-evaluation-pairwise":
        break

# Judgments
judgements_uri = details.outputs["judgments"].artifacts[0].uri
judgements_df = pd.read_json(judgements_uri, lines=True)
judgements_df.head()

Unnamed: 0,prompt,inference_instruction,inference_context,response_a,response_b,choice,explanation,confidence
0,\n You are a leading digital marketer w...,\n You are a leading digital marketer w...,ASICS Women's Performance Running Capri Tight,Unleash your athletic potential with our ASIC...,ASICS Women's Performance Running Ca...,A,Response (B) is grounded and fully answers the...,1
1,\n You are a leading digital marketer w...,\n You are a leading digital marketer w...,Unisex Chequered Arab Arafat Shemagh Kafiyah D...,Discover our stunning collection of 17 gorgeo...,"""This unisex chequered Arab Arafat s...",B,Both responses are well-written and capture th...,1
2,\n You are a leading digital marketer w...,\n You are a leading digital marketer w...,RetroFit Men's Long Sleeve Pullover Hoodie Swe...,Slip into cozy comfort with our RetroFit Men'...,"""RetroFit Men's Long Sleeve Pullover...",A,Response (A) provides a detailed description o...,1
3,\n You are a leading digital marketer w...,\n You are a leading digital marketer w...,City Hunter Soft Nylon Russian/Trapper/Trooper...,Stay warm and protected from the harsh winter...,City Hunter Soft Nylon Russian/Trapp...,A,Response (A) is well-written and provides addi...,1
4,\n You are a leading digital marketer w...,\n You are a leading digital marketer w...,Rago Pull On Open Girdle,Experience unparalleled support and comfort w...,"Rago Pull On Open Girdle, Firm contr...",A,Response (A) provides a detailed description o...,1


In [19]:
# Aggregate Metrics
for details in job.task_details:   #full job details
    if details.task_name == "model-evaluation-text-generation-pairwise":
        break
pd.DataFrame([details.outputs["autosxs_metrics"].artifacts[0].metadata])

Unnamed: 0,autosxs_model_a_win_rate,autosxs_model_b_win_rate
0,0.98,0.02


In [20]:
details.outputs["autosxs_metrics"]

artifacts {
  name: "projects/757654702990/locations/us-central1/metadataStores/default/artifacts/13809795646850762148"
  display_name: "autosxs_metrics"
  etag: "1715016211631"
  create_time {
    seconds: 1715016087
    nanos: 568000000
  }
  update_time {
    seconds: 1715016211
    nanos: 631000000
  }
  state: LIVE
  schema_title: "system.Metrics"
  schema_version: "0.0.1"
  metadata {
    fields {
      key: "autosxs_model_b_win_rate"
      value {
        number_value: 0.02
      }
    }
    fields {
      key: "autosxs_model_a_win_rate"
      value {
        number_value: 0.98
      }
    }
  }
}

### Investigate AutoSxS comparison judgements and explanations

In [28]:
print(f"Response A: {judgements_df['response_a'][6]}\n")
print(f"Response B: {judgements_df['response_b'][6]}\n")
print(f"Explanation: {judgements_df['explanation'][6]}\n")

Response A:  Hit the beach in style with the Billabong Gettin Jiggy Board Shorts, a pair of vibrant and functional water shorts designed for men who love to make a statement. These board shorts come in a pack of one and boast a captivating all-over tie-dye print in shades of orange, sure to turn heads wherever you go.

Made with water-repellent fabric, these shorts dry quickly, ensuring you stay comfortable and fresh even after taking a dip in the ocean or lounging by the

Response B:           "Billabong Gettin Jiggy Board Short - Men's, 5-inch water shorts with all-over tie-dye print, Elastic waistband with drawstring, Water-repellent fabric for quick drying, Comfortable and stylish"                                                                                                                                                                                                                                                                                                                  

In [26]:
# Save Judgements to GCS
judgements_df.to_csv("judgements_df.csv", index=False)
save_csv_gcs(BUCKET_NAME, "judgements_df")

File uploaded to cloud storage in gs://sandbox-401718-passage-gen-test/data/judgements_df.csv


## (optional) Ground Truth human preferences

Users have the option to specify a column that indicates the preference between pre-generated responses. Human alignment is used to build trust in the autorater by providing metrics that quantify the agreement between AutoSXS and the human judgment, building trust in the autorater's ability to accurately judge the responses. By quantifying the agreement between the model's choices and human preferences, users can confidently rely on AutoSXS to accurately identify the most suitable model to generate the best product descriptions for the product catalog.

![human-pref.png](./imgs/check-alignment.png)

In [29]:
# Inspect
eval_df_ = pd.read_csv(f"{BUCKET_NAME}/data/{evaluation_dataset_name}.csv")
eval_df_["human_preferences"] = "A"

# Judgements
judgements_df = pd.read_csv(f"{BUCKET_NAME}/data/judgements_df.csv")
eval_df_human_pref = pd.merge(eval_df_, judgements_df, on='prompt', how='left')

# Save JSON to GCS
eval_df_human_pref.to_json("evaluation_dataset_human_pref.jsonl", orient="records", lines=True)
save_jsonl_gcs(BUCKET_NAME, "evaluation_dataset_human_pref")

File uploaded to cloud storage in gs://sandbox-401718-passage-gen-test/data/evaluation_dataset_human_pref.jsonl


### Run AutoSxS w/ Human Preference

In [36]:
template_uri = 'pipeline.yaml'
compiler.Compiler().compile(
    pipeline_func=model_evaluation.autosxs_pipeline,
    package_path=template_uri,
)

In [37]:
UUID = generate_uuid()
display_name = f"examples-resp-model-full-32k-human-pref-{UUID}-pref-a"
context_column = "name"
question_column = "prompt"
response_column_a = "response_a_y"
response_column_b = "response_b"
human_preference_column = "human_preferences"

parameters = {
    "evaluation_dataset": f"{BUCKET_NAME}/data/evaluation_dataset_human_pref.jsonl",
    "id_columns": [question_column],
    "autorater_prompt_parameters": {
        "inference_context": {"column": context_column},
        "inference_instruction": {"column": question_column},
    },
    "task": "question_answering@001",
    "response_column_a": response_column_a,
    "response_column_b": response_column_b,
    "human_preference_column": human_preference_column,

}

aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)
job = aiplatform.PipelineJob(
    job_id=display_name,
    display_name=display_name,
    pipeline_root=os.path.join(BUCKET_URI, display_name),
    template_path=template_uri,
    parameter_values=parameters,
    enable_caching=False,
)
job.run(sync=False)

Creating PipelineJob
PipelineJob created. Resource name: projects/757654702990/locations/us-central1/pipelineJobs/examples-resp-model-full-32k-human-pref-sdazv6of-pref-a
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/757654702990/locations/us-central1/pipelineJobs/examples-resp-model-full-32k-human-pref-sdazv6of-pref-a')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/examples-resp-model-full-32k-human-pref-sdazv6of-pref-a?project=757654702990
PipelineJob projects/757654702990/locations/us-central1/pipelineJobs/examples-resp-model-full-32k-human-pref-sdazv6of-pref-a current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/757654702990/locations/us-central1/pipelineJobs/examples-resp-model-full-32k-human-pref-sdazv6of-pref-a current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/757654702990/locations/us-central1/pipelineJobs/examples-resp-model-full-32k-

In [38]:
for details in job.task_details:
    if details.task_name == "online-evaluation-pairwise":
        break

# Judgments
judgements_uri = details.outputs["judgments"].artifacts[0].uri
judgements_df = pd.read_json(judgements_uri, lines=True)
judgements_df.head()

Unnamed: 0,prompt,inference_instruction,inference_context,response_a,response_b,choice,explanation,confidence,human_preference
0,\n You are a leading digital marketer w...,\n You are a leading digital marketer w...,Nautica Men's Castaway Stripe Hoodie,Slip into effortless style with our Nautica M...,"""Nautica Men's Castaway Stripe Hoodi...",A,Response (B) is a repetition of the input desc...,1,A
1,\n You are a leading digital marketer w...,\n You are a leading digital marketer w...,Unisex Chequered Arab Arafat Shemagh Kafiyah D...,Discover our stunning collection of 17 gorgeo...,"""This unisex chequered Arab Arafat s...",B,Both responses are well-written and capture th...,1,A
2,\n You are a leading digital marketer w...,\n You are a leading digital marketer w...,ASICS Women's Performance Running Capri Tight,Unleash your athletic potential with our ASIC...,ASICS Women's Performance Running Ca...,A,Response (B) is grounded and fully answers the...,1,A
3,\n You are a leading digital marketer w...,\n You are a leading digital marketer w...,City Hunter Soft Nylon Russian/Trapper/Trooper...,Stay warm and protected from the harsh winter...,City Hunter Soft Nylon Russian/Trapp...,A,Response (A) is well-written and provides addi...,1,A
4,\n You are a leading digital marketer w...,\n You are a leading digital marketer w...,(6249-2) Smart Satin Evening Suit with Flute S...,Indulge in timeless elegance with our exquisi...,Smart Satin Evening Suit with Flute ...,A,Response (A) is a well-written product descrip...,1,A


In [39]:
# Aggregate Metrics
for details in job.task_details:   #full job details
    if details.task_name == "model-evaluation-text-generation-pairwise":
        break
pd.DataFrame([details.outputs["autosxs_metrics"].artifacts[0].metadata])

Unnamed: 0,accuracy,autosxs_model_a_win_rate,autosxs_model_b_win_rate,cohens_kappa,f1,fn,fp,human_preference_model_a_win_rate,human_preference_model_b_win_rate,precision,recall,tn,tp
0,0.9,0.9,0.1,0.0,0.947368,1.0,0.0,1.0,0.0,1.0,0.9,0.0,9.0


## Downstream retuning of incumbent model

Based on evaluation results, the model can be fine-tuned to improve performance.

Reference notebook for fine tuning Gemma from the Google Cloud repository: [model_garden_gemma_finetuning_on_vertex.ipynb](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_gemma_finetuning_on_vertex.ipynb)

## Clean-up

* Undeploy and delete endpoint

In [41]:
# Undeploy model and delete endpoint
endpoint_vllm.delete(force=True)

Undeploying Endpoint model: projects/757654702990/locations/us-central1/endpoints/7117910623756746752
Undeploy Endpoint model backing LRO: projects/757654702990/locations/us-central1/endpoints/7117910623756746752/operations/423810639576694784
Endpoint model undeployed. Resource name: projects/757654702990/locations/us-central1/endpoints/7117910623756746752
Deleting Endpoint : projects/757654702990/locations/us-central1/endpoints/7117910623756746752
Delete Endpoint  backing LRO: projects/757654702990/locations/us-central1/operations/1335789564119220224
Endpoint deleted. . Resource name: projects/757654702990/locations/us-central1/endpoints/7117910623756746752
