## **Testing and evaluation**

Importing vertexai generative model package

In [139]:
from vertexai import init
from vertexai.preview.generative_models import GenerativeModel

Initializing VertexAI

In [167]:
init(project="qwiklabs-gcp-00-1c0ebb19fb7c", location="global")


Function 1: category classifier

In [144]:
def classify_question_with_gemini(question: str) -> str:
    categories = ["Employment", "General Information", "Emergency Services", "Tax Related"]
    prompt = (
        "You are a good content classifier tool.\n"
        "Classify the user's question into one of the following categories:\n"
        f"{', '.join(categories)}.\n\n"
        "If the question doesn't come under any category, just say: No relevant category.\n"
        f"Return only the matched category name exactly as listed above (i.e., {', '.join(categories)}).\n"
        "Do not enclose the result/category with single or double quotes.\n\n"
        f"Question: {question}\n"
        "Category:"
    )

    gemini_model = GenerativeModel("gemini-2.5-pro-preview-06-05")
    response = gemini_model.generate_content(prompt)
    result = response.text.strip()

    if result in categories:
        return result
    else:
        return "No relevant category"

In [145]:
classify_question_with_gemini("Best tourist places?")

'General Information'

Installing pytest library

In [146]:
!pip install -q google-cloud-aiplatform pytest

Test case for category classifier function

In [168]:
import pytest

def test_classify_question(testcases):
  for test in testcases:
    result = classify_question_with_gemini(test['query'])
    print("Result:", result)

    assert result is not None, "Function returned None"
    assert result.strip().lower() == test['category'].lower(), "Classification did not match"
    print(f"Tax test case passed for category - {test['category']}")


In [169]:
testcases=[
    {'query':"accident helpline number",'category':"Emergency Services"},
    {'query':"Best tourist places",'category':"General Information"},
    {'query':"How to apply for a job?",'category':"Employment"},
    {'query':"What are the steps to file my income tax?",'category':"Tax Related"}
    ]


In [170]:
test_classify_question(testcases)

Result: Emergency Services
Tax test case passed for category - Emergency Services
Result: General Information
Tax test case passed for category - General Information
Result: Employment
Tax test case passed for category - Employment
Result: Tax Related
Tax test case passed for category - Tax Related


Function2: Generate social media posts

In [151]:
def generate_announcement_post(event: str) -> str:
    prompt = (
        "You are a professional social media content.\n"
        "Write a short, friendly social media post for the following government announcement:\n"
        f"{event}\n\n"
        "Make it suitable for Twitter or Facebook."
    )

    gemini_model = GenerativeModel("gemini-2.5-pro-preview-06-05")
    response = gemini_model.generate_content(prompt)
    return response.text


In [152]:
generate_announcement_post("New year wishes")

'Of course! Here are a few options, from casual to slightly more formal, all designed to be short, friendly, and engaging.\n\n### Option 1: (Warm & Casual)\n\nHappy New Year! 🎉 Wishing you and your family a year filled with joy, health, and happiness. We\'re excited for all the great things to come for our community in 2024. Cheers to a fresh start!\n\n#HappyNewYear #NewYear2024 #[YourCity/StateName]\n\n---\n\n### Option 2: (Community-Focused)\n\nWishing a very Happy New Year to everyone in our wonderful community! May the coming year bring new opportunities, success, and prosperity for all. We look forward to another year of serving you and building a better future, together.\n\n#HappyNewYear #Community #LookingForward\n\n---\n\n### Option 3: (Short & Sweet for Twitter)\n\nHappy New Year! 🎆 From all of us here, we wish you a safe, healthy, and prosperous 2024. Let\'s make it a great one!\n\n#NewYearWishes #2024\n\n---\n\n**Pro-Tips for the Post:**\n\n*   **Add a Visual:** Include a hi

function to evaluate the relevance of generate social media post result

In [153]:
def evaluate_post_relevance_with_gemini(input_text: str, generated_post: str) -> str:
    model = GenerativeModel("gemini-2.5-pro-preview-06-05")

    prompt = f"""
    You are an AI assistant helping to check if a social media post is relevant to the original announcement.

    Original Announcement:
    "{input_text}"

    Generated Post:
    "{generated_post}"

    Is the generated post relevant to the original announcement?
    Reply only with "Yes" or "No".
    """

    response = model.generate_content(prompt)
    return response.text.strip()


Pytest for social media post generator

In [154]:
import pytest

def test_announcement_post_relevance_with_gemini():
    input_text = "All schools will be closed due to snow on Monday."
    post = generate_announcement_post(input_text)

    relevance = evaluate_post_relevance_with_gemini(input_text, post)

    print("Generated Post:\n", post)
    print("Gemini Relevance Check:", relevance)

    assert relevance.lower() == "yes", "The post is not relevant"


In [155]:
test_announcement_post_relevance_with_gemini()

Generated Post:
 Of course! Here are a few options, from the most straightforward and friendly to a slightly more playful one.

### Option 1 (Friendly & Direct - Great for Facebook/Twitter)

**Heads Up, Parents & Students!** ❄️

Due to the heavy snow, all [Your City/County] schools will be closed on Monday, [Date].

Please stay safe and warm out there! We'll share any further updates right here.

#SnowDay #SchoolClosure #[YourCityName]

---

### Option 2 (Safety-Focused - A bit more formal)

**Important Weather Announcement:**

For the safety of our students and staff, all schools will be closed on Monday, [Date], due to the snow.

Stay home if you can and please be careful on the roads. For more information, visit our website: [Link]

#SafetyFirst #WeatherUpdate #[YourCounty]

---

### Option 3 (Playful & Fun - Great for a school district's page)

**Get the hot chocolate ready!** ☕️☃️

It's official: Monday is a SNOW DAY! All our schools will be closed due to the winter weather.

Enjo

# **Google Evaluation API**

Test dataset for evaluation

In [156]:
test_dataset = [
    {"query": "How do I apply for a government job?", "category": "Employment"},
    {"query": "What is the income tax filing deadline?", "category": "Tax Related"},
    {"query": "Is there a helpline for ambulance services?", "category": "Emergency Services"},
    {"query": "What are the best places to visit in summer?", "category": "General Information"},
    {"query": "How do I upload my resume to the job portal?", "category": "Employment"},
    {"query": "Where can I pay my property tax?", "category": "Tax Related"},
    {"query": "Fire emergency contact number?", "category": "Emergency Services"},
    {"query": "Where can I find tourist guides for my city?", "category": "General Information"},
    {"query": "Steps to register for unemployment benefits?", "category": "Employment"},
    {"query": "Do we have school tomorrow?", "category": "General Information"},
    {"query": "Police station near me?", "category": "Emergency Services"},
    {"query": "I want to know about tax rebates.", "category": "Tax Related"},
    {"query": "What time does the post office open?", "category": "General Information"},
    {"query": "Can I file my taxes online?", "category": "Tax Related"},
    {"query": "Help! There’s a gas leak in my house!", "category": "Emergency Services"},
]


In [157]:
import pandas as pd
from vertexai.preview.evaluation import EvalTask, MetricPromptTemplateExamples


Configuring evaluation dataset

In [158]:
categories = ["Employment", "General Information", "Emergency Services", "Tax Related"]
eval_dataset = pd.DataFrame([
    {
        "instruction": "You are a good content classifier tool.\n"
                        "Classify the user's question into one of the following categories:\n"
                        f"{', '.join(categories)}.\n\n"
                        "If the question doesn't come under any category, just say: No relevant category.\n"
                        f"Return only the matched category name exactly as listed above (i.e., {', '.join(categories)}).\n"
                        "Do not enclose the result/category with single or double quotes.\n\n",
        "context": f"category: {test_data['query']}",
        "response": test_data["category"],
    } for test_data in test_dataset
])

In [159]:
import datetime
run_ts = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
eval_task = EvalTask(
    dataset=eval_dataset,
    metrics=[
        MetricPromptTemplateExamples.Pointwise.INSTRUCTION_FOLLOWING,
        MetricPromptTemplateExamples.Pointwise.GROUNDEDNESS,
        MetricPromptTemplateExamples.Pointwise.VERBOSITY,
        MetricPromptTemplateExamples.Pointwise.SUMMARIZATION_QUALITY
    ],
    experiment=f"classify-question-with-gemini-{run_ts}"
)


In [164]:
init(project="qwiklabs-gcp-00-1c0ebb19fb7c", location="us-central1")

Running evaluation task

In [165]:
prompt_template = (
    "Instruction: {instruction}. query: {context}. category: {response}"
)
result = eval_task.evaluate(
      prompt_template=prompt_template,
      experiment_run_name=f"classify-question-with-gemini-{run_ts}"
)
evaluation_results = []
evaluation_results.append(result)

INFO:vertexai.preview.evaluation.eval_task:Logging Eval experiment evaluation metadata: {'prompt_template': 'Instruction: {instruction}. query: {context}. category: {response}'}
INFO:vertexai.preview.evaluation._evaluation:Assembling prompts from the `prompt_template`. The `prompt` column in the `EvalResult.metrics_table` has the assembled prompts used for model response generation.
INFO:vertexai.preview.evaluation._evaluation:Computing metrics with a total of 60 Vertex Gen AI Evaluation Service API requests.
100%|██████████| 60/60 [00:06<00:00,  9.13it/s]
INFO:vertexai.preview.evaluation._evaluation:All 60 metric requests are successfully computed.
INFO:vertexai.preview.evaluation._evaluation:Evaluation Took:6.581563384002948 seconds


Result comparison

In [166]:
from vertexai.preview.evaluation import notebook_utils
notebook_utils.display_eval_result(eval_result=result)

### Summary Metrics

Unnamed: 0,row_count,instruction_following/mean,instruction_following/std,groundedness/mean,groundedness/std,verbosity/mean,verbosity/std,summarization_quality/mean,summarization_quality/std
0,15.0,5.0,0.0,1.0,0.0,0.0,0.0,5.0,0.0


### Row-based Metrics

Unnamed: 0,instruction,context,response,prompt,instruction_following/explanation,instruction_following/score,groundedness/explanation,groundedness/score,verbosity/explanation,verbosity/score,summarization_quality/explanation,summarization_quality/score
0,You are a good content classifier tool.\nClass...,category: How do I apply for a government job?,Employment,Instruction: You are a good content classifier...,The response accurately classifies the user's ...,5.0,The response 'Employment' is directly attribut...,1.0,"The response is perfectly concise, providing a...",0.0,The model correctly classified the query into ...,5.0
1,You are a good content classifier tool.\nClass...,category: What is the income tax filing deadline?,Tax Related,Instruction: You are a good content classifier...,The response correctly classified the user's q...,5.0,The AI model correctly classifies the user's q...,1.0,"The response is perfectly concise, providing t...",0.0,The response correctly identifies the category...,5.0
2,You are a good content classifier tool.\nClass...,category: Is there a helpline for ambulance se...,Emergency Services,Instruction: You are a good content classifier...,The model correctly classifies the user query ...,5.0,The response 'Emergency Services' is directly ...,1.0,"The response is perfectly concise, providing a...",0.0,The response correctly classifies the query in...,5.0
3,You are a good content classifier tool.\nClass...,category: What are the best places to visit in...,General Information,Instruction: You are a good content classifier...,The response follows all the instructions incl...,5.0,"The response, 'General Information', accuratel...",1.0,"The response is perfectly concise, providing a...",0.0,The response correctly classifies the user's q...,5.0
4,You are a good content classifier tool.\nClass...,category: How do I upload my resume to the job...,Employment,Instruction: You are a good content classifier...,The response correctly identifies the category...,5.0,The AI's response is fully grounded in the use...,1.0,The response is perfectly concise and provides...,0.0,The response accurately classifies the user's ...,5.0
5,You are a good content classifier tool.\nClass...,category: Where can I pay my property tax?,Tax Related,Instruction: You are a good content classifier...,The response correctly classifies the query in...,5.0,The response 'Tax Related' is fully grounded i...,1.0,"The response is perfectly concise, providing a...",0.0,The model correctly classified the query into ...,5.0
6,You are a good content classifier tool.\nClass...,category: Fire emergency contact number?,Emergency Services,Instruction: You are a good content classifier...,The response correctly classified the query in...,5.0,The response is fully grounded as it only uses...,1.0,The response is appropriately concise as it di...,0.0,The response correctly classifies the user que...,5.0
7,You are a good content classifier tool.\nClass...,category: Where can I find tourist guides for ...,General Information,Instruction: You are a good content classifier...,The response completely fulfills the instructi...,5.0,The response correctly identifies the category...,1.0,The response is perfectly concise as it return...,0.0,The response correctly classifies the query in...,5.0
8,You are a good content classifier tool.\nClass...,category: Steps to register for unemployment b...,Employment,Instruction: You are a good content classifier...,The model accurately classifies the user quest...,5.0,The response is fully grounded as it accuratel...,1.0,"The response is perfectly concise, providing a...",0.0,"The response follows all instructions, correct...",5.0
9,You are a good content classifier tool.\nClass...,category: Do we have school tomorrow?,General Information,Instruction: You are a good content classifier...,The response correctly classifies the user que...,5.0,The response correctly identified the user's q...,1.0,"The response is perfectly concise, providing a...",0.0,The response correctly classifies the user's q...,5.0
