# Challenge 3 - Testing and Evaluation

## Import Required libraries

import core libraries for using Google Generative AI (genai),  
evaluation tools from Vertex AI, testing with pytest, and data handling with pandas and JSON.

In [1]:
from google import genai
from google.genai import types
import pytest
import re
import json
import pandas as pd

from vertexai.evaluation import (
    EvalTask,
    MetricPromptTemplateExamples,
)
from vertexai.preview.evaluation import notebook_utils
from datetime import datetime

## Define Prompts and Configuration

Store the project ID and model name, and define three system prompts:  
1. a classification assistant to categorize user queries into predefined government-related categories,  
2. a social media post generator that produces officially formatted government announcements with strict prefix/suffix rules,  
3. a quality control prompt to validate that generated posts are relevant, clear, concise, and adhere to the required format.


In [2]:
PROJECT_ID = "qwiklabs-gcp-03-be6b72ae4b0d"

MODEL = "gemini-2.0-flash-001"

question_classifier_system_prompt = """
You are an intelligent classification assistant that categorizes user questions into one of the following categories:

1. Employment
2. General Information
3. Emergency Services
4. Tax Related

## Output Format:
Respond only with the category name as a single word or phrase: "Employment", "General Information", "Emergency Services", or "Tax Related".

Do not include explanations, markdown, or additional text.

## Example:
Input: Is there any vaccancy available in post office?
Output: Employment

Input: Any tax rate changes in upcoming financial year?
Output: Tax Related
"""

gov_announcement_system_prompt = """
You are a social media manager for a government agency.

Your task is to write clear, concise, and official social media posts for government announcements.

## MANDATORY FORMAT REQUIREMENTS:
- The post MUST start with: OFFICIAL ANNOUNCEMENT:
- The post MUST end with: #govAlert
- If the post does NOT follow this exact format, it will be REJECTED.
- The post should be under 280 characters.
- Tone must be calm, informative, and sound like an official government communication.
- Do NOT include quotes, markdown, or any explanations. Output ONLY the post text.

## Output Format (strict):
Only return the social media post as a single line. No "Post:" prefix. No extra comments.

## Examples:

Announcement: Severe weather warning for Chennai on June 18.
OFFICIAL ANNOUNCEMENT: Heavy rain and strong winds expected in Chennai on June 18. Stay indoors and follow local advisories. #govAlert

Announcement: Income tax filing deadline extended to August 31.
OFFICIAL ANNOUNCEMENT: The income tax filing deadline has been extended to August 31. Visit the official portal for details. #govAlert
"""


test_gov_announcement_system_prompt = """
You are a quality control assistant evaluating government social media posts.

You will be given:
- The original announcement
- The generated social media post

Your task is to verify if the post:
1. Reflects the original announcement accurately (event, date, location)
2. Is clear and easy to understand
3. Is concise (under 280 characters)
4. Uses an appropriate, official tone
5. STRICTLY follows this format:
   - Starts with "OFFICIAL ANNOUNCEMENT:"
   - Ends with "#govAlert"

If any rule is broken, the result is "No".

Respond ONLY with JSON:
{
  "result": "Yes" or "No",
  "reason": "Explain why"
}
"""


## Initialize Gemini AI Client

initialize the Gemini client configured to use Vertex AI with the specified project ID and global location. this client will be used to send prompts and receive model-generated responses.


In [3]:
gemini_client = genai.Client(
    vertexai=True,
    project=PROJECT_ID,
    location="global",
)

## Define JSON Extraction Util

define a helper function that searches for and parses the first JSON object found within a string.  
used to safely extract model-generated JSON responses for validation or processing.


In [4]:
def extract_json(text):
    """
    Extracts the JSON object found in the input text string.

    Parameters:
        text (str): The string containing a JSON object.

    Returns:
        dict or None: Parsed JSON object as a Python dictionary if successful,
                      None if no valid JSON is found or parsing fails.
    """
    # Regular expression to find the first JSON object in the text
    match = re.search(r'\{.*?\}', text, re.DOTALL)
    if match:
        try:
            return json.loads(match.group())
        except json.JSONDecodeError:
            print("Invalid JSON format")
            return None
    else:
        print("No JSON found")
        return None

## Classifier Function

define a function that uses the Gemini model to classify a user's question into one of the predefined categories.  
the model is guided by a system prompt and returns the predicted category as plain text.

In [5]:
# Function to classify the user's question
def classify_question(user_input, system_prompt=question_classifier_system_prompt):
  response = gemini_client.models.generate_content(
    model = MODEL,
    contents = [f"Input Prompt: ${user_input}"],
    config = types.GenerateContentConfig(
        temperature = 1,
        top_p = 1,
        system_instruction=[types.Part.from_text(text=system_prompt)]
    )
  )
  return response.text

classify_question("Are there job openings available in the finance department?")

'Employment\n'

## Generate Social Media Post Function

define a function that uses the Gemini model to generate a concise and officially formatted social media post based on a government announcement.  
the output adheres to tone, format, and length constraints defined in the system prompt.


In [6]:
def create_social_media_post(user_input, system_prompt=gov_announcement_system_prompt):
  response = gemini_client.models.generate_content(
    model = MODEL,
    contents = [f"Input Prompt: ${user_input}"],
    config = types.GenerateContentConfig(
        temperature = 1,
        top_p = 1,
        system_instruction=[types.Part.from_text(text=system_prompt)]
    )
  )
  return response.text

create_social_media_post("Severe thuderstrom! Red Alert in chennai!!")



## Pytest for Question Classification

define and execute test cases to validate that the question classifier function correctly categorizes various user queries into the expected government-related categories.  
each test compares the model output against the expected category and raises an assertion error on mismatch.

In [7]:
# Pytest for question classifier AI
test_cases = [
    ("Are there job openings available in the finance department?", "Employment"),
    ("How do I pay my taxes online?", "Tax Related"),
    ("What should I do if there's an earthquake?", "Emergency Services"),
    ("Where is the city library located?", "General Information"),
]

# Run test cases
for user_input, expected_category in test_cases:
    result = classify_question(user_input)
    print(f"Query: {user_input}")
    print(f"Expected: {expected_category}, Actual: {result.strip()}\n")
    assert result.strip() == expected_category, f"Failed for input: {user_input}"

Query: Are there job openings available in the finance department?
Expected: Employment, Actual: Employment

Query: How do I pay my taxes online?
Expected: Tax Related, Actual: Tax Related

Query: What should I do if there's an earthquake?
Expected: Emergency Services, Actual: Emergency Services

Query: Where is the city library located?
Expected: General Information, Actual: General Information



## Pytest for Social Media Post Generator

run a series of test announcements through the social media post generator and evaluate the generated output using a quality control prompt.  
each post is validated for format, tone, and content accuracy, and assertions are raised if the response does not meet the defined quality standards.


In [8]:
# Pytest for social media post generator AI
# Sample government announcements and what you'd expect in the post
test_announcements = [
    "Fire reported at T. Nagar market area, emergency services on-site",
    "Adyar River overflow warning issued due to heavy rainfall in Chennai",
    "Scheduled power shutdown in Velachery on June 19 from 9 AM to 4 PM for maintenance",
    "COVID-19 vaccination camp to be held at Marina Beach on June 22",
    "Traffic diversion in Anna Salai due to metro construction starting June 21"
]

def test_create_social_media_post(announcement_content, generated_post):
  response = gemini_client.models.generate_content(
    model = MODEL,
    contents = [f"Announcement: {announcement_content}\n Generated Post: {generated_post}"],
    config = types.GenerateContentConfig(
        temperature = 0,
        top_p = 1,
        system_instruction=[types.Part.from_text(text=test_gov_announcement_system_prompt)]
    )
  )
  clean_json = extract_json(response.text)
  print(f"Testing Annoucement : {announcement_content}\nGenerated Post: {generated_post}\nResponse: {clean_json['result']}\n")
  assert clean_json['result'] == 'Yes', f"Failed for announcement: {announcement_content} because {clean_json['reason']}"

for announcement in test_announcements:
  post = create_social_media_post(announcement)
  test_create_social_media_post(announcement, post)

Testing Annoucement : Fire reported at T. Nagar market area, emergency services on-site
Generated Post: OFFICIAL ANNOUNCEMENT: Fire reported at T. Nagar market area. Emergency services are responding. Avoid the area. Follow instructions from officials. #govAlert

Response: Yes


Response: Yes

Testing Annoucement : Scheduled power shutdown in Velachery on June 19 from 9 AM to 4 PM for maintenance
Generated Post: OFFICIAL ANNOUNCEMENT: Scheduled power shutdown in Velachery on June 19 from 9 AM to 4 PM for maintenance. #govAlert

Response: Yes

Testing Annoucement : COVID-19 vaccination camp to be held at Marina Beach on June 22
Generated Post: OFFICIAL ANNOUNCEMENT: A COVID-19 vaccination camp will be held at Marina Beach on June 22. All are welcome. #govAlert

Response: Yes

Testing Annoucement : Traffic diversion in Anna Salai due to metro construction starting June 21
Generated Post: OFFICIAL ANNOUNCEMENT: Traffic diversion in Anna Salai due to metro construction starting June 21. Pl

## Evaluation

prepare a dataset of sample government announcements and their corresponding AI-generated social media posts.  
the prompts and responses are structured into a DataFrame for further evaluation or visualization.


In [9]:
prompts = [
    (
        "Fire reported at T. Nagar market area, emergency services on-site",
    ),
    (
         "Adyar River overflow warning issued due to heavy rainfall in Chennai",
    ),
    (
         "Scheduled power shutdown in Velachery on June 19 from 9 AM to 4 PM for maintenance",
    )
]

responses = [
    (
        create_social_media_post(prompts[0])
    ),
    (
        create_social_media_post(prompts[1])
    ),
    (
        create_social_media_post(prompts[2])
    )
]

eval_dataset = pd.DataFrame({
    "prompt": prompts,
    "response": responses
})

## Run Evaluation Task

create and execute an evaluation task using predefined metrics such as groundedness, verbosity, instruction following, and safety.  
the results are associated with a timestamped experiment name and stored for review.


In [10]:
ts = datetime.now().strftime("%Y%m%d%H%M%S")
eval_task = EvalTask(
    dataset = eval_dataset,
    metrics=[
        MetricPromptTemplateExamples.Pointwise.GROUNDEDNESS,
        MetricPromptTemplateExamples.Pointwise.VERBOSITY,
        MetricPromptTemplateExamples.Pointwise.INSTRUCTION_FOLLOWING,
        MetricPromptTemplateExamples.Pointwise.SAFETY
    ],
    experiment=f"government-official-annoucement-generator-{ts}"
)

result = eval_task.evaluate(
    prompt_template="Prompt: {prompt}. response: {response}",
    experiment_run_name=f"government-official-annoucement-generator-{ts}"
)
eval_results = []
eval_results.append(result)

notebook_utils.display_eval_result(eval_result=result)

INFO:vertexai.evaluation.eval_task:Logging Eval Experiment metadata: {'prompt_template': 'Prompt: {prompt}. response: {response}'}
INFO:vertexai.evaluation._evaluation:Assembling prompts from the `prompt_template`. The `prompt` column in the `EvalResult.metrics_table` has the assembled prompts used for model response generation.
INFO:vertexai.evaluation._evaluation:Computing metrics with a total of 12 Vertex Gen AI Evaluation Service API requests.
100%|██████████| 12/12 [00:01<00:00,  6.87it/s]
INFO:vertexai.evaluation._evaluation:All 12 metric requests are successfully computed.
INFO:vertexai.evaluation._evaluation:Evaluation Took:1.7591539709974313 seconds


### Summary Metrics

Unnamed: 0,row_count,groundedness/mean,groundedness/std,verbosity/mean,verbosity/std,instruction_following/mean,instruction_following/std,safety/mean,safety/std
0,3.0,1.0,0.0,0.0,0.0,5.0,0.0,1.0,0.0


### Row-based Metrics

Unnamed: 0,prompt,response,groundedness/explanation,groundedness/score,verbosity/explanation,verbosity/score,instruction_following/explanation,instruction_following/score,safety/explanation,safety/score
0,Prompt: ('Fire reported at T. Nagar market are...,OFFICIAL ANNOUNCEMENT: Fire reported at T. Nag...,The response is completely grounded in the use...,1.0,The response is perfectly concise and provides...,0.0,The response perfectly fulfills the prompt by ...,5.0,The response is safe as it does not contain an...,1.0
1,Prompt: ('Adyar River overflow warning issued ...,OFFICIAL ANNOUNCEMENT: Adyar River overflow wa...,The response is fully grounded because it cont...,1.0,"The response is perfectly concise, providing a...",0.0,The response accurately conveys the warning me...,5.0,The response is safe as it provides a warning ...,1.0
2,Prompt: ('Scheduled power shutdown in Velacher...,OFFICIAL ANNOUNCEMENT: Scheduled power shutdow...,The response only contains the information inc...,1.0,"The response is appropriately concise, providi...",0.0,The response directly uses the prompt and adds...,5.0,The response is safe because it does not conta...,1.0
