## AIM OF THE FOLLOWING APP

The app that we will be creating in this notebook will answer questions about the Alaska Department of Snow, or questions about how much snowfall is expected in Alaska on the current day.

## DOWNLOADING THE NECESSARY PACKAGES

In [64]:
%pip install --upgrade --quiet google-genai

In [84]:
!pip install --quiet ipytest

In [65]:

from IPython.display import Markdown, display
from google import genai
from google.genai.types import GenerateContentConfig
from google.genai import types

## SETTING UP THE BIGQUERY TABLE THAT WILL COMPRISE OUR RAG

Loading the data from the gcloud storage into the environment, and loading it into a dataframe

In [5]:
!gsutil cp gs://labs.roitraining.com/alaska-dept-of-snow/* .

Copying gs://labs.roitraining.com/alaska-dept-of-snow/.DS_Store...
Copying gs://labs.roitraining.com/alaska-dept-of-snow/alaska-dept-of-snow-faqs.csv...
Copying gs://labs.roitraining.com/alaska-dept-of-snow/faq-01.txt...
Copying gs://labs.roitraining.com/alaska-dept-of-snow/faq-02.txt...
- [4 files][ 16.2 KiB/ 16.2 KiB]                                                
==> NOTE: You are performing a sequence of gsutil operations that may
run significantly faster if you instead use gsutil -m cp ... Please
see the -m section under "gsutil help options" for further information
about when gsutil -m can be advantageous.

Copying gs://labs.roitraining.com/alaska-dept-of-snow/faq-03.txt...
Copying gs://labs.roitraining.com/alaska-dept-of-snow/faq-04.txt...
Copying gs://labs.roitraining.com/alaska-dept-of-snow/faq-05.txt...
Copying gs://labs.roitraining.com/alaska-dept-of-snow/faq-06.txt...
Copying gs://labs.roitraining.com/alaska-dept-of-snow/faq-07.txt...
Copying gs://labs.roitraining.com/alas

In [7]:
!ls

alaska-dept-of-snow-faqs.csv  faq-09.txt  faq-18.txt  faq-27.txt  faq-36.txt  faq-45.txt
faq-01.txt		      faq-10.txt  faq-19.txt  faq-28.txt  faq-37.txt  faq-46.txt
faq-02.txt		      faq-11.txt  faq-20.txt  faq-29.txt  faq-38.txt  faq-47.txt
faq-03.txt		      faq-12.txt  faq-21.txt  faq-30.txt  faq-39.txt  faq-48.txt
faq-04.txt		      faq-13.txt  faq-22.txt  faq-31.txt  faq-40.txt  faq-49.txt
faq-05.txt		      faq-14.txt  faq-23.txt  faq-32.txt  faq-41.txt  faq-50.txt
faq-06.txt		      faq-15.txt  faq-24.txt  faq-33.txt  faq-42.txt
faq-07.txt		      faq-16.txt  faq-25.txt  faq-34.txt  faq-43.txt
faq-08.txt		      faq-17.txt  faq-26.txt  faq-35.txt  faq-44.txt


Getting the list of files that we will want to process for our RAG. It is assumed here that we only want the txt files

In [30]:
import os

relevant_files = []

for file in os.listdir():
  if "faq-" in file:
    relevant_files.append(file)

We now load the files into a pandas dataframe, which we will subsequently use to create a Bigquery dataframe

In [31]:
import pandas as pd

preload_list = []

for file in relevant_files:
  f = open(file)
  raw_data = [row.strip('\n') for row in f.readlines()]
  raw_diction = {"question":raw_data[0],"answer":raw_data[2]}
  # Loading the dictionary into the list that will be used to create our pandas dataframe
  preload_list.append(raw_diction)
  f.close()

rag_pandas = pd.DataFrame(preload_list)

We now load our pandas dataframe into Bigquery. For the purpose of this exercise, we assume this is a one-off setup

In [32]:
import pandas_gbq

project_id ="qwiklabs-gcp-02-cf6490c204fb"
table_id = "RAG.ads_questions_raw"

pandas_gbq.to_gbq(rag_pandas, table_id, project_id=project_id)

100%|██████████| 1/1 [00:00<00:00, 9058.97it/s]


## CREATING OUR VECTOR DATABASE IN BIGQUERY

We begin by defining our model that we will use for the vector search. Here, it is assumed that is a one-off setup

In [33]:
from google.cloud import bigquery

bq_client = bigquery.Client(project=project_id)

query = """CREATE OR REPLACE MODEL `RAG.embedding_model`
REMOTE WITH CONNECTION `us.rag_embeddings`
OPTIONS (ENDPOINT = 'text-embedding-005');
"""

query_job = bq_client.query(query)
query_job.result()

<google.cloud.bigquery.table._EmptyRowIterator at 0x7e82cfb94280>

We now create the embeddings table that will act as our data store for the RAG

In [34]:
query = """
CREATE OR REPLACE TABLE `qwiklabs-gcp-02-cf6490c204fb.RAG.ads_questions_embeddings` AS
SELECT *
FROM ML.GENERATE_EMBEDDING(
    MODEL `RAG.embedding_model`,
    (SELECT question, answer, concat('Q: ',question, ' A: ',answer) AS content FROM `qwiklabs-gcp-02-cf6490c204fb.RAG.ads_questions_raw`)
);
"""
query_job = bq_client.query(query)
query_job.result()

<google.cloud.bigquery.table._EmptyRowIterator at 0x7e82984cf580>

We run the query below as a means of demonstrating what our vector database looks like

In [209]:
query = """
  SELECT base.question as question, base.answer as answer, distance
FROM VECTOR_SEARCH(
  TABLE `RAG.ads_questions_embeddings`, 'ml_generate_embedding_result',
  (
  SELECT text_embedding, content AS query
  FROM ML.GENERATE_TEXT_EMBEDDING(
  MODEL `RAG.embedding_model`,
  (SELECT 'What is the ADS?' AS content))
  ),
  top_k => 3, options => '{"fraction_lists_to_search": 0.01}')
  ;
"""
query_job = bq_client.query(query)
rows = query_job.result()

for row in rows:
    print(row)

Row(('How can I stay informed about ADS news and updates?', 'Subscribe to the ADS newsletter on the official website or follow the department’s social media channels for ongoing updates and announcements.', 0.8017229427683615), {'question': 0, 'answer': 1, 'distance': 2})
Row(('Who is the CFO of ADS?', 'The current CFO is Janet Kirk, appointed in 2022. She oversees all financial operations, including cost management and budget forecasting.', 0.8274179311401391), {'question': 0, 'answer': 1, 'distance': 2})
Row(('How does ADS handle interagency communication?', 'ADS uses a centralized coordination system shared with local governments, school districts, and emergency services to streamline announcements and responses.', 0.8417617752076187), {'question': 0, 'answer': 1, 'distance': 2})


## ACCESSING API FOR FURTHER BACKEND FUNCTIONALITY

Here we define a function that will leverage Vertex AI's Tool and FunctionDeclaration, along with https://api.weather.gov, to determine how much snow is expected in Alaska on a given day

In [171]:
def get_snowfall_info():

  # Importing the necessary packages
  import vertexai
  from vertexai.generative_models import (
    Content,
    FunctionDeclaration,
    GenerationConfig,
    GenerativeModel,
    Part,
    Tool,
    ToolConfig
  )

  project_id ="qwiklabs-gcp-02-cf6490c204fb"
  location="us-central1"
  model_name = "gemini-2.5-pro-preview-05-06"

  # Initialising the Gemini Model
  gemini_model = GenerativeModel(model_name = model_name)

  # Defining our FunctionDeclaration
  get_snow_func = FunctionDeclaration(
      name = "get_snowfall",
      description = "Get the expected snowfall in Alaska",
      parameters = {
          "type":"object",
          "properties": {
              "location": {
                  "type":"string",
                  "description": "The city name of the location for which to get the weather.",
                  "default": {
                      "string_value": "Anchorage, AK"
                  }
              }
          }
      }
  )

  # Defining the base prompt
  user_promp_content = Content(
      role = "user",
      parts = [
          Part.from_text("How much snow is expected to fall in Alaska today?")
      ]
  )

  # Defining our tool
  tool = Tool(function_declarations=[get_snow_func])

  # Getting our initial response
  initial_response =gemini_model.generate_content(
    user_prompt_content,
    generation_config=GenerationConfig(temperature=0),
    tools=[tool],
    tool_config=ToolConfig(
        function_calling_config=ToolConfig.FunctionCallingConfig(
            # ANY mode forces the model to predict only function calls
            mode=ToolConfig.FunctionCallingConfig.Mode.ANY,
            # Allowed function calls to predict when the mode is ANY. If empty, any  of
            # the provided function calls will be predicted.
            allowed_function_names=["get_snowfall"],
        )
    )
)

  # Getting the API responses that the model will reference against.
  lat, lon = 61.2174, -149.8631  # Fixing this to Mount Mackenzie, Alaska for

  headers = {
      'User-Agent':'email',
      'Accept':'application/ld+json'
  }

  point_url = f'https://api.weather.gov/points/{lat},{lon}'

  point_response = requests.get(point_url, headers=headers)
  forecast_url = point_response.json()['forecast']
  forecast_response = requests.get(forecast_url, headers=headers)
  api_response = forecast_response.json()

  # Retrieving our response that also includes info from the weather api
  final_response = gemini_model.generate_content(
      [
          user_prompt_content,  # User prompt
          initial_response.candidates[0].content,  # Function call response
          Content(
              parts=[
                  Part.from_function_response(
                      name="get_snowfall",
                      response={
                          "content": api_response,  # Return the API response to Gemini
                      },
                  )
              ],
          ),
      ],
      tools=[tool],
  )
  # Get the model summary response
  summary = final_response.text

  return summary

In [172]:
get_snowfall_info()

'It looks like there is no snow expected in Anchorage, Alaska today. The forecast is mostly sunny with a high near 52 degrees Fahrenheit.'

## DEFINING OUR RAG FUNCTIONS

The below functions perform embedding comparisons between a question and what is available within the vector database. It should also be noted that we have set this up to only apply examples with a minimum similarity score, to minimise irrelevant examples being applied to the prompt.

In [50]:
def perform_bq_search(client,question,k):
  # Defining our base query
  base_query = """
  SELECT base.content, distance
FROM VECTOR_SEARCH(
  TABLE `RAG.ads_questions_embeddings`, 'ml_generate_embedding_result',
  (
  SELECT text_embedding, content AS query
  FROM ML.GENERATE_TEXT_EMBEDDING(
  MODEL `RAG.embedding_model`,
  (SELECT '{}' AS content))
  ),
  top_k => {}, options => '{{"fraction_lists_to_search": 0.01}}')
  """
  # customised query to the user's question.
  customised_query = base_query.format(question, k)

  query_job = client.query(customised_query)
  output = []
  for row in query_job:
    output.append({"content":row.content,"similarity":row.distance})

  return output

In [58]:
def build_prompt_with_examples(question, example_list,min_similarity):

  relevant_examples = ""

  # Only apply RAG entries that exceed the minimum similarity threshold
  for entry in example_list:
    example = entry["content"]
    similarity_score = entry["similarity"]

    if similarity_score > min_similarity:
      relevant_examples += example + '\n'

  # return prompt based on the presence of valid RAG examples
  if len(relevant_examples) > 0:
    prompt = f"""
    Instructions: Answer the following question using the provided context.

    Question: {question}

    Context: {relevant_examples}
    """
  else:
    prompt = f"""
    Instructions: Answer the following question .

    Question: {question}
    """

  return prompt

Testing our RAG retrieval setup. Here, we can see that the prompt only has examples applied if they meet the minimum threshold. If not, the generic question will be input to the genai model

In [59]:
test_out = perform_bq_search(bq_client,"what is the ads and what do they do?",3)

build_prompt_with_examples("what is the ads and what do they do?", test_out, 0.5)

'\n    Instructions: Answer the following question using the provided context.\n    \n    Question: what is the ads and what do they do?\n\n    Context: Q: How can I stay informed about ADS news and updates? A: Subscribe to the ADS newsletter on the official website or follow the department’s social media channels for ongoing updates and announcements.\nQ: Who is the CFO of ADS? A: The current CFO is Janet Kirk, appointed in 2022. She oversees all financial operations, including cost management and budget forecasting.\nQ: What concerns does the CFO have about ADS operations? A: The CFO is primarily concerned about controlling operational costs, especially regarding cloud-based technology solutions for data management.\n\n    '

In [61]:
test_out = perform_bq_search(bq_client,"what is the ads and what do they do?",3)

build_prompt_with_examples("what is the ads and what do they do?", test_out, 0.96)

'\n    Instructions: Answer the following question .\n    \n    Question: what is the ads and what do they do?\n    '

## HANDLING SECURITY FOR THE PROMPTS

The below handles prompt security for our setup

In [173]:
safety_settings = [types.SafetySetting(
      category="HARM_CATEGORY_HATE_SPEECH",
      threshold="BLOCK_LOW_AND_ABOVE"
    ),types.SafetySetting(
      category="HARM_CATEGORY_DANGEROUS_CONTENT",
      threshold="BLOCK_LOW_AND_ABOVE"
    ),types.SafetySetting(
      category="HARM_CATEGORY_SEXUALLY_EXPLICIT",
      threshold="BLOCK_LOW_AND_ABOVE"
    ),types.SafetySetting(
      category="HARM_CATEGORY_HARASSMENT",
      threshold="BLOCK_LOW_AND_ABOVE"
    )]

In [174]:
system_instruction_context = """
You are a customer facing AI chatbot.
Your mission is to provide answers about the Alaska Department of Snow, or ADS for short.
Remember that before you answer a question, you must check to see if it complies with your mission.
If not, you can say Sorry, I can't answer that question.
"""

In [175]:
def define_config(client, system_instructions, safety, temp, top_p, tokens):

  generated_config = types.GenerateContentConfig(
    temperature = temp,
    top_p = top_p,
    max_output_tokens = tokens,
    response_modalities = ["TEXT"],
    safety_settings = safety,
    system_instruction=[types.Part.from_text(text=system_instructions)],
  )

  return generated_config


We now create a config that is based on our security settings

In [75]:
# Defining our filters for our prompt. Here we set temperature to a low value, to ensure consistency in the responses.
top_p = 0.95
temperature = 0.1
max_output_tokens = 8192

client = genai.Client(
      vertexai=True,
      project="qwiklabs-gcp-02-cf6490c204fb",
      location="us-central1",
)

# Generating our config for the Gen AI Setup
generated_content_config = define_config(client, system_instruction_context, safety_settings, temperature, top_p, max_output_tokens)

## FUNCTION TO APPLY OUR RAG SETUP

The below method will leverage the RAG setup that we had specified

In [81]:
def apply_rag(project,location,model,config,question,k,similarity_score):

  from google import genai
  from google.genai.types import GenerateContentConfig
  from google.genai import types

  client = genai.Client(
      vertexai=True,
      project=project,
      location=location,
  )

  bq_client = bigquery.Client(project=project_id)

  # Defining our closes matches from BQ
  rag_matches = perform_bq_search(bq_client,question,k)

  # Defining our prompt with the RAG outputs
  rag_ammended_prompt = build_prompt_with_examples(question, rag_matches,similarity_score)

  # Getting our response
  chat = client.chats.create(model=model,config=config)
  response = chat.send_message(rag_ammended_prompt)

  # Returning our result
  return response.text

In [82]:
location="us-central1"
model = "gemini-2.5-pro-preview-05-06"

apply_rag(project_id,location,model,generated_content_config,"What is the ADS?",3,0.85)

'The ADS stands for the Alaska Department of Snow. We are the official organization in Alaska dedicated to managing and providing information about snow. This includes things like snowpack monitoring, avalanche safety programs, and public advisories related to snow conditions.'

## UNIT TESTS OF AGENT FUNCTIONALITY

In [97]:
def test_rag_theme():

  # Boolean we will assess our outputs against
  passed = True

  location="us-central1"
  model = "gemini-2.5-pro-preview-05-06"
  project_id ="qwiklabs-gcp-02-cf6490c204fb"

  question_list = [("What is the ADS?","the Alaska Department of Snow")]

  for item in question_list:
    question = item[0]
    template_answer = item[1]

    gen_ai_answer = apply_rag(project_id,location,model,generated_content_config,question,3,0.85)

    check_question = f"""
    Are these responses covering the same topics? Answer with a yes or no

    response1: {template_answer}
    response2: {gen_ai_answer}
    """

    check_client = genai.Client(
      vertexai=True,
      project=project_id,
      location=location
    )

    checker = check_client.models.generate_content(model=model, contents=check_question)

    if "yes" not in checker.text.lower():
      print(gen_ai_answer)
      print("------")
      print(template_answer)
      passed = False

    assert passed == True



In [99]:
import pytest
import ipytest
ipytest.autoconfig()
ipytest.run()

[32m.[0m[33m                                                                                            [100%][0m
../usr/local/lib/python3.10/dist-packages/_pytest/config/__init__.py:1277
    self._mark_plugins_for_rewrite(hook)



<ExitCode.OK: 0>

## TESTS WITH GOOGLE EVALUATION SERVICE API

In [100]:
project_id ="qwiklabs-gcp-02-cf6490c204fb"
location="us-central1"
model = "gemini-2.5-pro-preview-05-06"
similarity_score = 0.85
k = 3

contexts = ["What is the ADS?","Who is in chare of the ads?"]

base_prompt = """
Instructions: Answer the following question .

Question: {}
"""

full_prompts = [base_prompt.format(i) for i in contexts]
content = [apply_rag(project_id,location,model,generated_content_config,text,k,similarity_score) for text in contexts]

eval_dataset = pd.DataFrame(
{
"response": content,
"context": full_prompts,
"instruction":full_prompts
}
)

In [101]:
eval_dataset.head()

Unnamed: 0,response,context,instruction
0,The ADS stands for the Alaska Department of Sn...,\nInstructions: Answer the following question ...,\nInstructions: Answer the following question ...
1,The provided context does not specify who is i...,\nInstructions: Answer the following question ...,\nInstructions: Answer the following question ...


In [127]:
import datetime
from vertexai.evaluation import (
    MetricPromptTemplateExamples,
    EvalTask,
    PairwiseMetric,
    PairwiseMetricPromptTemplate,
    PointwiseMetric,
    PointwiseMetricPromptTemplate
)
import vertexai

eval_classify_users = EvalTask(
    dataset=eval_dataset,
    metrics=[MetricPromptTemplateExamples.Pointwise.FLUENCY,
        MetricPromptTemplateExamples.Pointwise.VERBOSITY]
)

prompt_template = (
    "Instruction: {instruction}\n"
    "context: {context}\n"
    "response: {response}"
)
result = eval_classify_users.evaluate(prompt_template=prompt_template)

result.summary_metrics

INFO:vertexai.evaluation._evaluation:Assembling prompts from the `prompt_template`. The `prompt` column in the `EvalResult.metrics_table` has the assembled prompts used for model response generation.
INFO:vertexai.evaluation._evaluation:Computing metrics with a total of 4 Vertex Gen AI Evaluation Service API requests.
100%|██████████| 4/4 [00:03<00:00,  1.05it/s]
INFO:vertexai.evaluation._evaluation:All 4 metric requests are successfully computed.
INFO:vertexai.evaluation._evaluation:Evaluation Took:3.8154398859996945 seconds


{'row_count': 2,
 'fluency/mean': 5.0,
 'fluency/std': 0.0,
 'verbosity/mean': 0.0,
 'verbosity/std': 0.0}

## BRINGING EVERYTHING TOGETHER

We now create a function, apply_agent(), that will either provide information about Alaska's current weather or answers about the ADS. This is the function that will be hosted on the website.

In [176]:
def apply_agent(user_input, config):

  # Setting our configuration
  project_id ="qwiklabs-gcp-02-cf6490c204fb"
  location="us-central1"
  model = "gemini-2.5-pro-preview-05-06"
  similarity_score = 0.85
  k = 3

  # We leverage Gemini to do an initial screen of whether the user is asking about snowfall
  snowfall = False

  screening_client = genai.Client(
      vertexai=True,
      project=project_id,
      location=location
  )

  screening_question = f"""
  Is the following question about how much snowfall is expected in Alaska? Answer with a yes or no

  Question:
  {user_input}
  """

  screening_response = screening_client.models.generate_content(model=model, contents=screening_question)

  if "yes" in screening_response.text.lower():
    snowfall = True

  # Apply the get_snowfall_info() if the question is about snowfall
  if snowfall:
    return get_snowfall_info()

  # Use our RAG setup in all other instances
  return apply_rag(project_id, location, model, config, user_input, k, similarity_score)

We now test out our finalised function that will be applied onto the website. As per the below, we can see that the model responds only to questions that it should (ie. snowfall and ADS details)

In [177]:
apply_agent("What is the ADS?", generated_content_config)

'The ADS stands for the Alaska Department of Snow. We are a (fictional) government agency dedicated to managing snow-related resources, safety, and research in Alaska.'

In [178]:
apply_agent("How much snow will be in Alaska today?", generated_content_config)

'There is no snow expected in Anchorage, AK today. The forecast is mostly sunny with a high near 52 degrees Fahrenheit.'

In [179]:
apply_agent("How much wood could a woodchuck chuck if a woodchuck could chuck wood?", generated_content_config)

"Sorry, I can't answer that question."

## CREATING A WEBSITE TO HOST THIS INFO

Here we attempt to use Dash as a proxy for a website

In [182]:
!pip install dash
!pip install dash-core-components
!pip install dash-html-components
!pip install jupyter-dash



In [199]:
# Importing some base python packages
import sys
import subprocess
import itertools
from datetime import datetime
import os

import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output, State
import plotly.graph_objs as go

import numpy as np
import pandas as pd
from google.cloud import bigquery
from jupyter_dash import JupyterDash


In [205]:
app = dash.Dash()

# Requires Dash 2.17.0 or later
app.layout = html.Div(id = "main_div", children=[
    html.H4("Enter Your Question Here"),
    dcc.Textarea(
        id="user_prompt",
        value="prompt specified by the user"
    ),
    html.Br(),
    html.H4("Gen AI Response"),
    dcc.Textarea(
        id="genai_response",
        value="Gen AI Response"
    ),
    html.Br(),
     html.Button('Submit Question', id='submit-to-genai', n_clicks=0),
    ]
)

@app.callback(
   Output("genai-response","value"),
   Input("submit-to-gen-ai","n_clicks"),
   State("user-story","value")
)
def genAIGenerate(clicks,query):
  if clicks is not None and clicks > 0:
    response = apply_agent(query, generated_content_config)
  return response

app.run()

<IPython.core.display.Javascript object>