In [None]:
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Semantic Analysis in BigQuery with AI Functions

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/applying-llms-to-data/bigquery_ai_operators.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fuse-cases%2Fapplying-llms-to-data%2Fbigquery_ai_operators.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/use-cases/applying-llms-to-data/bigquery_ai_operators.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/bigquery/import?url=https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/applying-llms-to-data/bigquery_ai_operators.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/bigquery/v1/32px.svg" alt="BigQuery Studio logo"><br> Open in BigQuery Studio
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/applying-llms-to-data/bigquery_ai_operators.ipynb">
      <img width="32px" src="https://www.svgrepo.com/download/217753/github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/applying-llms-to-data/bigquery_ai_operators.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/applying-llms-to-data/bigquery_ai_operators.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/applying-llms-to-data/bigquery_ai_operators.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/applying-llms-to-data/bigquery_ai_operators.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/applying-llms-to-data/bigquery_ai_operators.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>

| Author(s) |
| --- |
| [Alicia Williams](https://github.com/aliciawilliams) |

## Overview

This tutorial will guide you through the powerful AI functions available in BigQuery. You'll get hands-on experience using two collections of functions that integrate directly with powerful Gemini models, allowing you to perform sophisticated AI-driven analysis on your data right within your familiar SQL environment.

1.  **managed AI functions ([`AI.SCORE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-score), [`AI.CLASSIFY`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-classify), [`AI.IF`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-if))**: These are high-level, easy-to-use functions for data analysts who are not necessarily prompt engineers or ML practitioners. BigQuery manages the model and parameter choices, provides prompt engineering, and handles scalability to provide high-quality results for common tasks like semantic filtering, joining, ranking, and classification.

2.  **general-purpose AI functions ([`AI.GENERATE_BOOL`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-bool), [`AI.GENERATE_DOUBLE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-double), [`AI.GENERATE_INT`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-int))**: These are inference functions for power-users who want full control over the prompt. They are perfect for row-level AI tasks, especially enriching data in a `SELECT` clause, and returning a specific data type (`BOOL`, `DOUBLE`, or `INT`).

### Objectives

We'll cover how to:

* **Perform semantic ranking** with the managed `AI.SCORE()` function.
* **Perform classification** with the managed `AI.CLASSIFY()` function.
* **Perform semantic filtering** with the managed `AI.IF()` function.
* **Perform semantic joins** with the managed `AI.IF()` function.
* **Perform powerful, row-level analysis** using general-purpose inference functions like `AI.GENERATE_BOOL`, `AI.GENERATE_DOUBLE`, and `AI.GENERATE_INT`.

### Services and Costs

This tutorial uses the following billable components of Google Cloud:

* **BigQuery**: [Pricing](https://cloud.google.com/bigquery/pricing)

* **BigQuery ML**: [Pricing](https://cloud.google.com/bigquery/pricing#bqml)

* **Vertex AI**: [Pricing](https://cloud.google.com/vertex-ai/generative-ai/pricing)

You can use the [Pricing Calculator](https://cloud.google.com/products/calculator) to generate a cost estimate based on your projected usage.

---

## Before you begin

### Setting up your Google Cloud project
**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

3. [Enable the BigQuery, BigQuery Connection, and Vertex AI APIs](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com,aiplatform.googleapis.com).

4. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).

### Setting your project ID

In [None]:
PROJECT_ID = ""  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

### Authenticating to your Google Cloud account

Depending on your Jupyter environment, you may have to manually authenticate. Follow the relevant instructions below.

**1. Colab Enterprise or BigQuery Studio Notebooks**
* Do nothing as you are already authenticated.

**2. Colab, uncomment and run:**

In [None]:
# from google.colab import auth
#
# auth.authenticate_user()

### Creating a BigQuery Cloud resource connection

You will need to create a [Cloud resource connection](https://cloud.google.com/bigquery/docs/create-cloud-resource-connection) to enable BigQuery to interact with Vertex AI services.

In [None]:
!bq mk --connection --location=us \
    --connection_type=CLOUD_RESOURCE test_connection

### Setting permissions for the service account

The resource connection service account requires certain project-level permissions to interact with Vertex AI.

In [None]:
SERVICE_ACCT = !bq show --format=prettyjson --connection us.test_connection | grep "serviceAccountId" | cut -d '"' -f 4
SERVICE_ACCT_EMAIL = SERVICE_ACCT[-1]
print(SERVICE_ACCT_EMAIL)

In [None]:
import time

!gcloud projects add-iam-policy-binding --format=none $PROJECT_ID --member=serviceAccount:$SERVICE_ACCT_EMAIL --role='roles/bigquery.connectionUser'
!gcloud projects add-iam-policy-binding --format=none $PROJECT_ID --member=serviceAccount:$SERVICE_ACCT_EMAIL --role='roles/aiplatform.user'

## wait 60 seconds, give IAM updates time to propagate, otherwise, following cells will fail
time.sleep(60)

###Creating a helper function to view images
This is a helpful utility function that you'll use later in the tutorial. It takes the results of your search query (stored in a pandas DataFrame) and displays the corresponding product images in a nice grid format, making it easy to see the results of a query that contain images.

In [None]:
## Code created with Gemini

from IPython.display import display, HTML
import pandas as pd
import html

def display_image_grid(df: pd.DataFrame, url_column: str = 'signed_url', image_width: int = 220):
    """
    Renders a grid of cards, each with an image and its corresponding
    DataFrame row details.

    Args:
        df (pd.DataFrame): DataFrame containing image URLs and other metadata.
        url_column (str): The name of the column that contains the image URLs.
        image_width (int): The width of each card in pixels.
    """
    # --- Validate Input ---
    if not isinstance(df, pd.DataFrame) or df.empty:
        print("Input is not a valid or non-empty DataFrame. Nothing to display.")
        return
    if url_column not in df.columns:
        print(f"Error: Column '{url_column}' not found in the DataFrame.")
        return

    # Get a list of all columns that are NOT the url_column
    detail_columns = [col for col in df.columns if col != url_column]

    # --- Build HTML for each card ---
    card_html_list = []
    for index, row in df.iterrows():
        # --- MODIFICATION IS HERE ---
        # Strip any leading/trailing single or double quotes from the URL
        url = str(row[url_column]).strip('\'"')

        # Create an HTML block for the other details
        details_html = ""
        for col in detail_columns:
            # Escape data to prevent HTML rendering issues
            value = html.escape(str(row[col]))
            col_name = html.escape(col.replace('_', ' ').title())
            details_html += f'<p style="margin: 4px 0; font-size: 14px;"><strong>{col_name}:</strong> {value}</p>'

        # Assemble the full card
        card_html_list.append(f'''
            <div style="width: {image_width}px; margin: 10px; border-radius: 8px;
                        box-shadow: 0 4px 8px 0 rgba(0,0,0,0.2); font-family: sans-serif;
                        text-align: left; background-color: white;">
                <img src="{url}"
                     alt="Product Image"
                     style="width: 100%; height: {image_width - 40}px; object-fit: cover;
                            border-top-left-radius: 8px; border-top-right-radius: 8px;"
                     onerror="this.onerror=null;this.src='https://placehold.co/{image_width}x{image_width-40}/eee/ccc?text=Image+Expired';"
                >
                <div style="padding: 10px 15px;">
                    {details_html}
                </div>
            </div>
        ''')

    # --- Display the final grid ---
    all_cards_string = ''.join(card_html_list)
    final_html = f'''
        <p><strong>Displaying {len(card_html_list)} result(s):</strong></p>
        <div style="display: flex; flex-wrap: wrap; justify-content: flex-start;">
            {all_cards_string}
        </div>
    '''
    display(HTML(final_html))

### Loading and viewing the sample data

This tutorial will use data from a fictional e-commerce pet supply company called **Cymbal Pets**.

The following query performs two main actions:
* creates a BigQuery dataset called `cymbal_pets`, and
* creates two tables using data stored in Google Cloud Storage (GCS); one standard table containing product information (`products`) and one object table containing product images (`product_images`).

In [None]:
%%bigquery --project {PROJECT_ID}

CREATE SCHEMA IF NOT EXISTS cymbal_pets;

-- Load a non-object table
LOAD DATA OVERWRITE cymbal_pets.products
FROM
  FILES(
    format = 'avro',
    uris = [
      'gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/tables/products/products_*.avro']);


-- Create an object table
CREATE OR REPLACE EXTERNAL TABLE cymbal_pets.product_images
  WITH CONNECTION `us.test_connection`
  OPTIONS (
    object_metadata = 'SIMPLE',
    uris = ['gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/images/*.png']);

Let's take a peek at a few rows of each table to get familiar with the data.

In [None]:
%%bigquery --project {PROJECT_ID}

SELECT *
FROM cymbal_pets.products
LIMIT 2

In [None]:
%%bigquery --project {PROJECT_ID}

SELECT *
FROM cymbal_pets.product_images
LIMIT 2

---

##BigQuery managed AI functions

The [**BigQuery managed AI functions**](https://docs.cloud.google.com/bigquery/docs/generative-ai-overview#managed_ai_functions) ([`AI.SCORE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-score), [`AI.CLASSIFY`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-classify), and [`AI.IF`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-if)) extend traditional SQL with natural language conditions.

These functions are designed to be accessible to all users. They enhance quality by applying **prompt rewrite strategies** automatically, allowing you to write simple prompts while achieving accurate results.

We'll run through a few examples of using these functions for analysis with the **`cymbal_pets`** dataset.

###Using `AI.SCORE`: Ranking by "giftability"

[`AI.SCORE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-score) is a function that accepts text input and uses a Vertex AI Gemini model to rate those inputs based on a scoring system that you describe as part of the prompt. If you do not provide a scoring system, the function automatically rewrites your prompt to generate a scoring rubric.

It is perfect for ranking items based on semantic criteria that are not explicitly in the data. Let's apply it to a potential marketing use case: determining how suitable a product is as a gift for a pet owner.

In [None]:
%%bigquery --project {PROJECT_ID}

SELECT
  product_name,
  description,
  AI.SCORE((
    'How "giftable" is this product for a pet owner? ', description,
    'Use a scale from 1-10.'),
  connection_id => 'us.test_connection') AS giftability_score
FROM
  `cymbal_pets.products`
ORDER BY
  giftability_score DESC

### Using `AI.CLASSIFY`: Classifying by intended animal

[`AI.CLASSIFY`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-classify) uses a Vertex AI Gemini model to classify inputs into categories that you provide. `AI.CLASSIFY` accepts multimodal input (text, image, video, etc), and can be used for tasks such as classifying reviews by sentiment, classifying support tickets or emails by topic, or
classifying an image by its style or contents.

Let's use it to assign each toy product to an animal type using the product name and description. We'll define a set of target categories and ask the model to assign each product to the most appropriate one. Notice we include a fallback "All Pets" category.

In [None]:
%%bigquery --project {PROJECT_ID}

SELECT
  product_name,
  AI.CLASSIFY(
    ('What animal is this product for?',product_name,' ',description),
    categories => ["Dog", "Cat", "Bird", "Fish", "Small Animal", "All Pets"],
    connection_id => 'us.test_connection') AS animal_type
FROM
  `cymbal_pets.products`
WHERE category = "Toys"
LIMIT 20

### Using `AI.IF`: Filtering product images

[`AI.IF`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-if) uses a Vertex AI Gemini model to evaluate a condition described in natural language and returns a `BOOL`. Similar to `AI.CLASSIFY`, it can also accept multimodal input. If you haven't yet worked with multimodal data in BigQuery, you can learn more in the [Analyzing Multimodal Data in BigQuery](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/applying-llms-to-data/multimodal-analysis-bigquery/analyze_multimodal_data_bigquery.ipynb) notebook.

Let's use `AI.IF` to perform a visual filtering task on our `product_images` table. We'll use it to find all product images that contain a ball by using `AI.IF` within the `WHERE` clause.

In [None]:
%%bigquery images_df --project {PROJECT_ID}

SELECT
  STRING(OBJ.GET_ACCESS_URL(ref, 'r').access_urls.read_url) AS signed_url,
  uri,
  metadata
FROM
  `cymbal_pets.product_images`
WHERE
  AI.IF((
    'Does this product image contain a ball? ',ref),
    connection_id => 'us.test_connection');

We used [`OBJ.GET_ACCESS_URL`](https://cloud.google.com/bigquery/docs/reference/standard-sql/objectref_functions#objget_access_url) in the `SELECT` to produce a read-only URL containing the product image.

The results of the previous query are now stored in a pandas DataFrame called `images_df` (a parameter added to the [`%%bigquery magic` utility](https://googleapis.dev/python/bigquery-magics/latest/) in the prior cell).

Let's take a look at the results.

In [None]:
images_df

Now, let's view these images using the helper function created in the **Before you begin** section of this notebook.

In [None]:
display_image_grid(images_df)

### Using `AI.IF`: Performing a semantic join of product images with product table

We can also use [`AI.IF`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-if) to perform semantic joins. In this next query, we will use it within the `JOIN` clause to join the `products` table (text) with the `product_images` table (image). The join will only succeed if the image *semantically matches* the product description.

In [None]:
%%bigquery join_df --project {PROJECT_ID}

SELECT
  product_name,
  description,
  brand,
  STRING(OBJ.GET_ACCESS_URL(ref, 'r').access_urls.read_url) AS signed_url
FROM
  `cymbal_pets.products` as products
INNER JOIN
  `cymbal_pets.product_images` as images
ON
  AI.IF((
      'You will be provided an image of a pet product. ',
      'Determine if the image is of the following pet toy: ',
      products.product_name,
      products.description,
      images.ref
      ),
    connection_id => 'us.test_connection')
WHERE
  products.category = "Toys" AND
  products.brand = "Fluffy Buns"

Let's take a look at the resulting DataFrame.

In [None]:
join_df

Now, let's view the results including images using the helper function.

In [None]:
display_image_grid(join_df)

---

## General-purpose AI functions

BigQuery's general-purpose AI functions ([`AI.GENERATE_BOOL`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-bool), [`AI.GENERATE_DOUBLE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-double), and [`AI.GENERATE_INT`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-int)) are another set of capabilities that bring the power of LLMs for AI-driven data extraction and inference tasks directly within your SQL queries.

These functions are considered "general-purpose" because they provide full control over the prompt and are designed for power-users. Similar to the managed functions, they can be used alongside your standard SQL in `SELECT` and `WHERE` clauses, giving you the power to do complex analysis with natural language.

### Using `AI.GENERATE_BOOL`: Enriching product details

The [`AI.GENERATE_BOOL`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-bool) function allows you to analyze any combination of text and unstructured data and returns a `BOOL` value for each row in the table.

Let's use `AI.GENERATE_BOOL` for a data enrichment task. We'll find the products in our catalog that require a power source (like electricity or batteries) to operate and add a clear attribute that isn't already available in our `products` table.

In [None]:
%%bigquery --project {PROJECT_ID}

SELECT
  product_name,
  category,
  description,
  AI.GENERATE_BOOL(
    ('Does this product require electricity, batteries, ',
     'or a power source to operate?',product_name,' ',description),
    connection_id => 'us.test_connection',
    endpoint => 'gemini-2.5-flash').* EXCEPT(full_response)
FROM
  `cymbal_pets.products`

A few items to notice from the query text:
* `AI.GENERATE_BOOL` accepts an [`endpoint` argument](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-bool#arguments). This allows you to specify any [generally available](https://cloud.google.com/vertex-ai/generative-ai/docs/models#generally_available_models) or [preview](https://cloud.google.com/vertex-ai/generative-ai/docs/models#preview_models) Gemini model. If you don't specify an `endpoint` value, BigQuery selects a recent stable version of Gemini to use.
* `AI.GENERATE_BOOL` allows you to [specify output options](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-bool#output):
  * `result` - the `BOOL` value containing the model's response to the prompt
  * `full_response` - a JSON value containing all fields from the [response](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/GenerateContentResponse)
  * `status` - a `STRING` value that contains the API response status for the corresponding row (this value is empty if the operation was successful)

In this query, we chose Gemini 2.5 Flash as the `endpoint` and specified the `result` and `status` fields be returned (by using the `*` wildcard after the function and adding an `EXCEPT` to skip returning the `full_response`)

---
### Comparison: `AI.GENERATE_BOOL` vs. `AI.IF`

At first glance, [`AI.GENERATE_BOOL`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-bool) and [`AI.IF`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-if) seem to do similar jobs. However, they are designed for different users and different tasks.

| Feature | `AI.GENERATE_BOOL` | `AI.IF` |
| :--- | :--- | :--- |
| **Primary Use Case** | **Data enrichment** in the `SELECT` clause. | **Filtering** in the `WHERE` clause and **joining** with the `JOIN ON` clause. |
| **Prompting** | **Direct Passthrough**<br>The model sees your exact prompt. You are responsible for all prompt engineering. | **Automatic Prompt Optimization**<br>BigQuery enhances your prompt to be more effective. |
| **Model Option** | Adjustable via the `endpoint` parameter. | A model is chosen for you. |
| **Error handling** | Records errors in its output. | Returns `NULL` for any rows with errors. |

`AI.IF` is generally recommended for semantic filtering because its built-in prompt enhancements mean you don't have to be a prompt engineer to get good results.

**Key Takeaway:**
* Use `AI.GENERATE_BOOL` for use cases where you want full control, or when using it to augment data in the `SELECT` clause.
* Use `AI.IF` for smart, semantic filtering in the `WHERE` clause or for joining with the `JOIN ON` clause.
---

### Using `AI.GENERATE_INT`: Counting mentioned ingredients

[`AI.GENERATE_INT`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-int) is similar to `AI.GENERATE_BOOL`, but differs in that it will return an integer value.

Let's use `AI.GENERATE_INT` to perform an extraction task, specifically to count how many distinct ingredients or food items it can identify in the food product descriptions. While the descriptions are high-level marketing text and not detailed ingredient lists, this will test the model's ability to infer ingredients from a product's name and general description.

In [None]:
%%bigquery --project {PROJECT_ID}

SELECT
  product_name,
  description,
  AI.GENERATE_INT(
    ('Based on the text, how many distinct food ingredients ',
     'can you identify? If none are listed, return 0.',
     product_name,' ',description),
    connection_id => 'us.test_connection',
    endpoint => 'gemini-2.5-flash').result AS ingredient_count
FROM
  `cymbal_pets.products`
WHERE
  category = 'Food'
ORDER BY
  ingredient_count DESC

### Using `AI.GENERATE_DOUBLE`: Estimating shipping weight

[`AI.GENERATE_DOUBLE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-double) can be used for similar tasks as `AI.GENERATE_INT`, but differs in that it will return a decimal value. Both functions can be used for *extraction* tasks, as we saw in the last example. In this example, we will see an *inference* task.

Since our product data doesn't include shipping weights, let's ask the model to estimate the weight in pounds with decimal precision. It will have to infer this based on the product's name and description (e.g., a "50 Gallon Aquarium" is much heavier than a "Dog Bone"). This is a great way to generate missing data.

In [None]:
%%bigquery --project {PROJECT_ID}

SELECT
  product_name,
  description,
  AI.GENERATE_DOUBLE(
    ('Based on this product description, what is a rough ',
     'estimated weight of the product for shipping in pounds (lbs)?',
     product_name,' ',description),
    connection_id => 'us.test_connection',
    endpoint => 'gemini-2.5-flash').result AS estimated_shipping_weight
FROM
  `cymbal_pets.products`

##Recap

In this notebook, you explored the powerful suite of AI functions in BigQuery to perform advanced data analysis using natural language.

You learned how to:
* **Perform semantic analysis with managed AI functions**, which are ideal for all users:
    * Used [`AI.SCORE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-score) to **rank** products based on a subjective quality like "giftability."
    * Used [`AI.CLASSIFY`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-classify) to **categorize** products into predefined animal types.
    * Used [`AI.IF`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-if) for powerful **multimodal filtering** (finding balls in product images) and **semantic joins** (matching products to their images).
* **Execute row-level tasks with general-purpose AI functions**, which give power-users full control:
    * Used [`AI.GENERATE_BOOL`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-bool) to **enrich data** to determine which products require power based on a yes/no condition.
    * Used [`AI.GENERATE_INT`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-int) to **extract** structured data (a count of ingredients) from unstructured text.
    * Used [`AI.GENERATE_DOUBLE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-double) to **infer** and create new data points (estimating shipping weights) that were not present in the original dataset.


##Next Steps
Continue your learning with the following notebooks:
* [Introduction to Generative AI functions in BigQuery](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/applying-llms-to-data/bigquery_generative_ai_intro.ipynb)
* [Analyze Multimodal Data in BigQuery](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/applying-llms-to-data/multimodal-analysis-bigquery/analyze_multimodal_data_bigquery.ipynb)
* [Text + multimodal embedding generation and vector search in BigQuery](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/applying-llms-to-data/bigquery_embeddings_vector_search.ipynb)

Take a look at the product documentation:
* the [Generative AI Overview](https://cloud.google.com/bigquery/docs/generative-ai-overview) landing page
* the [`AI.SCORE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-score) function documentation
* the [`AI.CLASSIFY`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-classify) function documentation
* the [`AI.IF`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-if) function documentation
* the [`AI.GENERATE_BOOL`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-bool) function documentation
* the [`AI.GENERATE_INT`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-int) function documentation
* the [`AI.GENERATE_DOUBLE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-generate-double) function documentation


# Cleaning Up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

In [None]:
# Delete the BigQuery Connection
! bq rm --connection --project_id=$PROJECT_ID --location=us test_connection

# Delete the BigQuery dataset
! bq rm -r -f $PROJECT_ID:cymbal_pets
