# Challenge Task 12: Build Multimodal Vector Search for Cymbal Retail

## Scenario
The **Lumiki Holiday Campaign** is emphasizing "Smart Home" visualization. Customers often search using vague terms like "modern vibe" or want to find products that *look* like a photo they uploaded. Keyword search isn't enough.

## Your Mission
You must build a Semantic Search Engine in BigQuery. You will:
1.  **Generate Assets:** Create product images and manuals (Code provided).
2.  **Text Search:** Generate text embeddings and perform vector search.
3.  **Multimodal Search:** Generate multimodal embeddings from images and perform text-to-image search.
4.  **Automate:** Use BigQuery's new `AI.EMBED` to automate the process.

## Pre-requisites
* Completion of Task 13 (Table `cymbal_product_augmented` must exist).
* Connection `cymbal_cloud_resource_connection_usc` must be configured.

## 1. Setup & Asset Generation
Run the following cells to initialize your environment and generate the visual assets required for multimodal search. 

**Note:** You do not need to write code for Section 1. Just execute the cells.

In [None]:
PROJECT_ID_LIST=!gcloud config list --format "value(core.project)" 2>/dev/null
PROJECT_ID=PROJECT_ID_LIST[0]
PROJECT_NBR_LIST=!gcloud projects describe $PROJECT_ID --format="value(projectNumber)"
PROJECT_NBR=PROJECT_NBR_LIST[0]
LOCATION="us-central1"

# Using the dataset created in Task 13
DATASET_ID="cymbal_retail_ai_ds"
TABLE_ID="cymbal_product_augmented"
BUCKET_NAME=f"cymbal-multimodal-assets-{PROJECT_NBR}"

print(f"Project: {PROJECT_ID}")
print(f"Bucket: {BUCKET_NAME}")

In [None]:
# Create Storage Bucket for Images
!gcloud storage buckets create gs://{BUCKET_NAME} --location={LOCATION}

In [None]:
# Install libraries for Image Generation
%pip install --upgrade --quiet google-genai google-cloud-storage google-cloud-aiplatform reportlab

In [None]:
#Extract the Service Account ID
CONNECTION_PATH = f"{PROJECT_ID}.us-central1.cymbal_cloud_resource_connection_usc"
DESC_CONN = !bq show --format=prettyjson --connection {CONNECTION_PATH}
import json
CONN_DATA = json.loads("".join(DESC_CONN))
SA_EMAIL = CONN_DATA['cloudResource']['serviceAccountId']

print(f"Authorizing Service Account: {SA_EMAIL}")

In [None]:
# Apply IAM Policy Bindings
!gcloud projects add-iam-policy-binding {PROJECT_ID} --member=serviceAccount:{SA_EMAIL} --role='roles/bigquery.connectionUser' --format=none --condition=None
!gcloud projects add-iam-policy-binding {PROJECT_ID} --member=serviceAccount:{SA_EMAIL} --role='roles/aiplatform.user' --format=none --condition=None
!gcloud projects add-iam-policy-binding {PROJECT_ID} --member=serviceAccount:{SA_EMAIL} --role='roles/storage.objectViewer' --format=none --condition=None

import time
print("Waiting for IAM propagation (60s)...")
time.sleep(80)

### 1.1 Execute Image Generation Script
Run the cell below. It will read your `cymbal_product_augmented` table, call Vertex AI to generate images for the products, and save them to GCS. 

**Wait for this to complete before proceeding.**

In [None]:
import pandas_gbq
import time
from io import BytesIO
from IPython.display import Image, Markdown, display
from google import genai
from google.genai.types import FinishReason, GenerateContentConfig, ImageConfig
from google.cloud import storage

# Setup Clients
genai_client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
storage_client = storage.Client(project=PROJECT_ID)
model_flavor = "gemini-2.5-flash-image"

def generate_and_persist_image(product_id, product_nm, product_desc):
    prompt = f"Generate a high-resolution commercial product image for {product_nm}. Description: {product_desc}. White background, studio lighting."
    output_path = f"generated_images/{product_id}.png"

    try:
        response = genai_client.models.generate_content(
            model=model_flavor, contents=prompt,
            config=GenerateContentConfig(
                response_modalities=["IMAGE"],
                image_config=ImageConfig(aspect_ratio="1:1"),
                candidate_count=1
            )
        )

        # Error Handling
        if not response.candidates or response.candidates[0].finish_reason != FinishReason.STOP:
            reason = response.candidates[0].finish_reason if response.candidates else "No candidates"
            print(f"Error {product_id}: Prompt Content Error: {reason}")
            return None

        # Processing & Uploading
        for part in response.candidates[0].content.parts:
            if part.inline_data:
                bucket = storage_client.bucket(BUCKET_NAME)
                blob = bucket.blob(output_path)
                blob.upload_from_file(
                    BytesIO(part.inline_data.data),
                    content_type=part.inline_data.mime_type
                )
                print(f"Generated: {output_path}")
                return f"gs://{BUCKET_NAME}/{output_path}"
    except Exception as e:
        print(f"Error {product_id}: {e}")
    return None

# Fetch Data
sql = f"SELECT product_id, product_nm, product_description FROM `{PROJECT_ID}.{DATASET_ID}.{TABLE_ID}`"
df = pandas_gbq.read_gbq(sql, project_id=PROJECT_ID)

# Run Generation
for index, row in df.iterrows():
    uri = generate_and_persist_image(row['product_id'], row['product_nm'], row['product_description'])
    df.at[index, 'product_image_gcs_uri'] = uri
    time.sleep(2)

### 1.2 Update BigQuery with Image URIs
Now that images are in GCS, we update the table references.

In [None]:
from google.cloud import bigquery
bq_client = bigquery.Client()

update_query = f"""
    UPDATE `{DATASET_ID}.{TABLE_ID}`
    SET product_image_gcs_uri = CONCAT('gs://{BUCKET_NAME}/generated_images/', product_id, '.png')
    WHERE TRUE
"""
query_job = bq_client.query(update_query)
query_job.result()
print("Table updated with Image URIs.")

## 2. Challenge: Text-to-Text Vector Search
Now the real work begins. You need to enable semantic search so users can find products by meaning, not just keywords.

### 2.1 Create Text Embedding Model
**TODO:** Create a remote model named `cymbal_text_embedding_model` that points to `gemini-embedding-001`.

In [None]:
%%bigquery --project {PROJECT_ID}

CREATE OR REPLACE MODEL `cymbal_retail_ai_ds.cymbal_text_embedding_model`
REMOTE WITH CONNECTION `us-central1.cymbal_cloud_resource_connection_usc`
OPTIONS (ENDPOINT = 'gemini-embedding-001');

### 2.2 Generate Text Embeddings
**TODO:** 
1. Add a column `text_embedding` (ARRAY<FLOAT64>) to `cymbal_product_augmented`.
2. Use `ML.GENERATE_EMBEDDING` to populate it using the product name and description.

In [None]:
%%bigquery --project {PROJECT_ID}

-- TODO: ALTER TABLE to add text_embedding column
ALTER TABLE `cymbal_retail_ai_ds.cymbal_product_augmented`
ADD COLUMN IF NOT EXISTS text_embedding ARRAY<FLOAT64>;

In [None]:
%%bigquery --project {PROJECT_ID}

UPDATE `cymbal_retail_ai_ds.cymbal_product_augmented`
SET text_embedding = (
    SELECT ml_generate_embedding_result
    FROM ML.GENERATE_EMBEDDING(
        MODEL `cymbal_retail_ai_ds.cymbal_text_embedding_model`,
        (SELECT CONCAT(product_nm, ' ', product_description) AS content)
    )
)
WHERE TRUE;

### 2.3 Perform Vector Search
**TODO:** Perform a vector search to find products similar to the query: **"fabric steamer"**.

In [None]:
%%bigquery --project {PROJECT_ID}

SELECT base.product_nm, base.product_description, distance
FROM VECTOR_SEARCH(
    TABLE `cymbal_retail_ai_ds.cymbal_product_augmented`,
    'text_embedding',
    (SELECT ml_generate_embedding_result
     FROM ML.GENERATE_EMBEDDING(
         MODEL `cymbal_retail_ai_ds.cymbal_text_embedding_model`,
         (SELECT "fabric steamer" AS content)
     )
    ),
    top_k => 3
)
ORDER BY distance ASC;

#### Task Validation
Task Complete! 
> **Note:** Go back to the lab guide page and click **Check my progress** for **AT ID: 7141 Text-to-Text Vector Search**.

## 3. Challenge: Multimodal Search (Text-to-Image)
Cymbal wants users to search for products based on how they *look*.

### 3.1 Create Multimodal Model
**TODO:** Create `cymbal_multimodal_model` pointing to `multimodalembedding@001`.

In [None]:
%%bigquery --project {PROJECT_ID}

CREATE OR REPLACE MODEL `cymbal_retail_ai_ds.cymbal_multimodal_model`
REMOTE WITH CONNECTION `us-central1.cymbal_cloud_resource_connection_usc`
OPTIONS (ENDPOINT = 'multimodalembedding@001');

### 3.2 Generate Image Embeddings
**TODO:** 
1. Add column `mm_embedding` (ARRAY<FLOAT64>).
2. Update the table. This is the hardest part. You must use `OBJ.MAKE_REF`, `OBJ.FETCH_METADATA`, and `OBJ.GET_ACCESS_URL` chain to convert the GCS URI into something the model can read.

In [None]:
%%bigquery --project {PROJECT_ID}

ALTER TABLE `cymbal_retail_ai_ds.cymbal_product_augmented`
ADD COLUMN IF NOT EXISTS mm_embedding ARRAY<FLOAT64>;

In [None]:
%%bigquery --project {PROJECT_ID}

UPDATE `cymbal_retail_ai_ds.cymbal_product_augmented`
SET mm_embedding = (
    SELECT ml_generate_embedding_result
    FROM ML.GENERATE_EMBEDDING(
        MODEL `cymbal_retail_ai_ds.cymbal_multimodal_model`,
        (SELECT product_id, OBJ.GET_ACCESS_URL(OBJ.FETCH_METADATA(OBJ.MAKE_REF(product_image_gcs_uri))) AS content
         FROM `cymbal_retail_ai_ds.cymbal_product_augmented` AS inner_table
         WHERE inner_table.product_id = `cymbal_retail_ai_ds.cymbal_product_augmented`.product_id)
    )
)
WHERE TRUE;

### 3.3 Perform Text-to-Image Search
**TODO:** Search for **"modern white appliances"**. This time, the search must match against the **Image Embeddings (`mm_embedding`)**, not the text.

In [None]:
%%bigquery --project {PROJECT_ID}

SELECT base.product_nm, base.product_image_gcs_uri, distance
FROM VECTOR_SEARCH(
    TABLE `cymbal_retail_ai_ds.cymbal_product_augmented`,
    'mm_embedding',
    (SELECT ml_generate_embedding_result
     FROM ML.GENERATE_EMBEDDING(
         MODEL `cymbal_retail_ai_ds.cymbal_multimodal_model`,
         (SELECT "modern white appliances" AS content)
     )
    ),
    top_k => 3
)
ORDER BY distance ASC;

#### Task Validation
Task Complete! 
> **Note:** Go back to the lab guide page and click **Check my progress** for **AT ID: 7142 Multimodal Search (Text-to-Image)**.

## 4. Challenge: Automated Embeddings (AI.EMBED)
Manually updating embeddings is slow. Use BigQuery's `GENERATED ALWAYS AS` syntax to automate this.

### 4.1 Create Auto-Embedding Table
**TODO:** Create a table `cymbal_product_auto_embedding` that automatically generates embeddings for `product_description` using `text-embedding-005`.

In [None]:
%%bigquery --project {PROJECT_ID}

CREATE OR REPLACE TABLE `cymbal_retail_ai_ds.cymbal_product_auto_embedding` (
    product_id STRING,
    product_nm STRING,
    product_description STRING,
    embedding STRUCT<result ARRAY<FLOAT64>, status STRING>
        GENERATED ALWAYS AS (AI.EMBED(product_description, 'text-embedding-005'))
        STORED
        OPTIONS(asynchronous = TRUE)
);

### 4.2 Insert and Verify
**TODO:** Insert data from the augmented table into the auto-embedding table and verify the embeddings are generated.

In [None]:
%%bigquery --project {PROJECT_ID}

INSERT INTO `cymbal_retail_ai_ds.cymbal_product_auto_embedding` (product_id, product_nm, product_description)
SELECT product_id, product_nm, product_description
FROM `cymbal_retail_ai_ds.cymbal_product_augmented`;

In [None]:
%%bigquery --project {PROJECT_ID}

-- Validate
SELECT * FROM `cymbal_retail_ai_ds.cymbal_product_auto_embedding` LIMIT 5;

#### Task Validation
Task Complete! 
> **Note:** Go back to the lab guide page and click **Check my progress** for **AT ID 7143 Automated Embeddings (AI.EMBED)**.