In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Introduction to Multimodal Embeddings on Vertex AI

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro_multimodal_embeddings.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Run in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fembeddings%2Fintro_multimodal_embeddings.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Run in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro_multimodal_embeddings.ipynb">
      <img width="32px" src="https://www.svgrepo.com/download/217753/github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/embeddings/intro_multimodal_embeddings.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://goo.gle/4fVmfkB">
      <img width="32px" src="https://cdn.qwiklabs.com/assets/gcp_cloud-e3a77215f0b8bfa9b3f611c0d2208c7e8708ed31.svg" alt="Google Cloud logo"><br> Open in  Cloud Skills Boost
    </a>
  </td>
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro_multimodal_embeddings.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro_multimodal_embeddings.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro_multimodal_embeddings.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro_multimodal_embeddings.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro_multimodal_embeddings.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>            

| Authors |
| --- |
| [Lavi Nigam](https://github.com/lavinigam-gcp) |
| [Kaz Sato](https://github.com/kazunori279) |

### Objectives

In this notebook, you will explore:
* Vertex AI Multimodal Embeddings API (Texts, Images & Video)
* Building simple search with e-commerce data
    - Find product based on text query
    - Find product based on image
    - Find Video based on video

## Multimodal Embeddings

[Vertex AI Multimodal Embeddings API](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-multimodal-embeddings) generates [`128`, `256`, `512`, and `1408` (default)] -dimension vectors based on the input you provide, which can include a combination of image, text, and video data. The embedding vectors can then be used for subsequent tasks like image classification or video content moderation.

The image embedding vector and text embedding vector generated with this API shares the semantic space. Consequently, these vectors can be used interchangeably for use cases like searching image by text, or searching video by image.

**Use cases**

**Image and text:**

* Image classification: Takes an image as input and predicts one or more classes (labels).
* Image search: Search relevant or similar images.
* Recommendations: Generate product or ad recommendations based on images.

**Image, text, and video:**

* Recommendations: Generate product or advertisement recommendations based on videos (similarity search).
* Video content search
    * Using semantic search: Take a text as an input, and return a set of ranked frames matching the query.
* Using similarity search:
    * Take a video as an input, and return a set of videos matching the query.
    * Take an image as an input, and return a set of videos matching the query.
* Video classification: Takes a video as input and predicts one or more classes.

## Getting Started

### Install Vertex AI SDK for Python and other dependencies

In [None]:
%pip install --upgrade --quiet google-cloud-aiplatform numpy pandas seaborn scikit-learn

### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).

In [None]:
import sys

# Additional authentication is required for Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information and initialize Vertex AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [None]:
# Use the environment variable if the user doesn't provide Project ID.
import os

PROJECT_ID = "[your-project-id]"  # @param {type: "string", placeholder: "[your-project-id]", isTemplate: true}
if not PROJECT_ID or PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

# Initialize Vertex AI
import vertexai

vertexai.init(project=PROJECT_ID, location=LOCATION)

### Import libraries

In [None]:
# for data processing
import numpy as np
import pandas as pd
import seaborn as sns

pd.options.mode.chained_assignment = None  # default='warn'

# for showing images and videos
from IPython.display import HTML
from IPython.display import Image as ImageByte
from IPython.display import display
from sklearn.metrics.pairwise import cosine_similarity

# vertex ai sdk
from vertexai.vision_models import Image as VMImage
from vertexai.vision_models import MultiModalEmbeddingModel
from vertexai.vision_models import Video as VMVideo
from vertexai.vision_models import VideoSegmentConfig

### Load Vertex AI Multimodal Embeddings

In [None]:
mm_embedding_model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")

### Helper functions

In [None]:
def get_text_embedding(
    text: str = "banana muffins",
    dimension: int | None = 1408,
) -> list[float]:
    embedding = mm_embedding_model.get_embeddings(
        contextual_text=text,
        dimension=dimension,
    )
    return embedding.text_embedding


def get_image_embedding(
    image_path: str,
    dimension: int | None = 1408,
) -> list[float]:
    image = VMImage.load_from_file(image_path)
    embedding = mm_embedding_model.get_embeddings(
        image=image,
        dimension=dimension,
    )
    return embedding.image_embedding


def get_video_embedding(
    video_path: str,
    dimension: int | None = 1408,
    video_segment_config: VideoSegmentConfig | None = None,
) -> list[float]:
    video = VMVideo.load_from_file(video_path)
    embedding = mm_embedding_model.get_embeddings(
        video=video,
        dimension=dimension,
        video_segment_config=video_segment_config,
    )
    return [video_emb.embedding for video_emb in embedding.video_embeddings]


def get_public_url_from_gcs(gcs_uri: str) -> str:
    return gcs_uri.replace("gs://", "https://storage.googleapis.com/").replace(
        " ", "%20"
    )


def display_video_from_gcs(gcs_uri: str) -> None:
    display(
        HTML(
            f"""
    <video width="640" height="480" controls>
        <source src="{get_public_url_from_gcs(gcs_uri)}" type="video/mp4">
        Your browser does not support the video tag.
    </video>
    """
        )
    )


def print_similar_images(query_emb: list[float], data_frame: pd.DataFrame):
    # calc dot product
    image_embs = data_frame["image_embeddings"]
    scores = [np.dot(eval(image_emb), query_emb) for image_emb in image_embs]
    data_frame["score"] = scores
    data_frame = data_frame.sort_values(by="score", ascending=False)

    # print results
    print(data_frame.head()[["score", "title"]])
    for url in data_frame.head()["img_url"]:
        display(ImageByte(url=url, width=200, height=200))


def print_similar_videos(query_emb: list[float], data_frame: pd.DataFrame):
    # calc dot product
    video_embs = data_frame["video_embeddings"]
    scores = [np.dot(eval(video_emb), query_emb) for video_emb in video_embs]
    data_frame["score"] = scores
    data_frame = data_frame.sort_values(by="score", ascending=False)

    # print results
    print(data_frame.head()[["score", "file_name"]])
    url = data_frame.iloc[0]["gcs_path"]
    display_video_from_gcs(url)

## Generate Text Embeddings

In [None]:
text_emb = get_text_embedding(text="What is life?")
print("length of embedding: ", len(text_emb))
print("First five values are: ", text_emb[:5])

#### Embeddings and Pandas DataFrames

If your text is stored in a column of a DataFrame, you can create a new column with the embeddings with the example below.

In [None]:
text = [
    "i really enjoyed the movie last night",
    "so many amazing cinematic scenes yesterday",
    "had a great time writing my Python scripts a few days ago",
    "huge sense of relief when my .py script finally ran without error",
    "O Romeo, Romeo, wherefore art thou Romeo?",
]

df = pd.DataFrame(text, columns=["text"])
df

Create a new column, `embeddings`, using the [`apply()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html) function in pandas with the embeddings model.

In [None]:
df["embedding"] = df.apply(lambda x: get_text_embedding(x.text), axis=1)
df

#### Comparing similarity of text examples using cosine similarity

In [None]:
cos_sim_array = cosine_similarity(list(df.embedding.values))

# display as DataFrame
df = pd.DataFrame(cos_sim_array, index=text, columns=text)
df

To make this easier to understand, you can use a heatmap. Naturally, text is most similar when they are identical (score of 1.0). The next highest scores are when sentences are semantically similar. The lowest scores are when sentences are quite different in meaning.

In [None]:
ax = sns.heatmap(df, annot=True, cmap="crest")
ax.xaxis.tick_top()
ax.set_xticklabels(text, rotation=90)

## Generate Image Embeddings

In [None]:
# Image embeddings with default 1408 dimension
image_path = "gs://github-repo/embeddings/getting_started_embeddings/gms_images/GGOEACBA104999.jpg"
print(get_public_url_from_gcs(image_path))

image_emb = get_image_embedding(
    image_path=image_path,
)
print("length of embedding: ", len(image_emb))
print("First five values are: ", image_emb[:5])

### Find product images based on text search query

In [None]:
# get product list with pre-computed image embeddings
product_image_list = pd.read_csv(
    "https://storage.googleapis.com/github-repo/embeddings/getting_started_embeddings/image_data_with_embeddings.csv"
)
product_image_list.head()

In [None]:
# calc_scores for a text query
query_emb = get_text_embedding("something related to dinosaurs theme")
print_similar_images(query_emb, product_image_list)

In [None]:
query_emb = get_text_embedding("Socks in checkered patterns")
print_similar_images(query_emb, product_image_list)

## Generate Video Embeddings

In [None]:
# Video embeddings with 1408 dimension
video_path = "gs://github-repo/embeddings/getting_started_embeddings/UCF-101-subset/BrushingTeeth/v_BrushingTeeth_g01_c02.mp4"
display_video_from_gcs(video_path)

video_emb = get_video_embedding(
    video_path=video_path,
)

print("length of embedding: ", len(video_emb[0]))
print("First five values of the first segment are: ", video_emb[0][:5])

### Find videos based on text search query

In [None]:
video_list = pd.read_csv(
    "https://storage.googleapis.com/github-repo/embeddings/getting_started_embeddings/video_data_with_embeddings.csv"
)
print(f"Items in the video list: {len(video_list)}")
video_list.head()

### Find Similar videos

In [None]:
query_emb = get_text_embedding("A music concert")
print_similar_videos(query_emb, video_list)

In [None]:
query_emb = get_text_embedding("A person playing a TaiChi")
print_similar_videos(query_emb, video_list)

## What's next?

- Learn how to store the vectors (embeddings) into Vertex AI Vector Search: [Notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/vector-search-quickstart.ipynb)
- Learn how to tune the embeddings with your own data: [Notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro_embeddings_tuning.ipynb)
- Learn how to use embeddings to do Text RAG and Multimodal RAG: [Notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/intro_multimodal_rag.ipynb)