<a href="https://colab.research.google.com/github/xqr-g/vertex-ai-samples/blob/main/notebooks/community/generative_ai/text_embedding_api_cloud_next_new_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Cloud Next Embedding models


<table align="left">

  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/generative_ai/text_embedding_api_cloud_next_new_models.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/generative_ai/text_embedding_api_cloud_next_new_models.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/community/generative_ai/text_embedding_api_cloud_next_new_models.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>                                                                                               
</table>

**_NOTE_**: This notebook has been tested in the following environment:

* Python version = 3.10

## Overview

This colab is used as a code example for how to call our newly released text embedding models (textembedding-gecko@latest and textembedding-gecko-multilingual@latest).

Learn more about [text embedding api](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings).

This tutorial uses the following Google Cloud ML services and resources:
- Vertex LLM SDK

The steps performed include:
- Installation and imports
- Generate embeddings


### Costs

This tutorial uses billable components of Google Cloud:

* Vertex AI

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing),
and use the [Pricing Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Before you begin

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

3. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

4. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).

### Authenticate your Google Cloud account

Depending on your Jupyter environment, you may have to manually authenticate. Follow the relevant instructions below.

**1. Vertex AI Workbench**
* Do nothing as you are already authenticated.

**2. Local JupyterLab instance, uncomment and run:**

In [None]:
# ! gcloud auth login

**3. Colab, uncomment and run:**

In [None]:
# from google.colab import auth
# auth.authenticate_user()

## Installation

Install the following packages required to execute this notebook.

**Remember to restart the runtime after installation.**

In [None]:
!pip install git+https://github.com/googleapis/python-aiplatform.git

### Please restart the runtime.

### Imports libraries

In [None]:
import vertexai
from vertexai.language_models import TextEmbeddingInput, TextEmbeddingModel

#### Set your project ID and initiate Vertex AI

**If you don't know your project ID**, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)

In [None]:
PROJECT_ID = ""  # @param {type:"string"}
REGION = "us-central1"

# Set the project id
! gcloud config set project {PROJECT_ID}

# Initiate Vertex AI
vertexai.init(project=PROJECT_ID, location=REGION)

## Generate embeddings

In [None]:
# Set the model name.
MODEL_NAME = "textembedding-gecko@latest"  # @param ["textembedding-gecko@latest", "textembedding-gecko-multilingual@latest"]

# Set the task_type, text and optional title as the model inputs.
TASK_TYPE = "RETRIEVAL_DOCUMENT"  # @param ["RETRIEVAL_QUERY", "RETRIEVAL_DOCUMENT", "SEMANTIC_SIMILARITY", "CLASSIFICATION", "CLUSTERING"]
TITLE = "Google"  # @param {type:"string"}
TEXT = "Embed text."  # @param {type:"string"}

# Verify the input is valid.
if not MODEL_NAME:
    raise ValueError("Please set MODEL_NAME.")
if not TASK_TYPE:
    raise ValueError("Please set TASK_TYPE.")
if not TEXT:
    raise ValueError("Please set TEXT.")
if TITLE and TASK_TYPE != "RETRIEVAL_DOCUMENT":
    raise ValueError("Title can only be provided if the task_type is RETRIEVAL_DOCUMENT")

In [None]:
def text_embedding(
  model_name: str, task_type: str, text: str, title: str = "") -> list:
    """Generate text embedding with a Large Language Model."""
    model = TextEmbeddingModel.from_pretrained(model_name)

    text_embedding_input = TextEmbeddingInput(
        task_type=task_type, title=title, text=text)
    embeddings = model.get_embeddings([text_embedding_input])
    return embeddings[0].values

embedding = text_embedding(
    model_name=MODEL_NAME, task_type=TASK_TYPE, text=TEXT, title=TITLE)
print(len(embedding))

768
