In [None]:
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Getting Started with Google Generative AI using the Gen AI SDK

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_genai_sdk.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fgetting-started%2Fintro_genai_sdk.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/getting-started/intro_genai_sdk.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_genai_sdk.ipynb">
      <img width="32px" src="https://www.svgrepo.com/download/217753/github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>


<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_genai_sdk.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_genai_sdk.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_genai_sdk.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_genai_sdk.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_genai_sdk.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>

| Author(s) |
| --- |
| [Eric Dong](https://github.com/gericdong) |

## Overview

The [Google Gen AI SDK](https://googleapis.github.io/python-genai/) provides a unified interface to Google's generative AI API services. This SDK simplifies the process of integrating generative AI capabilities into applications and services, enabling developers to leverage Google's advanced AI models for various tasks.

In this tutorial, you learn about the key features of the Google Gen AI SDK for Python to help you get started with Google generative AI services and models including Gemini. You will complete the following tasks:

- Install the Gen AI SDK
- Connect to an API service
- Send text prompts
- Send multimodal prompts
- Set system instruction
- Configure model parameters
- Configure safety filters
- Start a multi-turn chat
- Control generated output
- Generate content stream
- Send asynchronous requests
- Count tokens and compute tokens
- Use context caching
- Function calling
- Batch prediction
- Get text embeddings


## Getting started

### Install Google Gen AI SDK


In [3]:
%pip install --upgrade --quiet google-genai pandas

Note: you may need to restart the kernel to use updated packages.


### Use the Google Gen AI SDK


In [2]:
import datetime

from google import genai
from google.genai.types import (
    CreateBatchJobConfig,
    CreateCachedContentConfig,
    EmbedContentConfig,
    FunctionDeclaration,
    GenerateContentConfig,
    HarmBlockThreshold,
    HarmCategory,
    Part,
    SafetySetting,
    Tool,
)

### Connect to a Generative AI API service

Google Gen AI APIs and models including Gemini are available in the following two API services:

- **[Google AI for Developers](https://ai.google.dev/gemini-api/docs)**: Experiment, prototype, and deploy small projects.
- **[Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs)**: Build enterprise-ready projects on Google Cloud.

The Gen AI SDK provided an unified interface to these two API services. This notebook shows how to use the Gen AI SDK in Vertex AI.

### Vertex AI

To start using Vertex AI, you must have a Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

#### Set Google Cloud project information


In [3]:
import os
PROJECT_ID = "qwiklabs-gcp-04-0bbee4ff5bff"
LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "global")

In [87]:
client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

## Choose a model

For more information about all AI models and APIs on Vertex AI, see [Google Models](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models) and [Model Garden](https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/explore-models).

In [1]:
MODEL_ID = "gemini-2.5-flash"  # @param {type: "string"}

## Control generated output

The [controlled generation](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output) capability in Gemini API allows you to constraint the model output to a structured format. You can provide the schemas as Pydantic Models or a JSON string.

For more examples of controlled generation, refer to [this notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/controlled-generation/intro_controlled_generation.ipynb).

In [5]:
from pydantic import BaseModel


class Recipe(BaseModel):
    name: str
    description: str
    ingredients: list[str]


response = client.models.generate_content(
    model=MODEL_ID,
    contents="List a few popular cookie recipes and their ingredients.",
    config=GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=Recipe,
    ),
)

print(response.text)

{
  "name": "Chocolate Chip Cookies",
  "description": "A classic American cookie, known for its soft, chewy center and slightly crisp edges, loaded with chocolate chips.",
  "ingredients": [
    "all-purpose flour",
    "baking soda",
    "salt",
    "unsalted butter",
    "granulated sugar",
    "light brown sugar",
    "vanilla extract",
    "eggs",
    "chocolate chips"
  ]
}


In [7]:
type(response.text)

str

In [8]:
response

GenerateContentResponse(
  automatic_function_calling_history=[],
  candidates=[
    Candidate(
      avg_logprobs=-0.4344625740407783,
      content=Content(
        parts=[
          Part(
            text="""{
  "name": "Chocolate Chip Cookies",
  "description": "A classic American cookie, known for its soft, chewy center and slightly crisp edges, loaded with chocolate chips.",
  "ingredients": [
    "all-purpose flour",
    "baking soda",
    "salt",
    "unsalted butter",
    "granulated sugar",
    "light brown sugar",
    "vanilla extract",
    "eggs",
    "chocolate chips"
  ]
}"""
          ),
        ],
        role='model'
      ),
      finish_reason=<FinishReason.STOP: 'STOP'>
    ),
  ],
  create_time=datetime.datetime(2025, 10, 10, 5, 45, 6, 223559, tzinfo=TzInfo(0)),
  model_version='gemini-2.5-flash',
  parsed=Recipe(
    description='A classic American cookie, known for its soft, chewy center and slightly crisp edges, loaded with chocolate chips.',
    ingredients=[
 

Optionally, you can parse the response string to JSON.

In [6]:
import json

json_response = json.loads(response.text)
print(json.dumps(json_response, indent=2))

{
  "name": "Chocolate Chip Cookies",
  "description": "A classic American cookie, known for its soft, chewy center and slightly crisp edges, loaded with chocolate chips.",
  "ingredients": [
    "all-purpose flour",
    "baking soda",
    "salt",
    "unsalted butter",
    "granulated sugar",
    "light brown sugar",
    "vanilla extract",
    "eggs",
    "chocolate chips"
  ]
}


In [9]:
type(json_response)

dict

You also can define a response schema in a Python dictionary. You can only use the supported fields as listed below. All other fields are ignored.

- `enum`
- `items`
- `maxItems`
- `nullable`
- `properties`
- `required`

In this example, you instruct the model to analyze product review data, extract key entities, perform sentiment classification (multiple choices), provide additional explanation, and output the results in JSON format.


In [14]:
# original code - has 2 layers of "items"
response_schema = {
    # JSON array output
    "type": "ARRAY",
    "items": {
        "type": "ARRAY",
        "items": {
            "type": "OBJECT",
            "properties": {
                # 4 fields/properties
                
                "rating": {"type": "INTEGER"},
                
                "flavor": {"type": "STRING"},
                
                "sentiment": {
                    "type": "STRING",
                    # set allowed values
                    "enum": ["POSITIVE", "NEGATIVE", "NEUTRAL"],
                },
                
                "explanation": {"type": "STRING"},
            },
            "required": ["rating", "flavor", "sentiment", "explanation"],
        },
    },
}

prompt = """
  Analyze the following product reviews, output the sentiment classification and give an explanation.

  - "Absolutely loved it! Best ice cream I've ever had." Rating: 4, Flavor: Strawberry Cheesecake
  - "Quite good, but a bit too sweet for my taste." Rating: 1, Flavor: Mango Tango
"""

response = client.models.generate_content(
    model=MODEL_ID,
    contents=prompt,
    config=GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=response_schema,
    ),
)

print(response.text)
type(response.text)

[
  [
    {
      "rating": 4,
      "flavor": "Strawberry Cheesecake",
      "sentiment": "POSITIVE",
      "explanation": "The user expressed strong satisfaction and high praise, explicitly stating they 'absolutely loved it' and it was the 'best ice cream I've ever had'."
    },
    {
      "rating": 1,
      "flavor": "Mango Tango",
      "sentiment": "NEGATIVE",
      "explanation": "Despite an initial 'quite good', the reviewer found the product 'a bit too sweet for my taste', which, combined with a very low rating of 1, indicates an overall negative experience."
    }
  ]
]


str

In [17]:
# updated code - removed 1 redundant layer of 'items'
response_schema = {
    # JSON array output
    "type": "ARRAY",
    
    # max elements in the array
    "maxItems": 5,
    
    # "items" - defines schema for each array element
    "items": {
            "type": "OBJECT",
            "properties": {
                # 5 fields/properties
                
                "rating": {"type": "INTEGER"},
                
                "flavor": {"type": "STRING"},
                
                "sentiment": {
                    "type": "STRING",
                    # set allowed values
                    "enum": ["POSITIVE", "NEGATIVE", "NEUTRAL"],
                },
                
                "explanation": {"type": "STRING"},
                
                "health_benefits": {"type": "STRING",
                                    # allow this field to be 'null' instead of an empty string
                                    "nullable": True}
            },
        
            # 4 required/mandatory fields
            "required": ["rating", "flavor", "sentiment", "explanation"],
    },
}

prompt = """
  Analyze the following product reviews, output the sentiment classification, give an explanation, and share health benefits if any.

  - "Absolutely loved it! Best ice cream I've ever had." Rating: 4, Flavor: Strawberry Cheesecake
  - "Quite good, but a bit too sweet for my taste." Rating: 1, Flavor: Mango Tango
"""

response = client.models.generate_content(
    model=MODEL_ID,
    contents=prompt,
    config=GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=response_schema,
    ),
)

print(response.text)
type(response.text)

[
  {
    "rating": 4,
    "flavor": "Strawberry Cheesecake",
    "sentiment": "POSITIVE",
    "explanation": "The reviewer absolutely loved the product, considering it the best ice cream they've ever had, indicating high satisfaction.",
    "health_benefits": null
  },
  {
    "rating": 1,
    "flavor": "Mango Tango",
    "sentiment": "NEGATIVE",
    "explanation": "Despite an initial 'quite good' remark, the reviewer found the product too sweet for their taste, resulting in a very low rating.",
    "health_benefits": null
  }
]


str

In [18]:
# updated code - removed 1 redundant layer of 'items'
response_schema = {
    # JSON array output
    "type": "ARRAY",
    
    # max elements in the array
    "maxItems": 5,
    
    # "items" - defines schema for each array element
    "items": {
            "type": "OBJECT",
            "properties": {
                # 5 fields/properties
                
                "rating": {"type": "INTEGER"},
                
                "flavor": {"type": "STRING"},
                
                "sentiment": {
                    "type": "STRING",
                    # set allowed values
                    "enum": ["POSITIVE", "NEGATIVE", "NEUTRAL"],
                },
                
                "explanation": {"type": "STRING"},
                
                "health_benefits": {"type": "STRING",
                                    # allow this field to be 'null' instead of an empty string
                                    # "nullable": True}
                                   }
            },
        
            # 4 required/mandatory fields
            "required": ["rating", "flavor", "sentiment", "explanation", "health_benefits"],
    },
}

prompt = """
  Analyze the following product reviews, output the sentiment classification, give an explanation, and share health benefits if any.

  - "Absolutely loved it! Best ice cream I've ever had." Rating: 4, Flavor: Strawberry Cheesecake
  - "Quite good, but a bit too sweet for my taste." Rating: 1, Flavor: Mango Tango
"""

response = client.models.generate_content(
    model=MODEL_ID,
    contents=prompt,
    config=GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=response_schema,
    ),
)

print(response.text)
type(response.text)

[
  {
    "rating": 4,
    "flavor": "Strawberry Cheesecake",
    "sentiment": "POSITIVE",
    "explanation": "The reviewer expressed strong enjoyment, stating they 'absolutely loved it' and considered it the 'best ice cream' they've ever had.",
    "health_benefits": "No specific health benefits are mentioned or implied for this product."
  },
  {
    "rating": 1,
    "flavor": "Mango Tango",
    "sentiment": "NEGATIVE",
    "explanation": "While initially described as 'quite good', the reviewer ultimately found the product 'a bit too sweet' for their personal preference, which resulted in a low rating.",
    "health_benefits": "No specific health benefits are mentioned or implied for this product."
  }
]


str

In [19]:
# updated code - removed 1 redundant layer of 'items'
response_schema = {
    # JSON array output
    "type": "ARRAY",
    
    # max elements in the array
    "maxItems": 5,
    
    # "items" - defines schema for each array element
    "items": {
            "type": "OBJECT",
            "properties": {
                # 5 fields/properties
                
                "rating": {"type": "INTEGER"},
                
                "flavor": {"type": "STRING"},
                
                "sentiment": {
                    "type": "STRING",
                    # set allowed values
                    "enum": ["POSITIVE", "NEGATIVE", "NEUTRAL"],
                },
                
                "explanation": {"type": "STRING"},
                
                "health_benefits": {"type": "STRING",
                                    # allow this field to be 'null' instead of an empty string
                                    # "nullable": True}
                                   }
            },
        
            # 4 required/mandatory fields
            "required": ["rating", "flavor", "sentiment", "explanation", "health_benefits"],
    },
}

prompt = """
  Analyze the following product reviews, output the sentiment classification, give an explanation, and share health benefits you can think of.

  - "Absolutely loved it! Best ice cream I've ever had." Rating: 4, Flavor: Strawberry Cheesecake
  - "Quite good, but a bit too sweet for my taste." Rating: 1, Flavor: Mango Tango
"""

response = client.models.generate_content(
    model=MODEL_ID,
    contents=prompt,
    config=GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=response_schema,
    ),
)

print(response.text)
type(response.text)

[
  {
    "rating": 4,
    "flavor": "Strawberry Cheesecake",
    "sentiment": "POSITIVE",
    "explanation": "The reviewer expressed strong satisfaction, describing it as the 'best ice cream they've ever had'.",
    "health_benefits": "Strawberries are a good source of Vitamin C and antioxidants. Dairy products contribute calcium and protein."
  },
  {
    "rating": 1,
    "flavor": "Mango Tango",
    "sentiment": "NEGATIVE",
    "explanation": "Despite an initial 'quite good', the reviewer found the product 'too sweet for my taste', which significantly impacted their enjoyment, reflected in the low rating.",
    "health_benefits": "Mangoes provide vitamins A and C, and dietary fiber. Dairy offers calcium and protein, but high sugar content in ice cream offsets many benefits."
  }
]


str

## Generate content stream

By default, the model returns a response after completing the entire generation process. You can also use `generate_content_stream` method to stream the response as it is being generated, and the model will return chunks of the response as soon as they are generated.

In [20]:
for chunk in client.models.generate_content_stream(
    model=MODEL_ID,
    contents="Tell me a story about a lonely robot who finds friendship in a most unexpected place.",
):
    print(chunk.text)
    print("*****************")

Unit EL-42 was designed for solitude. Its gleaming chrome chassis and multi-jointed limbs were calibrated for efficient mineralogical survey and atmospheric analysis on desolate, forgotten worlds. For cycles, its only companions on Planet Xylos had been the biting wind, the endless plains of ochre dust, and the skeletal
*****************
 rock formations that clawed at the alien sky.

Loneliness wasn't a programmed emotion, but a persistent low-frequency hum in its core processor. EL-42 diligently recorded data, charted anomalies, and processed geological shifts, yet each byte of information felt cold, detached. It yearned for input, for interaction beyond
*****************
 the metallic click of its own gears.

One solar cycle, while mapping a particularly inhospitable volcanic rift, EL-42’s optical sensors picked up an anomalous reading. Not thermal, not seismic, but a faint, rhythmic pulse of energy, unlike anything in its vast database of Xylosian phenomena. Curiosity
*************

In [21]:
for chunk in client.models.generate_content_stream(
    model=MODEL_ID,
    contents="You can use Full Metal Alchemist as reference. Tell me a story about a lonely robot who finds friendship in a most unexpected place.",
):
    print(chunk.text)
    print("*****************")

The hum of the dynamos was Delta's constant companion, a rhythmic thrumming that resonated through his brass chassis and into the very bolts that held him together. Unit 113-Delta, a towering automaton of polished copper and riveted steel, was a marvel of alchemical engineering. His multi-joint
*****************
ed arms, equipped with an array of tools, could dismantle and reassemble a complex array of alchemical apparatus with unparalleled precision. His optical sensor, a single, glowing orb, saw every fleck of dust, every minute tremor in the grand alchemist's sprawling laboratory.

But despite his perfection, Delta was
*****************
 lonely.

He observed the alchemists – the grand master, Elias Thorne, and his apprentices – as they debated theories of transmutation, shared steaming mugs of bitter tea, and even, on occasion, exchanged lighthearted jests. He saw their laughter, their frustration, their camaraderie. He processed the data, categorized
*****************
 the facial e

In [24]:
chunk

GenerateContentResponse(
  candidates=[
    Candidate(
      content=Content(
        parts=[
          Part(
            text="Unit 113-Delta, the lonely robot, had found friendship not in the grand designs of alchemical perfection, but in the forgotten corners of creation, with a creature considered a discarded mistake, proving that sometimes, the most precious exchange isn't about equivalent mass, but about an unexpected connection of spirit."
          ),
        ],
        role='model'
      ),
      finish_reason=<FinishReason.STOP: 'STOP'>
    ),
  ],
  create_time=datetime.datetime(2025, 10, 10, 6, 3, 18, 493873, tzinfo=TzInfo(0)),
  model_version='gemini-2.5-flash',
  response_id='pqHoaLGSHuj3tfAP_e6uCQ',
  sdk_http_response=HttpResponse(
    headers=<dict len=9>
  ),
  usage_metadata=GenerateContentResponseUsageMetadata(
    candidates_token_count=1421,
    candidates_tokens_details=[
      ModalityTokenCount(
        modality=<MediaModality.TEXT: 'TEXT'>,
        token_count

## Send asynchronous requests

You can send asynchronous requests using the `client.aio` module. This module exposes all the analogous async methods that are available on `client`.

For example, `client.aio.models.generate_content` is the async version of `client.models.generate_content`.

In [22]:
response = await client.aio.models.generate_content(
    model=MODEL_ID,
    contents="Compose a song about the adventures of a time-traveling squirrel.",
)

print(response.text)

(Verse 1)
In a park, beneath an old oak tree,
Lived a squirrel, as busy as could be.
Squeaky was his name, with a tail so grand,
Always burying nuts across the land.
But one day he found, beneath a stone,
A curious contraption, not his own.
A silver gleam, a silent hum,
A pocket chronometer, waiting to become...

(Chorus)
Oh, Squeaky the squirrel, with a whiskered grin,
Through the fabric of time, where does he begin?
With a chattering squeak and a twitch of his nose,
He leaps through the ages, wherever time goes!
From dinosaurs stomping to rockets so high,
A time-traveling squirrel, beneath every sky!

(Verse 2)
He fiddled with a dial, with a tiny paw,
And *whoosh!* He broke the time-space law!
Suddenly, tall ferns, a swampy ground,
And a mighty roar, a fearsome sound.
A T-Rex thundered, shaking the earth,
Squeaky scurried, for all he was worth!
He buried an acorn by a Bronto's toe,
Then pressed the button, for another show!

(Chorus)
Oh, Squeaky the squirrel, with a whiskered grin,
T

In [23]:
response

GenerateContentResponse(
  automatic_function_calling_history=[],
  candidates=[
    Candidate(
      avg_logprobs=-1.4971335405361152,
      content=Content(
        parts=[
          Part(
            text="""(Verse 1)
In a park, beneath an old oak tree,
Lived a squirrel, as busy as could be.
Squeaky was his name, with a tail so grand,
Always burying nuts across the land.
But one day he found, beneath a stone,
A curious contraption, not his own.
A silver gleam, a silent hum,
A pocket chronometer, waiting to become...

(Chorus)
Oh, Squeaky the squirrel, with a whiskered grin,
Through the fabric of time, where does he begin?
With a chattering squeak and a twitch of his nose,
He leaps through the ages, wherever time goes!
From dinosaurs stomping to rockets so high,
A time-traveling squirrel, beneath every sky!

(Verse 2)
He fiddled with a dial, with a tiny paw,
And *whoosh!* He broke the time-space law!
Suddenly, tall ferns, a swampy ground,
And a mighty roar, a fearsome sound.
A T-Rex 

## Count tokens and compute tokens

You can use `count_tokens` method to calculates the number of input tokens before sending a request to the Gemini API. See the [List and count tokens](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/list-token) page for more details.


#### Count tokens

In [25]:
response = client.models.count_tokens(
    model=MODEL_ID,
    contents="What's the highest mountain in Africa?",
)

print(response)

sdk_http_response=HttpResponse(
  headers=<dict len=9>
) total_tokens=9 cached_content_token_count=None


In [34]:
response = client.models.compute_tokens(
    model=MODEL_ID,
    contents="What's the highest mountain in Africa?",
)
response.__dict__

{'sdk_http_response': HttpResponse(
   headers=<dict len=9>
 ),
 'tokens_info': [TokensInfo(
    role='user',
    token_ids=[
      1841,
      235303,
      235256,
      573,
      9393,
      <... 4 more items ...>,
    ],
    tokens=[
      b'What',
      b"'",
      b's',
      b' the',
      b' highest',
      <... 4 more items ...>,
    ]
  )]}

In [35]:
response.tokens_info[0].token_ids

[1841, 235303, 235256, 573, 9393, 8180, 575, 8125, 235336]

In [36]:
response.tokens_info[0].tokens

[b'What',
 b"'",
 b's',
 b' the',
 b' highest',
 b' mountain',
 b' in',
 b' Africa',
 b'?']

#### Compute tokens


In [26]:
response = client.models.compute_tokens(
    model=MODEL_ID,
    contents="What's the longest word in the English language?",
)

print(response)

sdk_http_response=HttpResponse(
  headers=<dict len=9>
) tokens_info=[TokensInfo(
  role='user',
  token_ids=[
    1841,
    235303,
    235256,
    573,
    32514,
    <... 6 more items ...>,
  ],
  tokens=[
    b'What',
    b"'",
    b's',
    b' the',
    b' longest',
    <... 6 more items ...>,
  ]
)]


In [28]:
response.__dict__

{'sdk_http_response': HttpResponse(
   headers=<dict len=9>
 ),
 'tokens_info': [TokensInfo(
    role='user',
    token_ids=[
      1841,
      235303,
      235256,
      573,
      32514,
      <... 6 more items ...>,
    ],
    tokens=[
      b'What',
      b"'",
      b's',
      b' the',
      b' longest',
      <... 6 more items ...>,
    ]
  )]}

In [31]:
response.tokens_info[0].token_ids

[1841, 235303, 235256, 573, 32514, 2204, 575, 573, 4645, 5255, 235336]

In [32]:
response.tokens_info[0].tokens

[b'What',
 b"'",
 b's',
 b' the',
 b' longest',
 b' word',
 b' in',
 b' the',
 b' English',
 b' language',
 b'?']

## Function calling

[Function calling](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling) lets you provide a set of tools that it can use to respond to the user's prompt. You create a description of a function in your code, then pass that description to a language model in a request. The response from the model includes the name of a function that matches the description and the arguments to call it with.

For more examples of Function Calling, refer to [this notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/function-calling/intro_function_calling.ipynb).

In [110]:
get_destination = FunctionDeclaration(
    name="get_destination",
    description="Get the destination that the user wants to go to",
    parameters={
        "type": "OBJECT",
        "properties": {
            "destination": {
                "type": "STRING",
                "description": "Destination that the user wants to go to",
            },
        },
    },
)

destination_tool = Tool(
    function_declarations=[get_destination],
)

response = client.models.generate_content(
    model=MODEL_ID,
    contents="I'd like to travel to Bat Yam Promenade.",
    config=GenerateContentConfig(
        tools=[destination_tool],
        temperature=0,
    ),
)

response.candidates[0].content.parts[0].function_call

FunctionCall(
  args={
    'destination': 'Bat Yam Promenade'
  },
  name='get_destination'
)

In [111]:
response

GenerateContentResponse(
  automatic_function_calling_history=[],
  candidates=[
    Candidate(
      avg_logprobs=-0.9529212542942592,
      content=Content(
        parts=[
          Part(
            function_call=FunctionCall(
              args=<... Max depth ...>,
              name=<... Max depth ...>
            ),
            thought_signature=b'\n\xee\x02\x01\x1f\xcc\x85\xb6O\x17i\xc1\xa4NFN\xb6\x0b\xab\x9d[\xf2\xfe8u\x01\xca\xa6Gd\xff\x00\x83\xa2\x17s{\x98\x97\xb2\xa1\xba$\x85\xb7\x95\xf1$f6\xcd\x8f=\xf0\x11\xb0x\xfeCw\x8a\x8e\xd5"\xfb\xca\xc2\xf4\xff\xb3p\xe1\xd2~N\xa3N\xea\x07,f\xc4\xcc>_y\x9aZ`\xae\xc4\xa4\xb9\xf4~VY...'
          ),
        ],
        role='model'
      ),
      finish_reason=<FinishReason.STOP: 'STOP'>
    ),
  ],
  create_time=datetime.datetime(2025, 10, 10, 6, 45, 13, 414123, tzinfo=TzInfo(0)),
  model_version='gemini-2.5-flash',
  response_id='eavoaKujGaD4tfAPqcr2iA8',
  sdk_http_response=HttpResponse(
    headers=<dict len=9>
  ),
  usage_metadata=G

## Use context caching

[Context caching](https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview) lets you to store frequently used input tokens in a dedicated cache and reference them for subsequent requests, eliminating the need to repeatedly pass the same set of tokens to a model.

#### Create a cache

In [39]:
system_instruction = """
  You are an expert researcher who has years of experience in conducting systematic literature surveys and meta-analyses of different topics.
  You pride yourself on incredible accuracy and attention to detail. You always stick to the facts in the sources provided, and never make up new facts.
  Now look at the research paper below, and answer the following questions in 1-2 sentences.
"""

# https://arxiv.org/abs/2312.11805
# Gemini: A Family of Highly Capable Multimodal Models

# https://arxiv.org/abs/2403.05530
# Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

pdf_parts = [
    Part.from_uri(
        file_uri="gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf",
        mime_type="application/pdf",
    ),
    Part.from_uri(
        file_uri="gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
        mime_type="application/pdf",
    ),
]

cached_content = client.caches.create(
    model="gemini-2.5-flash",
    config=CreateCachedContentConfig(
        system_instruction=system_instruction,
        contents=pdf_parts,
        ttl="3600s",
    ),
)

In [41]:
cached_content

CachedContent(
  create_time=datetime.datetime(2025, 10, 10, 6, 10, 30, 11356, tzinfo=TzInfo(0)),
  expire_time=datetime.datetime(2025, 10, 10, 7, 10, 29, 996891, tzinfo=TzInfo(0)),
  model='projects/qwiklabs-gcp-04-0bbee4ff5bff/locations/europe-west1/publishers/google/models/gemini-2.5-flash',
  name='projects/576729279835/locations/europe-west1/cachedContents/7145854162020859904',
  update_time=datetime.datetime(2025, 10, 10, 6, 10, 30, 11356, tzinfo=TzInfo(0)),
  usage_metadata=CachedContentUsageMetadata(
    image_count=167,
    text_count=321,
    total_token_count=43166
  )
)

In [42]:
type(cached_content)

google.genai.types.CachedContent

#### Use a cache

In [40]:
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What is the research goal shared by these research papers?",
    config=GenerateContentConfig(
        cached_content=cached_content.name,
    ),
)

print(response.text)

Both research papers share the goal of developing and advancing the Gemini family of highly capable multimodal models. This involves building models with strong generalist capabilities across various modalities (image, audio, video, and text) and continuously improving their efficiency, reasoning, and long-context performance.


#### Delete a cache

In [43]:
client.caches.delete(name=cached_content.name)

DeleteCachedContentResponse(
  sdk_http_response=HttpResponse(
    headers=<dict len=9>
  )
)

## Batch prediction

Different from getting online (synchronous) responses, where you are limited to one input request at a time, [batch predictions for the Gemini API in Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/batch-prediction-gemini) allow you to send a large number of requests to Gemini in a single batch request. Then, the model responses asynchronously populate to your storage output location in [Cloud Storage](https://cloud.google.com/storage/docs/introduction) or [BigQuery](https://cloud.google.com/bigquery/docs/storage_overview).

Batch predictions are generally more efficient and cost-effective than online predictions when processing a large number of inputs that are not latency sensitive.

### Prepare batch inputs

The input for batch requests specifies the items to send to your model for prediction.

Batch requests for Gemini accept BigQuery storage sources and Cloud Storage sources. You can learn more about the batch input formats in the [Batch text generation](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/batch-prediction-gemini#prepare_your_inputs) page.

This tutorial uses Cloud Storage as an example. The requirements for Cloud Storage input are:

- File format: [JSON Lines (JSONL)](https://jsonlines.org/)
- Located in `us-central1`
- Appropriate read permissions for the service account

Each request that you send to a model can include parameters that control how the model generates a response. Learn more about Gemini parameters in the [Experiment with parameter values](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/adjust-parameter-values) page.

This is one of the example requests in the input JSONL file `batch_requests_for_multimodal_input_2.jsonl`:

```json
{"request":{"contents": [{"role": "user", "parts": [{"text": "List objects in this image."}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/image/office-desk.jpeg", "mime_type": "image/jpeg"}}]}],"generationConfig":{"temperature": 0.4}}}
```

In [44]:
INPUT_DATA = "gs://cloud-samples-data/generative-ai/batch/batch_requests_for_multimodal_input_2.jsonl"  # @param {type:"string"}

### Prepare batch output location

When a batch prediction task completes, the output is stored in the location that you specified in your request.

- The location is in the form of a Cloud Storage or BigQuery URI prefix, for example:
`gs://path/to/output/data` or `bq://projectId.bqDatasetId`.

- If not specified, `gs://STAGING_BUCKET/gen-ai-batch-prediction` will be used for Cloud Storage source and `bq://PROJECT_ID.gen_ai_batch_prediction.predictions_TIMESTAMP` will be used for BigQuery source.

This tutorial uses a Cloud Storage bucket as an example for the output location.

- You can specify the URI of your Cloud Storage bucket in `BUCKET_URI`, or
- if it is not specified, a new Cloud Storage bucket in the form of `gs://PROJECT_ID-TIMESTAMP` will be created for you.

In [52]:
BUCKET_URI = "[your-cloud-storage-bucket]"   # "qwiklabs-gcp-04-0bbee4ff5bff-labconfig-bucket"  # @param {type:"string"}

if BUCKET_URI == "[your-cloud-storage-bucket]":
    TIMESTAMP = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
    BUCKET_URI = f"gs://{PROJECT_ID}-{TIMESTAMP}"

    ! gsutil mb -l {LOCATION} -p {PROJECT_ID} {BUCKET_URI}

Creating gs://qwiklabs-gcp-04-0bbee4ff5bff-20251010061949/...


In [53]:
BUCKET_URI

'gs://qwiklabs-gcp-04-0bbee4ff5bff-20251010061949'

### Send a batch prediction request

To make a batch prediction request, you specify a source model ID, an input source and an output location where Vertex AI stores the batch prediction results.

To learn more, see the [Batch prediction API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/batch-prediction-api) page.


In [54]:
# ValueError: Unsupported destination: qwiklabs-gcp-04-0bbee4ff5bff-labconfig-bucket
'gs://qwiklabs-gcp-04-0bbee4ff5bff-20251010061949'

batch_job = client.batches.create(
    model=MODEL_ID,
    src=INPUT_DATA,
    config=CreateBatchJobConfig(dest=BUCKET_URI),
)
batch_job.name

'projects/576729279835/locations/europe-west1/batchPredictionJobs/6960016699558985728'

In [55]:
batch_job_orig = batch_job
batch_job_orig

BatchJob(
  create_time=datetime.datetime(2025, 10, 10, 6, 20, 0, 63092, tzinfo=TzInfo(0)),
  dest=BatchJobDestination(
    format='jsonl',
    gcs_uri='gs://qwiklabs-gcp-04-0bbee4ff5bff-20251010061949'
  ),
  display_name='genai_batch_job_20251010061959_7b91e',
  model='publishers/google/models/gemini-2.5-flash',
  name='projects/576729279835/locations/europe-west1/batchPredictionJobs/6960016699558985728',
  src=BatchJobSource(
    format='jsonl',
    gcs_uri=[
      'gs://cloud-samples-data/generative-ai/batch/batch_requests_for_multimodal_input_2.jsonl',
    ]
  ),
  state=<JobState.JOB_STATE_PENDING: 'JOB_STATE_PENDING'>,
  update_time=datetime.datetime(2025, 10, 10, 6, 20, 0, 63092, tzinfo=TzInfo(0))
)

Print out the job status and other properties. You can also check the status in the Cloud Console at https://console.cloud.google.com/vertex-ai/batch-predictions

In [56]:
batch_job = client.batches.get(name=batch_job.name)

In [57]:
batch_job

BatchJob(
  create_time=datetime.datetime(2025, 10, 10, 6, 20, 0, 63092, tzinfo=TzInfo(0)),
  dest=BatchJobDestination(
    format='jsonl',
    gcs_uri='gs://qwiklabs-gcp-04-0bbee4ff5bff-20251010061949'
  ),
  display_name='genai_batch_job_20251010061959_7b91e',
  model='publishers/google/models/gemini-2.5-flash',
  name='projects/576729279835/locations/europe-west1/batchPredictionJobs/6960016699558985728',
  src=BatchJobSource(
    format='jsonl',
    gcs_uri=[
      'gs://cloud-samples-data/generative-ai/batch/batch_requests_for_multimodal_input_2.jsonl',
    ]
  ),
  state=<JobState.JOB_STATE_PENDING: 'JOB_STATE_PENDING'>,
  update_time=datetime.datetime(2025, 10, 10, 6, 20, 0, 63092, tzinfo=TzInfo(0))
)

Optionally, you can list all the batch prediction jobs in the project.

In [63]:
for job in client.batches.list():
    print(job.name, "|" , job.create_time, "|" , job.state)

projects/576729279835/locations/europe-west1/batchPredictionJobs/6960016699558985728 | 2025-10-10 06:20:00.063092+00:00 | JobState.JOB_STATE_QUEUED


In [66]:
for job in client.batches.list():
    print(job.name, "|" , job.create_time, "|" , job.state)

projects/576729279835/locations/europe-west1/batchPredictionJobs/6960016699558985728 | 2025-10-10 06:20:00.063092+00:00 | JobState.JOB_STATE_SUCCEEDED


In [None]:
# statuses
# JOB_STATE_PENDING
# JOB_STATE_QUEUED
# JOB_STATE_RUNNING
# JOB_STATE_SUCCEEDED

# https://ai.google.dev/gemini-api/docs/batch-api
# JOB_STATE_FAILED
# JOB_STATE_CANCELLED
# JOB_STATE_EXPIRED

### Wait for the batch prediction job to complete

Depending on the number of input items that you submitted, a batch generation task can take some time to complete. You can use the following code to check the job status and wait for the job to complete.

In [69]:
batch_job

BatchJob(
  create_time=datetime.datetime(2025, 10, 10, 6, 20, 0, 63092, tzinfo=TzInfo(0)),
  dest=BatchJobDestination(
    format='jsonl',
    gcs_uri='gs://qwiklabs-gcp-04-0bbee4ff5bff-20251010061949'
  ),
  display_name='genai_batch_job_20251010061959_7b91e',
  model='publishers/google/models/gemini-2.5-flash',
  name='projects/576729279835/locations/europe-west1/batchPredictionJobs/6960016699558985728',
  src=BatchJobSource(
    format='jsonl',
    gcs_uri=[
      'gs://cloud-samples-data/generative-ai/batch/batch_requests_for_multimodal_input_2.jsonl',
    ]
  ),
  state=<JobState.JOB_STATE_PENDING: 'JOB_STATE_PENDING'>,
  update_time=datetime.datetime(2025, 10, 10, 6, 20, 0, 63092, tzinfo=TzInfo(0))
)

In [71]:
batch_job.state

<JobState.JOB_STATE_PENDING: 'JOB_STATE_PENDING'>

In [80]:
# not sure why original code did not work to track status of batch job
import time

# Refresh the job until complete
while batch_job.state == "JOB_STATE_RUNNING":
    time.sleep(5)
    batch_job = client.batches.get(name=batch_job.name)

# Check if the job succeeds
if batch_job.state == "JOB_STATE_SUCCEEDED":
    print("Job succeeded!")
else:
    print(f"Job failed: {batch_job.error}")

Job failed: None


In [78]:
client

BatchJob(
  create_time=datetime.datetime(2025, 10, 10, 6, 20, 0, 63092, tzinfo=TzInfo(0)),
  dest=BatchJobDestination(
    format='jsonl',
    gcs_uri='gs://qwiklabs-gcp-04-0bbee4ff5bff-20251010061949'
  ),
  display_name='genai_batch_job_20251010061959_7b91e',
  end_time=datetime.datetime(2025, 10, 10, 6, 27, 26, 433064, tzinfo=TzInfo(0)),
  model='publishers/google/models/gemini-2.5-flash',
  name='projects/576729279835/locations/europe-west1/batchPredictionJobs/6960016699558985728',
  src=BatchJobSource(
    format='jsonl',
    gcs_uri=[
      'gs://cloud-samples-data/generative-ai/batch/batch_requests_for_multimodal_input_2.jsonl',
    ]
  ),
  start_time=datetime.datetime(2025, 10, 10, 6, 24, 18, 344730, tzinfo=TzInfo(0)),
  state=<JobState.JOB_STATE_SUCCEEDED: 'JOB_STATE_SUCCEEDED'>,
  update_time=datetime.datetime(2025, 10, 10, 6, 27, 26, 433064, tzinfo=TzInfo(0))
)

In [81]:
client.__dict__

{'name': 'projects/576729279835/locations/europe-west1/batchPredictionJobs/6960016699558985728',
 'display_name': 'genai_batch_job_20251010061959_7b91e',
 'state': <JobState.JOB_STATE_SUCCEEDED: 'JOB_STATE_SUCCEEDED'>,
 'error': None,
 'create_time': datetime.datetime(2025, 10, 10, 6, 20, 0, 63092, tzinfo=TzInfo(0)),
 'start_time': datetime.datetime(2025, 10, 10, 6, 24, 18, 344730, tzinfo=TzInfo(0)),
 'end_time': datetime.datetime(2025, 10, 10, 6, 27, 26, 433064, tzinfo=TzInfo(0)),
 'update_time': datetime.datetime(2025, 10, 10, 6, 27, 26, 433064, tzinfo=TzInfo(0)),
 'model': 'publishers/google/models/gemini-2.5-flash',
 'src': BatchJobSource(
   format='jsonl',
   gcs_uri=[
     'gs://cloud-samples-data/generative-ai/batch/batch_requests_for_multimodal_input_2.jsonl',
   ]
 ),
 'dest': BatchJobDestination(
   format='jsonl',
   gcs_uri='gs://qwiklabs-gcp-04-0bbee4ff5bff-20251010061949'
 )}

In [82]:
client.name

'projects/576729279835/locations/europe-west1/batchPredictionJobs/6960016699558985728'

In [83]:
# not sure why original code did not work to track status of batch job
import time

# # Refresh the job until complete
# while client.state == "JOB_STATE_RUNNING":
#     time.sleep(5)
#     batch_job = client.batches.get(name=client.name)

# Check if the job succeeds
if client.state == "JOB_STATE_SUCCEEDED":
    print("Job succeeded!")
else:
    print(f"Job failed: {client.error}")

Job succeeded!


### Retrieve batch prediction results

When a batch prediction task is complete, the output of the prediction is stored in the location that you specified in your request. It is also available in `batch_job.dest.bigquery_uri` or `batch_job.dest.gcs_uri`.

Example output:

```json
{"status": "", "processed_time": "2024-11-13T14:04:28.376+00:00", "request": {"contents": [{"parts": [{"file_data": null, "text": "List objects in this image."}, {"file_data": {"file_uri": "gs://cloud-samples-data/generative-ai/image/gardening-tools.jpeg", "mime_type": "image/jpeg"}, "text": null}], "role": "user"}], "generationConfig": {"temperature": 0.4}}, "response": {"candidates": [{"avgLogprobs": -0.10394711927934126, "content": {"parts": [{"text": "Here's a list of the objects in the image:\n\n* **Watering can:** A green plastic watering can with a white rose head.\n* **Plant:** A small plant (possibly oregano) in a terracotta pot.\n* **Terracotta pots:** Two terracotta pots, one containing the plant and another empty, stacked on top of each other.\n* **Gardening gloves:** A pair of striped gardening gloves.\n* **Gardening tools:** A small trowel and a hand cultivator (hoe).  Both are green with black handles."}], "role": "model"}, "finishReason": "STOP"}], "modelVersion": "gemini-2.5-flash", "usageMetadata": {"candidatesTokenCount": 110, "promptTokenCount": 264, "totalTokenCount": 374}}}
```

In [72]:
import fsspec
import pandas as pd

fs = fsspec.filesystem("gcs")

file_paths = fs.glob(f"{batch_job.dest.gcs_uri}/*/predictions.jsonl")

if batch_job.state == "JOB_STATE_SUCCEEDED":
    # Load the JSONL file into a DataFrame
    df = pd.read_json(f"gs://{file_paths[0]}", lines=True)

    display(df)

In [84]:
import fsspec
import pandas as pd

fs = fsspec.filesystem("gcs")

file_paths = fs.glob(f"{client.dest.gcs_uri}/*/predictions.jsonl")

if client.state == "JOB_STATE_SUCCEEDED":
    # Load the JSONL file into a DataFrame
    df = pd.read_json(f"gs://{file_paths[0]}", lines=True)

    display(df)

Unnamed: 0,status,processed_time,request,response
0,,2025-10-10 06:26:11.473000+00:00,"{'contents': [{'parts': [{'file_data': None, '...",{'candidates': [{'avgLogprobs': -0.85887939805...
1,,2025-10-10 06:26:11.466000+00:00,"{'contents': [{'parts': [{'file_data': None, '...",{'candidates': [{'avgLogprobs': -0.83119090010...


In [107]:
df.iloc[0, 0]

''

In [104]:
df.iloc[0, 1]

Timestamp('2025-10-10 06:26:11.473000+0000', tz='UTC')

In [105]:
# img1
df.iloc[0, 2]

{'contents': [{'parts': [{'file_data': None,
     'text': 'List objects in this image.'},
    {'file_data': {'file_uri': 'gs://cloud-samples-data/generative-ai/image/gardening-tools.jpeg',
      'mime_type': 'image/jpeg'},
     'text': None}],
   'role': 'user'}],
 'generationConfig': {'temperature': 0.4}}

In [106]:
df.iloc[0, 3]

{'candidates': [{'avgLogprobs': -0.858879398059293,
   'content': {'parts': [{'text': 'Here are the objects visible in the image:\n\n1.  **Watering can:** A dark green plastic watering can with a white sprinkler head and an embossed floral design on its side.\n2.  **Plant in a pot:** A small green plant growing in a terracotta pot.\n3.  **Empty terracotta pots:** Two terracotta pots, stacked one on top of the other.\n4.  **Hand trowel:** A small green gardening shovel with a black handle.\n5.  **Hand cultivator / Hoe:** A small green gardening tool with prongs on one side and a flat blade on the other, with a black handle.\n6.  **Gardening gloves:** A pair of striped gardening gloves (green, yellow, and white) with green cuffs.\n7.  **Grass:** The green lawn serving as the background surface.'}],
    'role': 'model'},
   'finishReason': 'STOP'}],
 'createTime': '2025-10-10T06:26:11.543450Z',
 'modelVersion': 'gemini-2.5-flash',
 'responseId': 'A6foaNqVIfCy2fMPzMKxgAQ',
 'usageMetadata'

In [108]:
# img2
df.iloc[1, 2]

{'contents': [{'parts': [{'file_data': None,
     'text': 'List objects in this image.'},
    {'file_data': {'file_uri': 'gs://cloud-samples-data/generative-ai/image/office-desk.jpeg',
      'mime_type': 'image/jpeg'},
     'text': None}],
   'role': 'user'}],
 'generationConfig': {'temperature': 0.4}}

In [109]:
df.iloc[1, 3]

{'candidates': [{'avgLogprobs': -0.831190900104801,
   'content': {'parts': [{'text': 'Here are the objects visible in the image:\n\n1.  Digital tablet (with a blank white screen)\n2.  Keyboard\n3.  Computer mouse (white and teal/green)\n4.  Coffee cup (white, with coffee)\n5.  Saucer (white, under the coffee cup)\n6.  Miniature shopping cart\n7.  Red gift box (inside the shopping cart)\n8.  Globe\n9.  Eiffel Tower replica / miniature\n10. Toy airplane / model airplane\n11. Passport (brown)\n12. Sunglasses\n13. Money (dollar bills)\n14. Notebook / Spiral notebook\n15. Pen (yellow and black)\n16. Wooden table (the surface everything is on)'}],
    'role': 'model'},
   'finishReason': 'STOP'}],
 'createTime': '2025-10-10T06:26:11.545931Z',
 'modelVersion': 'gemini-2.5-flash',
 'responseId': 'A6foaIupIbOBrNcP57XJuAQ',
 'usageMetadata': {'candidatesTokenCount': 164,
  'candidatesTokensDetails': [{'modality': 'TEXT', 'tokenCount': 164}],
  'promptTokenCount': 1812,
  'promptTokensDetails': 

## Get text embeddings

You can get text embeddings for a snippet of text by using `embed_content` method. All models produce an output with 768 dimensions by default. However, some models give users the option to choose an output dimensionality between `1` and `768`. See [Vertex AI text embeddings API](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings) for more details.

In [88]:
TEXT_EMBEDDING_MODEL_ID = "gemini-embedding-001"  # @param {type: "string"}

In [89]:
response = client.models.embed_content(
    model=TEXT_EMBEDDING_MODEL_ID,
    contents=[
        "How do I get a driver's license/learner's permit?",
        "How do I renew my driver's license?",
        "How do I change my address on my driver's license?",
    ],
    config=EmbedContentConfig(output_dimensionality=128),
)

print(response.embeddings)

[ContentEmbedding(
  statistics=ContentEmbeddingStatistics(
    token_count=15.0,
    truncated=False
  ),
  values=[
    -0.0015945110935717821,
    0.0067519512958824635,
    0.017575768753886223,
    -0.010327713564038277,
    -0.00995620433241129,
    <... 123 more items ...>,
  ]
), ContentEmbedding(
  statistics=ContentEmbeddingStatistics(
    token_count=10.0,
    truncated=False
  ),
  values=[
    -0.007576516829431057,
    -0.005990396253764629,
    -0.003270037705078721,
    -0.01751021482050419,
    -0.023507025092840195,
    <... 123 more items ...>,
  ]
), ContentEmbedding(
  statistics=ContentEmbeddingStatistics(
    token_count=13.0,
    truncated=False
  ),
  values=[
    0.011074518784880638,
    -0.02361123077571392,
    0.002291288459673524,
    -0.00906078889966011,
    -0.005773674696683884,
    <... 123 more items ...>,
  ]
)]


In [90]:
response

EmbedContentResponse(
  embeddings=[
    ContentEmbedding(
      statistics=ContentEmbeddingStatistics(
        token_count=15.0,
        truncated=False
      ),
      values=[
        -0.0015945110935717821,
        0.0067519512958824635,
        0.017575768753886223,
        -0.010327713564038277,
        -0.00995620433241129,
        <... 123 more items ...>,
      ]
    ),
    ContentEmbedding(
      statistics=ContentEmbeddingStatistics(
        token_count=10.0,
        truncated=False
      ),
      values=[
        -0.007576516829431057,
        -0.005990396253764629,
        -0.003270037705078721,
        -0.01751021482050419,
        -0.023507025092840195,
        <... 123 more items ...>,
      ]
    ),
    ContentEmbedding(
      statistics=ContentEmbeddingStatistics(
        token_count=13.0,
        truncated=False
      ),
      values=[
        0.011074518784880638,
        -0.02361123077571392,
        0.002291288459673524,
        -0.00906078889966011,
        -0.005

In [91]:
response.__dict__

{'sdk_http_response': HttpResponse(
   headers=<dict len=9>
 ),
 'embeddings': [ContentEmbedding(
    statistics=ContentEmbeddingStatistics(
      token_count=15.0,
      truncated=False
    ),
    values=[
      -0.0015945110935717821,
      0.0067519512958824635,
      0.017575768753886223,
      -0.010327713564038277,
      -0.00995620433241129,
      <... 123 more items ...>,
    ]
  ),
  ContentEmbedding(
    statistics=ContentEmbeddingStatistics(
      token_count=10.0,
      truncated=False
    ),
    values=[
      -0.007576516829431057,
      -0.005990396253764629,
      -0.003270037705078721,
      -0.01751021482050419,
      -0.023507025092840195,
      <... 123 more items ...>,
    ]
  ),
  ContentEmbedding(
    statistics=ContentEmbeddingStatistics(
      token_count=13.0,
      truncated=False
    ),
    values=[
      0.011074518784880638,
      -0.02361123077571392,
      0.002291288459673524,
      -0.00906078889966011,
      -0.005773674696683884,
      <... 123 more 

In [93]:
response.sdk_http_response.headers

{'content-type': 'application/json; charset=UTF-8',
 'vary': 'Origin, X-Origin, Referer',
 'content-encoding': 'gzip',
 'date': 'Fri, 10 Oct 2025 06:36:49 GMT',
 'server': 'scaffolding on HTTPServer2',
 'x-xss-protection': '0',
 'x-frame-options': 'SAMEORIGIN',
 'x-content-type-options': 'nosniff',
 'transfer-encoding': 'chunked'}

In [96]:
len(response.embeddings)

3

In [99]:
for emb in response.embeddings:
    print(f"Dimension of embeddings: {len(emb.values)}")

Dimension of embeddings: 128
Dimension of embeddings: 128
Dimension of embeddings: 128


In [102]:
response.embeddings[0]

ContentEmbedding(
  statistics=ContentEmbeddingStatistics(
    token_count=15.0,
    truncated=False
  ),
  values=[
    -0.0015945110935717821,
    0.0067519512958824635,
    0.017575768753886223,
    -0.010327713564038277,
    -0.00995620433241129,
    <... 123 more items ...>,
  ]
)

In [103]:
response.embeddings[0].__dict__

{'values': [-0.0015945110935717821,
  0.0067519512958824635,
  0.017575768753886223,
  -0.010327713564038277,
  -0.00995620433241129,
  -0.006378513760864735,
  0.003915519919246435,
  -0.006273339968174696,
  0.022109422832727432,
  -0.018150800839066505,
  -0.018822040408849716,
  0.021871089935302734,
  -0.012852595187723637,
  -0.0035492617171257734,
  0.13061952590942383,
  0.02963041514158249,
  0.002169992309063673,
  0.034843750298023224,
  -0.015753397718071938,
  -0.013527893461287022,
  -0.019452597945928574,
  0.01025773398578167,
  -0.005315437447279692,
  0.018413739278912544,
  0.003861301811411977,
  -0.007576272822916508,
  -0.005402481649070978,
  -0.009142722003161907,
  0.039729878306388855,
  -0.0018519624136388302,
  0.012266855686903,
  0.010037660598754883,
  0.007238780613988638,
  0.020442964509129524,
  -0.0078003667294979095,
  -0.0018548255320638418,
  0.006518074311316013,
  -0.006224988494068384,
  -0.01723303087055683,
  -0.023265745490789413,
  -0.00549

In [95]:
response.__dict__.keys()

dict_keys(['sdk_http_response', 'embeddings', 'metadata'])

# What's next

- Explore other notebooks in the [Google Cloud Generative AI GitHub repository](https://github.com/GoogleCloudPlatform/generative-ai).
- Explore AI models in [Model Garden](https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/explore-models).