In [None]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

## Overview

### Gemini
Gemini is a family of generative AI models developed by Google DeepMind. Gemini models support prompts that include text, image, and video as input and support text responses as output.

### Gemini API in Vertex AI

The Gemini API in Vertex AI provides a unified interface for interacting with Gemini models. You can interact with the Gemini API by using the following methods:

* Use [Vertex AI Studio](https://cloud.google.com/generative-ai-studio) for quick testing and command generation.
* Use cURL commands in Cloud Shell.
* Use the Vertex AI SDK for Python in a Jupyter notebook

This notebook focuses on using the **cURL commands** to call the Gemini API in Vertex AI.

For more information, see the [Generative AI on Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview) documentation.

### Objectives

In this tutorial, you learn how to use the Gemini API in Vertex AI with cURL commands to interact with the Gemini 2.0 Flash (`gemini-2.0-flash-001`) model.

You will complete the following tasks:

- Install the Python SDK.
- Use the Gemini API in Vertex AI to interact with each model.
  - Gemini 2.0 Flash (`gemini-2.0-flash-001`) model:
    - Generate text from text prompts.
    - Explore various features and configuration options.
    - Generate text from image(s) and text prompts.
    - Generate text from video.
  

### Costs
This tutorial uses billable components of Google Cloud:

- Vertex AI

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

## Getting Started

### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment.

This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).

In [1]:
import sys

# Additional authentication is required for Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [2]:
PROJECT_ID = "qwiklabs-gcp-04-11344c7a78ac"  # @param {type:"string"}
LOCATION = "us-east1"  # @param {type:"string"}

### Defining environment variables for cURL commands

These environment variables are used to construct the cURL commands.

In [3]:
import os

os.environ["PROJECT_ID"] = PROJECT_ID
os.environ["LOCATION"] = LOCATION
os.environ["API_ENDPOINT"] = f"{LOCATION}-aiplatform.googleapis.com"

## Use the Gemini 2.0 Flash model

In [4]:
os.environ["MODEL_ID"] = "gemini-2.0-flash-001"

### Generate content

The generateContent method can handle a wide variety of use cases, including multi-turn chat and multimodal input, depending on what the underlying model supports. In this example, you send a text prompt using the `generateContent` method.

In [5]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": { "text": "Why is the sky blue?" }
    }
  }'


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2610    0  2509  100   101   1247     50  0:00:02  0:00:02 --:--:--  1297


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "The sky is blue because of a phenomenon called **Rayleigh scattering**. Here's a breakdown of how it works:\n\n*   **Sunlight and its Colors:** Sunlight is actually made up of all the colors of the rainbow. We perceive it as white because they are all mixed together.\n\n*   **Entering the Atmosphere:** When sunlight enters the Earth's atmosphere, it collides with tiny air molecules (mostly nitrogen and oxygen).\n\n*   **Scattering of Light:** This collision causes the light to scatter in different directions.\n\n*   **Rayleigh Scattering and Wavelength:** Rayleigh scattering is more effective at scattering shorter wavelengths of light. Blue and violet light have shorter wavelengths than other colors like red and orange.\n\n*   **Why Blue More Than Violet?** While violet light has the shortest wavelength, it is absorbed more in the upper atmosphere and our eyes are a

### Streaming

The Gemini API provides a streaming response mechanism. With this approach, you don't need to wait for the complete response; you can start processing fragments as soon as they're accessible.

In [6]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": { "text": "Why is the sky blue?" }
    }
  }'


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed


[{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "The"
          }
        ]
      }
    }
  ],
  "usageMetadata": {
    "trafficType": "ON_DEMAND"
  },
  "modelVersion": "gemini-2.0-flash-001",
  "createTime": "2025-04-19T16:24:32.111791Z",
  "responseId": "QM4DaK_pBrrPwtQPx6jKiAY"
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " sky is blue"
          }
        ]
      }
    }
  ],
  "usageMetadata": {
    "trafficType": "ON_DEMAND"
  },
  "modelVersion": "gemini-2.0-flash-001",
  "createTime": "2025-04-19T16:24:32.111791Z",
  "responseId": "QM4DaK_pBrrPwtQPx6jKiAY"
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " due to a phenomenon called **Rayleigh scattering**. Here's the breakdown:\n\n*"
          }
        ]
      }
    }
  ],
  "usageMetadata": {
    "

100  5421    0  5320  100   101   2775     52  0:00:01  0:00:01 --:--:--  2826


            "text": " why we perceive the sky as blue rather than violet.\n\n*   **The result:** Because blue light is scattered more effectively than other colors, it is spread all over the sky. When you look up, you see the blue light that has been scattered in your direction.\n\n**In summary, the sky is blue because"
          }
        ]
      }
    }
  ],
  "usageMetadata": {
    "trafficType": "ON_DEMAND"
  },
  "modelVersion": "gemini-2.0-flash-001",
  "createTime": "2025-04-19T16:24:32.111791Z",
  "responseId": "QM4DaK_pBrrPwtQPx6jKiAY"
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " blue light is scattered more than other colors in sunlight when it passes through the Earth's atmosphere.**\n"
          }
        ]
      },
      "finishReason": "STOP"
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 6,
    "candidatesTokenCount": 290,
    "totalTokenCount": 296,
    "trafficType": "ON_DEMAND",

### Model parameters

Every prompt you send to the model includes parameter values that control how the model generates a response. The model can generate different results for different parameter values. You can experiment with different model parameters to see how the results change. 

In [7]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": [
        {"text": "Describe this image"},
        {"file_data": {
          "mime_type": "image/png",
          "file_uri": "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg"
        }}
      ]
    },
    "generation_config": {
      "temperature": 0.2,
      "top_p": 0.1,
      "top_k": 16,
      "max_output_tokens": 2048,
      "candidate_count": 1,
      "stop_sequences": []
    },
    "safety_settings": {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    }
  }'


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3441    0  2838  100   603   1185    251  0:00:02  0:00:02 --:--:--  1437


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "Here's a description of the image:\n\n**Overall Impression:**\n\nThe image shows a tabby cat standing in the snow. The cat is the main subject and is in focus, while the snowy background is slightly blurred.\n\n**Cat's Appearance:**\n\n*   **Coat:** The cat has a classic tabby coat pattern, with dark brown or black stripes on a lighter brown background.\n*   **Eyes:** The cat has yellow or golden eyes.\n*   **Pose:** The cat is standing with one paw slightly raised, as if it's about to take a step. It's looking directly at the viewer with a curious or alert expression.\n*   **Build:** The cat appears to be of average size and build.\n\n**Background:**\n\n*   The background is entirely snow-covered.\n*   There are some tracks or indentations in the snow, suggesting that something has moved through the area.\n*   The background is out of focus, which helps to emphasiz

### Chat

The Gemini API supports natural multi-turn conversations and is ideal for text tasks that require back-and-forth interactions.

Specify the `role` field only if the content represents a turn in a conversation. You can set `role` to one of the following values: `user`, `model`.

In [8]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          { "text": "Hello" }
        ]
      },
      {
        "role": "model",
        "parts": [
          { "text": "Hello! I am glad you could both make it." }
        ]
      },
      {
        "role": "user",
        "parts": [
          { "text": "So what is the first order of business?" }
        ]
      }
    ]
  }'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2187    0  1789  100   398   1089    242  0:00:01  0:00:01 --:--:--  1331


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "Alright, let's get down to business. Since we haven't established a specific project or purpose, let's figure that out first. Here are a few options we could explore:\n\n1.  **Brainstorm a Project:** We could use this session to brainstorm ideas for a project we could collaborate on. This could be anything from writing a story or creating a game to planning a real-world event or developing a piece of software.\n\n2.  **Practice a Skill:** If you have a skill you want to practice (like writing, coding, problem-solving, etc.), we could use this session to work on it together.\n\n3.  **Learn Something New:** We could pick a topic and learn about it together, discussing what we find and asking questions.\n\n4.  **Solve a Puzzle/Challenge:** I can present you with a puzzle, riddle, or challenge that we can work on together.\n\n5.  **Just Chat:** We could simply have a co

### Function calling

Function calling lets you create a description of a function in their code, then pass that description to a language model in a request. This sample is an example of passing in a description of a function that returns information about where a movie is playing. Several function declarations are included in the request, such as `find_movies` and `find_theaters`.

Learn more about [function calling](https://cloud.google.com/vertex-ai/docs/generative-ai/multimodal/function-calling).

In [9]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
  "contents": {
    "role": "user",
    "parts": {
      "text": "Which theaters in Mountain View show Barbie movie?"
    }
  },
  "tools": [
    {
      "function_declarations": [
        {
          "name": "find_movies",
          "description": "find movie titles currently playing in theaters based on any description, genre, title words, etc.",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
              },
              "description": {
                "type": "string",
                "description": "Any kind of description including category or genre, title words, attributes, etc."
              }
            },
            "required": [
              "description"
            ]
          }
        },
        {
          "name": "find_theaters",
          "description": "find theaters based on location and optionally movie title which are is currently playing in theaters",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
              },
              "movie": {
                "type": "string",
                "description": "Any movie title"
              }
            },
            "required": [
              "location"
            ]
          }
        },
        {
          "name": "get_showtimes",
          "description": "Find the start times for movies playing in a specific theater",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
              },
              "movie": {
                "type": "string",
                "description": "Any movie title"
              },
              "theater": {
                "type": "string",
                "description": "Name of theater"
              },
              "date": {
                "type": "string",
                "description": "Date for requested showtime"
              }
            },
            "required": [
              "location",
              "movie",
              "theater",
              "date"
            ]
          }
        }
      ]
    }
  ]
}'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3479    0   916  100  2563   2223   6220 --:--:-- --:--:-- --:--:--  8444


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "functionCall": {
              "name": "find_theaters",
              "args": {
                "movie": "Barbie",
                "location": "Mountain View, CA"
              }
            }
          }
        ]
      },
      "finishReason": "STOP",
      "avgLogprobs": -0.025156281211159447
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 191,
    "candidatesTokenCount": 11,
    "totalTokenCount": 202,
    "trafficType": "ON_DEMAND",
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 191
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 11
      }
    ]
  },
  "modelVersion": "gemini-2.0-flash-001",
  "createTime": "2025-04-19T16:24:56.488479Z",
  "responseId": "WM4DaJ_oHbG5u7APxbrjmQk"
}


## Multimodal input

The Gemini 2.0 Flash  (`gemini-2.0-flash-001`) is a multimodal model that supports adding image and video in text or chat prompts for a text response.


### Download an image from Google Cloud Storage

In [10]:
! gsutil cp "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg" ./image.jpg

Copying gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg...
/ [1 files][ 17.4 KiB/ 17.4 KiB]                                                
Operation completed over 1 objects/17.4 KiB.                                     


### Generate text from a local image

Specify the [base64](https://en.wikipedia.org/wiki/Base64) encoding of the image or video to include inline in the prompt and the `mime_type` field. The supported [MIME types](https://en.wikipedia.org/wiki/Media_type) for images include `image/png` and `image/jpeg`.

In [11]:
%%bash

# Encode image data in base64
# NOTE: This command only works on Linux.
data=$(base64 -w 0 image.jpg)

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d "{
      'contents': {
        'role': 'USER',
        'parts': [
          {
            'text': 'Is it a cat?'
          },
          {
            'inline_data': {
              'data': '${data}',
              'mime_type':'image/jpeg'
            }
          }
        ]
       }
     }"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 24976    0   882  100 24094    997  27255 --:--:-- --:--:-- --:--:-- 28221


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "Yes, the image shows a cat. It appears to be a tabby cat standing in the snow.\n"
          }
        ]
      },
      "finishReason": "STOP",
      "avgLogprobs": -0.19682922817411877
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 263,
    "candidatesTokenCount": 21,
    "totalTokenCount": 284,
    "trafficType": "ON_DEMAND",
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 5
      },
      {
        "modality": "IMAGE",
        "tokenCount": 258
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 21
      }
    ]
  },
  "modelVersion": "gemini-2.0-flash-001",
  "createTime": "2025-04-19T16:25:00.861209Z",
  "responseId": "XM4DaJnINM7mwtQP-On0wQM"
}


### Generate text from an image on Google Cloud Storage

Specify the Cloud Storage URI of the image to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that's sending the request. You must also specify the `mime_type` field. The supported image MIME types include `image/png` and `image/jpeg`.

In [12]:
%%bash

MODEL_ID="gemini-2.0-flash-001"

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d '{
    "contents": {
      "role": "USER",
      "parts": [
        {
          "text": "Describe this image"
        },
        {
          "file_data": {
            "mime_type": "image/png",
            "file_uri": "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg"
          }
        }
      ]
    },
    "generation_config": {
      "temperature": 0.2,
      "top_p": 0.1,
      "top_k": 16,
      "max_output_tokens": 2048,
      "candidate_count": 1,
      "stop_sequences": []
    },
    "safety_settings": {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    }
  }'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3504    0  2855  100   649   1104    250  0:00:02  0:00:02 --:--:--  1354


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "Here's a description of the image:\n\n**Overall Impression:**\n\nThe image shows a tabby cat standing in a snowy environment. The cat is the main subject and is in focus, while the background is a blur of white snow.\n\n**Cat Details:**\n\n*   **Coat:** The cat has a classic tabby coat pattern, with dark brown or black stripes on a lighter brown background.\n*   **Eyes:** The cat has yellow or golden eyes.\n*   **Pose:** The cat is standing with one paw slightly raised, as if it's about to take a step. It's looking directly at the camera with a curious or alert expression.\n*   **Build:** The cat appears to be of average build, not overly thin or overweight.\n\n**Background:**\n\n*   The background is entirely snow-covered. There are some subtle variations in the snow's texture, suggesting footprints or other disturbances.\n*   The background is out of focus, which 

### Generate text from a video file

Specify the Cloud Storage URI of the video to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that's sending the request. You must also specify the `mime_type` field. The supported MIME types for video include `video/mp4`.


In [13]:
%%bash

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent \
  -d \
'{
    "contents": {
      "role": "USER",
      "parts": [
        {
          "text": "Answer the following questions using the video only. What is the profession of the main person? What are the main features of the phone highlighted?Which city was this recorded in?Provide the answer JSON."
        },
        {
          "file_data": {
            "mime_type": "video/mp4",
            "file_uri": "gs://github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4"
          }
        }
      ]
    }
  }'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1525    0  1011  100   514    290    147  0:00:03  0:00:03 --:--:--   437


{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "```json\n{\n\"Profession\": \"Photographer\",\n\"Phone Features\": \"Video boost with Night Sight\",\n\"City\": \"Tokyo\"\n}\n```"
          }
        ]
      },
      "finishReason": "STOP",
      "avgLogprobs": -0.22779150570140166
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 16285,
    "candidatesTokenCount": 34,
    "totalTokenCount": 16319,
    "trafficType": "ON_DEMAND",
    "promptTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 40
      },
      {
        "modality": "VIDEO",
        "tokenCount": 14820
      },
      {
        "modality": "AUDIO",
        "tokenCount": 1425
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 34
      }
    ]
  },
  "modelVersion": "gemini-2.0-flash-001",
  "createTime": "2025-04-19T16:25:08.526555Z",
  "responseId": "ZM4DaNuRILrPwtQP