In [1]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemini: An Overview of Multimodal Use Cases

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/intro_multimodal_use_cases.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> Run in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fuse-cases%2Fintro_multimodal_use_cases.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Run in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/use-cases/intro_multimodal_use_cases.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/intro_multimodal_use_cases.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://goo.gle/3DUssjz">
      <img width="32px" src="https://cdn.qwiklabs.com/assets/gcp_cloud-e3a77215f0b8bfa9b3f611c0d2208c7e8708ed31.svg" alt="Google Cloud logo"><br> Open in  Cloud Skills Boost
    </a>
  </td>
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/intro_multimodal_use_cases.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/intro_multimodal_use_cases.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/intro_multimodal_use_cases.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/intro_multimodal_use_cases.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/intro_multimodal_use_cases.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>            


| Authors |
| --- |
| [Katie Nguyen](https://github.com/katiemn) |
| [Saeed Aghabozorgi](https://github.com/saeedaghabozorgi) |

## Overview

**YouTube Video: Multimodal AI in action**

<a href="https://www.youtube.com/watch?v=pEmCgIGpIoo&list=PLIivdWyY5sqJio2yeg1dlfILOUO2FoFRx" target="_blank">
  <img src="https://img.youtube.com/vi/pEmCgIGpIoo/maxresdefault.jpg" alt="Multimodal AI in action" width="500">
</a>

In this notebook, you will explore a variety of different use cases enabled by multimodality with Gemini.

Gemini is a family of generative AI models developed by [Google DeepMind](https://deepmind.google/) that is designed for multimodal use cases. [Gemini 2.0](https://cloud.google.com/vertex-ai/generative-ai/docs/gemini-v2) is the latest model version.

### Gemini 2.0 Flash

This smaller Gemini model is optimized for high-frequency tasks to prioritize the model's response time. This model has superior speed and efficiency with a context window of up to 1 million tokens for all modalities.

For more information, see the [Generative AI on Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview) documentation.

### Objectives

This notebook demonstrates a variety of multimodal use cases with Gemini.

In this tutorial, you will learn how to use Gemini with the Gen AI SDK for Python to:

  - Process and generate text
  - Parse and summarize PDF documents
  - Reason across multiple images
  - Generating a video description
  - Combining video data with external knowledge
  - Understand Audio
  - Analyze a code base
  - Combine modalities
  - Recommendation based on user preferences for e-commerce
  - Understanding charts and diagrams
  - Comparing images for similarities, anomalies, or differences

### Costs

This tutorial uses billable components of Google Cloud:

- Vertex AI

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.


## Getting Started


### Install Google Gen AI SDK for Python

In [2]:
%pip install --upgrade --quiet google-genai gitingest

Note: you may need to restart the kernel to use updated packages.


### Restart runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.

In [1]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. The restart might take a minute or longer. After it's restarted, continue to the next step. ⚠️</b>
</div>


### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).


In [4]:
import sys

# Additional authentication is required for Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information and create client

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [1]:
from google import genai

PROJECT_ID = "qwiklabs-gcp-03-9d4e440cc380" # "[your-project-id]"  # @param {type:"string"}
LOCATION = "europe-west4" # "[your-region]"  # @param {type:"string"}

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

### Import libraries


In [2]:
from IPython.display import Audio, Image, Markdown, Video, display
from gitingest import ingest
from google.genai.types import CreateCachedContentConfig, GenerateContentConfig, Part
import nest_asyncio

nest_asyncio.apply()

[32m2025-10-09 02:13:10.712[0m | [1mINFO    [0m | [36mgoogle.genai.models[0m:[36mgenerate_content[0m:[36m4975[0m | AFC is enabled with max remote calls: 10.
[32m2025-10-09 02:13:16.508[0m | [1mINFO    [0m | [36mhttpx._client[0m:[36m_send_single_request[0m:[36m1025[0m | HTTP Request: POST https://europe-west4-aiplatform.googleapis.com/v1beta1/projects/qwiklabs-gcp-03-9d4e440cc380/locations/europe-west4/publishers/google/models/gemini-2.0-flash-001:generateContent "HTTP/1.1 200 OK"
[32m2025-10-09 02:13:16.531[0m | [1mINFO    [0m | [36mgoogle.genai.models[0m:[36mgenerate_content[0m:[36m4975[0m | AFC is enabled with max remote calls: 10.
[32m2025-10-09 02:13:28.119[0m | [1mINFO    [0m | [36mhttpx._client[0m:[36m_send_single_request[0m:[36m1025[0m | HTTP Request: POST https://europe-west4-aiplatform.googleapis.com/v1beta1/projects/qwiklabs-gcp-03-9d4e440cc380/locations/europe-west4/publishers/google/models/gemini-2.0-flash-001:generateContent "HTTP/1.

### Load Gemini 2.0 Flash model

Learn more about all [Gemini models on Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models).

In [3]:
MODEL_ID = "gemini-2.0-flash-001"  # @param {type: "string"}

## Individual Modalities

### Generating a video description

Gemini can also extract tags throughout a video:

In [8]:
video_url = "https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/mediterraneansea.mp4"
display(Video(video_url, width=350))

prompt = """
What is shown in this video?
Where should I go to see it?
What are the top 5 places in the world that look like this?
Provide the 10 best tags for this video?
"""

video = Part.from_uri(
    file_uri=video_url,
    mime_type="video/mp4",
)
contents = [prompt, video]

response = client.models.generate_content(model=MODEL_ID, contents=contents)
display(Markdown(response.text))

Okay, here is the information you requested about the video:

*   **What is shown in this video?** The video shows the Old City Marina (Kaleiçi Marina) in Antalya, Turkey. You can see the harbor, boats, a lighthouse, the cliffs along the coast, and the city in the background.

*   **Where should I go to see it?** You should go to Antalya, Turkey, specifically to the Old City Marina, also known as Kaleiçi Marina.

*   **What are the top 5 places in the world that look like this?** It's difficult to find exact replicas, but here are some places with similar features (coastal cliffs, harbors, historical significance):

    1.  **Dubrovnik, Croatia:** Features a historical old town with harbors and beautiful coastal scenery.
    2.  **Valletta, Malta:** An ancient city with striking harbor views and fortifications.
    3.  **Cinque Terre, Italy:** Although the cliffs are steeper and the buildings are more colorful, the harbors and coastal villages bear some similarity.
    4.  **Santorini, Greece:** A picturesque island with white-washed buildings nestled on cliffs overlooking a caldera.
    5.  **Positano, Italy:** Another Italian town on the Amalfi Coast with houses cascading down a cliff to the sea.

*   **Provide the 10 best tags for this video:**

    1.  Antalya
    2.  Turkey
    3.  Kaleiçi Marina
    4.  Old City Marina
    5.  Harbor
    6.  Lighthouse
    7.  Mediterranean Sea
    8.  Coastal View
    9.  Travel
    10. Drone Footage

> You can confirm that the location is indeed Antalya, Turkey by visiting the Wikipedia page: https://en.wikipedia.org/wiki/Antalya

You can also use Gemini to retrieve extra information beyond the video contents.

In [9]:
video_url = "https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/ottawatrain3.mp4"
display(Video(video_url, width=350))

prompt = """
Which train line is this?
Where does it go?
What are the stations/stops?
Which river is being crossed?
"""

video = Part.from_uri(
    file_uri=video_url,
    mime_type="video/mp4",
)
contents = [prompt, video]

response = client.models.generate_content(
    model=MODEL_ID, contents=contents, config=GenerateContentConfig(temperature=0)
)
display(Markdown(response.text))

Based on the image, here's the information:

*   **Train Line:** O-Train Trillium Line
*   **Where does it go:** It runs north-south in Ottawa, Canada.
*   **Stations/Stops:** Bayview, Carling, Carleton, Confederation, Greenboro, Hunt Club, Leitrim, Mooney's Bay, South Keys.
*   **River being crossed:** Rideau River

> You can confirm that this is indeed the Confederation Line on Wikipedia here: https://en.wikipedia.org/wiki/Confederation_Line