In [None]:
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Multimodal Sentiment Analysis with Gemini

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main//gemini/use-cases/multimodal-sentiment-analysis/intro_to_multimodal_sentiment_analysis.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fuse-cases%2Fmultimodal-sentiment-analysis%2Fintro_to_multimodal_sentiment_analysis.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/use-cases/multimodal-sentiment-analysis/intro_to_multimodal_sentiment_analysis.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main//gemini/use-cases/multimodal-sentiment-analysis/intro_to_multimodal_sentiment_analysis.ipynb">
      <img width="32px" src="https://upload.wikimedia.org/wikipedia/commons/9/91/Octicons-mark-github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/multimodal-sentiment-analysis/intro_to_multimodal_sentiment_analysis.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/multimodal-sentiment-analysis/intro_to_multimodal_sentiment_analysis.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/multimodal-sentiment-analysis/intro_to_multimodal_sentiment_analysis.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/53/X_logo_2023_original.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/multimodal-sentiment-analysis/intro_to_multimodal_sentiment_analysis.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/multimodal-sentiment-analysis/intro_to_multimodal_sentiment_analysis.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>

| | |
|-|-|
| Author(s) |  [Mohammad Al-Ansari](https://github.com/mansari/) |

## Overview

This notebook demonstrates multimodal sentiment analysis with Gemini by comparing sentiment analysis performed directly on audio with analysis performed on its text transcript, highlighting the benefits of multimodal analysis, including capturing nuances like tone, inflection, and other non-verbal cues, for a richer understanding of sentiment.

Gemini is a family of generative AI models developed by [Google DeepMind](https://deepmind.google/) that is designed for multimodal use cases. [Gemini 2.0](https://cloud.google.com/vertex-ai/generative-ai/docs/gemini-v2) is the latest model version.

### Objectives

In this notebook, we will explore sentiment analysis using text and audio as two different modalities.

In summary, you will learn how to use Gemini with the Gen AI SDK for Python to:

  - Load a sample conversation for between two individuals.
  - Transcribe the audio content.
  - Conduct sentiment analysis using two modalities: the audio content and the transcript.
  - Compare the two analyses and provide advantages of each.

For additional multimodal use cases with Gemini, check out [Gemini: An Overview of Multimodal Use Cases](https://github.com/GoogleCloudPlatform/generative-ai/blob/c00b23b5f4963ce3b452b9318dc4e8e3aff7232e/gemini/use-cases/intro_multimodal_use_cases.ipynb)

### Costs

This tutorial uses billable components of Google Cloud:

- Vertex AI

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.


## Getting Started

### Install Google Gen AI SDK for Python

In [None]:
%pip install --upgrade --quiet google-genai

### Restart current runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [None]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>

### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).

In [None]:
import sys

# Additional authentication is required for Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information and initialize Vertex AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

#### Create a client for Vertex AI (with all Google Cloud capabilities and services)

Specify the project ID and location while creating the client.

In [None]:
import os

from google import genai

PROJECT_ID = "[your-project-id]"  # @param {type: "string", placeholder: "[your-project-id]", isTemplate: true}
if not PROJECT_ID or PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

### Import libraries

In [None]:
from IPython.display import Markdown, display
from google.genai.types import GenerateContentConfig, Part

### Load Gemini 2.0 Flash model

Learn more about all [Gemini models on Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models).

In [None]:
MODEL_ID = "gemini-2.0-flash-001"  # @param {type: "string"}

### Load sample audio content

Run the code below to load a sample audio file that was previously generated with Gemini.

The sample file includes text that can be interperted differently based on tone, and other audio elements that can impact the results of a sentiment analysis performed on the audio content.

The file has been uploaded to a Google Cloud Storage bucket, so you can load it using the `Part.from_uri` method. You can listen to the content of the file [here](https://storage.googleapis.com/github-repo/generative-ai/gemini/use-cases/multimodal-sentiment-analysis/sample_conversation.wav).

In [None]:
audio_part = Part.from_uri(
    file_uri="gs://github-repo/generative-ai/gemini/use-cases/multimodal-sentiment-analysis/sample_conversation.wav",
    mime_type="audio/wav",
)

### Run sentiment analysis on audio content

Now you can ask Gemini to run sentiment analysis directly on the audio file content.

In [None]:
prompt = "Provide a sentiment analysis of this conversation. Use speaker A, speaker B, etc to identify speakers."
response = client.models.generate_content(
    model=MODEL_ID,
    contents=[
        audio_part,
        prompt,
    ],
)
audio_analysis = response.text
display(Markdown(audio_analysis))

### Generate a transcript from the audio content

You can use Gemini to generate a transcript for the audio file that we can use in our text sentiment analysis.

In [None]:
prompt = "Generate a transcript of this conversation. Use speaker A, speaker B, etc to identify speakers."

response = client.models.generate_content(
    model=MODEL_ID,
    contents=[
        audio_part,
        prompt,
    ],
)

transcript = response.text
display(Markdown(transcript))

### Run sentiment analysis on the transcript

Now you can ask Gemini to run the sentiment analysis, but this time on the transcript we extracted from the audio.

In [None]:
prompt = (
    """Provide a sentiment analysis of this conversation. Use speaker A, speaker B, etc to identify speakers.
Transcript:"""
    + transcript
)

response = client.models.generate_content(
    model=MODEL_ID,
    contents=prompt,
    config=GenerateContentConfig(
        response_modalities=["TEXT"],
    ),
)

text_analysis = response.text
display(Markdown(text_analysis))

### Compare the two analyses

Now you can ask Gemini to compare the two different analyses and show if we gained any insight from either of them.

In [None]:
prompt = (
    """Provide a short comparison of two analyses of an audio conversation.
One was made based on a text transcript of the call, and the other is based on an audio recording.
Transcript analysis:"""
    + text_analysis
    + """Audio analysis:"""
    + audio_analysis
)

response = client.models.generate_content(
    model=MODEL_ID,
    contents=prompt,
    config=GenerateContentConfig(
        response_modalities=["TEXT"],
    ),
)

comparison = response.text
display(Markdown(comparison))

### Conclusion

 Sentiment analysis of audio offers significant advantages, providing a deeper understanding of nuanced sentiment and delivery. Audio cues, like the enthusiastic tone of a speaker or the subtle inflections in their voice, add crucial context often missed by text alone.

 This richer information is invaluable for applications from understanding customer feedback to gauging emotional responses in communication. We encourage you to explore these benefits by analyzing your own audio data, such as interviews, podcasts, or presentations.
