# Analyze and Reason on Multimodal Data with Gemini: Challenge Lab

## GSP524

## Challenge Scenario

#### Cymbal Direct: Analyzing Social Media Engagement for a New Product Launch

Cymbal Direct just launched a new line of athletic apparel designed for enhanced performance during various activities. To gauge public perception and potential market impact, Cymbal Direct is tasked with analyzing social media engagement across multiple platforms. This analysis will involve:
  * **Text**: Analyzing customer reviews and social media posts for sentiment and key themes.
  * **Image**: Analyzing images posted by influencers and customers wearing the apparel to identify style trends and usage patterns.
  * **Audio** Analyzing an audio clip of a podcast episode of a recent interview about Cymbal Direct's new product launch.

The goal is to provide Cymbal Direct with actionable insights to refine their marketing strategy, improve their products, and bolster product positioning. Are you ready for the challenge?

## Task 1. Import libraries and install the Gen AI SDK

In this section, you will import the libraries required for this lab and install the Google Gen AI SDK.

**All cells have been written for you in this section. There are no `#TODOs` required.**

### Install Google Gen AI SDK for Python

In [None]:
%pip install --upgrade --quiet google-genai

### Restart current runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [None]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

### Import Libraries

In [None]:
from IPython.display import HTML, Markdown, display
from google import genai
from google.genai import types
from google.genai.types import (
    FunctionDeclaration,
    GenerateContentConfig,
    GoogleSearch,
    MediaResolution,
    Part,
    Retrieval,
    SafetySetting,
    Tool,
    ToolCodeExecution,
    ThinkingConfig,
    GenerateContentResponse,
    GenerateContentConfig,    
    VertexAISearch,
)
from collections.abc import Iterator
import os

### Set Google Cloud project information and initialize Google Gen AI SDK

In [None]:
import os
os.makedirs('analysis', exist_ok=True)

PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT")
LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")
print(f"Project ID: {PROJECT_ID}")
print(f"LOCATION: {LOCATION}")

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

### Load the Gemini 2.0 Flash model

Learn more about all [Gemini models on Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models).

In [10]:
MODEL_ID = "gemini-2.0-flash-001"  # @param {type: "string"}

## Task 2. Analyze and reason on customer feedback (text)

In this task, you'll use the Gemini 2.0 Flash and Gemini-2.5-flash models to analyze customer reviews and social media posts in text format about Cymbal Direct's new athletic apparel. You will save the findings from the model into a markdown file that you will use for a comprehensive report in the last task.

**Your tasks will be labeled with a `#TODO` section in the cell. Read each cell carefully and ensure you are filling them out correctly!**

###  Load and preview the text data
This file contains customer reviews and social media posts about Cymbal Direct's new athletic apparel line, collected from various e-commerce platforms and social media sites. The data is in raw text format, with each review or post separated by a newline.

In [None]:
# Load and preview the text data (reviews.txt)
!gcloud storage cp gs://{PROJECT_ID}-bucket/media/text/reviews.txt media/text/reviews.txt
!head media/text/reviews.txt

### Initial Analysis with Gemini 2.0 Flash
For this section, you will need to fill out the `#TODOs` for **Construct the prompt for Gemini** and **Send the prompt to Gemini**.

In [None]:
# 1. Load the text data (reviews.txt)
with open('media/text/reviews.txt', 'r') as f:
    text_data = f.read()


# 2. Construct the prompt for Gemini
# TODO: Write a prompt that instructs the Gemini model to analyze the customer reviews and social media posts.
# The prompt should include clear instructions to:
# - Identify the overall sentiment (positive, negative, or neutral) of each review or post.
# - Extract key themes and topics discussed, such as product quality, fit, style, customer service, and pricing.
# - Identify any frequently mentioned product names or specific features.
prompt = f"""
You are an AI assistant helping to analyze the customer reviews and social media posts n text format about Cymbal Direct's new athletic apparel. 
Analyze the provided customer reviews and social media posts. For each review or post:
- Identify the overall sentiment (positive, negative, or neutral).
- Extract key themes and topics discussed, such as product quality, fit, style, customer service, and pricing.
- Identify any frequently mentioned product names or specific features.

{text_data}
"""

# 3. Send the prompt to Gemini
# TODO: Use the `client.models.generate_content` method to send the prompt and text data to the Gemini model.
# TODO: Make sure to specify the `MODEL_ID` and the `prompt` as parameters.
# TODO: Store the response from the model in a variable named `response`.
response = client.models.generate_content(
    model=MODEL_ID,
    contents=prompt
)

# 4. Display the response
display(Markdown(response.text))


### Deep Dive with Gemini-2.5-flash Model  

Now that you have generated some insights based on the reviews, you will use the Gemini-2.5-flash model to explore the reviews in more detail, and come up with some takeaways and use reasoning to create actionable insights for your team.

In [13]:
MODEL_ID = "gemini-2.5-flash"  # @param {type: "string"}

### Helper functions

Create methods to print out the thoughts and answer.

In [None]:
def print_thoughts(response: GenerateContentResponse) -> None:
    for part in response.candidates[0].content.parts:
        header = "Thoughts" if part.thought else "Answer"
        display(Markdown(f"""## {header}:\n{part.text}"""))


def print_thoughts_stream(response: Iterator[GenerateContentResponse]) -> None:
    display(Markdown("## Thoughts:\n"))
    answer_shown = False

    for chunk in response:
        for part in chunk.candidates[0].content.parts:
            if not part.thought and not answer_shown:
                display(Markdown("## Answer:\n"))
                answer_shown = True
            display(Markdown(part.text))

### Enable thoughts

You set the flag `include_thoughts` in the `ThinkingConfig` to indicate whether to return thoughts in the model response. The flag is set to `False` by default. You will also set the optional `thinking_budget` parameter in the ThinkingConfig to control and configure how much a model thinks on a given user prompt.

**Hint: you will need to use this for calls to the Thinking model!**

In [None]:
config=types.GenerateContentConfig(thinking_config=types.ThinkingConfig(include_thoughts=True,thinking_budget=1024))

### Deep Dive with Gemini-2.5-flash: Reasoning on Customer Sentiment

In this section, you'll use the Thinking model to delve deeper into the customer sentiment and identify key areas for improvement. We're particularly interested in understanding the reasoning behind positive and negative reviews and uncovering any recurring themes that might not be immediately apparent.

For this section, you will need to fill out the `#TODOs` for **Construct the prompt for Gemini** and **Send the prompt to the Gemini Thinking model**.

In [None]:
# 1. Construct the prompt for Gemini
# TODO: Write a prompt that instructs the Gemini model to analyze the customer reviews and social media posts in more detail.
# The prompt should include clear instructions to:
# - Identify the main factors driving positive and negative sentiment.
# - Assess the overall impact of the new athletic apparel line on brand perception.
# - Identify three key areas where Cymbal Direct can improve customer satisfaction or product offerings.
# - Imagine you are presenting your findings to the Cymbal Direct marketing team and highlight the three most important takeaways.
thinking_mode_prompt = f"""
You are an AI assistant helping to analyze the customer reviews and social media posts in detail. 
Analyze the provided customer reviews and social media posts. For each review or post:
- Identify the main factors driving both positive and negative sentiment.
- Assess the overall impact of the new athletic apparel line on brand perception.
- Identify three key areas where Cymbal Direct can improve customer satisfaction or product offerings.
- Imagine you are presenting your findings to the Cymbal Direct marketing team and highlight the three most important takeaways.
{text_data}
"""

# 2. Send the prompt to the Gemini Thinking model
# TODO: Use the `client.models.generate_content` method to send the prompt and text data to the Gemini model.
# TODO: Make sure to specify the `MODEL_ID` and the `thinking_mode_prompt` as parameters.
# TODO: Also, pass the `config` object to enable thinking mode.
# TODO: Store the response from the model in a variable named `thinking_model_response`.
thinking_model_response = client.models.generate_content(
    model=MODEL_ID,
    contents=thinking_mode_prompt,
    config=config
)

# 3. Print thoughts and answer
print_thoughts(thinking_model_response)

# 4. Save the text analysis to a file
with open('analysis/text_analysis.md', 'w') as f:
    f.write(thinking_model_response.text)


## Task 3. Analyze and reason on visual content: Style trends and customer behavior

In this task, you'll focus on analyzing images related to Cymbal Direct's new athletic apparel line. The goal is to identify style trends and customer behavior based on the images. You will save the findings from the model into a markdown file that you will use for a comprehensive report in the last task.

**Your tasks will be labeled with a `#TODO` section in the cell. Read each cell carefully and ensure you are filling them out correctly!**

#### Introduction and Context
This image dataset consists of a mix of product photos and influencer posts showcasing Cymbal Direct's new athletic apparel line. The images feature models and influencers wearing the apparel in various settings, providing visual information about style, usage patterns, and target audience.

### Load and preview the image data

In [None]:
!gcloud storage cp -r gs://{PROJECT_ID}-bucket/media/images media/

In [None]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

# Specify the directory where the images are stored
image_dir = 'media/images'

# Get a list of all image files in the directory
image_files = [f for f in os.listdir(image_dir) if os.path.isfile(os.path.join(image_dir, f))]

# Display the images
for image_file in image_files:
    # Construct the full path to the image file
    image_path = os.path.join(image_dir, image_file)
    
    # Load the image
    img = mpimg.imread(image_path)
    
    # Display the image using Matplotlib
    plt.figure()  # Create a new figure for each image
    plt.imshow(img)
    plt.axis('off')  # Hide the axis
    plt.show()

### Initial Analysis with Gemini 2.0 Flash

For this section, you will need to fill out the `#TODOs` for **Construct the prompt for Gemini** and **Send the prompt and images to Gemini**.

In [None]:
MODEL_ID = "gemini-2.0-flash-001"  # @param {type: "string"}

In [None]:
# 1. Load the image data
image_folder = 'media/images'
image_files = [f for f in os.listdir(image_folder) if os.path.isfile(os.path.join(image_folder, f))]

# 2. Load the images into a list of `Part` objects
image_parts = []
for image_file in image_files:
    image_path = os.path.join(image_folder, image_file)
    with open(image_path, 'rb') as f:
        image_bytes = f.read()
    image_parts.append(Part.from_bytes(data=image_bytes, mime_type='image/jpeg'))  # Adjust mime_type if needed

# 3. Construct the prompt for Gemini
# TODO: Write a prompt that instructs the Gemini model to analyze the images of Cymbal Direct's new athletic apparel line.
# The prompt should include clear instructions to:
# - Identify the apparel items in each image.
# - Describe the attributes of each item.
# - Identify any prominent style trends or preferences.
prompt = f"""
You are an AI assistant helping to analyze the images of Cymbal Direct's new athletic apparel line.
For each provided image:
- Identify the apparel items in each image.
- Describe the attributes of each item.
- Identify any prominent style trends or preferences.
"""

# 4. Send the prompt and images to Gemini
# TODO: Use the `client.models.generate_content` method to send the prompt and images to the Gemini model.
# TODO: Make sure to specify the `MODEL_ID` and the `contents` (including the prompt and image parts) as parameters.
# TODO: Store the response from the model in a variable named `response`.
response = client.models.generate_content(
    model=MODEL_ID,
    contents=[prompt] + image_parts
)

# 5. Display the response
display(Markdown(response.text))


### Reasoning on image trends with Gemini-2.5-flash

You'll now use the Thinking model to perform a more in-depth analysis of the visual elements, inferring context, target audience, and potential marketing implications.

For this section, you will need to fill out the `#TODOs` for **Construct the prompt for Gemini** and **Send the prompt and images to the Gemini Thinking model**.

In [None]:
MODEL_ID = "gemini-2.5-flash"  # @param {type: "string"}

In [None]:
# 1. Construct the prompt for Gemini
# TODO: Write a prompt that instructs the Gemini model to analyze the images in more detail.
# The prompt should include clear instructions to:
# - Develop a hypothesis about the target audience for each image.
# - Analyze how visual elements contribute to the overall message and appeal.
# - Compare the observed style trends with broader fashion trends in athletic wear.
# - Provide recommendations for Cymbal Direct's future marketing campaigns or product development.
thinking_mode_prompt = f"""
You are an AI assistant helping to analyze in detail the visual elements and to infer context, target audience, and potential marketing implications. 
For each provided image:
- Develop a hypothesis about the target audience for each image.
- Analyze how visual elements contribute to the overall message and appeal.
- Compare the observed style trends with broader fashion trends in athletic wear.
- Provide recommendations for Cymbal Direct's future marketing campaigns or product development.
"""

# 2. Send the prompt and images to the Gemini Thinking model
# TODO: Use the `client.models.generate_content` method to send the thinking_mode_prompt and images to the Gemini model.
# TODO: Make sure to specify the `MODEL_ID`, `contents` (including the prompt and image parts), and `config` to enable thinking mode.
# TODO: Store the response from the model in a variable named `thinking_model_response`.
thinking_model_response = client.models.generate_content(
    model=MODEL_ID,
    contents=[thinking_mode_prompt] + image_parts,
    config=config
)

# 3. Print thoughts and answer
print_thoughts(thinking_model_response)

# 4. Save the image analysis to a file
with open('analysis/image_analysis.md', 'w') as f:
    f.write(thinking_model_response.text)


## Task 4. Analyze and reason on audio content: Customer perceptions

In this section, you will use Gemini to analyze a podcast about Cymbal Direct's new clothing line and extract information/sentiment out of it and use those to generate insights for the company. You will save the findings from the model into a markdown file that you will use for a comprehensive report in the last task.

**Your tasks will be labeled with a `#TODO` section in the cell. Read each cell carefully and ensure you are filling them out correctly!**

#### Introduction and Context
This audio clip is from a podcast episode featuring an interview with a Cymbal Direct representative discussing the new athletic apparel line. The conversation covers various aspects of the apparel, including design, features, target audience, and marketing strategy.

### Preview the podcast episode (optional)

To listen to the podcast episode, you can copy the file to your local environment and use iPython to preview it in the notebook.

In [None]:
import IPython

!gcloud storage cp gs://{PROJECT_ID}-bucket/media/audio/cymbal_direct_expert_interview.wav \
media/audio/cymbal_direct_expert_interview.wav

IPython.display.Audio('media/audio/cymbal_direct_expert_interview.wav')

### Initial analysis with Gemini 2.0 Flash
For this section, you will need to fill out the `#TODOs` for **Construct the prompt for Gemini** and **Send the prompt and audio to Gemini**.

In [None]:
MODEL_ID = "gemini-2.0-flash-001"  # @param {type: "string"}

In [None]:
# Construct the file URI using f-string
file_uri = f"gs://{PROJECT_ID}-bucket/media/audio/cymbal_direct_expert_interview.wav"

audio_part = Part.from_uri(
    file_uri=file_uri,
    mime_type="audio/wav",
)

In [None]:
# 1. Construct the prompt for Gemini
# TODO: Write a prompt that instructs the Gemini model to analyze the audio recording of the conversation about Cymbal Direct's new athletic apparel line.
# The prompt should include clear instructions to:
# - Transcribe the conversation, identifying different speakers.
# - Provide a sentiment analysis, highlighting positive, negative, and neutral opinions.
# - Identify key themes and topics discussed, such as comfort, fit, performance, style, and comparisons to competitors.
thinking_mode_prompt = f"""
You are an AI assistant helping to analyze the audio recording of the conversation about Cymbal Direct's new athletic apparel line.
Analyze the podcast in details and:
1. Transcribe the conversation, identifying different speakers.
2. Provide a sentiment analysis, highlighting positive, negative, and neutral opinions.
3. Identify key themes and topics discussed, such as comfort, fit, performance, style, and comparisons to competitors.
"""

# 2. Send the prompt and audio to Gemini
# TODO: Use the `client.models.generate_content` method to send the thinking_mode_prompt and audio data to the Gemini model.
# TODO: Make sure to specify the `MODEL_ID` and the `contents` (including the `audio_part` and the `prompt`) as parameters.
# TODO: Store the response from the model in a variable named `response`.
response = client.models.generate_content(
    model=MODEL_ID,
    contents=[thinking_mode_prompt, audio_part]
)

# 3. Display the response
display(Markdown(response.text))


### Reasoning on Audio Insights with Gemini-2.5-flash
In this section, you'll use the Thinking model to analyze the conversation at a deeper level, reason about customer satisfaction, deduce influencing factors, and generate data-driven recommendations.

For this section, you will need to fill out the `#TODOs` for **Construct the prompt for Gemini** and **Send the prompt and audio to the Gemini Thinking model**.

In [None]:
MODEL_ID = "gemini-2.5-flash"  # @param {type: "string"}

In [None]:
# 1. Construct the prompt for Gemini
# TODO: Write a prompt that instructs the Gemini model to analyze the audio recording in more detail.
# The prompt should include clear instructions to:
# - Reason about the overall customer satisfaction with the apparel.
# - Deduce the key factors influencing customer perception.
# - Develop three data-driven recommendations for Cymbal Direct.
# - Identify any potential biases or limitations in the audio data.
prompt = """
You are an AI assistant helping to analyze in detail the audio recording of the conversation about Cymbal Direct's new athletic apparel line.
Analyze the conversation at a deeper level, reason about customer satisfaction, deduce influencing factors, and generate data-driven recommendations.

- Reason about the overall customer satisfaction with the apparel.
- Deduce the key factors influencing customer perception.
- Develop three data-driven recommendations for Cymbal Direct.
- Identify any potential biases or limitations in the audio data.
"""

# 2. Send the prompt and audio to the Gemini Thinking model
# TODO: Use the `client.models.generate_content` method to send the prompt and audio data to the Gemini model.
# TODO: Make sure to specify the `MODEL_ID`, `contents` (including the `audio_part` and the `prompt`), and `config` to enable thinking mode.
# TODO: Store the response from the model in a variable named `thinking_model_response`.
thinking_model_response = client.models.generate_content(
    model=MODEL_ID,
    contents=[prompt, audio_part],
    config=config
)

# 3. Print the thoughts and answer
print_thoughts(thinking_model_response)

# 4. Save the audio analysis to a text file in the analysis folder
with open('analysis/audio_analysis.md', 'w') as f:
    f.write(thinking_model_response.text)


## Task 5. Synthesize multimodal insights: Generate a comprehensive report

In this final task, you will synthesize the insights gained from your previous analyses of text, images, and audio data. You'll use the Gemini-2.5-flash model to generate a comprehensive report that consolidates the findings from each modality, providing a holistic view of customer sentiment, style preferences, and key trends related to Cymbal Direct's new athletic apparel line.

You will save the final report generated by the model into a markdown file, which you will then upload to Cloud Storage for review and evaluation. This comprehensive report will serve as a valuable resource for Cymbal Direct, enabling them to make informed decisions and optimize their strategies based on a thorough understanding of customer perceptions and market trends.

**Your tasks will be labeled with a #TODO section in the cell. Read each cell carefully and ensure you are filling them out correctly!**


In [35]:
MODEL_ID = "gemini-2.5-flash"  # @param {type: "string"}

In [None]:
# 1. Load the analysis results from the files
with open('analysis/text_analysis.md', 'r') as f:
    text_analysis = f.read()

with open('analysis/image_analysis.md', 'r') as f:
    image_analysis = f.read()

with open('analysis/audio_analysis.md', 'r') as f:
    audio_analysis = f.read()

# 2. Combine the analysis results
all_analysis = f"""
## Text Analysis:
{text_analysis}

## Image Analysis:
{image_analysis}

## Audio Analysis:
{audio_analysis}
"""
# 3. Construct the prompt for Gemini
# TODO: Write a prompt to instruct the Gemini model to generate a comprehensive report based on the combined analysis results.
# The prompt should include clear instructions to:
# - Summarize the overall sentiment towards the new apparel line.
# - Identify key themes and trends in customer feedback.
# - Provide insights on style preferences, usage patterns, and customer behavior.
# - Evaluate the audio and its fit with the product image.
# - Offer actionable recommendations for Cymbal Direct to refine their marketing strategy and product positioning.
comprehensive_report_prompt = f"""
You are a skilled market analyst tasked to analyze text, images, and audio data o generate a comprehensive report that consolidates 
the findings from earch source, providing a holistic view of customer sentiment, style preferences, and key trends related to Cymbal Direct's new athletic apparel line.

Detailled tasks:
- Summarize the overall sentiment towards the new apparel line.
- Identify key themes and trends in customer feedback.
- Provide insights on style preferences, usage patterns, and customer behavior.
- Evaluate the audio and its fit with the product image.
- Offer actionable recommendations for Cymbal Direct to refine their marketing strategy and product positioning.
{all_analysis}
"""

# 4. Send the prompt to Gemini
# TODO: Use the `client.models.generate_content` method to send the comprehensive_report_prompt to the Gemini model.
# TODO: Make sure to specify the `MODEL_ID`, the `comprehensive_report_prompt`, and the `config` to enable thinking mode.
# TODO: Store the response from the model in a variable named `thinking_model_response`.
thinking_model_response = client.models.generate_content(
    model=MODEL_ID,
    contents=comprehensive_report_prompt,
    config=config
)

# 5. Print the thoughts and answer
print_thoughts(thinking_model_response)

# 6. Save the final report to a file
with open('analysis/final_report.md', 'w') as f:
    f.write(thinking_model_response.text)


In [None]:
!gcloud storage cp analysis/final_report.md gs://{PROJECT_ID}-bucket/analysis/final_report.md

## Congratulations!

Congratulations! In this lab, you have successfully utilized the Gemini 2.0 Flash and Thinking models to analyze multimodal data, including text, images, and audio, to gain valuable insights for Cymbal Direct's new athletic apparel line. You have demonstrated proficiency in constructing effective prompts, leveraging the reasoning capabilities of the Thinking model, and generating a comprehensive report with actionable recommendations.

Copyright 2025 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.