In [1]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

## Overview

For retail companies, recommendation systems improve customer experience and thus can increase sales.

This notebook shows how you can use the multimodal capabilities of Gemini 2.0 model to rapidly create a multimodal recommendation system out-of-the-box.

## Scenario

The customer shows you their living room:

|Customer photo |
|:-----:|
|<img src="https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/rooms/spacejoy-c0JoR_-2x3E-unsplash.jpg" width="80%">  |



Below are four chair options that the customer is trying to decide between:

|Chair 1| Chair 2 | Chair 3 | Chair 4 |
|:-----:|:----:|:-----:|:----:|
| <img src="https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/cesar-couto-OB2F6CsMva8-unsplash.jpg" width="80%">|<img src="https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/daniil-silantev-1P6AnKDw6S8-unsplash.jpg" width="80%">|<img src="https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/ruslan-bardash-4kTbAMRAHtQ-unsplash.jpg" width="80%">|<img src="https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/scopic-ltd-NLlWwR4d3qU-unsplash.jpg" width="80%">|


How can you use Gemini, a multimodal model, to help the customer choose the best option, and also explain why?

### Objectives

Your main objective is to learn how to create a recommendation system that can provide both recommendations and explanations using a multimodal model: Gemini 2.0.

In this notebook, you will begin with a scene (e.g. a living room) and use the Gemini model to perform visual understanding. You will also investigate how the Gemini model can be used to recommend an item (e.g. a chair) from a list of furniture items as input.

By going through this notebook, you will learn:
- how to use the Gemini model to perform visual understanding
- how to take multimodality into consideration in prompting for the Gemini model
- how the Gemini model can be used to create retail recommendation applications out-of-the-box

### Costs
This tutorial uses billable components of Google Cloud:

- Vertex AI

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

## Getting Started

### Install Gen AI SDK for Python

In [2]:
%pip install --upgrade google-genai

Collecting google-genai
  Downloading google_genai-1.11.0-py3-none-any.whl.metadata (32 kB)
Collecting httpx<1.0.0,>=0.28.1 (from google-genai)
  Downloading httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)
Collecting httpcore==1.* (from httpx<1.0.0,>=0.28.1->google-genai)
  Downloading httpcore-1.0.8-py3-none-any.whl.metadata (21 kB)
Downloading google_genai-1.11.0-py3-none-any.whl (159 kB)
Downloading httpx-0.28.1-py3-none-any.whl (73 kB)
Downloading httpcore-1.0.8-py3-none-any.whl (78 kB)
Installing collected packages: httpcore, httpx, google-genai
Successfully installed google-genai-1.11.0 httpcore-1.0.8 httpx-0.28.1
Note: you may need to restart the kernel to use updated packages.


### Restart current runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [3]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).

In [1]:
import sys

# Additional authentication is required for Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

### Define Google Cloud project information and initialize Vertex AI

Initialize the Vertex AI SDK for Python for your project:

In [3]:
# Define project information
PROJECT_ID = "qwiklabs-gcp-00-0eeb2c30bbea"  # @param {type:"string"}
LOCATION = "us-west1"  # @param {type:"string"}

# Create the API client
from google import genai
client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

### Import libraries

In [4]:
from google.genai.types import GenerateContentConfig, Part

## Using Gemini  model

Gemini is a multimodal model that supports adding image and video in text or chat prompts for a text response.

### Load Gemini model

In [5]:
MODEL_ID = "gemini-2.0-flash"

### Visual understanding with Gemini

Here you will ask the Gemini model to describe a room in details from its image. To do that you have to **combine text and image in a single prompt**.

In [6]:
# urls for room images
room_image_url = "https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/rooms/spacejoy-c0JoR_-2x3E-unsplash.jpg"
room_image = Part.from_uri(file_uri=room_image_url, mime_type="image/jpeg")

prompt = "Describe what's visible in this room and the overall atmosphere:"
contents = [
    room_image,
    prompt,
]

responses = client.models.generate_content_stream(model=MODEL_ID, contents=contents)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")


-------Response--------
The room is a bright and airy living space with a calm and inviting atmosphere. The color palette is primarily neutral, featuring soft whites, creams, beiges, and light browns. 

Here are some key features:

*   **Furniture:** There's a light-colored sofa with several throw pillows, a round armchair with a round ottoman, and a round white coffee table with wooden legs. A tall floor lamp stands near the sofa.
*   **Decor:** The walls are adorned with two round, straw-framed mirrors. There's a tall white vase with dried pampas grass adding a touch of organic texture.
*   **Textiles:** A woven rug covers the floor, and a sheepskin rug is placed near the armchair. A cozy throw blanket is draped over the sofa.
*   **Lighting:** The room appears to have plenty of natural light, supplemented by the floor lamp.
*   **Overall Atmosphere:** The room feels comfortable, stylish, and well-curated. The mix of textures, natural elements, and neutral tones creates a sense of w

### Generating open recommendations based on built-in knowledge

Using the same image, you can ask the model to recommend **a piece of furniture** that would fit in it alongside with the description of the room.

Note that the model can choose **any furniture** to recommend in this case, and can do so from its only built-in knowledge.

In [7]:
prompt1 = "Recommend a new piece of furniture for this room:"
prompt2 = "and explain the reason in detail"
contents = [prompt1, room_image, prompt2]

responses = client.models.generate_content_stream(model=MODEL_ID, contents=contents)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")


-------Response--------
Okay, based on the image and the style of the room, I would recommend adding a **console table** behind the sofa.

Here's why:

*   **Functionality:** A console table provides a surface to place lamps, books, decorative objects, or even drinks, making the sofa area more functional and convenient. It adds a horizontal element to the room and breaks up the wall space.
*   **Visual Balance:** Placing it behind the sofa would anchor the sofa in the room and prevent it from feeling like it's floating in the space.
*   **Style Compatibility:** The room has a relaxed, natural, and somewhat minimalist aesthetic. A console table made of wood, rattan, or a light-colored painted finish would blend seamlessly with the existing furniture and decor. A simple design with clean lines would be best.

In the next cell, you will ask the model to recommend **a type of chair** that would fit in it alongside with the description of the room.

Note that the model can choose **any type of chair** to recommend in this case.

In [8]:
prompt1 = "Describe this room:"
prompt2 = "and recommend a type of chair that would fit in it"
contents = [prompt1, room_image, prompt2]

responses = client.models.generate_content_stream(model=MODEL_ID, contents=contents)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")


-------Response--------
The room is a light and airy living space with a modern and somewhat bohemian aesthetic. The walls are a neutral light color, likely off-white or beige. The flooring appears to be wooden parquet in some areas, and a large woven rug in a light beige or cream covers the main seating area.

The furniture consists of:

*   **A sofa:** A light gray or beige sofa with several pillows in coordinating neutral tones and textures. A cozy beige throw blanket is draped over the back.
*   **An armchair:** A rounded armchair upholstered in a light cream or off-white fabric. It's paired with a round, matching ottoman.
*   **A coffee table:** A round white coffee table with slender wooden legs.
*   **Lighting:** A floor lamp with a beige lampshade and a wood and white base. A basket is at the bottom of the lamp stand.
*   **Decor:** Two large, circular mirrors with woven frames are hung on the wall behind the sofa. A tall, white cylindrical vase holds a large arrangement of dr

### Generating recommendations based on provided images

Instead of keeping the recommendation open, you can also provide a list of items for the model to choose from. Here you will download a few chair images and set them as options for the Gemini model to recommend from. This is particularly useful for retail companies who want to provide recommendations to users based on the kind of room they have, and the available items that the store offers.

In [9]:
# Download and display sample chairs
furniture_image_urls = [
    "https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/cesar-couto-OB2F6CsMva8-unsplash.jpg",
    "https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/daniil-silantev-1P6AnKDw6S8-unsplash.jpg",
    "https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/ruslan-bardash-4kTbAMRAHtQ-unsplash.jpg",
    "https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/scopic-ltd-NLlWwR4d3qU-unsplash.jpg",
]

# Load furniture images as Part Objects
furniture_images = [
    Part.from_uri(file_uri=url, mime_type="image/jpeg") for url in furniture_image_urls
]

# To recommend an item from a selection, you will need to label the item number within the prompt.
# That way you are providing the model with a way to reference each image as you pose a question.
# Labelling images within your prompt also help to reduce hallucinations and overall produce better results.
contents = [
    "Consider the following chairs:",
    "chair 1:",
    furniture_images[0],
    "chair 2:",
    furniture_images[1],
    "chair 3:",
    furniture_images[2],
    "chair 4:",
    furniture_images[3],
    "room:",
    room_image,
    "You are an interior designer. For each chair, explain whether it would be appropriate for the style of the room:",
]

responses = client.models.generate_content_stream(model=MODEL_ID, contents=contents)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")


-------Response--------
Okay, let's analyze each chair in the context of the room's style.

**Overall Room Style:**

The room has a modern bohemian aesthetic. It is characterized by:

*   **Neutral Color Palette:** Predominantly whites, creams, tans, and light browns.
*   **Natural Materials:**  Wood, rattan/wicker, textiles with natural textures.
*   **Relaxed and Comfortable Vibe:** The furniture placement and the use of pillows/throws create a cozy atmosphere.
*   **Modern Simplicity:** Clean lines in the furniture but softened with organic elements.
* **Round Accent Shapes:** Mirros, pillows, ottomans, vases.

**Chair Analysis:**

*   **Chair 1 (Round Wooden Stool):** This chair clashes with the room. The modern design of the stool does not align with the relaxed boho theme of the room. It is also too low for general use in the living room.

*   **Chair 2 (White Upholstered Armchair):** This chair is very appropriate for the room. The white color and the buttoned details of the ch

You can also return the responses in JSON format, to make it easier to plug recommendations into a recommendation system:

In [10]:
contents = [
    "Consider the following chairs:",
    "chair 1:",
    furniture_images[0],
    "chair 2:",
    furniture_images[1],
    "chair 3:",
    furniture_images[2],
    "chair 4:",
    furniture_images[3],
    "room:",
    room_image,
    "You are an interior designer. Return in JSON, for each chair, whether it would fit in the room, with an explanation:",
]

responses = client.models.generate_content_stream(
    model=MODEL_ID,
    contents=contents,
    config=GenerateContentConfig(response_mime_type="application/json"),
)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")


-------Response--------
[
  {
    "chair_id": "chair 1",
    "fits": false,
    "explanation": "Chair 1 is a stool style chair. The room already contains plenty of seating and is decorated in a classic/boho style which doesn't fit with the style of the stool."
  },
  {
    "chair_id": "chair 2",
    "fits": true,
    "explanation": "Chair 2 is a classic looking armchair. The room already features an armchair in a similar style, so this armchair would fit well."
  },
  {
    "chair_id": "chair 3",
    "fits": false,
    "explanation": "Chair 3 is a simple, modern stool. The room already contains plenty of seating, and its classic/boho style doesn't fit with the style of the stool."
  },
  {
    "chair_id": "chair 4",
    "fits": false,
    "explanation": "Chair 4 is a modern office chair. It does not match the room's classic/boho style, and its purpose (office seating) isn't relevant to a living room."
  }
]

## Conclusion

This notebook showed how you can easily build a multimodal recommendation system using Gemini for furniture, but you can also use the similar approach in:

- recommending clothes based on an occasion or an image of the venue
- recommending wallpaper based on the room and settings

You may also want to explore how you can build a RAG (retrieval-augmented generation) system where you retrieve relevant images from your store inventory to users who can they use Gemini to help identify the most ideal choice from the various options provided, and also explain the rationale to users.