## Example Multimodal retail recommendation: using Gemini to recommend items based on images and image reasoning


Author :  Rahul Raj Pandey <br>rahul@econz.net

### Import libraries and initialise


In [7]:
# Define project information
PROJECT_ID = "rahul-research-test"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}

# Initialize Vertex AI
import vertexai
vertexai.init(project=PROJECT_ID, location=LOCATION)
from vertexai.generative_models import GenerativeModel, Image

### Load Gemini 1.5 Pro model

In [8]:
multimodal_model = GenerativeModel("gemini-1.5-pro")

### Define helper functions

In [9]:
import http.client
import io
import typing
import urllib.request

import IPython.display
from PIL import Image as PIL_Image
from PIL import ImageOps as PIL_ImageOps


def display_image(image: Image, max_width: int = 600, max_height: int = 350) -> None:
    pil_image = typing.cast(PIL_Image.Image, image._pil_image)
    if pil_image.mode != "RGB":
        # Modes such as RGBA are not yet supported by all Jupyter environments
        pil_image = pil_image.convert("RGB")
    image_width, image_height = pil_image.size
    if max_width < image_width or max_height < image_height:
        # Resize to display a smaller notebook image
        pil_image = PIL_ImageOps.contain(pil_image, (max_width, max_height))
    display_image_compressed(pil_image)


def display_image_compressed(pil_image: PIL_Image.Image) -> None:
    image_io = io.BytesIO()
    pil_image.save(image_io, "jpeg", quality=80, optimize=True)
    image_bytes = image_io.getvalue()
    ipython_image = IPython.display.Image(image_bytes)
    IPython.display.display(ipython_image)


def get_image_bytes_from_url(image_url: str) -> bytes:
    with urllib.request.urlopen(image_url) as response:
        response = typing.cast(http.client.HTTPResponse, response)
        print(response.headers["Content-Type"])
        # if response.headers["Content-Type"] not in ("image/png", "image/jpeg","image/jpg"):
        #     raise Exception("Image can only be in PNG or JPEG format")
        image_bytes = response.read()
    return image_bytes


def load_image_from_url(image_url: str) -> Image:
    image_bytes = get_image_bytes_from_url(image_url)
    return Image.from_bytes(image_bytes)


def print_multimodal_prompt(contents: list):
    """
    Given contents that would be sent to Gemini,
    output the full multimodal prompt for ease of readability.
    """
    for content in contents:
        if isinstance(content, Image):
            display_image(content)
        else:
            print(content)

### Generating recommendations based on provided images

Instead of keeping the recommendation open, you can also provide a list of items for the model to choose from. Here you will download a few chair images and set them as options for the Gemini model to recommend from.

### Chairs for Room

##### Scenario

The customer shows you their living room:

|Customer photo |
|:-----:|
|<img src="https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/rooms/spacejoy-c0JoR_-2x3E-unsplash.jpg" width="80%">  |



Below are four chair options that the customer is trying to decide between:

|Chair 1| Chair 2 | Chair 3 | Chair 4 |
|:-----:|:----:|:-----:|:----:|
| <img src="https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/cesar-couto-OB2F6CsMva8-unsplash.jpg" width="80%">|<img src="https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/daniil-silantev-1P6AnKDw6S8-unsplash.jpg" width="80%">|<img src="https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/ruslan-bardash-4kTbAMRAHtQ-unsplash.jpg" width="80%">|<img src="https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/scopic-ltd-NLlWwR4d3qU-unsplash.jpg" width="80%">|


#### code

In [23]:
# urls for room images
room_image_url = "https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/rooms/spacejoy-c0JoR_-2x3E-unsplash.jpg"

# load room images as Image Objects
room_image = load_image_from_url(room_image_url)

# Download and display sample chairs
furniture_image_urls = [
    "https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/cesar-couto-OB2F6CsMva8-unsplash.jpg",
    "https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/daniil-silantev-1P6AnKDw6S8-unsplash.jpg",
    "https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/ruslan-bardash-4kTbAMRAHtQ-unsplash.jpg",
    "https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/scopic-ltd-NLlWwR4d3qU-unsplash.jpg",
]

# Load furniture images as Image Objects
furniture_images = [load_image_from_url(url) for url in furniture_image_urls]

# To recommend an item from a selection, you will need to label the item number within the prompt.
# Labelling images within your prompt also help to reduce hallucinations and overall produce better results.
contents = [
    "Consider the following chairs:",
    "chair 1:",
    furniture_images[0],
    "chair 2:",
    furniture_images[1],
    "chair 3:",
    furniture_images[2],
    "chair 4:",
    furniture_images[3],
    "room:",
    room_image,
    "You are an interior designer. For each chair, explain whether it would be appropriate for the style of the room:",
]

image/jpeg
image/jpeg
image/jpeg
image/jpeg
image/jpeg


#### output

In [19]:
responses = multimodal_model.generate_content(contents)
display_markdown(responses.text, raw=True)


-------Response--------


Here's a breakdown of each switchboard's suitability for the room, along with a table summarizing the assessments:

**Switchboard 1**

* **Description:** Rectangular wooden frame with grey switches and outlets.
* **Appropriateness:**  This switchboard has a modern, slightly rustic feel due to the wood grain. It could work well in the room if you want to complement the wooden wall paneling, but the grey might clash with the overall warm tones. 
* **Rating:** 6/10 (Potentially suitable, but consider the color contrast)

**Switchboard 2**

* **Description:**  Clean, minimalist white rectangular switchboard with a variety of outlets and a dimmer switch.
* **Appropriateness:**  This is a very versatile style that could blend well into the background. It wouldn't be a standout feature, but it wouldn't clash with the design either.
* **Rating:** 7/10 (Safe and neutral choice)

**Switchboard 3**

* **Description:** Two square options – one metallic gold, one wood-look with gold accents.
* **Appropriateness:**  The gold is too opulent and doesn't match the room's refined, natural aesthetic. The wood option might work if it matched the wall paneling exactly, but it's risky. 
* **Rating:** 4/10 (Not recommended, clashes with the style)

**Switchboard 4**

* **Description:**  Curved, dark wood-look switchboard with black switches. 
* **Appropriateness:** This switchboard has a sleek, contemporary feel that complements the room's modern lines. The dark wood would harmonize with the wall paneling and add a touch of sophistication.
* **Rating:** 9/10 (Excellent match for the style)


**Assessment Table:**

| Switchboard | Style | Appropriateness | Rating |
|---|---|---|---|
| 1 | Modern Rustic | Could work, but consider color contrast | 6/10 |
| 2 | Minimalist | Safe and neutral | 7/10 |
| 3 | Opulent/Metallic  | Not recommended, clashes with style | 4/10 |
| 4 | Contemporary Wood | Excellent match for the style | 9/10 | 


### SwitchBoards on Wall


##### Scenario

The customer shows you their living room:

|Customer photo |
|:-----:|
|<img src="https://media.istockphoto.com/id/1390233984/photo/modern-luxury-bedroom.jpg?s=612x612&w=0&k=20&c=po91poqYoQTbHUpO1LD1HcxCFZVpRG-loAMWZT7YRe4=" width="80%">  |



Below are four options that the customer is trying to decide between:

|Options 1| Options 2 | Options 3 | Options 4 |
|:-----:|:----:|:-----:|:----:|
| <img src="https://img.staticmb.com/mbcontent/images/crop/uploads/2023/3/Wooden-modular-switchboard-designs-with-switches-and-sockets_0_1200.jpg" width="40%">|<img src="https://t3.ftcdn.net/jpg/06/25/01/76/240_F_625017618_XEaENhgwNaQibpH5yzlqFvVNuyq1jBNE.jpg" width="80%">|<img src="https://5.imimg.com/data5/SG/TT/MY-44167071/electrical-switche-500x500.jpg" width="50%">|<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQU5Z_HeKkI078zmFYQATbDQBQW5BjLZvZ8SA&s" width="100%">|


#### code

In [22]:
from IPython.display import display_markdown
# urls for room images
room_image_url = "https://media.istockphoto.com/id/1390233984/photo/modern-luxury-bedroom.jpg?s=612x612&w=0&k=20&c=po91poqYoQTbHUpO1LD1HcxCFZVpRG-loAMWZT7YRe4="

# "https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/rooms/spacejoy-c0JoR_-2x3E-unsplash.jpg"

# load room images as Image Objects
room_image = load_image_from_url(room_image_url)



# Download and display sample chairs
furniture_image_urls = [
    "https://img.staticmb.com/mbcontent/images/crop/uploads/2023/3/Wooden-modular-switchboard-designs-with-switches-and-sockets_0_1200.jpg",
   "https://t3.ftcdn.net/jpg/06/25/01/76/240_F_625017618_XEaENhgwNaQibpH5yzlqFvVNuyq1jBNE.jpg",
    "https://5.imimg.com/data5/SG/TT/MY-44167071/electrical-switche-500x500.jpg",
    "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQU5Z_HeKkI078zmFYQATbDQBQW5BjLZvZ8SA&s",
]

# Load furniture images as Image Objects
furniture_images = [load_image_from_url(url) for url in furniture_image_urls]

# To recommend an item from a selection, you will need to label the item number within the prompt.
# Labelling images within your prompt also help to reduce hallucinations and overall produce better results.
contents = [
    "Consider the following swatch boards:",
    "switch board 1:",
    furniture_images[0],
    "switch board 2:",
    furniture_images[1],
    "switch board 3:",
    furniture_images[2],
    "switch board 4:",
    furniture_images[3],
    "room:",
    room_image,
    "You are an interior designer. For each swatch board , explain whether it would be appropriate for the style of the room in brief with a rating, also give assesment in tabular form:",
]
# print("-------Prompt--------")
# print_multimodal_prompt(contents)

image/jpeg
image/jpg
image/jpeg
image/jpeg
image/jpeg


#### Ouput

In [21]:
responses = multimodal_model.generate_content(contents)
display_markdown(responses.text, raw=True)

Here's an assessment of each switchboard design in relation to the bedroom's style:

**Overall Room Style:** Modern, minimalist with warm, natural elements.

**Switchboard Assessments:**

| Switchboard | Appropriate? | Rating (out of 5) | Explanation |
|---|---|---|---|
| 1 | Somewhat | 3/5 | The wood tone is harmonious with the room's natural elements, but the design feels slightly dated and lacks the sleekness of the overall aesthetic. |
| 2 | Not Ideal | 2/5 | Too generic and utilitarian. The stark white clashes with the warm tones and doesn't complement the modern design elements. |
| 3 | Not Suitable | 1/5 |  The red and chrome combination is far too bold and clashes heavily with the bedroom's serene color palette and natural materials. |
| 4 | **Excellent Choice** | 4.5/5 | **The dark wood and sleek black switches align perfectly with the room's modern aesthetic and warm color scheme. The slightly curved design adds a touch of visual interest without being overwhelming.** |

**In Conclusion:** Switchboard #4 is the most fitting choice for this bedroom. It seamlessly blends functionality with style, complementing the existing design elements without feeling out of place. 


### Glasses/Lenses Recommendation 

##### Scenario

The customer uploads their face image:

|Customer photo |
|:-----:|
|<img src="https://raw.githubusercontent.com/Rahulraj31/Multimodel-Gemini-Recommendation-System-/refs/heads/main/sample-user-image.jpeg" width="20%">  |



Below are four Glasses options that the customer is trying to decide between:

|Option 1| Option 2 | Option 3 | Option 4 |
|:-----:|:----:|:-----:|:----:|
| <img src="https://static5.lenskart.com/media/catalog/product/pro/1/thumbnail/628x301/9df78eab33525d08d6e5fb8d27136e95//j/i/john-jacobs-jj-e16116-c2-eyeglasses_g_8777_19_10_2023.jpg" width="80%">|<img src="https://static5.lenskart.com/media/catalog/product/pro/1/thumbnail/628x301/9df78eab33525d08d6e5fb8d27136e95//v/i/Purple-Silver-Purple-Transparent-Full-Rim-Round-Vincent-Chase-SLEEK-STEEL-VC-E13784-C1-Eyeglasses_vincent-chase-vc-e13784-c1-eyeglasses_g_301709_02_2022.jpg" width="80%">|<img src="https://static5.lenskart.com/media/catalog/product/pro/1/thumbnail/628x301/9df78eab33525d08d6e5fb8d27136e95//j/o/john-jacobs-jj-e13346-c2-eyeglasses_g_5793.jpg" width="80%">|<img src="https://static5.lenskart.com/media/catalog/product/pro/1/thumbnail/628x301/9df78eab33525d08d6e5fb8d27136e95//v/i/brown-transparent-silver-full-rim-cat-eye-vincent-chase-blend-edit-vc-e14973-c2-eyeglasses_g_3516_10_14_22.jpg" width="80%">|


#### code


In [27]:
from IPython.display import display_markdown
# urls for room images
user_face_image_url = "https://raw.githubusercontent.com/Rahulraj31/Multimodel-Gemini-Recommendation-System-/refs/heads/main/sample-user-image.jpeg"

# "https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/rooms/spacejoy-c0JoR_-2x3E-unsplash.jpg"

# load room images as Image Objects
user_image = load_image_from_url(user_face_image_url)



# Download and display sample chairs
spec_image_urls = [
    "https://static5.lenskart.com/media/catalog/product/pro/1/thumbnail/628x301/9df78eab33525d08d6e5fb8d27136e95//j/i/john-jacobs-jj-e16116-c2-eyeglasses_g_8777_19_10_2023.jpg",
   "https://static5.lenskart.com/media/catalog/product/pro/1/thumbnail/628x301/9df78eab33525d08d6e5fb8d27136e95//v/i/Purple-Silver-Purple-Transparent-Full-Rim-Round-Vincent-Chase-SLEEK-STEEL-VC-E13784-C1-Eyeglasses_vincent-chase-vc-e13784-c1-eyeglasses_g_301709_02_2022.jpg",
    "https://static5.lenskart.com/media/catalog/product/pro/1/thumbnail/628x301/9df78eab33525d08d6e5fb8d27136e95//j/o/john-jacobs-jj-e13346-c2-eyeglasses_g_5793.jpg",
    "https://static5.lenskart.com/media/catalog/product/pro/1/thumbnail/628x301/9df78eab33525d08d6e5fb8d27136e95//v/i/brown-transparent-silver-full-rim-cat-eye-vincent-chase-blend-edit-vc-e14973-c2-eyeglasses_g_3516_10_14_22.jpg",
]

# Load furniture images as Image Objects
spec_images = [load_image_from_url(url) for url in spec_image_urls]

# To recommend an item from a selection, you will need to label the item number within the prompt.
# Labelling images within your prompt also help to reduce hallucinations and overall produce better results.
contents = [
    "Consider the following swatch boards:",
    "spec type 1:",
    spec_images[0],
    "spec type 2:",
    spec_images[1],
    "spec type 3:",
    spec_images[2],
    "spec type 4:",
    spec_images[3],
    "user face pic:",
    user_image,
    "You are a glasses/spectacles fashion advisor your task is to analyse user face structure and advice them with best options of given glasses. For each spec type , explain whether it would be appropriate for the user to choose them to wear in brief with a rating based upon looks of the glass and how it would affect user face, also give assesment in tabular form:",
]
# print("-------Prompt--------")
# print_multimodal_prompt(contents)

image/jpeg
image/jpeg
image/jpeg
image/jpeg
image/jpeg


#### output

In [26]:
responses = multimodal_model.generate_content(contents)
display_markdown(responses.text, raw=True)

Based on the photo provided, this gentleman has a round face shape with softer features. Here's a breakdown of each spec type and how well they'd suit him:

**Spec Type 1: (Black Square Frames)**

* **Appropriate?**  Yes. Square and rectangular frames can provide a flattering contrast to round faces, adding definition and making the face appear slimmer. 
* **Rating:** 8/10
* **Explanation:** The black color adds a classic and bold touch, complementing a variety of styles. These frames are a safe and stylish bet.

**Spec Type 2: (Thin, Round, Purple Frames)**

* **Appropriate?** Less ideal. Round frames on a round face can overemphasize the roundness. 
* **Rating:** 5/10 
* **Explanation:** While the thinness of the frames helps, the overall shape might not be the most flattering. He could pull these off if he prefers a softer, more whimsical style, but it's not the best choice for accentuating his features.

**Spec Type 3: (Geometric, Angular Frames)**

* **Appropriate?** Good. Angular and geometric frames add structure and dimension to a round face.
* **Rating:** 7/10 
* **Explanation:** These frames make a statement and are very on-trend. The sharp angles provide a nice counterpoint to his softer features.

**Spec Type 4: (Cat-Eye Frames)**

* **Appropriate?**  Less ideal. While cat-eye frames can be very stylish, they tend to complement oval and square faces better. On a round face, they might not have the desired lifting effect.
* **Rating:** 6/10
* **Explanation:** These could work if he prefers a more retro-inspired look, but the frames might not harmonize perfectly with his natural features. 

**Assessment Table:**

| Spec Type | Appropriate? | Rating | Explanation |
|---|---|---|---|
| Black Square Frames | Yes | 8/10 | Classic, bold, slimming effect |
| Thin, Round, Purple Frames | Less Ideal | 5/10 | May overemphasize roundness |
| Geometric, Angular Frames | Good | 7/10 | Adds structure and dimension |
| Cat-Eye Frames | Less Ideal | 6/10 | May not provide the best lifting effect | 

**Important Note:** The best way to find the perfect glasses is to try them on! This analysis provides general guidance, but personal style and preference also play a huge role. 


In [28]:
contents

['Consider the following swatch boards:',
 'spec type 1:',
 <vertexai.generative_models._generative_models.Image at 0x234d27750d0>,
 'spec type 2:',
 <vertexai.generative_models._generative_models.Image at 0x234d2730620>,
 'spec type 3:',
 <vertexai.generative_models._generative_models.Image at 0x234bab63e60>,
 'spec type 4:',
 <vertexai.generative_models._generative_models.Image at 0x234d27be240>,
 'user face pic:',
 <vertexai.generative_models._generative_models.Image at 0x234bab63fb0>,
 'You are a glasses/spectacles fashion advisor your task is to analyse user face structure and advice them with best options of given glasses. For each spec type , explain whether it would be appropriate for the user to choose them to wear in brief with a rating based upon looks of the glass and how it would affect user face, also give assesment in tabular form:']