# Use Multimodal Models

This exercise is optional if you have more time and would like to try out multimodal models. Multimodal models can use different inputs such as text, audio and images. In Generative AI Hub on SAP AI Core you can access `gpt-4o` which is multimodal.

👉 Go back to [01-deploy-model](01-deploy-model.md) to deploy gpt-4o.

👉 Then assign the deployment id to the `DEPLOYMENT_ID_MULTIMODAL` in [variables.py](variables.py).

👉 Now run the code snippet below to get a description for the AI Foundation Architecture. These descriptions can then for example be used as alternative text for screen readers or other assistive tech.

In [2]:
import os
import json

with open('/home/user/projects/generative-ai-codejam/.aicore-config.json', 'r') as config_file:
    config_data = json.load(config_file)

os.environ["AICORE_AUTH_URL"]=config_data["url"]+"/oauth/token"
os.environ["AICORE_CLIENT_ID"]=config_data["clientid"]
os.environ["AICORE_CLIENT_SECRET"]=config_data["clientsecret"]
os.environ["AICORE_BASE_URL"]=config_data["serviceurls"]["AI_API_URL"]
os.environ["AICORE_RESOURCE_GROUP"]="default"

import base64
import variables

from gen_ai_hub.proxy.langchain.openai import ChatOpenAI

In [3]:
with open("documents/ai-foundation-architecture.png", "rb") as image_file:
    image_data = base64.b64encode(image_file.read()).decode('utf-8')

message= {"role": "user", "content": [
            {"type": "text", "text": "Describe the images as an alternative text"},
            {"type": "image_url", "image_url": {
                "url": f"data:image/png;base64,{image_data}"}
            }
        ]}
    

model = ChatOpenAI(deployment_id=variables.DEPLOYMENT_ID_MULTIMODAL)

response = model.invoke([message])
print(response.content)

**Alternative Text Description:**

The image is a structured diagram titled "AI Foundation on SAP BTP." It is divided into several sections:

1. **Top Section:** Labeled "AI Foundation on Business Technology Platform," which includes various AI services such as Document Processing, Recommendation, and Machine Translation.

2. **Middle Section:** Features "Generative AI Management," with three components: Toolset, Trust & Control, and Access. Adjacent to it is "AI Workload Management," which includes Training and Inference.

3. **Lower Middle Section:** Labeled "Business Data & Context," containing elements like Vector Engine and Data Management.

4. **Bottom Section:** Titled "Foundation Models," with options for SAP built, Hosted, and Remote, along with Fine-tuned.

5. **Lifecycle Management:** A blue box at the bottom indicates Lifecycle Management.

The overall layout consists of colored boxes and labeled components arranged in a clear, hierarchical format.


In [7]:
with open("documents/bananabread.png", "rb") as image_file:
    image_data = base64.b64encode(image_file.read()).decode('utf-8')

message= {"role": "user", "content": [
            {"type": "text", "text": "Extract the ingredients and instructions in two different json files"},
            {"type": "image_url", "image_url": {
                "url": f"data:image/png;base64,{image_data}"}
            }
        ]}
    

model = ChatOpenAI(deployment_id=variables.DEPLOYMENT_ID_MULTIMODAL)

response = model.invoke([message])
print(response.content)

Sure! Here are the ingredients and instructions extracted into two separate JSON files.

**Ingredients JSON:**

```json
{
  "dry_ingredients": {
    "all_purpose_flour": "260 g",
    "sugar": "200 g",
    "baking_soda": "6 g",
    "salt": "3 g"
  },
  "wet_ingredients": {
    "banana": "225 g",
    "large_eggs": "2",
    "vegetable_oil": "100 g",
    "whole_milk": "55 g",
    "vanilla_extract": "5 g"
  },
  "toppings": {
    "chocolate_chips": "100 g",
    "walnuts": "100 g (Optional)"
  }
}
```

**Instructions JSON:**

```json
{
  "directions": [
    "Preheat oven to 180 °C",
    "Mash banana in a bowl",
    "Combine banana and other wet ingredients together in the same bowl",
    "Mix all dry ingredients together in a separate bowl",
    "Use a whisk to combine dry mixture into wet mixture until smooth",
    "Pour mixture into a greased/buttered loaf pan",
    "Place the pan into preheated oven on the middle rack and bake for about 60 minutes or until a toothpick comes out clean",
  