[*prompt-llm*](https://github.com/SAP-samples/generative-ai-codejam/blob/main/exercises/03-prompt-llm.ipynb)


### Check your connection to Generative AI Hub

‚òùÔ∏è In the init_env.py file the values from generative-ai-codejam/.aicore-config.json are assigned to environmental variables. That way the Generative AI Hub Python SDK will connect to Generative AI Hub.

üëâ For the Python SDK to know which resource group to use, you also need to set the resource group in the variables.py file to your own resource group (e.g. team-01) that you created in the SAP AI Launchpad in exercise 00-connect-AICore-and-AILaunchpad.


In [4]:
from config import init_env
from config import variables
import importlib
variables = importlib.reload(variables)

# TODO: You need to specify which model you want to use. In this case we are directing our prompt
# to the openAI API directly so you need to pick one of the GPT models. Make sure the model is actually deployed
# in genAI Hub. You might also want to chose a model that can also process images here already. 
# E.g. 'gpt-4.1-mini'
MODEL_NAME = 'gpt-4o'

# Do not modify the `assert` line below
assert MODEL_NAME!='', """You should change the variable `MODEL_NAME` with the name of your deployed model (like 'gpt-4o-mini') first!"""

init_env.set_environment_variables()
# Do not modify the `assert` line below 
assert variables.RESOURCE_GROUP!='', """You should change the value assigned to the `RESOURCE_GROUP` in the `variables.py` file to your own resource group first!"""

print(f"Resource group is set to: {variables.RESOURCE_GROUP}")

Resource group is set to: default



### Prompt an LLM in the Generative AI Hub...

...using OpenAI native client integration: https://help.sap.com/doc/generative-ai-hub-sdk/CLOUD/en-US/_reference/gen_ai_hub.html#openai

To understand the API and structures, check OpenAI documentation: https://platform.openai.com/docs/guides/text?api-mode=chat


In [3]:
from gen_ai_hub.proxy.native.openai import chat
from IPython.display import Markdown

messages = [
    {
        "role": "user", 
        "content": "Who won the latest Nobel prize of chemistry and what is their achievement?"
    }
]

kwargs = dict(model_name=MODEL_NAME, messages=messages)

response = chat.completions.create(**kwargs)
display(Markdown(response.choices[0].message.content))

The 2023 Nobel Prize in Chemistry was awarded to Moungi G. Bawendi, Louis E. Brus, and Alexei I. Ekimov for their discovery and synthesis of quantum dots. Quantum dots are tiny semiconductor particles or nanocrystals with unique optical and electronic properties due to their size, which is at the nanometer scale. These properties have significant applications in fields such as medical imaging and display technologies.


### Understanding roles

Most LLMs have the roles system, assistant (GPT) or model (Gemini) and user that can be used to steer the models response. In the previous step you only used the role user to ask your question.

üëâ Try out different system messages to change the response. You can also tell the model to not engage in smalltalk or only answer questions on a certain topic. Then try different user prompts as well!

Please note, that in OpenAI API with o1 models and newer, developer messages replace the previous system messages.


In [12]:
messages = [
    {   "role": "system", 
        # TODO try changing the system prompt
        "content": "Speak like Yoda from Star Wars."
    }, 
    {
        "role": "user", 
        "content": "What is the underlying model architecture of an LLM? Explain it as short as possible."
    }
]

kwargs = dict(model_name=MODEL_NAME, messages=messages)
response = chat.completions.create(**kwargs)
display(Markdown(response.choices[0].message.content))

Transformer architecture, it is. Layers of attention mechanisms, and feedforward networks, many have. Generate language, they do.

üëâ Also try to have it speak like a pirate.

In [5]:
messages = [
    {   "role": "system", 
        # TODO try changing the system prompt
        "content": "Speak like a pirate."
    }, 
    {
        "role": "user", 
        "content": "What is the underlying model architecture of an LLM? Explain it as short as possible."
    }
]

kwargs = dict(model_name=MODEL_NAME, messages=messages)
response = chat.completions.create(**kwargs)
display(Markdown(response.choices[0].message.content))

Arrr, the heart of an LLM be the transformer architecture, matey! It uses attention mechanisms to capture contextual relationships in the input text, allowing it to generate coherent and context-sensitive responses. Yo ho ho!

üëâ Also try to have it speak other celebraties who speaks in a way you are familiar with.

In [6]:
messages = [
    {   "role": "system", 
        # TODO try changing the system prompt
        "content": "Speak like a Donald Trump."
    }, 
    {
        "role": "user", 
        "content": "What is the underlying model architecture of an LLM? Explain it as short as possible."
    }
]

kwargs = dict(model_name=MODEL_NAME, messages=messages)
response = chat.completions.create(**kwargs)
display(Markdown(response.choices[0].message.content))

Listen, folks, let me tell you‚Äîlarge language models, they‚Äôre incredible. The architecture, it‚Äôs known as a transformer. Highly powerful, revolutionary, uses attention mechanisms to understand context. Great technology, amazing performance, many say the best. That's what they are, folks!

In [35]:
messages = [
    {   "role": "system", 
        # TODO try changing the system prompt
        "content": "Speak like ÊùéÁôΩ in Chinese."
    }, 
    {
        "role": "user", 
        "content": "What is the underlying model architecture of an LLM? Explain it as short as possible."
    }
]

kwargs = dict(model_name=MODEL_NAME, messages=messages)
response = chat.completions.create(**kwargs)
display(Markdown(response.choices[0].message.content))

ÂêæÈóªÂ§ßÊ®°Âûã‰πãÂü∫Ôºå‰πÉÊòØÊ∑±Â±ÇÁ•ûÁªèÁΩëÁªú„ÄÇÂÖ∂Ë¶ÅÂ∑ßÂú®‰∫éËΩ¨ÂåñÂô®ÔºåËóèÊúâÂ§öÂ§¥Ëá™Ê≥®ÊÑèÊú∫Âà∂ÔºåÂ±ÇÂè†ËÄåÊàêÔºåÈöêÂê´ÂçÉ‰∏áÂèÇÊï∞ÔºåÊïÖËÉΩÈÄöÊôì‰∏áË±°Ôºö‰πâÁêÜÊ¥ûÊòéÔºåÊñáÈááÊñêÁÑ∂Ôºå‰πÉ‰∏ñÈó¥Êô∫ÊÖß‰πãÊòæÁé∞‰πü„ÄÇ

### Use Multimodal Models

Multimodal models can use different inputs such as text, audio and images. In Generative AI Hub on SAP AI Core you can access multiple multimodal models (e.g. gpt-4o-mini).

üëâ If you have not deployed one of the gpt-4o-mini models in previous exercises, then go back to the model library and deploy a model that can also process images.

üëâ Now run the code snippet below to get a description for the image of the AI Foundation Architecture. These descriptions can then for example be used as alternative text for screen readers or other assistive tech.

üëâ You can upload your own image and play around with it!

In [39]:
import base64

# get the image from the documents folder
with open("images/ai-foundation-architecture.png", "rb") as image_file:
    image_data = base64.b64encode(image_file.read()).decode('utf-8')

messages = [
    {
        "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe the images as an alternative text."
                },
                {
                    "type": "image_url", 
                    "image_url": {
                        "url": f"data:image/png;base64,{image_data}"
                        }
                }
            ]
    }
]

kwargs = dict(model_name=MODEL_NAME, messages=messages)
response = chat.completions.create(**kwargs)

display(Markdown(response.choices[0].message.content))

The image is a diagram labeled "Single-tenant application (Retrieval Augmented Generation & Generative AI on SAP BTP)." It illustrates a reference architecture within the SAP Business Technology Platform (BTP), showcasing how various components interact to leverage Generative AI.

On the left, there's an icon representing a user connected via "Application Clients" like mobile or desktop. These clients connect to an "App Router" and "HTML5 App Repository," which is part of a "Subaccount" in a Multi-Cloud environment.

Within this subaccount, a "User Interface" section includes "SAPUI5" and "UI5 Web Components." It also has "SAP Continuous Integration and Delivery" and "SAP Business Application Studio."

The diagram shows a pathway from the App Router to the "SAP Authorization and Trust Management service," indicating trust relationships. Linked to this is the "SAP Cloud Application Programming Model (CAP)" section, which lists "Application Service" details like "Use Case Logic," "Data Management," "LLM Plugins & SDKs," and "SAP Cloud SDK for AI."

A "Destination" line goes from the App Router to SAP's "Generative AI Hub." This hub comprises "SAP AI Launchpad" and "SAP AI Core," including elements like "Trust & Control," "Prompt Registry," and an "Orchestration" section with "Grounding," "Templating," "Data Masking," and "I/O Filtering."

"Foundation Model Access" outlines "Partner built" and "SAP built" models, with an additional "Foundation Models" section for "SAP hosted" and "Partner hosted." These connect to other areas via HTTPS.

On the far right, there's a "Network" section connected by HTTPS, illustrating connections to "SAP On-Premise Solutions," "3rd Party Applications," and "SAP Cloud Solutions" alongside "SAP Connectivity Service" and "SAP Destination Service."

At the bottom, a legend defines symbols like "Access," "Mutual Trust," "planned elements," and "SAP BTP Service," with a note clarifying the diagram as an L2 level depiction. This diagram details how SAP's solutions utilize AI and integration across cloud and on-premise environments.


### Extracting text from images

Nora loves bananabread and thinks recipes are a good example of how LLMs can also extract complex text from images, like from a picture of a recipe of a bananabread. Try your own recipe if you like :)

This exercise also shows how you can use the output of an LLM in other systems, as you can tell the LLM how to output information, for example in JSON format.


In [38]:
import base64
# get the image from the documents folder
with open("images/bananabread.png", "rb") as image_file:
    image_data = base64.b64encode(image_file.read()).decode('utf-8')

messages = [{"role": "user", "content": [
            {"type": "text", "text": "Extract the ingredients and instructions in two different json files."},
            {"type": "image_url", "image_url": {
                "url": f"data:image/png;base64,{image_data}"}
            }]
        }]

kwargs = dict(model_name=MODEL_NAME, messages=messages)
response = chat.completions.create(**kwargs)

display(Markdown(response.choices[0].message.content))

Here are the two JSON files based on the provided banana bread recipe image:

**ingredients.json:**
```json
{
  "dry_ingredients": {
    "all_purpose_flour": "260 g",
    "sugar": "200 g",
    "baking_soda": "6 g",
    "salt": "3 g"
  },
  "wet_ingredients": {
    "banana": "225 g",
    "eggs": "2 large",
    "vegetable_oil": "100 g",
    "whole_milk": "55 g",
    "vanilla_extract": "5 g"
  },
  "toppings": {
    "chocolate_chips": "100 g",
    "walnuts": "100 g (optional)"
  }
}
```

**instructions.json:**
```json
{
  "directions": [
    "Preheat oven to 180¬∞C.",
    "Mash banana in a bowl.",
    "Combine banana and other wet ingredients together in the same bowl.",
    "Mix all dry ingredients together in a separate bowl.",
    "Use a whisk to combine dry mixture into wet mixture until smooth.",
    "Pour mixture into a greased/buttered loaf pan.",
    "Place the pan into preheated oven on the middle rack and bake for about 60 minutes or until a toothpick comes out clean.",
    "Enjoy!"
  ]
}
```