## Coding challenge Part I (~45 min) - Using SAP's LLM Orchestration Service

This notebook will guide you in exploring some of SAP's generative AI capabilities around the LLM Orchestration Service.
You will use Python and SAP's `generative-ai-hub-sdk` library to interact with LLMs and the Orchestration Service.

(Note: SDKs are also available for JavaScript and Java) 


### SAP LLM Orchestration Service Overview

![Orchestration Pipeline Modules](./img/orchestration_pipeline.png)

**SAP LLM Orchestration Service: One API, multiple models, plus often-required business features**

- Large Language Model (LLM) Access with the same API
  - azure-openai (e.g. gpt-4o)
  - opensource (e.g. Meta Llama3, Mistral AI Mixtral, IBM Granite)
  - gcp-vertexai (e.g. Gemini 1.5 Flash)
  - aws-bedrock (e.g. Anthropic Claude 3.5 Sonnet, Amazon Titan Text Lite)
- Prompt Templating
  - Enrich user prompts with additional context
  - Reuse templates
- Content Filtering
  - Ensure that user input and model output adhere to safety standards
- Data Masking
  - Ensure that sensitive data is not transferred to third-party LLM providers
- Grounding / Retrieval Augmented Generation
  - Augment LLM prompts with related data fetched from specialized data sources the LLM would not have access to otherwise

### Let's get started: Preliminary steps
- Please form groups of up to 5 people (maximum 5 groups in total).
- Send an email (one per group) to christopher.hagedorn@sap.com to receive credentials file for the day.
- Go through this notebook, run and understand the commands, and make LLM requests using Python and the SDK functions.
  - You will have to fill in some parts to get things working correctly (TODO check if this is true).
- Adjust parameters and play around with the functions as you like
- If you get stuck or have any questions, approach us

### Setup

**1.** Verify that you have a supported and stable version of Python installed on your system
* Open up a terminal session at the root of the cloned repository and check the output of: `python --version`

**2.**  Setup a virtual Python environment to avoid conflicts with other potentially installed packages 
* Create a virtual environment: `python -m venv ENV`. Note: Your Python executable might also be called `python3`
* Activate the environment in your current terminal session
  * Unix-like systems, MacOS and Linux: `source ENV/bin/activate`
  * Windows: run the script `ENV/Scripts/Activate.ps1`
* Make sure that all subsequent steps are executed within the context of this newly created virtual environment
  * your terminal line shows `(ENV)` at the very left-hand side

**3.** Install the [SAP generative AI hub SDK](https://pypi.org/project/generative-ai-hub-sdk/)
* `pip install "generative-ai-hub-sdk[all]"`. If you use this notebook, you can run the following cell.

In [None]:
%pip install "generative-ai-hub-sdk[all]"

**4.** Configure authentication by setting the following environment variables using the demo credentials. If you use this notebook, you can run either of the following cells.

```bash
AICORE_AUTH_URL=<workshop-credentials-file.AICORE_AUTH_URL>
AICORE_BASE_URL=<workshop-credentials-file.AICORE_BASE_URL>
AICORE_CLIENT_ID=<workshop-credentials-file.AICORE_CLIENT_ID>
AICORE_CLIENT_SECRET=<workshop-credentials-file.AICORE_CLIENT_SECRET>
AICORE_ORCHESTRATION_DEPLOYMENT_URL=<workshop-credentials-file.AICORE_ORCHESTRATION_DEPLOYMENT_URL>
AICORE_RESOURCE_GROUP=<workshop-credentials-file.AICORE_RESOURCE_GROUP>
```

In [None]:
# TODO: Place the file shared with you into a folder tmp from the root directory
#       copy the filename and fill the placeholder <workshop-credentials-file> with it in the last line
%pip install python-dotenv
import os
from dotenv import load_dotenv
# loading variables from file
load_dotenv("tmp/dummy.env")

In [None]:
# NOT NEEDED if you have executed the cell above
# TODO: Copy the each environment variable's content from the file shared via email with you,
#       replacing the appropriate placeholder "<workshop-credentials-file.xyz>"
import os
os.environ["AICORE_AUTH_URL"] = "<workshop-credentials-file.AICORE_AUTH_URL>"
os.environ["AICORE_BASE_URL"] = "<workshop-credentials-file.AICORE_BASE_URL>"
os.environ["AICORE_CLIENT_ID"] = "<workshop-credentials-file.AICORE_CLIENT_ID>"
os.environ["AICORE_CLIENT_SECRET"] = "<workshop-credentials-file.AICORE_CLIENT_SECRET>"
os.environ["AICORE_ORCHESTRATION_DEPLOYMENT_URL"] = "<workshop-credentials-file.AICORE_ORCHESTRATION_DEPLOYMENT_URL>"
os.environ["AICORE_RESOURCE_GROUP"] = "<workshop-credentials-file.AICORE_RESOURCE_GROUP>"

**5.** You can now open your preferred development environment for Python and start with the exercises. Ensure that the created virtual environment and the set environment variables are used within your development environment.

We recommend using Jupyter Notebook to explore this tracks exercises. You can install it within your virtual environment using `pip install notebook`. Afterwards you can use the following command from your repository's root to view and edit the Jupyter Notebooks provided in this track: `jupyter notebook exercises/python/`. A browser window should open up automatically.

### Let's get started: Preliminary steps
Go through the subsequent cells to familiarise yourself with the SAP LLM Orchestration Service and how the `generative-ai-hub-sdk` supports you in using the SAP LLM Orchestration Service.

#### Create Prompt Template
The first module as part of the SAP LLM Orchestration Service we will use is templating. 
The templating module provides capabilities to define prompt skeletons that can then be parameterized per inference call.
In the following we can define such a template.
A template consists of 
- *messages*: a sequence of messages, which can include templating syntax
- *defaults*: an optional list of defaults for introduced template parameters

The messages can be of different type:
- *SystemMessage*: A message for priming AI behavior. The system message is usually passed in as the first of a sequence of input messages.
- *UserMessage*: A message from a user.
- *AssistantMessage*: A message of the LLM.

Template parameters are defined within the message string using the following syntax: `{{?param_name}}`

In [None]:
# Please fill in the commented sections of this cell and execute it.
from gen_ai_hub.orchestration.models.message import SystemMessage, UserMessage
from gen_ai_hub.orchestration.models.template import Template, TemplateValue

support_template = Template(
    messages=[
        SystemMessage("You are a helpful AI assistant that specializes in humorously answering business questions by users of SAP software. Every answer you give should end with a funny exclamation. Rhyming is an absolute must and a hard requirement!"),
        UserMessage(
            # TODO add template parameters in the below sentence replacing the ...
            "Please answer with no more than ... sentences and include ... emojis. The user asks the following: {{?user_question}}"
        ),
    ],
    defaults=[
        TemplateValue(name="user_question", value=""),
        # TODO add additional TemplateValues here according to the ones filled above
        # ...,
        # ...
    ],
)

#### Define the LLM

Select a Large Language Model (LLM) that will be used for inference.

In [None]:
from gen_ai_hub.orchestration.models.llm import LLM

gpt_4o_llm = LLM(name="gpt-4o", version="latest", parameters={"max_tokens": 256, "temperature": 0.2})

#### Create the Orchestration Configuration

Now create an orchestration config. The config is the central object that contains information about the modules used during inference run. For now, you will need to add the template and the llm.

In [None]:
# Please fill in the commented sections of this cell and execute it.
from gen_ai_hub.orchestration.models.config import OrchestrationConfig

config = OrchestrationConfig(
    #template=<add template to this call>,
    #llm=<add llm to this call>
)

#### Run the Orchestration Request

Now we want to see the orchestration service in action to summarize a prepared cv file, which is stored in `data/cv.txt`.
In preparation, we first create an orchestration service with the orchestration deployment url that you have been provided `"AICORE_ORCHESTRATION_DEPLOYMENT_URL"`. Second we load the cv text from the file.

Lastly, we can call the orchestration service. Note that the actual template values are now passed to the run method. The TemplateValue name parameter corresponds to the parameter name text provided in the user message string. The parameter to_lang is omitted and the default defined in the PromptTemplate is used.

In [None]:
from gen_ai_hub.orchestration.service import OrchestrationService

service = OrchestrationService(api_url=os.environ["AICORE_ORCHESTRATION_DEPLOYMENT_URL"], config=config)

# TODO add your question
q = ""


result = service.run(
    config=config,
    template_values=[
        TemplateValue(name="user_question", value=q)
    ]
)
print(result.orchestration_result.choices[0].message.content)

#### Adapt Default Template Values

Now that you have successfully posed your first question using some default parameters. Let's adapt the some of the defaults in the next call.

In [None]:
# Please adjust the other two templateValues that you set defaults for.
template_values = [
    TemplateValue(name="user_question", value=q)
    # Add TemplateValue here,
    # Add TemplateValue here
]
result = service.run(template_values=template_values)
print(result.orchestration_result.choices[0].message.content)

#### Adapt Model Parameters

Model parameters are settings that influence how LLM output is created.

Potentially most interesting is the `temperature` setting. Learn about it:

In [None]:
result = service.run(template_values=[
    TemplateValue(
        name="user_question",
        value="What is the `temperature` in LLM configurations? What difference does it make?")
])
print(result.orchestration_result.choices[0].message.content)

Let's crank up the heat:

In [None]:
# TODO change the temperature parameter of the LLM. Use a value between 0.0 to 2.0, try out a high value.

# temperature 

hot_gpt_4o_llm = LLM(name="gpt-4o",
                     version="latest", 
                     parameters={
                         "max_tokens": 256, 
                         "temperature": temperature}
                    )

result = service.run(
    config=OrchestrationConfig(template=support_template, llm=hot_gpt_4o_llm),
    template_values=template_values
)

print(result.orchestration_result.choices[0].message.content)

#### Adapt the Model Being Called

We've been using OpenAI's GPT-4o model. The LLM Orchestration service offers other options, examples are provided in the cell below. For a list of supported models see this [link](https://help.sap.com/doc/generative-ai-hub-sdk/CLOUD/en-US/_reference/README_sphynx.html#supported-models). Note that in our test environment it is not guaranteed that all of these models work.

In [None]:
gemini = LLM(name="gemini-1.5-flash", version="latest", parameters={"max_tokens": 256, "temperature": 1.0})
claude_3_haiku = LLM(name="anthropic--claude-3-haiku", version="latest", parameters={"max_tokens": 256, "temperature": 1.0})
titan_text = LLM(name="amazon--titan-text-lite", version="latest", parameters={"max_tokens": 256, "temperature": 1.0})

In [None]:
# TODO change below call to use different llm 

result = service.run(
    config=OrchestrationConfig(template=support_template, llm=hot_gpt_4o_llm),
    template_values=template_values
)
print(result.orchestration_result.choices[0].message.content)

#### Compare Different LLMs
Let us consider an easy mathematical problem that is taken from the following research paper "Easy Problems That LLMs Get Wrong: https://arxiv.org/abs/2405.19616"

In [None]:
t = Template(messages=[
    SystemMessage("Only answer with the shortest possible response!"),
    UserMessage("Count the number of occurrences of the letter ’L’ in the word ’LOLLAPALOOZA’")
])

models_to_test = ["gemini-1.5-flash", "anthropic--claude-3-haiku", "amazon--titan-text-lite", "ibm--granite-13b-chat", "meta--llama3-70b-instruct"]
version = "latest"
parameters={"max_tokens": 256, "temperature": 0.1}

# TODO iterate the models_to_test and create a new llm for each instance with the version and parameters provided above.
#      for each of the llms make an orchestration call with the template provided above and print the result using the following code
#      result = service.run(config=OrchestrationConfig(template=<TODO replace>, llm=<TODO replace>)
#      print(f"--- {<TODO print name of the model to test>} ---")
#      print(result.orchestration_result.choices[0].message.content.strip('\n'))


#### Building A Feature using a LLM

One of the most powerful features of LLMs is their language understanding. LLMs make surprisingly good translators.
Create a `Translator` class that contains an orchestration service and an orchestration service. The orchestration service is provided as an input during initilization. The orchestration config should contain an llm (of your choice) and a template with System and UserMessages that allow to provide a target language and a text as parameters.
This class should have a function `translate()` that takes a text and a target language as input. These inputs should be used to update the template parameters when making a call to the orchestration service. Ensure that the response is returned to the user.

In [None]:
# TODO fill the code snippets below to create the translator
class Translator:
    def __init__(self, orch_service):
        self.service = None #TODO complete me!
        self.config = OrchestrationConfig() #TODO complete me!

    def translate(self, text, to_lang="Spanish"):
        response = self.service.run(
            config=self.config,
            # TODO add template values
        )

        return response.orchestration_result.choices[0].message.content

In [None]:
translator = Translator(orch_service=service)
result = translator.translate(text="Hello, world!")
print(result)

#### Stretch Goal: Using Content Filtering

As a Business AI Developer, we must ensure that the content being translated by users as part of our software solutions meets certain safety standards.

We can use the Orchestration Service's content filtering features:

In [None]:
from gen_ai_hub.orchestration.models.azure_content_filter import AzureContentFilter, AzureThreshold

# define a content filter with settings for the four categotries hate, violence, seflharm and sexual
input_filter= AzureContentFilter(hate=AzureThreshold.ALLOW_SAFE,
                                 violence=AzureThreshold.ALLOW_SAFE,
                                 self_harm=AzureThreshold.ALLOW_SAFE,
                                 sexual=AzureThreshold.ALLOW_SAFE)

# apply this content filter to the OrchestrationConfig
config_w_filter = OrchestrationConfig(
    template=Template(
        messages=[
            UserMessage("{{?text}}")
        ]
    ),
    llm=LLM(name="gpt-4o",),
    input_filters=[input_filter]
)

In [None]:
# TODO add error handling (try / except ) to the following call to handle the raised exeception. Print the error message.
from gen_ai_hub.orchestration.exceptions import OrchestrationError

# when the service filters content it raises an exception
result = service.run(
    config=config_w_filter,
    template_values=[
        TemplateValue(name="text", value="I hate you")
    ]
)

#### Stretch Goal: Adding Content Filtering to the Translator

In [None]:
# TODO implement a translator that refuses to translate non-safe content
class SafeTranslator:
    def __init__(self, orch_service):
        #self.service = None
        # --- Set a config with filtering ---
        self.config = OrchestrationConfig() #TODO complete me!

    def translate(self, text, to_lang="French"):
        response = self.service.run(
            config=self.config,
            # TODO add template values
        )

        return response.orchestration_result.choices[0].message.content
