# Introduction

This lesson demonstrates how to use the SDK to interact with the Orchestration Service, enabling the creation of AI-driven workflows by seamlessly integrating various modules, such as templating, large language models (LLMs), data masking and content filtering. By leveraging these modules, you can build complex, automated workflows that enhance the capabilities of your AI solutions. For more details on configuring and using these modules, please refer to the [Orchestration Service Documentation](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/orchestration?locale=en-US).


## Setup and configuration

The following Python modules are to be installed during this hands-on introduction. 

#### **generative-ai-hub-sdk**

With this SAP python SDK you can leverage the power of generative Models like chatGPT available in SAP's generative AI Hub.


<br>

> **Note:** Jupyter Notebook kernel restart required after package installation.


</br>

#### Install Python packages

Run the following package installations. **pip** is the package installer for Python. You can use pip to install packages from the Python Package Index and other indexes.

In [None]:
!pip install generative-ai-hub-sdk --break-system-packages
!pip install ipywidgets --break-system-packages
!pip install pandas --break-system-packages

# kernel restart required!!!


### Verify SDK version

Run the following:

In [2]:
!pip show generative-ai-hub-sdk

Name: generative-ai-hub-sdk
Version: 4.0.0
Summary: generative AI hub SDK
Home-page: https://www.sap.com/
Author: SAP SE
Author-email: 
License: SAP DEVELOPER LICENSE AGREEMENT
Location: /Users/I064538/Documents/SAP/dev/sap-samples/sap-genai-hub-with-sap-hana-cloud-vector-engine/gen-ai-orch-venv/lib/python3.12/site-packages
Requires: ai-core-sdk, click, dacite, openai, overloading, packaging, pydantic, requests
Required-by: 


#### Restart Python kernel

The Python kernel needs to be restarted before continuing. 

> ![title](./images/config_001.png)

</br>

> **Note** This will take a couple of minutes.

In [None]:
# Test embeddings

from gen_ai_hub.proxy.native.openai import embeddings

response = embeddings.create(
    input="SAP Generative AI Hub is awesome!",
    model_name="text-embedding-ada-002"
    
)
print(response.data)

In [None]:
# YOUR_API_URL = "https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d26ce2f31ba6ff8a"

In [None]:
# import json

# from gen_ai_hub.orchestration.utils import load_text_file
# from gen_ai_hub.orchestration.models.data_masking import DataMasking
# from gen_ai_hub.orchestration.models.sap_data_privacy_integration import SAPDataPrivacyIntegration, MaskingMethod, ProfileEntity
# from gen_ai_hub.orchestration.models.azure_content_filter import AzureContentFilter
# from gen_ai_hub.orchestration.models.config import OrchestrationConfig, Template
# from gen_ai_hub.orchestration.models.message import SystemMessage, UserMessage
# from gen_ai_hub.orchestration.models.template import Template, TemplateValue
# from gen_ai_hub.orchestration.models.llm import LLM
# from gen_ai_hub.orchestration.service import OrchestrationService

In [None]:
"""PERSON: Represents personal names.
ORG: Represents organizational names.
UNIVERSITY: Represents educational institutions.
LOCATION: Represents geographical locations.
EMAIL: Represents email addresses.
PHONE: Represents phone numbers.
ADDRESS: Represents physical addresses.
SAP_IDS_INTERNAL: Represents internal SAP identifiers.
SAP_IDS_PUBLIC: Represents public SAP identifiers.
URL: Represents URLs.
USERNAME_PASSWORD: Represents usernames and passwords.
NATIONAL_ID: Represents national identification numbers.
IBAN: Represents International Bank Account Numbers.
SSN: Represents Social Security Numbers.
CREDIT_CARD_NUMBER: Represents credit card numbers.
PASSPORT: Represents passport numbers.
DRIVING_LICENSE: Represents driving license numbers.
NATIONALITY: Represents nationality information.
RELIGIOUS_GROUP: Represents religious group affiliation.
POLITICAL_GROUP: Represents political group affiliation.
PRONOUNS_GENDER: Represents pronouns and gender identity.
GENDER: Represents gender information.
SEXUAL_ORIENTATION: Represents sexual orientation.
TRADE_UNION: Represents trade union membership.
SENSITIVE_DATA: Represents any other sensitive information."""

In [None]:
# # Load the JSON configuration file
# file_path = 'data/ModelOrchConfig_1.json'
# def load_json_config(file_path):
#     with open(file_path, 'r') as file:
#         return json.load(file)

# # Extract information from the JSON config and generate the Python configuration
# def update_python_config(json_config):
#     # Extract relevant information from the JSON
#     llm_model_name = json_config['module_configurations']['llm_module_config']['model_name']
#     input_filter_config = json_config['module_configurations']['filtering_module_config']['input']['filters'][0]['config']
#     output_filter_config = json_config['module_configurations']['filtering_module_config']['output']['filters'][0]['config']
#     masking_entities = json_config['module_configurations']['masking_module_config']['masking_providers'][0]['entities']
#     system_message_content = json_config['module_configurations']['templating_module_config']['template'][0]['content']
#     user_message_content = json_config['module_configurations']['templating_module_config']['template'][1]['content']
#     # Generate the input and output filter objects based on the extracted configuration
#     # system_message_content = "greet the user {{?user}} \nyou are an python agent give a code for the scenario {{?code}} "

#     input_filter = AzureContentFilter(
#         hate=input_filter_config['Hate'],
#         sexual=input_filter_config['Sexual'],
#         self_harm=input_filter_config['SelfHarm'],
#         violence=input_filter_config['Violence']
#     )
#     output_filter = AzureContentFilter(
#         hate=output_filter_config['Hate'],
#         sexual=output_filter_config['Sexual'],
#         self_harm=output_filter_config['SelfHarm'],
#         violence=output_filter_config['Violence']
#     )
#     # Generate the data masking providers based on the extracted entities
#     masking_providers = []
#     for entity in masking_entities:
#         entity_type = entity['type']
#         if entity_type == 'profile-email':
#             masking_providers.append(ProfileEntity.EMAIL)
#         elif entity_type == 'profile-phone':
#             masking_providers.append(ProfileEntity.PHONE)
#         elif entity_type == 'profile-gender':
#             masking_providers.append(ProfileEntity.GENDER)
#         elif entity_type == 'profile-location':
#             masking_providers.append(ProfileEntity.LOCATION)
#         elif entity_type == 'profile-nationalid':
#             masking_providers.append(ProfileEntity.NATIONAL_ID)
#         elif entity_type == 'profile-nationality':
#             masking_providers.append(ProfileEntity.NATIONALITY)
#         elif entity_type == 'profile-org':
#             masking_providers.append(ProfileEntity.ORG)
#         elif entity_type == 'profile-person':
#             masking_providers.append(ProfileEntity.PERSON)
#         elif entity_type == 'profile-university':
#             masking_providers.append(ProfileEntity.UNIVERSITY)
#         elif entity_type == 'profile-url':
#             masking_providers.append(ProfileEntity.URL)
#         elif entity_type == 'profile-username-password':
#             masking_providers.append(ProfileEntity.USERNAME_PASSWORD)

#     # Generate the data masking config
#     data_masking = DataMasking(
#         providers=[SAPDataPrivacyIntegration(
#             method=MaskingMethod.ANONYMIZATION,
#             entities=masking_providers
#         )]
#     )

#     # Return the updated Python config
#     return OrchestrationConfig(
#         template=Template(
#             messages=[
#                 SystemMessage(system_message_content),
#                 UserMessage(user_message_content),
#             ]),
#         llm=LLM(name=llm_model_name),
#         data_masking=data_masking,
#         input_filters=[input_filter],
#         output_filters=[output_filter]
#     )

In [None]:
# # Assuming you have a JSON file path 'ModelOrchConfig_1.json'
# json_file_path = 'data/ModelOrchConfig_1.json'
# json_config = load_json_config(json_file_path)

# # Update the configuration
# config_ = update_python_config(json_config)

# orchestration_service = OrchestrationService(api_url=YOUR_API_URL, config=config_)
# # orchestration_service = OrchestrationService(api_url=YOUR_API_URL)

# # Execute Orchestration Service
# result = orchestration_service.run(
#     config=config_,
#     template_values=[
#         TemplateValue(name="candidate_resume", value="John Doe \n1234 Data St, San Francisco, CA 94101 \n(123) 456-7890 \njohndoe@email.com \nLinkedIn Profile \nGitHub Profile \nObjective \nDetail-oriented Data Scientist with 3+ years of experience in data analysis, statistical modeling, and machine learning. Seeking to leverage expertise in predictive modeling and data visualization to help drive data-informed decision-making at [Company Name]. \nEducation \nMaster of Science in Data Science \nUniversity of California, Berkeley \nGraduated: May 2021 \nBachelor of Science in Computer Science \nUniversity of California, Los Angeles \nGraduated: May 2019 \nTechnical Skills \nProgramming Languages: Python, R, SQL, Java \nData Analysis & Visualization: Pandas, NumPy, Matplotlib, Seaborn, Tableau \nMachine Learning: Scikit-learn, TensorFlow, Keras, XGBoost \nBig Data Technologies: Hadoop, Spark \nDatabases: MySQL, PostgreSQL \nVersion Control: Git \nProfessional Experience  \nData Scientist \nDataCorp Inc., San Francisco, CA \nJune 2021 – Present \nDeveloped predictive models to optimize marketing campaigns, which increased ROI by 20%. \nConducted in-depth data analysis using Python and SQL to identify trends and patterns in large datasets. \nCollaborated with cross-functional teams to implement data-driven strategies that improved customer satisfaction scores by 15%. \nCreated interactive dashboards using Tableau to visualize KPIs for stakeholders.  \nData Analyst Intern \nAnalytics Solutions, Los Angeles, CA \nJune 2020 – August 2020 \nAnalyzed large datasets to identify opportunities for business growth and improvement.\nAssisted in the development of automated reporting tools using Python and Excel. \nWorked with data visualization tools to create insightful reports for management. \nProjects \nCustomer Segmentation Analysis \nConducted K-means clustering on customer data to segment the customer base into distinct groups, enabling targeted marketing strategies. \nPredictive Stock Price Modeling \nBuilt a predictive model using time series analysis to forecast stock prices, achieving an accuracy rate of 85%. \nSentiment Analysis on Social Media \nImplemented natural language processing techniques to analyze sentiment from tweets, providing insights into public opinion on various topics. \nCertifications \nCertified Data Scientist (CDS) – Data Science Council of America \nMachine Learning Specialization – Coursera by Stanford University \nProfessional Affiliations \nMember, Association for Computing Machinery (ACM) \nMember, Data Science Society \nReferences \nAvailable upon request. \nPersonal Interests \n- I absolutely love exploring new technologies and working on innovative projects. \n- I enjoy reading books, especially on artificial intelligence and machine learning. \n- I hate people who are dishonest and unreliable. \n- I love traveling and experiencing new cultures. \n- I enjoy playing video games, especially competitive ones. \n- I hate being stuck in a routine; I always seek new challenges and growth opportunities. \n- I hate working in Azure cloud -\"Azure cloud is the most irritating platform i have ever used\" \n\n")
#     ]
# )

# # Check for masked entities in the response
# masked_result = result.orchestration_result.choices[0].message.content
# print(masked_result)

## Initializing the Orchestration Service

⚠️Before using the SDK, you need to set up a virtual deployment of the Orchestration Service. Once deployed, you'll have access to a unique endpoint URL (deploymentUrl).

In [6]:
YOUR_API_URL = "https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d26ce2f31ba6ff8a"

### Step 1: Define the Template and Default Input Values

The Template class is used to define structured message templates for generating dynamic interactions with language models. In this example, the template is designed for a translation assistant, allowing users to specify a language and text for translation.

In [7]:
from gen_ai_hub.orchestration.models.message import SystemMessage, UserMessage
from gen_ai_hub.orchestration.models.template import Template, TemplateValue

template = Template(
    messages=[
        SystemMessage("You are a helpful translation assistant."),
        UserMessage(
            "Translate the following text to {{?to_lang}}: {{?text}}"
        ),
    ],
    defaults=[
        TemplateValue(name="to_lang", value="German"),
    ],
)

>This template can be used to create translation requests where the language and text to be translated are specified dynamically. The placeholders in the UserMessage will be replaced with the actual values provided at runtime, and the default value for the language is set to German.

### Step 2: Define the LLM

The LLM class is used to configure and initialize a language model for generating text based on specific parameters. In this example, we'll use the GPT-4o model to perform the translation task.

ℹ️Note that virtual deployment of the language model is managed automatically by the Orchestration Service, so no additional deployment setup is required on your part.

In [8]:
from gen_ai_hub.orchestration.models.llm import LLM

llm = LLM(name="gpt-4o", version="latest", parameters={"max_tokens": 256, "temperature": 0.2})

>This configuration initializes the language model to use the gpt-4o variant with the latest updates. The model will generate responses up to 256 tokens in length and produce more predictable and focused output due to the low temperature setting.

### Step 3: Create the Orchestration Configuration

The OrchestrationConfig class is used to create a configuration that integrates various components, such as templates and language models, into a unified orchestration setup. This configuration specifies how these components work together to achieve the desired workflow.

In [None]:
from gen_ai_hub.orchestration.models.config import OrchestrationConfig

config = OrchestrationConfig(
    template=template,
    llm=llm,
)

### Step 4: Run the Orchestration Request

The OrchestrationService class is used to interact with the orchestration service by providing a configuration and invoking its operations. This service handles the execution of workflows defined by the provided configuration and processes inputs accordingly.

In [None]:
from gen_ai_hub.orchestration.service import OrchestrationService

orchestration_service = OrchestrationService(api_url=YOUR_API_URL, config=config)

Call the run method with the required template_values. The service will process the input according to the configuration and return the result.

In [None]:
result = orchestration_service.run(template_values=[
    TemplateValue(name="text", value="The Orchestration Service is working!")
])
print(result.orchestration_result.choices[0].message.content)

### Data Masking

The Data Masking Module anonymizes or pseudonymizes personally identifiable information (PII) before it is processed by the LLM module. When data is anonymized, all identifying information is replaced with placeholders (e.g., MASKED_ENTITY), and the original data cannot be recovered, ensuring that no trace of the original information is retained. In contrast, pseudonymized data is substituted with unique placeholders (e.g., MASKED_ENTITY_ID), allowing the original information to be restored if needed. In both cases, the masking module identifies sensitive data and replaces it with appropriate placeholders before further processing.

In [None]:
from gen_ai_hub.orchestration.utils import load_text_file
from gen_ai_hub.orchestration.models.data_masking import DataMasking
from gen_ai_hub.orchestration.models.sap_data_privacy_integration import SAPDataPrivacyIntegration, MaskingMethod, \
    ProfileEntity

data_masking = DataMasking(
    providers=[
        SAPDataPrivacyIntegration(
            method=MaskingMethod.ANONYMIZATION,  # or MaskingMethod.PSEUDONYMIZATION
            entities=[
                ProfileEntity.EMAIL,
                ProfileEntity.PHONE,
                ProfileEntity.PERSON,
                ProfileEntity.ORG,
                ProfileEntity.LOCATION
            ]
        )
    ]
)

config = OrchestrationConfig(
    template=Template(
        messages=[
            SystemMessage("You are a helpful AI assistant."),
            UserMessage("Summarize the following CV in 10 sentences: {{?orgCV}}"),
        ]
    ),
    llm=LLM(
        name="gpt-4o",
    ),
    data_masking=data_masking
)

cv_as_string = load_text_file("data/cv.txt")

result = orchestration_service.run(
    config=config,
    template_values=[
        TemplateValue(name="orgCV", value=cv_as_string)
    ]
)

In [None]:
print(result.orchestration_result.choices[0].message.content)

### Content Filtering

The Content Filtering Module can be configured to filter both the input to the LLM module (input filter) and the output generated by the LLM (output filter). The module uses predefined classification services to detect inappropriate or unwanted content, allowing flexible configuration through customizable thresholds. These thresholds can be set to control the sensitivity of filtering, ensuring that content meets desired standards before it is processed or returned as output.

In [None]:
from gen_ai_hub.orchestration.models.azure_content_filter import AzureContentFilter, AzureThreshold

input_filter= AzureContentFilter(hate=AzureThreshold.ALLOW_SAFE,
                                  violence=AzureThreshold.ALLOW_SAFE,
                                  self_harm=AzureThreshold.ALLOW_SAFE,
                                  sexual=AzureThreshold.ALLOW_SAFE)
output_filter = AzureContentFilter(hate=AzureThreshold.ALLOW_SAFE,
                                   violence=AzureThreshold.ALLOW_SAFE_LOW,
                                   self_harm=AzureThreshold.ALLOW_SAFE_LOW_MEDIUM,
                                   sexual=AzureThreshold.ALLOW_ALL)

config = OrchestrationConfig(
    template=Template(
        messages=[
            SystemMessage("You are a helpful AI assistant."),
            UserMessage("{{?text}}"),
        ]
    ),
    llm=LLM(
        name="gpt-4o",
    ),
    input_filters=[input_filter],
    output_filters=[output_filter]
)