# Orchestration Service Tutorial

This notebook demonstrates how to use the SDK to interact with the Orchestration Service, enabling the creation of AI-driven workflows by seamlessly integrating various modules, such as templating, large language models (LLMs), data masking and content filtering. By leveraging these modules, you can build complex, automated workflows that enhance the capabilities of your AI solutions. For more details on configuring and using these modules, please refer to the [Orchestration Service Documentation](https://help.sap.com/docs/ai-launchpad/sap-ai-launchpad/orchestration).

In [1]:
pip install hdbcli pandas generative-ai-hub-sdk[all] langchain lexical_diversity nltk --break-system-packages


Collecting lexical_diversity
  Obtaining dependency information for lexical_diversity from https://files.pythonhosted.org/packages/62/37/d6f959b2255b1321b3d359d902dbd83dec3c7bb6443168d79f8911a94ae3/lexical_diversity-0.1.1-py3-none-any.whl.metadata
  Downloading lexical_diversity-0.1.1-py3-none-any.whl.metadata (4.1 kB)
Downloading lexical_diversity-0.1.1-py3-none-any.whl (117 kB)
   ---------------------------------------- 0.0/117.8 kB ? eta -:--:--
   -------------------------------------- - 112.6/117.8 kB 2.2 MB/s eta 0:00:01
   ---------------------------------------- 117.8/117.8 kB 1.4 MB/s eta 0:00:00
Installing collected packages: lexical_diversity
Successfully installed lexical_diversity-0.1.1


### NOTE: This is the Initial Iteration with Evaluation Metric, work in progress... 

In [2]:
import time
import json
import os
from IPython.display import clear_output
from ai_core_sdk.ai_core_v2_client import AICoreV2Client
from ai_api_client_sdk.models.parameter_binding import ParameterBinding
from enum import Enum
 
# Inline credentials
with open('creds.json') as f:
    credCF = json.load(f)

# Set environment variables
def set_environment_vars(credCF):
    env_vars = {
        'AICORE_AUTH_URL': credCF['url'] + '/oauth/token',
        'AICORE_CLIENT_ID': credCF['clientid'],
        'AICORE_CLIENT_SECRET': credCF['clientsecret'],
        'AICORE_BASE_URL': credCF["serviceurls"]["AI_API_URL"] + "/v2",
        'AICORE_RESOURCE_GROUP': "llm-deployed" 
    }

    for key, value in env_vars.items():
        os.environ[key] = value
        print(value)

# Create AI Core client instance
def create_ai_core_client(credCF):
    set_environment_vars(credCF)  # Ensure environment variables are set
    return AICoreV2Client(
        base_url=os.environ['AICORE_BASE_URL'],
        auth_url=os.environ['AICORE_AUTH_URL'],
        client_id=os.environ['AICORE_CLIENT_ID'],
        client_secret=os.environ['AICORE_CLIENT_SECRET'],
        resource_group=os.environ['AICORE_RESOURCE_GROUP']
    )

ai_core_client = create_ai_core_client(credCF)

: 

Executing <Task pending name='Task-4' coro=<Kernel.dispatch_queue() running at c:\Users\I746414\AppData\Local\anaconda3\Lib\site-packages\ipykernel\kernelbase.py:516> wait_for=<Future pending cb=[Task.task_wakeup()] created at c:\Users\I746414\AppData\Local\anaconda3\Lib\site-packages\tornado\queues.py:248> cb=[IOLoop.add_future.<locals>.<lambda>() at c:\Users\I746414\AppData\Local\anaconda3\Lib\site-packages\tornado\ioloop.py:685] created at c:\Users\I746414\AppData\Local\anaconda3\Lib\asyncio\tasks.py:670> took 105.391 seconds


Executing <Task pending name='Task-4' coro=<Kernel.dispatch_queue() running at c:\Users\I746414\AppData\Local\anaconda3\Lib\site-packages\ipykernel\kernelbase.py:516> wait_for=<Future pending cb=[Task.task_wakeup()] created at c:\Users\I746414\AppData\Local\anaconda3\Lib\site-packages\tornado\queues.py:248> cb=[IOLoop.add_future.<locals>.<lambda>() at c:\Users\I746414\AppData\Local\anaconda3\Lib\site-packages\tornado\ioloop.py:685] created at c:\Users\I746414\AppData\Local\anaconda3\Lib\asyncio\tasks.py:670> took 11.015 seconds


## Initializing the Orchestration Service

⚠️Before using the SDK, you need to set up a virtual deployment of the Orchestration Service. Once deployed, you'll have access to a unique endpoint URL (deploymentUrl).

In [3]:
YOUR_API_URL = "https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/de3f0730ef89b19d"

### Data from Reddit

In [4]:
CLIENT_ID = "a7MrDySej99w6jTAwrOenQ"
SECRET_KEY = "UDOkopBDlgu2ar42d3AaaNtYXgsZBw"
username = "Sad-Parsnip9117"
password = "Redrose@87"

In [5]:
import requests
auth = requests.auth.HTTPBasicAuth(CLIENT_ID,SECRET_KEY)
data = {
    'grant_type':'password',
    'username':username,
    'password':password
}
headers = {'User-Agent':'MyAPI/0.0.1'}
res = requests.post('https://www.reddit.com/api/v1/access_token',auth=auth,data=data,headers=headers)
TOKEN = res.json()['access_token']
headers['Authorization'] = f'bearer {TOKEN}'

In [6]:
res.json()

{'access_token': 'eyJhbGciOiJSUzI1NiIsImtpZCI6IlNIQTI1NjpzS3dsMnlsV0VtMjVmcXhwTU40cWY4MXE2OWFFdWFyMnpLMUdhVGxjdWNZIiwidHlwIjoiSldUIn0.eyJzdWIiOiJ1c2VyIiwiZXhwIjoxNzI4MDQ2NDk5LjYzMjU0NiwiaWF0IjoxNzI3OTYwMDk5LjYzMjU0NSwianRpIjoiOUJEX3hEcEd6OUJlUDd1Z3NHNEEzeUNiaFJteFlnIiwiY2lkIjoiYTdNckR5U2VqOTl3NmpUQXdyT2VuUSIsImxpZCI6InQyXzE4ZjB5NHo2bG8iLCJhaWQiOiJ0Ml8xOGYweTR6NmxvIiwibGNhIjoxNzI1ODY5MTI4NjcyLCJzY3AiOiJlSnlLVnRKU2lnVUVBQURfX3dOekFTYyIsImZsbyI6OX0.XraywzPn2c3yZp-DlRWkCBLmliYkubowIZZxqobsJ-_Mxm-WtTWoRFhBiUERJlrGXuCTNnXriOB9aV7CvmodCp6hCnwX_rW0-ymZaero-cvJueYtM2KpL7nZZBCkaDDx_feq90WOXuWz2bCBcZYkGhKaT45joZzlrtahEuovLpgxfZR_zu8ZBEmH_1Db3E0v9KWvOhGz0p8ZfLIvw4aTSgPd2ZYfQCgB7V_Bk0b6RedVmtnK4QRMwyXeSwy6fTRAexj50bEEanN2YmjVCNTZsF_TwPkSmDbFzqAOJJF31Xdn_FkwzKbxHOwnXKUU3ZwqppziDOfMXSkurzb7767eJQ',
 'token_type': 'bearer',
 'expires_in': 86400,
 'scope': '*'}

In [7]:
reddit_data = requests.get('https://oauth.reddit.com/r/SAPAIcore/',headers=headers).json()['data']['children']

In [8]:
ids = [data['data']['id'] for data in reddit_data]
comments = {}
for i in ids:
    comment_data = requests.get(f'https://oauth.reddit.com/r/SAPAIcore//comments/{i}',headers=headers).json()
    comments[f"{i}"] = []
    def parse_comments(data):
        for j in data:
            comments[f"{i}"].append(j['data']['body'])
            if j['data']['replies'] != '':
                parse_comments(j['data']['replies']['data']['children'])
    if(len(comment_data[1]['data']['children'])):
        parse_comments(comment_data[1]['data']['children'])

In [9]:
extracted_data = []
for post in enumerate(reddit_data):
    post_content = post[1]['data']['selftext']
    
    # Removed anonymization logic
    # if user_intention(post_content):
    row = [int(post[0] + 1)]
    row.append(post[1]['data']['title'])
    row.append(post_content)
    row.append(str(comments[post[1]['data']['id']]))
    print(row[1])
    extracted_data.append(row)

AI core


In [10]:
import pandas as pd
df = pd.DataFrame(extracted_data,columns=["ID", "TITLE", "POST", "COMMENTS"])

df.to_csv("output1.csv", index=False)

from langchain.document_loaders.csv_loader import CSVLoader
from langchain.text_splitter import CharacterTextSplitter

loader = CSVLoader(
    file_path="output1.csv",
    csv_args={
        "delimiter": ",",
        "quotechar": '"'
    },
)

# Process data
text_documents = loader.load()
text_chunks = text_documents
print(f"Number of document chunks: {len(text_documents)}")

Number of document chunks: 1


In [11]:
# HC Vector Engine
from hdbcli import dbapi
from langchain_community.vectorstores.hanavector import HanaDB

host_address = "ec41b786-96de-467b-9ff5-db725945f89c.hna0.prod-us10.hanacloud.ondemand.com"
hdb_user = "DBADMIN"
hdb_password = "9hEW4UK86Fdt"

connection = dbapi.connect(
    host_address,
    port="443",
    user=hdb_user,
    password=hdb_password,
    autocommit=True,
    sslValidateCertificate=False,
)

In [13]:
from gen_ai_hub.proxy.langchain.init_models import init_embedding_model
embeddings = init_embedding_model('text-embedding-ada-002')
db = HanaDB(
    embedding=embeddings, connection=connection, table_name="SUBREDDIT_POST_ORC4"
)

In [14]:
# Delete already existing documents from the table
db.delete(filter={})

# add the loaded document chunks
db.add_documents(text_chunks)

[]

### Data Masking

The Data Masking Module anonymizes or pseudonymizes personally identifiable information (PII) before it is processed by the LLM module. When data is anonymized, all identifying information is replaced with placeholders (e.g., MASKED_ENTITY), and the original data cannot be recovered, ensuring that no trace of the original information is retained. In contrast, pseudonymized data is substituted with unique placeholders (e.g., MASKED_ENTITY_ID), allowing the original information to be restored if needed. In both cases, the masking module identifies sensitive data and replaces it with appropriate placeholders before further processing.

### Content Filtering

The Content Filtering Module can be configured to filter both the input to the LLM module (input filter) and the output generated by the LLM (output filter). The module uses predefined classification services to detect inappropriate or unwanted content, allowing flexible configuration through customizable thresholds. These thresholds can be set to control the sensitivity of filtering, ensuring that content meets desired standards before it is processed or returned as output.

In [15]:
import pandas as pd
from gen_ai_hub.orchestration.utils import load_text_file
from gen_ai_hub.orchestration.models.data_masking import DataMasking
from gen_ai_hub.orchestration.models.sap_data_privacy_integration import SAPDataPrivacyIntegration, MaskingMethod, ProfileEntity
from gen_ai_hub.orchestration.models.azure_content_filter import AzureContentFilter
from gen_ai_hub.orchestration.models.config import OrchestrationConfig, Template
from gen_ai_hub.orchestration.models.message import SystemMessage, UserMessage
from gen_ai_hub.orchestration.models.template import Template, TemplateValue
from gen_ai_hub.orchestration.models.llm import LLM
from gen_ai_hub.orchestration.service import OrchestrationService

# Setup Data Masking
data_masking = DataMasking(
    providers=[
        SAPDataPrivacyIntegration(
            method=MaskingMethod.ANONYMIZATION,  # or MaskingMethod.PSEUDONYMIZATION
            entities=[
                ProfileEntity.EMAIL,
                ProfileEntity.PHONE,
                ProfileEntity.PERSON,
                ProfileEntity.ORG,
                ProfileEntity.LOCATION
            ]
        )
    ]
)

# Define content filters
input_filter = AzureContentFilter(
    hate=0,
    sexual=4,
    self_harm=0,
    violence=2,
)
output_filter = AzureContentFilter(
    hate=0,
    sexual=4,
    self_harm=0,
    violence=2,
)

# Models to query
models = ["gpt-4o", "gemini-1.5-pro", "meta--llama3-70b-instruct"]
responses = {}

# Simulate Query and Context
query = "docker config"
docs = db.similarity_search_with_score(query, k=5)
context = ''.join(str(docs))

for model_name in models:
    # Orchestration Config Setup for each model
    config = OrchestrationConfig(
        template=Template(
            messages=[
                SystemMessage(
                    """You are a helpful AI assistant. You are provided with the top 5 RAG results (Context) from a Reddit post. Use the following Context to answer the User_input by selecting the most relevant response. Ensure you provide an answer **exactly as it appears** in the Context, with no alterations or variations. Do not generate any new information.\
            NOTE: Use only the provided Context, and do not alter it in any way.Don't hallucinate anything apart from Context provided"""),
                UserMessage("User_input: {{?text}}, Context: {{?context}}"),
            ]
        ),
        llm=LLM(name=model_name),
        data_masking=data_masking,  # Ensure masking is part of the config
        input_filters=[input_filter],
        output_filters=[output_filter]
    )

    orchestration_service = OrchestrationService(api_url=YOUR_API_URL, config=config)

    # Execute Orchestration Service
    result = orchestration_service.run(
        config=config,
        template_values=[
            TemplateValue(name="text", value=query),
            TemplateValue(name="context", value=context)
        ]
    )
    
    # Check for masked entities in the response
    masked_result = result.orchestration_result.choices[0].message.content
    
    # Store the masked result in the responses dictionary
    responses[model_name] = masked_result

# Print the final results after data masking and content filtering
for model, result in responses.items():
    print(f"Masked Result from {model}:", result)

# Accessing individual results separately for evaluation metrics
gpt_4o_result = responses["gpt-4o"]
gemini_1_5_pro_result = responses["gemini-1.5-pro"]
meta_llama3_70b_instruct_result = responses["meta--llama3-70b-instruct"]


Masked Result from gpt-4o: **To resolve the issue, follow these steps to create a Docker registry in MASKED_ORG via the AI Launchpad:**

1. Open the AI Launchpad.
2. Navigate to the MASKED_ORG Administration section.
3. Select the Docker Registry option.
4. Enter the desired name for your Docker registry.
5. Provide your Docker credentials in the following JSON format:

{
    ".dockerconfigjson": "{\"auths\":{\"https://hub.docker.com\":{\"username\":\"myusername\",\"password\":\"myaccesstoken\"}}}"
}

Make sure to replace myusername with your actual Docker username and myaccesstoken with your Docker access token.

Once you've provided these details, you should be able to successfully add the Docker registry to your MASKED_ORG Launchpad.
Masked Result from gemini-1.5-pro: {  \n\n".dockerconfigjson": "{\\\\\\\\"auths\\\\\\\\":{\\\\\\\\"https://hub.docker.com\\\\\\\\":{\\\\\\\\"username\\\\\\\\":\\\\\\\\"myusername\\\\\\\\",\\\\\\\\"password\\\\\\\\":\\\\\\\\"myaccesstoken\\\\\\\\"}}}"\\n

In [16]:
gpt_4o_result

'**To resolve the issue, follow these steps to create a Docker registry in MASKED_ORG via the AI Launchpad:**\n\n1. Open the AI Launchpad.\n2. Navigate to the MASKED_ORG Administration section.\n3. Select the Docker Registry option.\n4. Enter the desired name for your Docker registry.\n5. Provide your Docker credentials in the following JSON format:\n\n{\n    ".dockerconfigjson": "{\\"auths\\":{\\"https://hub.docker.com\\":{\\"username\\":\\"myusername\\",\\"password\\":\\"myaccesstoken\\"}}}"\n}\n\nMake sure to replace myusername with your actual Docker username and myaccesstoken with your Docker access token.\n\nOnce you\'ve provided these details, you should be able to successfully add the Docker registry to your MASKED_ORG Launchpad.'

In [17]:
gemini_1_5_pro_result

'{  \\n\\n".dockerconfigjson": "{\\\\\\\\\\\\\\\\"auths\\\\\\\\\\\\\\\\":{\\\\\\\\\\\\\\\\"https://hub.docker.com\\\\\\\\\\\\\\\\":{\\\\\\\\\\\\\\\\"username\\\\\\\\\\\\\\\\":\\\\\\\\\\\\\\\\"myusername\\\\\\\\\\\\\\\\",\\\\\\\\\\\\\\\\"password\\\\\\\\\\\\\\\\":\\\\\\\\\\\\\\\\"myaccesstoken\\\\\\\\\\\\\\\\"}}}"\\\\n\\n}\n'

In [18]:
meta_llama3_70b_instruct_result

'{  \n.dockerconfigjson": "{\\\\\\\\"auths\\\\\\\\":{\\\\\\\\\\\\"https://hub.docker.com\\\\\\\\":{\\\\\\\\\\\\"username\\\\\\\\":\\\\\\\\\\\\"myusername\\\\\\\\\\",\\\\\\\\\\\\"password\\\\\\\\":\\\\\\\\\\\\"myaccesstoken\\\\\\\\\\\\"}}}"'

## Evaluation Metrics

In [19]:
import nltk
from nltk.tokenize import word_tokenize
from lexical_diversity import lex_div as ld
import numpy as np

nltk.data.path.append('nltk_data')

# Lexical richness evaluation functions  
def calculate_rttr(text):
    words = word_tokenize(text)
    num_tokens = len(words)
    types = len(set(words))
    if num_tokens == 0:
        return 0  # Avoid division by zero
    rttr = types / np.sqrt(num_tokens)
    return rttr

def calculate_maas(text):
    words = word_tokenize(text)
    num_tokens = len(words)
    types = len(set(words))
    if num_tokens == 0 or types == 0:
        return float('inf')  # Return a large number for very poor results
    maas = (np.log(num_tokens) - np.log(types)) / (np.log(num_tokens)**2)
    return maas

def calculate_mattr(text, window_size=50):
    words = word_tokenize(text)
    return ld.mattr(words, window_size)

def calculate_mtld(text, ttr_threshold=0.72):
    words = word_tokenize(text)
    return ld.mtld(words, ttr_threshold)

def rttr_result(rttr):
    if rttr > 7.0:
        return 'good'
    elif 5.0 <= rttr < 7.0:
        return 'intermediate'
    else:  # rttr < 5.0
        return 'bad'

def maas_result(maas):
    if maas < 0.02:
        return 'good'
    elif 0.02 <= maas <= 0.04:
        return 'intermediate'
    elif maas > 0.04:
        return 'bad'

def mattr_result(mattr):
    if mattr > 0.85:
        return 'good'
    elif 0.65 < mattr <= 0.85:
        return 'intermediate'
    else:  # mattr <= 0.65
        return 'bad'

def mtld_result(mtld):
    if mtld > 80:
        return 'good'
    elif 60 < mtld <= 80:
        return 'intermediate'
    else:  # mtld <= 60
        return 'bad'

def evaluate_lexical_richness(text):  
    rttr = calculate_rttr(text)  
    maas = calculate_maas(text)  
    mattr = calculate_mattr(text)  
    mtld = calculate_mtld(text)  

    # Categorization based on thresholds  
    rttr_category = rttr_result(rttr)
    maas_category = maas_result(maas)
    mattr_category = mattr_result(mattr)
    mtld_category = mtld_result(mtld)

    # Overall evaluation: majority voting  
    categories = [rttr_category, maas_category, mattr_category, mtld_category]  
    overall_category = max(set(categories), key=categories.count)  
    
    # print({'RTTR': {'category': rttr_category, 'value': rttr},  
    #     'Maas': {'category': maas_category, 'value': maas},  
    #     'MATTR': {'category': mattr_category, 'value': mattr},  
    #     'MTLD': {'category': mtld_category, 'value': mtld},  
    #     'Overall': overall_category })
    
    return {  
        'RTTR': {'category': rttr_category, 'value': rttr},  
        'Maas': {'category': maas_category, 'value': maas},  
        'MATTR': {'category': mattr_category, 'value': mattr},  
        'MTLD': {'category': mtld_category, 'value': mtld},  
        'Overall': overall_category  
    }


In [20]:
from prettytable import PrettyTable
# Evaluate lexical richness for each response  
model_metrics = {
    "gpt-4o": evaluate_lexical_richness(gpt_4o_result),
    "gemini-1.5-pro": evaluate_lexical_richness(gemini_1_5_pro_result),
    "meta--llama3-70b-instruct": evaluate_lexical_richness(meta_llama3_70b_instruct_result)
} 

# Function to collect and format metrics for all models into a table  
def format_metrics_to_table(model_metrics: dict):  
    table = PrettyTable()  
    table.field_names = ["Model", "RTTR (Category)", "RTTR (Value)", "Maas (Category)", "Maas (Value)",  
                         "MATTR (Category)", "MATTR (Value)", "MTLD (Category)", "MTLD (Value)", "Overall"]  
 
    for model_name, metrics in model_metrics.items():  
        table.add_row([  
            model_name,  
            metrics['RTTR']['category'],   
            round(metrics['RTTR']['value'], 4),  # Round for better display
            metrics['Maas']['category'],  
            round(metrics['Maas']['value'], 4),  
            metrics['MATTR']['category'],  
            round(metrics['MATTR']['value'], 4),  
            metrics['MTLD']['category'],  
            round(metrics['MTLD']['value'], 4), 
            metrics['Overall']  
        ])  

    print(table)
 
# Format and display metrics in a table  
format_metrics_to_table(model_metrics)


+---------------------------+-----------------+--------------+-----------------+--------------+------------------+---------------+-----------------+--------------+--------------+
|           Model           | RTTR (Category) | RTTR (Value) | Maas (Category) | Maas (Value) | MATTR (Category) | MATTR (Value) | MTLD (Category) | MTLD (Value) |   Overall    |
+---------------------------+-----------------+--------------+-----------------+--------------+------------------+---------------+-----------------+--------------+--------------+
|           gpt-4o          |   intermediate  |    6.1649    |   intermediate  |    0.0277    |       bad        |     0.5807    |       bad       |   15.9779    | intermediate |
|       gemini-1.5-pro      |       bad       |    2.4797    |       bad       |    0.0686    |       bad        |     0.3617    |       bad       |    6.8187    |     bad      |
| meta--llama3-70b-instruct |       bad       |     2.44     |       bad       |    0.0699    |       bad