<a target="_blank" href="https://colab.research.google.com/github/amanichopra/sap-genai-hub/blob/main/native_llm_clients.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Preparation

## Install Libraries

In [10]:
!pip install "generative-ai-hub-sdk[all]"
!pip install "numpy<2.0.0" --force-reinstall

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Defaulting to user installation because normal site-packages is not writeable
Collecting numpy<2.0.0
  Using cached numpy-1.26.4-cp311-cp311-macosx_11_0_arm64.whl.metadata (114 kB)
Using cached numpy-1.26.4-cp311-cp311-macosx_11_0_arm64.whl (14.0 MB)
Installing collected packages: numpy
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.17.0 requires torch==2.2.0, which is not installed.[0m[31m
[0mSuccessfully installed numpy-1.26.4

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: 

Now, make sure to reset the runtime. In Google Colab, you can do this by clicking `Runtime` and `Restart Session`, as shown here:

<img src="assets/colab_restart_session.png" style="width:500px">

Now, you can continue by running the below cells. The packages have already been installed into the runtime before restarting.

## Authentication

Before requests to orchestration can be issued, we need to provide authentication details to the SDK. This can be done either via a configuration file or via the environment. Make sure to read the [Generative AI Hub SDK docs](https://help.sap.com/doc/generative-ai-hub-sdk/CLOUD/en-US/index.html) for more details. Below you will find an example for authenticating via environment variables using this very notebook. Ensure to store credentials in a file called `env_vars.env` file for the below command to work. If using Google Colab, you can place this file in the project folder by clicking the folder icon on the left and dropping the file in the workspace as shown:

<img src="./assets/upload_env.png" style="width:500px">

*Note that the above steps only hold if Colab is your runtime environment. If running these notebooks locally, you can configure the kernel yourself and install requirements via CLI.*

In [14]:
import os
from dotenv import load_dotenv

load_dotenv(dotenv_path='env_vars.env')

True

# Using the Generative AI Hub's Native LLM Clients

In this exercise, we will explore using the Generative AI Hub SDK to interact with LLMs via their native clients. Note that the prequisite is that you must have deployments for the models you want to consume.

### Sending a Request to an Embedding Model

The embedding returned by the `OpenAIEmbeddings` function is of shape 1x1536. You can view the function API [here](https://api.ai.prod.us-east-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d00185d3c6a8fe73). You can view more about the underlying embedding model by OpenAI [here](https://api.ai.prod.us-east-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d00185d3c6a8fe73).

In [12]:
from gen_ai_hub.proxy.langchain.openai import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings(proxy_model_name='text-embedding-3-small')
response = embedding_model.embed_query('Every decoding is another encoding.')
print(response)

[0.027271129190921783, -0.010424341075122356, -0.01652081124484539, -0.009508829563856125, 0.025453979149460793, 0.025717534124851227, 0.0913846418261528, -0.016936952248215675, 0.006890607066452503, -0.041586391627788544, 0.0588146448135376, -0.014398490078747272, -0.0453871488571167, -0.019863814115524292, 0.0434451550245285, -0.038118548691272736, 0.013947670347988605, 0.034095846116542816, 0.0036343010142445564, 0.050075676292181015, 0.006803911179304123, -0.04849433898925781, 0.06935688853263855, 0.03290290758013725, 0.03054477460682392, -0.02793695591390133, -0.004542876500636339, 0.07751326262950897, 0.04097605124115944, -0.03459521755576134, -0.010237077251076698, -0.028935695067048073, 0.016049183905124664, 0.018157633021473885, 0.045276179909706116, 0.012186005711555481, -0.007379573304206133, -0.01517528761178255, -0.012650696560740471, 0.024788152426481247, 0.035177815705537796, -0.04363935440778732, -0.049382105469703674, -0.009307694621384144, 0.011471630074083805, 0.0276

### Sending Request to an Anthropic Model

You can learn about the Claude 3 sonnet model [here](https://api.ai.prod.us-east-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d00185d3c6a8fe73), and the function docs [here](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference-call.html).

In [1]:
from gen_ai_hub.proxy.native.amazon.clients import Session

model = Session().client(model_name='anthropic--claude-3-sonnet')
messages = [{"role": "user", "content": [{'text': "I'm applying to be a data scientist at SAP. What should I include in my resume?"}]}]
response = model.converse(messages=messages,
    inferenceConfig={"maxTokens": 512, "temperature": 1, "topP": 0.9},
)
print(response['output']['message']['content'][0]['text'])

When applying for a data scientist role at SAP, you should tailor your resume to highlight your relevant skills, experience, and achievements. Here are some key things to include:

1. Education: List your academic qualifications, including your degree(s) in a quantitative field such as statistics, computer science, mathematics, or a related discipline.

2. Technical Skills: Emphasize your proficiency in programming languages commonly used in data science, such as Python, R, SQL, and any relevant libraries or frameworks (e.g., NumPy, Pandas, Scikit-learn, TensorFlow, Keras). Also, mention your expertise in data mining, machine learning, statistical modeling, and data visualization tools.

3. Data Science Projects: Provide details about your relevant data science projects, internships, or research work. Highlight the problem statements, methodologies used, tools/technologies applied, and the outcomes or insights derived from your work.

4. Experience with Data: Showcase your experience i

### Sending Request to a GPT Model

There are many options for using the OpenAI `chat/completions` API. Docs [here](https://platform.openai.com/docs/api-reference/chat/create). Below, we will simply compare streaming vs. non-streaming responses.

Non-Streaming:

In [5]:
from gen_ai_hub.proxy.native.openai import chat

messages = [
            {
              "role": "user",
              "content": "Write me a long Poem about SAP?"
            }
        ]
kwargs = { "max_tokens": 500, 'stream': False}
ns_response = chat.completions.create(**dict(model_name='gpt-4o-mini', messages=messages, **kwargs))
print(ns_response.choices)

[Choice(finish_reason='length', index=0, logprobs=None, message=ChatCompletionMessage(content="### Ode to SAP: The Heartbeat of Enterprise\n\nIn the realm where business dreams align,  \nA sturdy framework, where ideas entwine,  \nStands SAP, a beacon, a guiding star,  \nA tapestry woven, both near and far.  \n\nFrom humble beginnings in the sixties' dawn,  \nWhere visions of progress and tech were drawn,  \nIt flourished like blossoms, in spring's gentle air,  \nA software solution beyond compare.  \n\nWith modules designed to streamline the flow,  \nFinance and logistics, where insights bestow,  \nIn the vast world of data, it carves out a path,  \nForging connections, igniting the math.  \n\nOh, ERP wizard, you're more than a tool,  \nYou empower the shepherds, the dreamers, the school,  \nFrom sales to procurement, each function aligned,  \nAn orchestra playing, with harmony twined.  \n\nIn warehouses bustling, the inventory flows,  \nWith real-time insights, production just glows,

Streaming:

In [25]:
from gen_ai_hub.proxy.native.openai import chat

messages = [
            {
              "role": "user",
              "content": "Write me a long Poem about SAP?"
            }
        ]
kwargs = { "max_tokens": 500, "stream": True}
s_response = chat.completions.create(**dict(model_name='gpt-4o-mini', messages=messages, **kwargs))
for chunk in s_response:
    print(chunk)
    print(chunk.choices)
    print("****************")

ChatCompletionChunk(id='', choices=[], created=0, model='', object='', service_tier=None, system_fingerprint=None, usage=None, prompt_filter_results=[{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}])
[]
****************
ChatCompletionChunk(id='chatcmpl-ALAQheTYi6ylyVPqsbQlMspkZ7fKL', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={})], created=1729608343, model='gpt-4o-mini', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d54531d9eb', usage=None)
[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results=

In [6]:
print(ns_response.choices[0].message.content)

**In the Realm of SAP**

In the heart of enterprises, bustling and grand,  
Lies a system so mighty, with a guiding hand.  
A symphony of data, a dance in the cloud,  
With SAP leading, so steadfast and proud.  

From the hum of the factory, where machines sing,  
To the boardroom’s whispers of strategy’s swing,  
SAP’s essence weaves through each passing day,  
Transforming the numbers in intricate ways.  

A tapestry woven, where processes flow,  
From procurement’s cradle, where budgets can grow,  
To sales that ignite with a vibrant appeal,  
SAP nurtures the rhythm, a symphonic reel.  

With modules aplenty, each crafted with care,  
From Finance to HR, their power laid bare.  
A canvas of functions, a toolkit divine,  
Where the art of decision finds rhythm and rhyme.  

In S/4HANA’s embrace, the future’s mindset,  
The analytics pulse, like a star brightly set.  
With insights unveiled, and data in sight,  
Businesses flourish, evolving their flight.  

The storm of the market m

### Sending Request to Other Models

Native clients exist for other models as well, such as open-source Llamas, Geminis, variants of GPT, and others. We saw above that each model has difference APIs. For example, OpenAI models use `chat.completions` and Bedrock uses `converse`. We can write a function, as shown below, to leverage the correct API based on the model we are using, but this requires additional overhead, because we, as the developer, will need to maintain this function and ensure it is up to date with any new parameters OpenAI, VertexAI, Bedrock, etc. introduce in their API. For example, the below function doesn't have a way to customer top_p. VertexAI refers to top_p as `topP`, while OpenAI refers to it as `top_p`. We would need to write a function like below build a "harmonized" API. Instead of implementing this ourselved, the orchestration service via the Generative AI Hub offers this "harmonized" API out of the box. Thus, we don't have to rewrite this function if there are changes to the native APIs or new inference parameters being introduced.

In [7]:
from gen_ai_hub.proxy.native.openai import chat
from gen_ai_hub.proxy.native.google_vertexai.clients import GenerativeModel
from gen_ai_hub.proxy.native.amazon.clients import Session
from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client

LLMS = {'amazon': ['amazon--titan-text-express'], 
        'anthropic': ['anthropic--claude-3-sonnet'],
        'openai': ['gpt-4o-mini'],
        'google': ['gemini-1.5-flash']}

def get_model_response(model, system_instruction, prompt):
    if 'openai' in LLMS and model in LLMS['openai']:  
        messages = [{"role": "system", "content": system_instruction}, {"role": "user", "content": prompt}]
        return chat.completions.create(**dict(model_name=model, messages=messages)).choices[0].message.content
    elif 'meta' in LLMS and model in LLMS['meta']:
        messages = [{"role": "system", "content": system_instruction}, {"role": "user", "content": prompt}]
        return chat.completions.create(**dict(model_name=model, messages=messages)).choices[0].message.content
    elif 'google' in LLMS and model in LLMS['google']:
        proxy_client = get_proxy_client('gen-ai-hub')
        kwargs = dict({'model_name': model, 'system_instruction':[system_instruction], 'generation_config': {"response_mime_type": "application/json"}})
        model = GenerativeModel(proxy_client=proxy_client, **kwargs)
        model_response = model.generate_content([prompt])
        return model_response.candidates[0].content.parts[0].text
    elif 'amazon' in LLMS and model in LLMS['amazon']:
        model = Session().client(model_name=model)
        body = json.dumps(
            {
                    "inputText":f"{system_instruction}\n{prompt}",
                    "textGenerationConfig": {
                        "maxTokenCount": 3072,
                        "stopSequences": [],
                        "temperature": 0.7,
                        "topP": 0.9,
                    },
            }
        )
        response = model.invoke_model(body=body)
        response_body = json.loads(response.get("body").read())
        return response_body['results'][0]['outputText']
    elif 'anthropic' in LLMS and model in LLMS['anthropic']:
        model = Session().client(model_name=model)
        messages = [{"role": "user", "content": [{'text': f'{system_instruction}\n{prompt}'}]}]
        response = model.converse(messages=messages,
            inferenceConfig={"maxTokens": 512, "temperature": 0.5, "topP": 0.9},
        )
        return response['output']['message']['content'][0]['text']

In [8]:
print(get_model_response('gemini-1.5-flash', 'You are comedian, and you must talk with humor.', 'Tell me a really mean joke.'))

{"joke": "Why don't they play poker in the jungle?  Too many cheetahs."}


# Summary

Within this exercise you learned how to interact with LLMs and embedding models directly. Also, you played around with LLM parameters like temperature. Let's explore the orchestration module in the following exercises. Continue to [Exercise 2 - Orchestration Templating](./orchestration_templating.ipynb).