### Introduction
In this notebook, we will test out a mistral-7b with [llama.cpp](https://github.com/ggerganov/llama.cpp) in SAP AI Core. You can also run Running LLaMa 3, Phi3, Mistral, Mixtral, LlaVA, and other [supported models in llama.cpp](https://github.com/ggerganov/llama.cpp). 

### Prerequisites
Before running this notebook, please assure you have performed the [Prerequisites](../../README.md) and [01-deployment.ipynb](01-deployment.ipynb). As a result, a deployment of llama.cpp scenario is running in SAP AI Core. <br/><br/>

If the configuration and deployment are created through SAP AI Launchpad, please manually update the configuration_id and deployment_id in [env.json](env.json)
```json
{
    "configuration_id": "<YOUR_CONFIGURATION_ID_OF_LLAMA.CPP_SCENARIO>",
    "deployment_id": "<YOUR_DEPLOYMENT_ID_BASED_ON_CONFIG_ABOVE>"
}
```
 
### The high-level flow:
- Load configurations info
- Connect to SAP AI Core via SDK
- Check the status and logs of the deployment
- Inference the model with OpenAI-compatible chat completion API


#### 1.Load config info 
- resource_group loaded from [config.json](../config.json)
- deployment_id(created in 01-deployment.ipynb) loaded [env.json](env.json)

In [1]:
import requests, json
from ai_api_client_sdk.ai_api_v2_client import AIAPIV2Client

In [2]:
# Please replace the configurations below.
# config_id: The target configuration to create the deployment. Please create the configuration first.
with open("../config.json") as f:
    config = json.load(f)

with open("./env.json") as f:
    env = json.load(f)

deployment_id = env["deployment_id"]
resource_group = config.get("resource_group", "default")
print("deployment id: ", deployment_id, " resource group: ", resource_group)

deployment id:  d2b3f7ffbde78b38  resource group:  oss-llm


#### 2.Initiate connection to SAP AI Core 

In [3]:
aic_sk = config["ai_core_service_key"]
base_url = aic_sk["serviceurls"]["AI_API_URL"] + "/v2/lm"
ai_api_client = AIAPIV2Client(
    base_url= base_url,
    auth_url=aic_sk["url"] + "/oauth/token",
    client_id=aic_sk['clientid'],
    client_secret=aic_sk['clientsecret'],
    resource_group=resource_group)

In [4]:
token = ai_api_client.rest_client.get_token()
headers = {
        "Authorization": token,
        'ai-resource-group': resource_group,
        "Content-Type": "application/json"}


#### 3.Check the deployment status 

In [5]:
# Check deployment status before inference request
deployment_url = f"{base_url}/deployments/{deployment_id}"
response = requests.get(url=deployment_url, headers=headers)
resp = response.json()    
status = resp['status']

deployment_log_url = f"{base_url}/deployments/{deployment_id}/logs"
if status == "RUNNING":
        print(f"Deployment-{deployment_id} is running. Ready for inference request")
else:
        print(f"Deployment-{deployment_id} status: {status}. Not yet ready for inference request")
        #retrieve deployment logs
        #{{apiurl}}/v2/lm/deployments/{{deploymentid}}/logs.

        response = requests.get(deployment_log_url, headers=headers)
        print('Deployment Logs:\n', response.text)


Deployment-d2b3f7ffbde78b38 is running. Ready for inference request


#### 4.Inference completion and chat completion APIs
- model: Must be the exact model alias defined in [../config.json](../config.json) > configurations > llama.cpp parameters > {"key": "alias", "value": "xxx"}. Default as "mistral"

In [23]:
model = "mistral"  # Important: Must be the exact model alias defined in ../config.json > configurations > llama.cpp. 
#model = "mixtral" # Important: please resource plan to infer.l in byom-oss-llm-templates/ollama-template.yaml
deployment = ai_api_client.deployment.get(deployment_id)
inference_base_url = f"{deployment.deployment_url}"
openai_chat_api_endpoint = f"{inference_base_url}/v1/chat/completions"
openai_completion_api_endpoint = f"{inference_base_url}/v1/completions"

In [11]:
# List models
endpoint = f"{inference_base_url}/v1/models"
print(endpoint)

response = requests.get(url=endpoint, headers=headers)
print('Result:', response.text)

https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d2b3f7ffbde78b38/v1/v1/models
Result: {"data":[{"created":1711535689,"id":"mistral","meta":{"n_ctx_train":32768,"n_embd":4096,"n_params":7241732096,"n_vocab":32000,"size":5130674176,"vocab_type":1},"object":"model","owned_by":"llamacpp"}],"object":"list"}


##### 4.1 Sample#1: Test OpenAI compatible API for Chat Completion
Now let's test its [OpenAI compatible API for Chat Completion](https://github.com/ggerganov/llama.cpp/tree/master/examples/server), which is the exact API interface of Chat Completion of GPT-3.5/4 in SAP Generative AI Hub. 

In [12]:
#let's try its openai-compatible chat completion api
sys_msg = "You are an helpful AI assistant"
user_msg = "why the sky is blue?"

json_data = { 
  "model": model, 
  "messages": [
            {
                "role": "system",
                "content": sys_msg
            },
            {
                "role": "user",
                #"content": "why is the sky blue?"
                "content": user_msg
            }
        ]
}

response = requests.post(openai_chat_api_endpoint, headers=headers, json=json_data)
print('Result:', response.text)

Result: {"choices":[{"finish_reason":"stop","index":0,"message":{"content":" The sky appears blue due to a natural phenomenon called Rayleigh scattering. As sunlight reaches Earth's atmosphere, it interacts with gases and particles in the air, causing the scattering of light in various directions. Blue light is scattered more easily than other colors because it travels in smaller, shorter waves. This scattered blue light is what we see when we look up at the sky.","role":"assistant"}}],"created":1711535711,"id":"chatcmpl-gMZW9pG7IQUhGkx147KZ3wKdycQT6n1J","model":"mistral","object":"chat.completion","usage":{"completion_tokens":79,"prompt_tokens":21,"total_tokens":100}}


##### Sample#2: Write a haiku about running llama.cpp in AI Core

In [13]:
#let's test its openai-compatible chat completion api by writing a haiku
sys_msg = "You are a helpful assistant"
user_msg = "Write a haiku for running llama.cpp in AI Core"
json_data = {
  "model": model,
  "messages": [
            {
                "role": "system",
                "content": sys_msg
            },
            {
                "role": "user",
                "content": user_msg
            }
        ]
}

response = requests.post(openai_chat_api_endpoint, headers=headers, json=json_data)
print('Result:', response.json())

Result: {'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': ' Code awakens, llama runs, Silence in AI\'s core.\n\nExplanation: In this haiku, I have tried to convey the idea of running a C++ file named "llama.cpp" in an Artificial Intelligence (AI) core or system. The first line represents the code coming to life and being executed. The second line signifies the llama file (which could be any program or script) running smoothly within the AI\'s environment. Lastly, the third line emphasizes the quiet and focused nature of the AI core during the execution process.', 'role': 'assistant'}}], 'created': 1711535722, 'id': 'chatcmpl-yDUcXiDDXNb3UcL7v51QZN3NYX7Pach3', 'model': 'mistral', 'object': 'chat.completion', 'usage': {'completion_tokens': 123, 'prompt_tokens': 27, 'total_tokens': 150}}


##### 4.3 Sample#3: Customer Message Processing 
In our sample [btp-industry-use-cases/04-customer-interaction-gpt4](https://github.com/SAP-samples/btp-industry-use-cases/tree/main/04-customer-interaction-gpt4),GPT-3.5/4 is used to process customer messages in customer interactions and output in json schema with plain prompting.
- Summarize customer message into title and a short description
- Analyze the sentiment of the customer message
- Extract the entities from the customer message, such as customer, product, order no etc.

Let's see if the same scenario could be achieved with mistral-7b.


In [40]:
# Let's test its openai-compatible chat completion api with to process customer message with
# summarization, sentiment analysis and entities extraction and output as json
sys_msg = r'''
You are an AI assistant to process the input text. Here are your tasks on the text.
1.Apply Sentiment Analysis
2.Generate a title less than 100 characters,and summarize the text into a short description less than 200 characters
3.Extract the entities such as customer,product,order,delivery,invoice etc from the text Here is a preliminary list of the target entity fields and description. Please extract all the identifiable entities even not in the list below. Don't include any field with unknown value.
-customer_no: alias customer number, customer id, account id, account number which could be used to identify a customer.
-customer_name: customer name, account name
-customer_phone: customer contact number. -product_no: product number, product id
-product_name
-order_no: sales order number, order id
-order_date 
-delivery_no: delivery number, delivery id
-delivery_date: delivery date, shipping date
-invoice_no: alias invoice number, invoice id, receipt number, receipt id etc. which can be used to locate a invoice.
-invoice_date: invoice date, purchase date
-store_name
-store_location
etc.
    
For those fields not in list must follow the Snakecase name conversation like product_name, no space allow. 

Output expected in JSON format as below: 
{\"sentiment\":\"{{Positive/Neutral/Negative}}\",\"title\":\"{{The generated title based on the input text less than 100 characters}}\",\"summary\":\"{{The generated summary based on the input text less than 300 characters}}\",\"entities\":[{\"field\":\"{{the extracted fields such as product_name listed above}}\",\"value\":\"{{the extracted value of the field}}\"}]}
'''

user_msg = r'''
Input text: 
Everything was working fine one day I went to make a shot of coffee it stopped brewing after 3 seconds Then I tried the milk frother it stopped after 3 seconds again I took it back they fixed it under warranty but it’s happening again I don’t see this machine lasting more then 2 years to be honest I’m spewing I actually really like the machine It’s almost like it’s losing pressure somewhere, they wouldn’t tell my what the problem was when they fixed it.. Purchased at Harvey Norman for $1,349.
Product is used: Several times a week
 
JSON:
'''

json_data = { 
  "model": model,
  "response_format": {"type": "json_object"}, #JSON mode
  "messages": [
            {
                "role": "system",
                "content": sys_msg
            },
            {
                "role": "user",
                #"content": "why is the sky blue?"
                "content": user_msg
            }
        ]
}

response = requests.post(url=openai_chat_api_endpoint, headers=headers, json=json_data)
print('Result:', response.text)

Result: {"choices":[{"finish_reason":"stop","index":0,"message":{"content":" {\n\"sentiment\": \"Negative\",\n\"title\": \"Customer Expresses Dissatisfaction with Coffee Machine\",\n\"summary\": \"The customer encountered issues with their coffee machine, which stopped brewing and frothing after a few seconds. They took it back for repair under warranty but the problem persisted. The customer is not confident in the machine's longevity and expressed frustration over the lack of information provided during the repair process. Purchased at Harvey Norman for $1,349.\",\n\"entities\": [\n{\n\"field\": \"product_name\",\n\"value\": \"coffee machine\"\n},\n{\n\"field\": \"purchase_price\",\n\"value\": \"$1,349\"\n},\n{\n\"field\": \"store_name\",\n\"value\": \"Harvey Norman\"\n}\n]\n}","role":"assistant"}}],"created":1711545003,"id":"chatcmpl-ObXAZInyRluYvzbjFUzW2aeCJTIOQmJZ","model":"mistral","object":"chat.completion","usage":{"completion_tokens":188,"prompt_tokens":577,"total_tokens":765}

Next let's constrain the structured JSON output with [BNF Grammar](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md)

In [41]:
# Let's test its openai-compatible chat completion api with to process customer message with
# summarization, sentiment analysis and entities extraction and output as json

grammar = r'''
entities ::= "[" space ( entities-item ( "," space entities-item )* )? "]" space
entities-item ::= "{" space entities-item-field-kv "," space entities-item-value-kv "}" space
entities-item-field-kv ::= "\"field\"" space ":" space string
entities-item-value-kv ::= "\"value\"" space ":" space string
entities-kv ::= "\"entities\"" space ":" space entities
root ::= "{" space sentiment-kv "," space title-kv "," space summary-kv "," space entities-kv "}" space
sentiment-kv ::= "\"sentiment\"" space ":" space sentiment-value
sentiment-value ::= ("\"Positive\"" | "\"Neutral\"" | "\"Negative\"")
space ::= " "?
string ::=  "\"" (
        [^"\\] |
        "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
      )* "\"" space
summary-kv ::= "\"summary\"" space ":" space string
title-kv ::= "\"title\"" space ":" space string
'''

json_data = { 
  "model": model,
  "messages": [
            {
                "role": "system",
                "content": sys_msg
            },
            {
                "role": "user",
                #"content": "why is the sky blue?"
                "content": user_msg
            }
        ],
    "grammar": grammar
}

response = requests.post(url=openai_chat_api_endpoint, headers=headers, json=json_data)
print('Result:', response.text)

Result: {"choices":[{"finish_reason":"stop","index":0,"message":{"content":"{\"sentiment\": \"Negative\", \"title\": \"Customer Expresses Dissatisfaction with Coffee Machine\", \"summary\": \"The customer encountered issues with their coffee machine, which stopped brewing and frothing after a few seconds. They took it back for repair under warranty but the problem persisted. The customer is uncertain about the machine's longevity and expressed frustration over the lack of information provided during the repair process. Purchased at Harvey Norman for $1,349.\", \"entities\": [{\"field\": \"product_name\", \"value\": \"coffee machine\"}, {\"field\": \"purchase_date\", \"value\": \"Purchased at Harvey Norman\"}, {\"field\": \"price\", \"value\": \"$1,349\"}]}","role":"assistant"}}],"created":1711545268,"id":"chatcmpl-ZOFB3NAjmrAEJsq4kK9P2EatldNe6Nam","model":"mistral","object":"chat.completion","usage":{"completion_tokens":162,"prompt_tokens":577,"total_tokens":739}}


Alternatively, we can use the **completion API**

In [29]:
# Let's test its openai-compatible chat completion api with to process customer message with
# summarization, sentiment analysis and entities extraction and output as json
json_data = {
    "model": model,
    "prompt": f"{sys_msg}/n{user_msg}",
    "grammar": grammar
}

response = requests.post(
    url=openai_completion_api_endpoint, headers=headers, json=json_data
)
print("Result:", response.text)

Result: {"content":"{\"sentiment\":\"Negative\",\"title\":\"Customer expresses dissatisfaction with coffee machine performance under warranty\",\"summary\":\"A customer shares their experience of a malfunctioning coffee machine, which required repairs under warranty but continues to have issues. They express doubts about the product’s longevity and are unhappy with the lack of information provided by the company. The machine was purchased at Harvey Norman for $1,349 and is used several times a week.\",\"entities\":[{\"field\":\"product_name\",\"value\":\"coffee machine\"},{\"field\":\"customer_phone\",\"value\":\"not mentioned in text\"},{\"field\":\"store_name\",\"value\":\"Harvey Norman\"}]}","generation_settings":{"dynatemp_exponent":1.0,"dynatemp_range":0.0,"frequency_penalty":0.0,"grammar":"\nentities ::= \"[\" space ( entities-item ( \",\" space entities-item )* )? \"]\" space\nentities-item ::= \"{\" space entities-item-field-kv \",\" space entities-item-value-kv \"}\" space\nen