### Introduction
In this notebook, we will test out pulling out a mistral-7b and serving with [LocalAI](https://localai.io/) in SAP AI Core. You can also run Llama 3, Phi 3, Mistral, Mixtral, Gemma, Llava and other [supported models in LocalAI](https://localai.io/models/). 

### Prerequisites
Before running this notebook, please assure you have performed the [Prerequisites](../../README.md) and [01-deployment.ipynb](01-deployment.ipynb). As a result, a deployment of local-ai scenario is running in SAP AI Core. <br/><br/>

If the configuration and deployment are created through SAP AI Launchpad, please manually update the configuration_id and deployment_id in [env.json](env.json)
```json
{
    "configuration_id": "<YOUR_CONFIGURATION_ID_OF_LOCALAI_SCENARIO>",
    "deployment_id": "<YOUR_DEPLOYMENT_ID_BASED_ON_CONFIG_ABOVE>"
}
```
 
### The high-level flow:
- Load configurations info
- Connect to SAP AI Core via SDK
- Check the status and logs of the deployment
- Install model from LocalAI's model gallery through API
- Inference the model with OpenAI-compatible chat completion API

#### 1.Load config info 
- resource_group loaded from [config.json](../config.json)
- deployment_id(created in 01-deployment.ipynb) loaded [env.json](env.json)

In [60]:
import requests, json
from ai_api_client_sdk.ai_api_v2_client import AIAPIV2Client

In [61]:
# Please replace the configurations below.
# config_id: The target configuration to create the deployment. Please create the configuration first.
with open("../config.json") as f:
    config = json.load(f)

with open("./env.json") as f:
    env = json.load(f)

deployment_id = env["deployment_id"]
resource_group = config.get("resource_group", "default")
print("deployment id: ", deployment_id, " resource group: ", resource_group)

deployment id:  d09391ae26f8c16d  resource group:  oss-llm


#### 2.Initiate connection to SAP AI Core 

In [62]:
aic_sk = config["ai_core_service_key"]
base_url = aic_sk["serviceurls"]["AI_API_URL"] + "/v2/lm"
ai_api_client = AIAPIV2Client(
    base_url= base_url,
    auth_url=aic_sk["url"] + "/oauth/token",
    client_id=aic_sk['clientid'],
    client_secret=aic_sk['clientsecret'],
    resource_group=resource_group)

In [63]:
token = ai_api_client.rest_client.get_token()
headers = {
        "Authorization": token,
        'ai-resource-group': resource_group,
        "Content-Type": "application/json"}


#### 3.Check the deployment status 

In [64]:
# Check deployment status before inference request
deployment_url = f"{base_url}/deployments/{deployment_id}"
response = requests.get(url=deployment_url, headers=headers)
resp = response.json()    
status = resp['status']

deployment_log_url = f"{base_url}/deployments/{deployment_id}/logs"
if status == "RUNNING":
        print(f"Deployment-{deployment_id} is running. Ready for inference request")
else:
        print(f"Deployment-{deployment_id} status: {status}. Not yet ready for inference request")
        #retrieve deployment logs
        #{{apiurl}}/v2/lm/deployments/{{deploymentid}}/logs.

        response = requests.get(deployment_log_url, headers=headers)
        print('Deployment Logs:\n', response.text)


Deployment-d09391ae26f8c16d is running. Ready for inference request


#### 4.Install the model
We'll install the model mistral from LocalAI's [Model Gallery](https://localai.io/models/), in which GPU is not enabled by default. Then, we'll override the model configuration by enabling GPU. 

In [65]:
model = "mistral"
#model = "mistral:7b-instruct-q5_K_M"
#model = "mixtral:8x7b-instruct-v0.1-q4_0" #Important: please resource plan to infer.l in byom-oss-llm-templates/local-ai-template.yaml
deployment = ai_api_client.deployment.get(deployment_id)
inference_base_url = f"{deployment.deployment_url}/v1"

In [67]:
# Install model from model gallery. 
# Please refer to this for find out your target model: https://localai.io/models/#how-to-install-a-model-from-the-repositories
install_model_endpoint = f"{inference_base_url}/models/apply"
print(install_model_endpoint)

json_data = {
    # Installation with the id from model gallery is working well.
    "id": "model-gallery@mistral"
    #"id": "model-gallery@bert-embeddings"

    # Installation with url isn't working as described here: https://localai.io/models/#how-to-install-a-model-without-a-gallery
    #"url": "github:go-skynet/model-gallery/blob/main/bert-embeddings.yaml"
    #"url": "github:mudler/LocalAI/blob/master/examples/configurations/phi-2.yaml"
    #"url": "github:YatseaLi/byom-oss-llm-ai-core/main/byom-oss-llm-code/local-ai/configurations/mistral.yaml"
  }
response = requests.post(url=install_model_endpoint, headers=headers, json=json_data)
install_model_resp_json = response.json()
job_id = install_model_resp_json["uuid"]
print('Result:', install_model_resp_json)

https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d09391ae26f8c16d/v1/models/apply
Result: {'uuid': '0293dd4d-eb6f-11ee-ba84-463fc5800695', 'status': 'http://d09391ae26f8c16d-predictor-default.rg-2ac5c88d-3284ef4c.svc.cluster.local/models/jobs/0293dd4d-eb6f-11ee-ba84-463fc5800695'}


In [78]:
#Checking status of installing model
#deployment_log_url = f"{base_url}/deployments/{deployment_id}/logs?start=2021-05-19T00:00:14.347Z&"
endpoint = f"{inference_base_url}/models/jobs/{job_id}"
response = requests.get(endpoint, headers=headers )
print('Model Installation Job Logs:\n', response.text)

Model Installation Job Logs:
 {"file_name":"","error":null,"processed":true,"message":"completed","progress":100,"file_size":"","downloaded_size":""}


In [79]:
# Install model from hugging face without model config. GPU is not enabled by default. 
# In this example, let's override the configuration with GPU enabled while downloading the model from hugging face.
# Please refer to this for find out your target model: https://localai.io/models/#how-to-install-a-model-from-the-repositories
install_model_endpoint = f"{inference_base_url}/models/apply"
print(install_model_endpoint)

json_data = {
     "id": "model-gallery@mistral",
     #"id": "huggingface@TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q5_K_M.gguf",
     "name": model, #rename the model
     # Override with GPU enabled.
     "overrides": { 
        "f16": True,
        "mmap": True,
        "gpu_layers": 33,
        "threads": 3 
      }
   }

response = requests.post(install_model_endpoint, headers=headers, json=json_data)
install_model_resp_json = response.json()
job_id = install_model_resp_json["uuid"]
print('Result:', install_model_resp_json)

https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d09391ae26f8c16d/v1/models/apply
Result: {'uuid': '3711de74-eb6f-11ee-ba84-463fc5800695', 'status': 'http://d09391ae26f8c16d-predictor-default.rg-2ac5c88d-3284ef4c.svc.cluster.local/models/jobs/3711de74-eb6f-11ee-ba84-463fc5800695'}


In [88]:
#Checking status of installing model
#deployment_log_url = f"{base_url}/deployments/{deployment_id}/logs?start=2021-05-19T00:00:14.347Z&"
endpoint = f"{inference_base_url}/models/jobs/{job_id}"
response = requests.get(url=endpoint, headers=headers )
print('Model Installation Job Logs:\n', response.text)

Model Installation Job Logs:
 {"file_name":"","error":null,"processed":true,"message":"completed","progress":100,"file_size":"","downloaded_size":""}


In [95]:
# List models and found out the model id, which will  be used in next request on chat completion 
endpoint = f"{inference_base_url}/v1/models"
print(endpoint)

response = requests.get(endpoint, headers=headers)
print('Result:', response.text)

https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d09391ae26f8c16d/v1/v1/models
Result: {"object":"list","data":[{"id":"mistral","object":"model"}]}


#### 5.Inference completion and chat completion APIs

In [90]:
openai_chat_api_endpoint = f"{inference_base_url}/v1/chat/completions"

##### Sample#1: Test OpenAI compatible API for Chat Completion
Now let's test its [OpenAI compatible API for Chat Completion](https://localai.io/features/text-generation/#chat-completions), which is the exact API interface of Chat Completion of GPT-3.5/4 in SAP Generative AI Hub. 

In [96]:
#let's try its openai-compatible chat completion api
sys_msg = "You are an helpful AI assistant"
user_msg = "why the sky is blue?"

json_data = { 
  "model": model, 
  "messages": [
            {
                "role": "system",
                "content": sys_msg
            },
            {
                "role": "user",
                #"content": "why is the sky blue?"
                "content": user_msg
            }
        ]
}

response = requests.post(openai_chat_api_endpoint, headers=headers, json=json_data)
print('Result:', response.text)

Result: {"created":1711456731,"object":"chat.completion","id":"f9eecc4d-49e5-4c6f-86c7-12a9ca99b653","model":"mistral","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":" The sky appears blue because of the way light scatters in the Earth's atmosphere. When sunlight enters the Earth's atmosphere, it consists of a range of colors, including red, orange, yellow, green, blue, indigo, and violet, which together form the visible light spectrum.\n\nAs the sunlight travels through the atmosphere, it interacts with the air molecules and tiny particles called aerosols and dust. These particles are much smaller than the wavelength of visible light. When the sunlight encounters these particles, it can be scattered in different directions.\n\nThe shorter wavelength colors, such as blue and violet, are scattered more efficiently than the longer wavelength colors, like red and orange. This is because the particles in the atmosphere are much smaller than the wavelen

##### Sample#2: Write a haiku about running LocalAI in AI Core

In [97]:
#let's test its openai-compatible chat completion api by writing a haiku
sys_msg = "You are a helpful assistant"
user_msg = "Write a haiku for running LocalAI in AI Core"
json_data = {
  "model": model,
  "messages": [
            {
                "role": "system",
                "content": sys_msg
            },
            {
                "role": "user",
                "content": user_msg
            }
        ]
}

response = requests.post(openai_chat_api_endpoint, headers=headers, json=json_data)
print('Result:', response.json())

Result: {'created': 1711456731, 'object': 'chat.completion', 'id': 'f9eecc4d-49e5-4c6f-86c7-12a9ca99b653', 'model': 'mistral', 'choices': [{'index': 0, 'finish_reason': 'stop', 'message': {'role': 'assistant', 'content': ' Swiftly, the AI Core runs,\n Processing data with grace,\n LocalAI, a shining star.\n'}}], 'usage': {'prompt_tokens': 0, 'completion_tokens': 0, 'total_tokens': 0}}


##### Sample#3: Customer Message Processing 
In our sample [btp-industry-use-cases/04-customer-interaction-gpt4](https://github.com/SAP-samples/btp-industry-use-cases/tree/main/04-customer-interaction-gpt4),GPT-3.5/4 is used to process customer messages in customer interactions and output in json schema with plain prompting.
- Summarize customer message into title and a short description
- Analyze the sentiment of the customer message
- Extract the entities from the customer message, such as customer, product, order no etc.

Let's see if the same scenario could be achieved with mistral-7b.


In [98]:
# Let's test its openai-compatible chat completion api with to process customer message with
# summarization, sentiment analysis and entities extraction and output as json
sys_msg = r'''
You are an AI assistant to process the input text. Here are your tasks on the text.
1.Apply Sentiment Analysis
2.Generate a title less than 100 characters,and summarize the text into a short description less than 200 characters
3.Extract the entities such as customer,product,order,delivery,invoice etc from the text Here is a preliminary list of the target entity fields and description. Please extract all the identifiable entities even not in the list below. Don't include any field with unknown value. \
-customer_no: alias customer number, customer id, account id, account number which could be used to identify a customer.
-customer_name: customer name, account name
-customer_phone: customer contact number. -product_no: product number, product id
-product_name
-order_no: sales order number, order id
-order_date 
-delivery_no: delivery number, delivery id
-delivery_date: delivery date, shipping date
-invoice_no: alias invoice number, invoice id, receipt number, receipt id etc. which can be used to locate a invoice.
-invoice_date: invoice date, purchase date
-store_name
-store_location
etc.
    
For those fields not in list must follow the Snakecase name conversation like product_name, no space allow. 

Output expected in JSON format as below: 
{\"sentiment\":\"{{Positive/Neutral/Negative}}\",\"title\":\"{{The generated title based on the input text less than 100 characters}}\",\"summary\":\"{{The generated summary based on the input text less than 300 characters}}\",\"entities\":[{\"field\":\"{{the extracted fields such as product_name listed above}}\",\"value\":\"{{the extracted value of the field}}\"}]}
'''

user_msg = r'''
Input text: 
Everything was working fine one day I went to make a shot of coffee it stopped brewing after 3 seconds Then I tried the milk frother it stopped after 3 seconds again I took it back they fixed it under warranty but it’s happening again I don’t see this machine lasting more then 2 years to be honest I’m spewing I actually really like the machine It’s almost like it’s losing pressure somewhere, they wouldn’t tell my what the problem was when they fixed it.. Purchased at Harvey Norman for $1,349. \
Product is used: Several times a week
 
JSON:
'''

json_data = { 
  "model": model,
  "response_format": {"type": "json_object"}, #JSON mode
  "messages": [
            {
                "role": "system",
                "content": sys_msg
            },
            {
                "role": "user",
                #"content": "why is the sky blue?"
                "content": user_msg
            }
        ]
}

response = requests.post(url=openai_chat_api_endpoint, headers=headers, json=json_data)
print('Result:', response.text)

Result: {"created":1711456731,"object":"chat.completion","id":"f9eecc4d-49e5-4c6f-86c7-12a9ca99b653","model":"mistral","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":" {\"sentiment\":\"Negative\",\"title\":\"Coffee Machine Stopping Brewing After Short Use\",\"summary\":\"Customer reports coffee machine stopping brewing after a few seconds and milk frother not working properly. The machine was fixed under warranty but the issue reoccurred.\",\"entities\":[{\"field\":\"product_no\",\"value\":\"\"},{\"field\":\"product_name\",\"value\":\"Coffee Machine\"},{\"field\":\"customer_name\",\"value\":\"\"},{\"field\":\"store_name\",\"value\":\"Harvey Norman\"},{\"field\":\"order_no\",\"value\":\"\"},{\"field\":\"order_date\",\"value\":\"\"},{\"field\":\"delivery_no\",\"value\":\"\"},{\"field\":\"delivery_date\",\"value\":\"\"},{\"field\":\"invoice_no\",\"value\":\"\"},{\"field\":\"invoice_date\",\"value\":\"\"},{\"field\":\"product_price\",\"value\":\"$1,349

Next let's constrain the structured JSON output with [BNF Grammar](https://localai.io/features/constrained_grammars/)

In [None]:
# Let's test its openai-compatible chat completion api with to process customer message with
# summarization, sentiment analysis and entities extraction and output as json

grammar = r'''
entities ::= "[" space ( entities-item ( "," space entities-item )* )? "]" space
entities-item ::= "{" space entities-item-field-kv "," space entities-item-value-kv "}" space
entities-item-field-kv ::= "\"field\"" space ":" space string
entities-item-value-kv ::= "\"value\"" space ":" space string
entities-kv ::= "\"entities\"" space ":" space entities
root ::= "{" space sentiment-kv "," space title-kv "," space summary-kv "," space entities-kv "}" space
sentiment-kv ::= "\"sentiment\"" space ":" space sentiment-value
sentiment-value ::= ("\"Positive\"" | "\"Neutral\"" | "\"Negative\"")
space ::= " "?
string ::=  "\"" (
        [^"\\] |
        "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
      )* "\"" space
summary-kv ::= "\"summary\"" space ":" space string
title-kv ::= "\"title\"" space ":" space string
'''

json_data = { 
  "model": model,
  "messages": [
            {
                "role": "system",
                "content": sys_msg
            },
            {
                "role": "user",
                #"content": "why is the sky blue?"
                "content": user_msg
            }
        ],
    "grammar": grammar
}

response = requests.post(url=openai_chat_api_endpoint, headers=headers, json=json_data)
print('Result:', response.text)