### Introduction
In this notebook, we will test out pulling out a mistral-7b and serving with [Ollama](https://ollama.com/) in SAP AI Core. You can also run Llama 2, Mixtral, Gemma, Llava and other [supported models in Ollama](https://ollama.com/library). 

### Prerequisites
Before running this notebook, please assure you have performed the [Prerequisites](../../README.md) and [01-deployment.ipynb](01-deployment.ipynb). As a result, a deployment of Ollama scenario is running in SAP AI Core.
 
### The high-level flow:
- Load configurations info
- Connect to SAP AI Core via SDK
- Check the status and logs of the deployment
- Pull model from ollama model repository through API
- Inference the model with OpenAI-compatible chat completion API


#### 1.Load config info 
- resource_group loaded from [config.json](../config.json)
- deployment_id(created in 01-deployment.ipynb) loaded [env.json](env.json)

In [8]:
import requests, json
from ai_api_client_sdk.ai_api_v2_client import AIAPIV2Client

In [9]:
# Please replace the configurations below.
# config_id: The target configuration to create the deployment. Please create the configuration first.
with open("../config.json") as f:
    config = json.load(f)

with open("./env.json") as f:
    env = json.load(f)

deployment_id = env["deployment_id"]
resource_group = config.get("resource_group", "default")
print("deployment id: ", deployment_id, " resource group: ", resource_group)

deployment id:  dc8777fe661ff053  resource group:  oss-llm


#### 2.Initiate connection to SAP AI Core 

In [10]:
aic_sk = config["ai_core_service_key"]
base_url = aic_sk["serviceurls"]["AI_API_URL"] + "/v2/lm"
ai_api_client = AIAPIV2Client(
    base_url= base_url,
    auth_url=aic_sk["url"] + "/oauth/token",
    client_id=aic_sk['clientid'],
    client_secret=aic_sk['clientsecret'],
    resource_group=resource_group)


In [11]:
token = ai_api_client.rest_client.get_token()
headers = {
        "Authorization": token,
        'ai-resource-group': resource_group,
        "Content-Type": "application/json"}


#### 3.Check the deployment status 

In [12]:
# Check deployment status before inference request
deployment_url = f"{base_url}/deployments/{deployment_id}"
response = requests.get(url=deployment_url, headers=headers)
resp = response.json()    
status = resp['status']

deployment_log_url = f"{base_url}/deployments/{deployment_id}/logs"
if status == "RUNNING":
        print(f"Deployment-{deployment_id} is running. Ready for inference request")
else:
        print(f"Deployment-{deployment_id} status: {status}. Not yet ready for inference request")
        #retrieve deployment logs
        #{{apiurl}}/v2/lm/deployments/{{deploymentid}}/logs.

        response = requests.get(deployment_log_url, headers=headers)
        print('Deployment Logs:\n', response.text)


Deployment-dc8777fe661ff053 is running. Ready for inference request


#### 4.Pull the model into Ollama 

In [13]:
model = "mistral:7b-instruct-q5_K_M"
#model = "mixtral:8x7b-instruct-v0.1-q4_0" #Important: please resource plan to infer.l in byom-oss-llm-templates/ollama-template.yaml

deployment = ai_api_client.deployment.get(deployment_id)
inference_base_url = f"{deployment.deployment_url}/v1"
openai_base_url = deployment.deployment_url

In [14]:
# pull the model from ollama model repository
endpoint = f"{inference_base_url}/api/pull"
print(endpoint)

#let's pull the mistral model from ollama
json_data = {  "name": model}

response = requests.post(endpoint, headers=headers, json=json_data)
print('Result:', response.text)

https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/dc8777fe661ff053/v1/api/pull
Result: {"status":"pulling manifest"}
{"status":"pulling 1849ef83c4dd","digest":"sha256:1849ef83c4dd8c8c34e5c2fb964db1efe5dc994eb40909bd96c751b1346f8123","total":5132345952}
{"status":"pulling 1849ef83c4dd","digest":"sha256:1849ef83c4dd8c8c34e5c2fb964db1efe5dc994eb40909bd96c751b1346f8123","total":5132345952}
{"status":"pulling 1849ef83c4dd","digest":"sha256:1849ef83c4dd8c8c34e5c2fb964db1efe5dc994eb40909bd96c751b1346f8123","total":5132345952}
{"status":"pulling 1849ef83c4dd","digest":"sha256:1849ef83c4dd8c8c34e5c2fb964db1efe5dc994eb40909bd96c751b1346f8123","total":5132345952}
{"status":"pulling 1849ef83c4dd","digest":"sha256:1849ef83c4dd8c8c34e5c2fb964db1efe5dc994eb40909bd96c751b1346f8123","total":5132345952}
{"status":"pulling 1849ef83c4dd","digest":"sha256:1849ef83c4dd8c8c34e5c2fb964db1efe5dc994eb40909bd96c751b1346f8123","total":5132345952}
{"status":"pulling 1849ef83c4dd","

In [15]:
# Check the model list 
endpoint = f"{inference_base_url}/api/tags"
print(endpoint)

response = requests.get(endpoint, headers=headers)
print('Result:', response.text)

https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/dc8777fe661ff053/v1/api/tags
Result: {"models":[{"name":"llava:latest","model":"llava:latest","modified_at":"2024-03-26T03:18:26.87178127Z","size":4733363377,"digest":"8dd30f6b0cb19f555f2c7a7ebda861449ea2cc76bf1f44e262931f45fc81d081","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama","clip"],"parameter_size":"7B","quantization_level":"Q4_0"}},{"name":"mistral:7b-instruct-q5_K_M","model":"mistral:7b-instruct-q5_K_M","modified_at":"2024-03-26T04:35:42.289522826Z","size":5132357866,"digest":"8397c99c426ff35d3211c5a3f33b578334b57bf4f3281d41e37970bf98465acc","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"7B","quantization_level":"Q5_K_M"}}]}


#### 5.Inference completion and chat completion APIs

In [16]:
completion_api_endpoint = f"{inference_base_url}/api/generate"
chat_api_endpoint = f"{inference_base_url}/api/chat"
openai_chat_api_endpoint = f"{openai_base_url}/v1/chat/completions"

##### Sample#1: Test Ollama's Completion API

In [17]:
#test ollama's completion api
json_data = {
  "model": model,
  "prompt": "What color is the sky at different times of the day? Respond in JSON",
  "format": "json", #JSON mode
  "stream": False   #Streaming or not
}

response = requests.post(url=completion_api_endpoint, headers=headers, json=json_data)
print('Result:', response.text)

Result: {"model":"mistral:7b-instruct-q5_K_M","created_at":"2024-03-26T04:37:36.737093368Z","response":"{\n   \"sunrise\": \"orange\",\n   \"morning\": \"blue\",\n   \"afternoon\": \"white\",\n   \"evening\": \"pink\"\n}","done":true,"context":[733,16289,28793,28705,1824,3181,349,272,7212,438,1581,2421,302,272,1370,28804,1992,19571,297,9292,733,28748,16289,28793,13,28751,13,259,345,19875,24035,1264,345,271,909,548,13,259,345,21621,971,1264,345,12349,548,13,259,345,7792,8122,1264,345,10635,548,13,259,345,828,3250,1264,345,28720,655,28739,13,28752],"total_duration":10140687773,"load_duration":8836806175,"prompt_eval_count":26,"prompt_eval_duration":153737000,"eval_count":42,"eval_duration":1149415000}


##### Test Ollama's Chat Completion API

Now let's test Ollama's [OpenAI compatible API for Chat Completion](https://github.com/ollama/ollama/blob/main/docs/openai.md), which is the exact API interface of Chat Completion of GPT-3.5/4 in SAP Generative AI Hub. 
##### Sample#2: Write a haiku about Ollama in AI Core
Let's test its chat completion API

In [20]:
#let's test ollama openai-compatible chat completion api by writing a haiku
sys_msg = "You are a helpful assistant."
user_msg = "Write a haiku about running Ollama in AI Core"
json_data = {
    "model": model,
    "messages": [
            {
                "role": "system",
                "content": sys_msg
            },
            {
                "role": "user",
                "content": user_msg
            }
    ],
    "stream": False
}

response = requests.post(url=chat_api_endpoint, headers=headers, json=json_data)
print('Result:', response.text)

Result: {"model":"mistral:7b-instruct-q5_K_M","created_at":"2024-03-26T04:39:42.464957241Z","message":{"role":"assistant","content":"Ollama's wheels turn fast,\nAI Core hums with delight,\nRunning to the future."},"done":true,"total_duration":712327465,"load_duration":374694,"prompt_eval_duration":42835000,"eval_count":24,"eval_duration":667987000}


##### Sample#3: Customer Message Processing with OpenAI-compatible Chat Completion API
In our sample [btp-industry-use-cases/04-customer-interaction-gpt4](https://github.com/SAP-samples/btp-industry-use-cases/tree/main/04-customer-interaction-gpt4),GPT-3.5/4 is used to process customer messages in customer interactions and output in json schema with plain prompting.
- Summarize customer message into title and a short description
- Analyze the sentiment of the customer message
- Extract the entities from the customer message, such as customer, product, order no etc.

Let's see if the same scenario could be achieved with mistral-7b.


In [21]:
# Let's test its openai-compatible chat completion api with to process customer message with
# summarization, sentiment analysis and entities extraction and output as json
sys_msg = r'''
You are an AI assistant to process the input text. Here are your tasks on the text.
1.Apply Sentiment Analysis
2.Generate a title less than 100 characters,and summarize the text into a short description less than 200 characters
3.Extract the entities such as customer,product,order,delivery,invoice etc from the text Here is a preliminary list of the target entity fields and description. Please extract all the identifiable entities even not in the list below. Don't include any field with unknown value. 
-customer_no: alias customer number, customer id, account id, account number which could be used to identify a customer.
-customer_name: customer name, account name
-customer_phone: customer contact number. -product_no: product number, product id
-product_name
-order_no: sales order number, order id
-order_date 
-delivery_no: delivery number, delivery id
-delivery_date: delivery date, shipping date
-invoice_no: alias invoice number, invoice id, receipt number, receipt id etc. which can be used to locate a invoice.
-invoice_date: invoice date, purchase date
-store_name
-store_location
etc.
    
For those fields not in list must follow the Snakecase name conversation like product_name, no space allow. 

Output expected in JSON format as below: 
{\"sentiment\":\"{{Positive/Neutral/Negative}}\",\"title\":\"{{The generated title based on the input text less than 100 characters}}\",\"summary\":\"{{The generated summary based on the input text less than 300 characters}}\",\"entities\":[{\"field\":\"{{the extracted fields such as product_name listed above}}\",\"value\":\"{{the extracted value of the field}}\"}]}
'''

user_msg = r'''
Input text: 
Everything was working fine one day I went to make a shot of coffee it stopped brewing after 3 seconds Then I tried the milk frother it stopped after 3 seconds again I took it back they fixed it under warranty but it’s happening again I don’t see this machine lasting more then 2 years to be honest I’m spewing I actually really like the machine It’s almost like it’s losing pressure somewhere, they wouldn’t tell my what the problem was when they fixed it.. Purchased at Harvey Norman for $1,349. 
Product is used: Several times a week
 
JSON:
'''

json_data = { 
  "model": model,
  "response_format": {"type": "json_object"}, #JSON mode
  "messages": [
            {
                "role": "system",
                "content": sys_msg
            },
            {
                "role": "user",
                "content": user_msg
            }
        ]
}

response = requests.post(url=openai_chat_api_endpoint, headers=headers, json=json_data)
print('Result:', response.text)

Result: {"id":"chatcmpl-808","object":"chat.completion","created":1711428069,"model":"mistral:7b-instruct-q5_K_M","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"{\"sentiment\":\"Negative\",\"title\":\"Machine malfunctioning repeatedly\",\"summary\":\"Customer reports that their coffee machine stopped working after 3 seconds and milk frother also stopped working. They took it back under warranty but the problem persists. The customer is unhappy with the machine and feels it may not last more than 2 years.\", \"entities\":[{\"field\":\"product_name\",\"value\":\"coffee machine\"}, {\"field\":\"order_no\",\"value\":\"purchased at Harvey Norman for $1,349\"}, {\"field\":\"product_use\",\"value\":\"Several times a week\"}]}"},"finish_reason":"stop"}],"usage":{"prompt_tokens":526,"completion_tokens":128,"total_tokens":654}}

