### Introduction
In this notebook, we will test out the function calling with Meta's [llama 3.1](https://ollama.com/library/llama3.1) or Mistral's [mistral v0.3](https://ollama.com/library/mistral) served with [Ollama](https://ollama.com/) in SAP AI Core.  <br/><br/>

Please refer to this [blog post](https://community.sap.com/t5/artificial-intelligence-and-machine-learning-blogs/from-unstructured-input-to-structured-output-llm-meets-sap/ba-p/13772506) about custom function calling with open-source models through Ollama in SAP AI Core for more details.

### Prerequisites
Before running this notebook, please assure you have performed the [Prerequisites](../../README.md) and [01-deployment.ipynb](01-deployment.ipynb). As a result, a deployment of Ollama scenario is running in SAP AI Core. <br/><br/>

If the configuration and deployment are created through SAP AI Launchpad, please manually update the configuration_id and deployment_id in [env.json](env.json)
```json
{
    "configuration_id": "<YOUR_CONFIGURATION_ID_OF_OLLAMA_SCENARIO>",
    "deployment_id": "<YOUR_DEPLOYMENT_ID_BASED_ON_CONFIG_ABOVE>"
}
```
 
### The high-level flow:
- Load configurations info
- Connect to SAP AI Core via SDK
- Check the status and logs of the deployment
- Pull model from ollama model repository through API
- Inference the model with OpenAI-compatible chat completion API for function call


#### 1.Load config info 
- resource_group loaded from [config.json](../config.json)
- deployment_id(created in 01-deployment.ipynb) loaded [env.json](env.json)

In [27]:
import requests, json
from ai_api_client_sdk.ai_api_v2_client import AIAPIV2Client

In [28]:
# Please replace the configurations below.
# config_id: The target configuration to create the deployment. Please create the configuration first.
with open("../config.json") as f:
    config = json.load(f)

with open("./env.json") as f:
    env = json.load(f)

deployment_id = env["deployment_id"]
resource_group = config.get("resource_group", "default")
print("deployment id: ", deployment_id, " resource group: ", resource_group)

deployment id:  df240ccfb2d899b0  resource group:  oss-llm


#### 2.Initiate connection to SAP AI Core 

In [29]:
aic_sk = config["ai_core_service_key"]
base_url = aic_sk["serviceurls"]["AI_API_URL"] + "/v2/lm"
ai_api_client = AIAPIV2Client(
    base_url= base_url,
    auth_url=aic_sk["url"] + "/oauth/token",
    client_id=aic_sk['clientid'],
    client_secret=aic_sk['clientsecret'],
    resource_group=resource_group)


In [30]:
token = ai_api_client.rest_client.get_token()
headers = {
        "Authorization": token,
        'ai-resource-group': resource_group,
        "Content-Type": "application/json"}


#### 3.Check the deployment status 

In [31]:
# Check deployment status before inference request
deployment_url = f"{base_url}/deployments/{deployment_id}"
response = requests.get(url=deployment_url, headers=headers)
resp = response.json()    
status = resp['status']

deployment_log_url = f"{base_url}/deployments/{deployment_id}/logs"
if status == "RUNNING":
        print(f"Deployment-{deployment_id} is running. Ready for inference request")
else:
        print(f"Deployment-{deployment_id} status: {status}. Not yet ready for inference request")
        #retrieve deployment logs
        #{{apiurl}}/v2/lm/deployments/{{deploymentid}}/logs.

        response = requests.get(deployment_log_url, headers=headers)
        print('Deployment Logs:\n', response.text)


Deployment-df240ccfb2d899b0 is running. Ready for inference request


#### 4.Pull the model into Ollama 

In [32]:
# the model support function calling: llama3.1 and mistral
model = "llama3.1"
#model = "mistral"

deployment = ai_api_client.deployment.get(deployment_id)
inference_base_url = f"{deployment.deployment_url}/v1"
openai_base_url = deployment.deployment_url

In [None]:
# pull the model from ollama model repository
endpoint = f"{inference_base_url}/api/pull"
print(endpoint)

#let's pull the mistral model from ollama
json_data = {  "name": model}

response = requests.post(endpoint, headers=headers, json=json_data)
print('Result:', response.text)

Next, let's list the model and check if the target model is listed. 

In [None]:
# Check the model list 
endpoint = f"{inference_base_url}/api/tags"
print(endpoint)

response = requests.get(endpoint, headers=headers)
print('Result:', response.text)

#### 5.Invoke custom Function Call with chat completion APIs

In [35]:
completion_api_endpoint = f"{inference_base_url}/api/generate"
chat_api_endpoint = f"{inference_base_url}/api/chat"
openai_chat_api_endpoint = f"{openai_base_url}/v1/chat/completions"

##### 5.1 Sample#1: Get Current Weather with chat completion API
Let's test it with a basic sample of function with [ollama's chat completion API](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion) for answering the question "What is the weather today in Melbourne, Australia?".
![Process flow of function calling for getting current weather](../../resources/20-function-call-flow-get-weather.png)

Step 1: Function Call to LLM with the initial question

In [None]:
question = "What is the weather today in Melbourne, Australia?"
json_data = {
  "model": model,
  "messages": [
    {
      "role": "user",
      "content": question
    }
  ],
  "stream": False,
  "format": "json", #enable JSON mode to assure valid json response for function call
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The location to get the weather for, e.g. San Francisco, CA"
            },
            "format": {
              "type": "string",
              "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",
              "enum": ["celsius", "fahrenheit"]
            }
          },
          "required": ["location", "format"]
        }
      }
    }
  ]
}

response = requests.post(url=chat_api_endpoint, headers=headers, json=json_data)
print('Result:', response.text)

# parse the json response to retrieve the location and format, to be passed to 3rd-party weather API
resp_json = response.json()
func_dict = resp_json['message']['tool_calls'][0]['function']
func = func_dict['name']
args_dict = func_dict['arguments']
location = args_dict['location']
format = args_dict['format']

print('Function:', func)
print('Location:', location)
print('Format:', format)

Step 2: Service Fulfilment to the 3rd-party weather API

In [None]:
# service fulfillment by 3rd-party API with given location and format... for example, let's assume the 3rd party API returns a json weather condition like this. we'll instruct llama3.1 to answer the question with this service response
def get_current_weather(location, format):
    # Your actual API call goes here...
    response = { "condition": "Rainy", "temp_h": 15, "temp_l": 7, "temp_unit": "C" }
    return response
service_resp = get_current_weather(location, format)
service_resp_str = json.dumps(service_resp)

Step 3: Generate answer to the original question with the service response as context

In [None]:
# answering the original question with the service response as context
user_msg = """
context: {}

Answer the question with context(weather API response in json) above including weather condition as emoji and temperatures range: {}?Be concise.
""".format(service_resp_str,question)

json_data = {
  "model": model,
  "messages": [
    {
      "role": "user",
      "content": user_msg
    }
  ],
  "stream": False
}

response = requests.post(url=chat_api_endpoint, headers=headers, json=json_data)
resp_json = response.json()
print('Final Response JSON:', resp_json)

##### 5.2 Sample#1: Get Current Weather with OpenAI-compatible Chat Completion API
Let's test it with a basic sample with [Ollama's OpenAI-compatible Chat Completion API](https://github.com/ollama/ollama/blob/main/docs/openai.md) of function for answering the question "What is the weather today in Melbourne, Australia?". The function call body here is the same as ollama's chat completion API.

Step 1: Function Call to LLM with the initial question

In [None]:
question = "What is the weather today in Melbourne, Australia?"

json_data = {
  "model": model,
  "stream": False, # Streaming mode is to be supported
  "messages": [
    {
      "role": "user",
      "content": question
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "parameters": {
          "type": "object",
          "title": "weather parameters",
          "properties": {
            "location": {
              "type": "string",
              "description": "The location to get the weather for, e.g. San Francisco, CA"
            },
            "format": {
              "type": "string",
              "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",
              "enum": ["celsius", "fahrenheit"]
            }
          },
          "required": ["location", "format"]
        }
      }
    }
  ],
  # as of 13 Aug, tool_choice not yet supported by ollama's openai-compatible api
  "tool_choice": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather"
      }
    }
  ]
}

response = requests.post(url=openai_chat_api_endpoint, headers=headers, json=json_data)
print('Result:', response.text)

# parse the json response to retrieve the location and format, to be passed to 3rd-party weather API
resp_json = response.json()
func_dict = resp_json['choices'][0]['message']['tool_calls'][0]['function']
func = func_dict['name']
args_dict = json.loads(func_dict['arguments']) #arguments are returned as a json string, so we need to parse it.
location = args_dict['location']
format = args_dict['format']

print('Function:', func)
print('Location:', location)
print('Format:', format)

Step 2: Service Fulfilment to the 3rd-party weather API

In [None]:
# service fulfillment by 3rd-party API with given location and format... for example, let's assume the 3rd party API returns a json weather condition like this. we'll instruct llama3.1 to answer the question with this service response
def get_current_weather2(location, format):
    # Your actual API call goes here...
    response = { "condition": "Rainy", "temp_h": 15, "temp_l": 7, "temp_unit": "C" }
    return response
service_resp = get_current_weather2(location, format)
service_resp_str = json.dumps(service_resp)

Step 3: Generate answer to the original question with the service response as context

In [None]:
# answering the original question with the service response as context
user_msg = """
context: {}

Answer the question with context(weather API response in json) above including weather condition as emoji and temperatures range: {}?Be concise.
""".format(service_resp_str,question)

json_data = {
  "model": model,
  "messages": [
    {
      "role": "user",
      "content": user_msg
    }
  ],
  "stream": False
}

response = requests.post(url=openai_chat_api_endpoint, headers=headers, json=json_data)
resp_json = response.json()
print('Final Response JSON:', resp_json)