## Multiple Connected Function Calling with llama.cpp
### Adapted from the Ollama Notebook

### Requirements

#### 1. Install llama.cpp
llama.cpp installation instructions per OS (macOS, Linux, Windows) can be found on [their website](https://llama-cpp-python.readthedocs.io/en/latest/). 

#### 2. Python llama.cpp Library

For that:

In [1]:
%pip install llama-cpp-python

Collecting llama-cpp-python
  Downloading llama_cpp_python-0.3.9.tar.gz (67.9 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.9 MB[0m [31m?[0m eta [36m-:--:--[0m  Downloading llama_cpp_python-0.3.9.tar.gz (67.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.9/67.9 MB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.9/67.9 MB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l  Installing build dependencies ... [?25l-done
[?25h  Getting requirements to build wheel ... [?25done
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Installing backend dependencies ... [?25l-done
[?25h  Preparing metadata (pyproject.toml) ... [?25done
[?25h  Preparing metadata (pyproject.toml) ... [?25l-done
[?25done
[?25hCollecting typing-extensions>=4.5.0

#### 3. Pull the model from HuggingFace

Download the GGUF NousHermes-2-Pro-Mistral-7B from HuggingFace (uploaded by adrienbrault) [here](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF). :

### Usage

#### 1. Define Tools

In [2]:
import random

def get_weather_forecast(location: str) -> dict[str, str]:
    """Retrieves the weather forecast for a given location"""
    # Mock values for test
    return {
        "location": location,
        "forecast": "sunny",
        "temperature": "25°C",
    }

def get_random_city() -> str:
    """Retrieves a random city from a list of cities"""
    cities = ["Groningen", "Enschede", "Amsterdam", "Istanbul", "Baghdad", "Rio de Janeiro", "Tokyo", "Kampala"]
    return random.choice(cities)

def get_random_number() -> int:
    """Retrieves a random number"""
    # Mock value for test
    return 31

#### 2. Define Function Caller

For this example in Jupyter format, I'm simply putting the functions in a list. In a python project, you can use the implementation here as an inspiration: https://github.com/AtakanTekparmak/ollama_langhcain_fn_calling/tree/main

In [3]:
import inspect

class FunctionCaller:
    """
    A class to call functions from tools.py.
    """

    def __init__(self):
        # Initialize the functions dictionary
        self.functions = {
            "get_weather_forecast": get_weather_forecast,
            "get_random_city": get_random_city,
            "get_random_number": get_random_number,
        }
        self.outputs = {}

    def create_functions_metadata(self) -> list[dict]:
        """Creates the functions metadata for the prompt. """
        def format_type(p_type: str) -> str:
            """Format the type of the parameter."""
            # If p_type begins with "<class", then it is a class type
            if p_type.startswith("<class"):
                # Get the class name from the type
                p_type = p_type.split("'")[1]
            
            return p_type
            
        functions_metadata = []
        i = 0
        for name, function in self.functions.items():
            i += 1
            descriptions = function.__doc__.split("\n")
            print(descriptions)
            functions_metadata.append({
                "name": name,
                "description": descriptions[0],
                "parameters": {
                    "properties": [ # Get the parameters for the function
                        {   
                            "name": param_name,
                            "type": format_type(str(param_type)),
                        }
                        # Remove the return type from the parameters
                        for param_name, param_type in function.__annotations__.items() if param_name != "return"
                    ],
                    
                    "required": [param_name for param_name in function.__annotations__ if param_name != "return"],
                } if function.__annotations__ else {},
                "returns": [
                    {
                        "name": name + "_output",
                        "type": {param_name: format_type(str(param_type)) for param_name, param_type in function.__annotations__.items() if param_name == "return"}["return"]
                    }
                ]
            })

        return functions_metadata

    def call_function(self, function):
        """
        Call the function from the given input.

        Args:
            function (dict): A dictionary containing the function details.
        """
    
        def check_if_input_is_output(input: dict) -> dict:
            """Check if the input is an output from a previous function."""
            for key, value in input.items():
                if value in self.outputs:
                    input[key] = self.outputs[value]
            return input

        # Get the function name from the function dictionary
        function_name = function["name"]
        
        # Get the function params from the function dictionary
        function_input = function["params"] if "params" in function else None
        function_input = check_if_input_is_output(function_input) if function_input else None
    
        # Call the function from tools.py with the given input
        # pass all the arguments to the function from the function_input
        output = self.functions[function_name](**function_input) if function_input else self.functions[function_name]()
        self.outputs[function["output"]] = output
        return output

    

#### 3. Setup The Function Caller and Prompt

In [4]:
# Initialize the FunctionCaller 
function_caller = FunctionCaller()

# Create the functions metadata
functions_metadata = function_caller.create_functions_metadata()

['Retrieves the weather forecast for a given location']
['Retrieves a random city from a list of cities']
['Retrieves a random number']


In [5]:
import json

# Create the system prompt
prompt_beginning = """
You are an AI assistant that can help the user with a variety of tasks. You have access to the following functions:

"""

system_prompt_end = """

When the user asks you a question, if you need to use functions, provide ONLY the function calls, and NOTHING ELSE, in the format:
<function_calls>    
[
    { "name": "function_name_1", "params": { "param_1": "value_1", "param_2": "value_2" }, "output": "The output variable name, to be possibly used as input for another function},
    { "name": "function_name_2", "params": { "param_3": "value_3", "param_4": "output_1"}, "output": "The output variable name, to be possibly used as input for another function"},
    ...
]
"""
system_prompt = prompt_beginning + f"<tools> {json.dumps(functions_metadata, indent=4)} </tools>" + system_prompt_end

#### 4. Load the model

In [None]:
import llama_cpp
# CPU-only: do not set n_gpu_layers, use_mlock, or flash_attn
model = llama_cpp.Llama(
    model_path='../models/DeepHermes-3-Llama-3-3B-Preview-q4.gguf',
    n_threads=4,  # Adjust for your CPU
    n_ctx=8192,
)

llama_model_loader: loaded meta data with 39 key-value pairs and 255 tensors from ../DeepHermes-3-Llama-3-3B-Preview-q4.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 3B
llama_model_loader: - kv   3:                       general.organization str              = Unsloth
llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
llama_model_loader: - kv   5:                         general.size_label str              = 3B
llama_model_loader: - kv   6:                            general.license str              = llama3
llama_model_loader: Dumping metadata keys/values. Not

#### Inference

In [8]:

# Compose the prompt 
user_query = "Whats the temperature in a random city?"

# Get the response from the model
model_name = 'adrienbrault/nous-hermes2pro:Q8_0'
messages = [
    {'role': 'system', 'content': system_prompt,
    },
    {'role': 'user', 'content': user_query}
]
response = model.create_chat_completion(messages=messages)
print(response)
# Get the function calls from the response


llama_perf_context_print:        load time =  167581.16 ms
llama_perf_context_print: prompt eval time =  167568.00 ms /   452 tokens (  370.73 ms per token,     2.70 tokens per second)
llama_perf_context_print: prompt eval time =  167568.00 ms /   452 tokens (  370.73 ms per token,     2.70 tokens per second)
llama_perf_context_print:        eval time =   73859.80 ms /    67 runs   ( 1102.39 ms per token,     0.91 tokens per second)
llama_perf_context_print:       total time =  242219.95 ms /   519 tokens
llama_perf_context_print:        eval time =   73859.80 ms /    67 runs   ( 1102.39 ms per token,     0.91 tokens per second)
llama_perf_context_print:       total time =  242219.95 ms /   519 tokens


{'id': 'chatcmpl-41142eea-9c72-49c2-817c-0b07ffc2e719', 'object': 'chat.completion', 'created': 1747328501, 'model': '../DeepHermes-3-Llama-3-3B-Preview-q4.gguf', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': '<function_calls>\n[\n  {\n    "name": "get_random_city",\n    "output": "random_city"\n  },\n  {\n    "name": "get_weather_forecast",\n    "params": {\n      "location": "random_city"\n    },\n    "output": "temperature"\n  }\n]\n</function_calls>'}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 452, 'completion_tokens': 67, 'total_tokens': 519}}


In [9]:
response['choices'][0]['message']['content']

'<function_calls>\n[\n  {\n    "name": "get_random_city",\n    "output": "random_city"\n  },\n  {\n    "name": "get_weather_forecast",\n    "params": {\n      "location": "random_city"\n    },\n    "output": "temperature"\n  }\n]\n</function_calls>'

In [19]:
function_calls = response['choices'][0]['message']['content']
# If it ends with a <function_calls>, get everything before it
if function_calls.startswith("<function_calls>"):
    function_calls = function_calls.split('</function_calls>')[0].split('<function_calls>')[1]


# Read function calls as json
try:
    function_calls_json: list[dict[str, str]] = json.loads(function_calls)
except json.JSONDecodeError:
    function_calls_json = []
    print ("Model response not in desired JSON format")
finally:
    print("Function calls:")
    print(function_calls_json)

Function calls:
[{'name': 'get_random_city', 'output': 'random_city'}, {'name': 'get_weather_forecast', 'params': {'location': 'random_city'}, 'output': 'temperature'}]


In [20]:
# add <tool_call> to the function calls as mentioned in the chat template in Hugging Face
function_message = '<tool_call>' + str(function_calls_json) + '</tool_call>'

messages.append({'role': 'assistant', 'content': function_message})

In [21]:
for function in function_calls_json:
    output = f"Tool Response: {function_caller.call_function(function)}"
    print(output)

Tool Response: Tokyo
Tool Response: {'location': 'Tokyo', 'forecast': 'sunny', 'temperature': '25°C'}


In [22]:
# Call the functions
output = ""
for function in function_calls_json:
    output = f"{function_caller.call_function(function)}"

#Append the tool response to the messages with the chat format
tool_output = '<tool_response> ' + output + ' </tool_response>'
messages.append({'role': 'tool', 'content': tool_output})


In [23]:
messages

[{'role': 'system',
  'content': '\nYou are an AI assistant that can help the user with a variety of tasks. You have access to the following functions:\n\n<tools> [\n    {\n        "name": "get_weather_forecast",\n        "description": "Retrieves the weather forecast for a given location",\n        "parameters": {\n            "properties": [\n                {\n                    "name": "location",\n                    "type": "str"\n                }\n            ],\n            "required": [\n                "location"\n            ]\n        },\n        "returns": [\n            {\n                "name": "get_weather_forecast_output",\n                "type": "dict[str, str]"\n            }\n        ]\n    },\n    {\n        "name": "get_random_city",\n        "description": "Retrieves a random city from a list of cities",\n        "parameters": {\n            "properties": [],\n            "required": []\n        },\n        "returns": [\n            {\n                "name":

#### Inference the model again with the tool respones

In [24]:
response=model.create_chat_completion(messages=messages,temperature=0)
response

Llama.generate: 453 prefix-match hit, remaining 86 prompt tokens to eval
llama_perf_context_print:        load time =  167581.16 ms
llama_perf_context_print: prompt eval time =   43686.95 ms /    86 tokens (  507.99 ms per token,     1.97 tokens per second)
llama_perf_context_print:        eval time =    7038.22 ms /     9 runs   (  782.02 ms per token,     1.28 tokens per second)
llama_perf_context_print:       total time =   51039.19 ms /    95 tokens


{'id': 'chatcmpl-cafd517b-889f-41d4-a25a-7156f7d65dd0',
 'object': 'chat.completion',
 'created': 1747330131,
 'model': '../DeepHermes-3-Llama-3-3B-Preview-q4.gguf',
 'choices': [{'index': 0,
   'message': {'role': 'assistant',
    'content': 'The temperature in Tokyo is 25°C.'},
   'logprobs': None,
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 539, 'completion_tokens': 9, 'total_tokens': 548}}