Integrate Functionary v2.5 + Refactor Functionary Code #1509

jeffrey-fong · 2024-06-05T15:36:29Z

Implemented abstract base classes for FunctionaryChatHandler based on version for better maintenance.
Changed chat_format to functionary instead of functionary-v1, functionary-v2, functionary-v2.5. I changed the old chat_format for the old handler to functionary-old. Hope it is not breaking any code at your side. Anyway, can we remove the old handler function already?
Fully integrate Functionary v2.5 for both streaming and non-streaming.
Added code_interpreter feature for v2 and v2.5. Auto activates when user passes in {"type": "code_interpreter"} in tools. Returns a tool call to a tool called python with the argument being the generated code.
Functionary v1 also supports streaming now.

allenporter · 2024-07-13T21:06:15Z

I have tested this PR and it seems to be working just fine.

I used this models.json:

{
    "host": "0.0.0.0",
    "port": 8080,
    "models": [
      {
        "model": "functionary-small-v2.5.Q4_0.gguf",
        "model_alias": "functionary-small-v2.5-auto-tokenizer",
        "hf_model_repo_id": "meetkai/functionary-small-v2.5-GGUF",
        "hf_pretrained_model_name_or_path": "meetkai/functionary-small-v2.5-GGUF",
        "chat_format": "functionary",
        "n_gpu_layers": 32,
        "offload_kqv": true,
        "n_ctx": 8192,
        "use_mlock": false
      }
    ]
  }

Start server with CONFIG_FILE=models.json python3 -m llama_cpp.server then calling with an openai client library, something like this:

BASE_URL = "http://0.0.0.0:8080/v1"
API_KEY = "sk-xxx"
MODEL_ID = "functionary-small-v2.5-auto-tokenizer"

client = OpenAI(base_url=BASE_URL, api_key=API_KEY)

messages = [
    {"role": "user", "content": "what's the weather like in Hanoi?"}
]
tools = [ # For functionary-7b-v2 we use "tools"; for functionary-7b-v1.4 we use "functions" = [{"name": "get_current_weather", "description":..., "parameters": ....}]
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g., San Francisco, CA",
                    }
                },
                "required": ["location"]
            }
        }
    }
]

chat_response = client.chat.completions.create(
    model=model,
    messages=messages,
    tools=tools,
)
print(yaml.dump(chat_response.model_dump(), sort_keys=False, explicit_start=True))

Gives the following response:

---
id: chatcmpl-0a698bb0-d38a-4a77-b2e6-2226e09f71c9
choices:
- finish_reason: tool_calls
  index: 0
  logprobs: null
  message:
    content: null
    role: assistant
    function_call: null
    tool_calls:
    - id: call_ePVfb4Gg0aNbCV41cKc9tkAi
      function:
        arguments: '{"location": "Hanoi"}'
        name: get_current_weather
      type: function
created: 1720904370
model: functionary-small-v2.5-auto-tokenizer
object: chat.completion
service_tier: null
system_fingerprint: null
usage:
  completion_tokens: 12
  prompt_tokens: 126
  total_tokens: 133

allenporter

I'm not super familiar with the code base, but motivated to use this feature so thought I would help with review. Great work and thank you for contributing this as I am very eager to have a working function calling (I have trouble with earlier versions so hoping this version is more robust)

allenporter · 2024-07-13T23:22:14Z

llama_cpp/llama_chat_format.py

@@ -1328,7 +1329,7 @@ def format_gemma(
 # Tricky chat formats that require custom chat handlers


-@register_chat_completion_handler("functionary")
+@register_chat_completion_handler("functionary-old")


Which models can still use this version?

maybe functionary-legacy is a better name ?

allenporter · 2024-07-13T23:24:11Z

llama_cpp/llama_chat_format.py

+        completion = cast(
+            llama_types.Completion, self.llama.create_completion(


Could this be written as

return self.llama.create_completion(

given this function itself has no mypy typing return value? my impression is that this isn't really doing anything.

If the idea is to have the typing be correct then consider implementing typing for the return value.

allenporter · 2024-07-13T23:29:34Z

llama_cpp/llama_chat_format.py

+
+    def get_grammar(self, function_name: str):
+        function_body = None
+        for function in self.functions or []:


My impression is the function vs tools was something used in older versions. Perhaps only support the newer APIs for streaming rather than both? Or do the conversion between formats once.

Edit: Swapped tools vs functions

allenporter · 2024-07-13T23:40:52Z

llama_cpp/llama_chat_format.py

@@ -1858,11 +2575,12 @@ def prepare_messages_for_inference(
                        ),
                    )
                )
-
+        if tools is not None and tool_choice != "none" and any([tool["type"] == "code_interpreter" for tool in tools]):


The code_interpreter seems like a great feature, though perhaps it would be faster to review this PR if this is added in a follow up PR.

allenporter · 2024-07-13T23:43:40Z

llama_cpp/llama_chat_format.py

+        )
+        for chunk in completion:
+            if chunk["choices"][0]["text"]:
+                delta_text = chunk["choices"][0]["text"].strip()


This is overwritten each iteration of the loop. Is that intentional? Same in v2 i think. v25 seems to append them all into a list. Given max_tokens=1 in all of them, perhaps its not worth it to have the list -- want to just return either None or a token always? That could reduce complexity in generate_streaming since it has a big loop to handle this including a double break with to_break etc.

allenporter · 2024-07-14T00:40:10Z

llama_cpp/llama_chat_format.py

+                    generator = self.generate_streaming_tool_call(prompt=prompt)
+                    tool_id = "".join([random.choice(string.ascii_letters + string.digits) for _ in range(24)])
+
+                for response, finish_reason, logprobs, chunk, prompt in generator:


From what I read, finish_reason and logprobs are always None and so they can be removed (set explicitly in yield_response)

allenporter · 2024-07-14T00:50:36Z

llama_cpp/llama_chat_format.py

+        prompt += completion_text + "\n"
+        # Generate function args
+        stops = [self.tool_call_token, self.stop_token]
+        grammar = self.get_grammar(function_name) if function_name != "python" else None


This appears different than the logic in prepare_for_generation_with_tool_func_choice -- is it necessary to diverge?

allenporter · 2024-07-14T00:59:26Z

llama_cpp/llama_chat_format.py

+        content = ""
+        calls = []
+        completion_tokens = 0
+        # If tool_choice/function_call is provided


Consider pushing this logic inside of generate_tool_call to have some extra arguments for an already chosen tool call, given the second half of the function is identical and this is repeated a few times. (prepare_for_generation_with_tool_func_choice can essentially be a separate branch in that). While it would make that function more complex, it may reduce overall the total number of things happening by collapsing a few cases together.

I think this can also simplify the streaming state management too.

allenporter · 2024-07-14T01:16:20Z

llama_cpp/llama_chat_format.py

+            )
+
+        chunk_id, chunk_created = None, None
+        delta = {"role": None, "content": None, "function_call": None, "tool_calls": None}


Consider simplifying how delta is managed given that it is mutable and a lot of care needs to be taken to make sure that all other fields are cleared. It might be simpler to fully create the delta each time, then fill in the missing fields inside of yield_response when creating the response. This also simplifies the final response.

allenporter · 2024-07-14T01:18:42Z

llama_cpp/llama_chat_format.py

+                ],
+            )
+
+        chunk_id, chunk_created = None, None


Given these are only used in the If "auto" case, push the scope of the variables inside that if statement.

allenporter · 2024-07-14T01:35:43Z

Also I realize some of my comments are on existing code which i realize the intent was not to change to make review easier -- but given how much is changing already it may be worth simplifying where possible.

allenporter · 2024-07-15T04:10:51Z

llama_cpp/llama_chat_format.py

+        completion_tokens += completion["usage"]["completion_tokens"]
+        prompt += completion_text
+        content_prefix = "<|start_header_id|>assistant<|end_header_id|>\n\n"
+        content += prompt[prompt.rindex(content_prefix) + len(content_prefix):] + completion_text


When testing this out I see responses like The smart curtains have been closed. smart curtains have been closed..

I am not sure, but from what I can see here this first appends the completion_text to the prompt, then gets the tail of the prompt and adds completion_text again to the content. Does this mean completion_text was added twice?

Applying this fix seems to address the issue:

Suggested change

content += prompt[prompt.rindex(content_prefix) + len(content_prefix):] + completion_text

content += prompt[prompt.rindex(content_prefix) + len(content_prefix):]

allenporter · 2024-07-15T04:35:07Z

llama_cpp/llama_chat_format.py

+        stops = [self.tool_call_token, self.stop_token]
+        completion = self.create_completion(prompt=prompt, stop=stops, grammar=None)
+        for chunk in completion:
+            delta_text = chunk["choices"][0]["text"]


Does generate_streaming_content for v2.5 also need to strip the content_prefix assistant message?

axel7083

Amazing job 🚀 ! Tested with with the Instructor library, just encountered a problem with passing the function result to the server.

When using the example from the Functions.ipynb I am getting the following error on the server side.

Exception: 'dict object' has no attribute 'name'
Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/errors.py", line 171, in custom_route_handler
    response = await original_route_handler(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/app.py", line 483, in create_chat_completion
    ] = await run_in_threadpool(llama.create_chat_completion, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/llama.py", line 1748, in create_chat_completion
    return handler(
           ^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/llama_chat_format.py", line 2610, in functionary_new_chat_handler
    prompt = prepare_messages_for_inference(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/llama_chat_format.py", line 2598, in prepare_messages_for_inference
    tokenizer.hf_tokenizer.apply_chat_template(all_messages, add_generation_prompt=True, tokenize=False)
  File "/opt/app-root/lib64/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1812, in apply_chat_template
    rendered_chat = compiled_template.render(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/jinja2/environment.py", line 1304, in render
    self.environment.handle_exception()
  File "/opt/app-root/lib64/python3.11/site-packages/jinja2/environment.py", line 939, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 9, in top-level template code
jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'name'

Debugger

Running in debugger the function calling example, the first call works perfectly, I am getting the expected response with the right formatting.

However, the second call, where we add the tool response to the messages, got the error above.

Here is the messages content before sending it for the second call.

[
  {
    'role': 'user', '
    content': "what's the weather like in Hanoi?"
  }, 
  ChatCompletionMessage(
    content=None, 
    role='assistant', 
    function_call=None, 
    tool_calls=[
    ChatCompletionMessageToolCall(
      id='call_L4tj8aG7VYqlL1XCLzyHeH8z', 
      function=Function(
        arguments='{"location": "Hanoi"}', 
        name='get_current_weather'), 
        type='function'
      )
    ]
  ), 
  {
  'tool_call_id': 'call_L4tj8aG7VYqlL1XCLzyHeH8z', 
  'role': 'tool', 
  'name': 'get_current_weather', 
  'content': '{"location": "Hanoi", "temperature": "unknown"}'
  }
]

Reproduce

To reproduce without using the jupyter, here is the python

import json
import openai

BASE_URL = "http://0.0.0.0:8000/v1"
API_KEY = "sk-xxx"
MODEL_ID = "/models/functionary-small-v2.5.Q4_0.gguf"

def get_current_weather(location, unit="fahrenheit"):
    return json.dumps({"location": location, "temperature": "unknown"})

client = openai.OpenAI(base_url=BASE_URL, api_key=API_KEY)

messages = [
    {"role": "user", "content": "what's the weather like in Hanoi?"}
]
tools = [ # For functionary-7b-v2 we use "tools"; for functionary-7b-v1.4 we use "functions" = [{"name": "get_current_weather", "description":..., "parameters": ....}]
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g., San Francisco, CA",
                    }
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model=MODEL_ID,
    messages=messages,
    tools=tools,
)

response_message = response.choices[0].message
tool_calls = response_message.tool_calls

if not tool_calls:
    exit(1)

available_functions = {
            "get_current_weather": get_current_weather,
        }

messages.append(response_message)

for tool_call in tool_calls:
    function_name = tool_call.function.name
    function_to_call = available_functions[function_name]
    function_args = json.loads(tool_call.function.arguments)
    function_response = function_to_call(
        location=function_args.get("location"),
        unit=function_args.get("unit"),
    )
    messages.append(
        {
            "tool_call_id": tool_call.id,
            "role": "tool",
            "name": function_name,
            "content": function_response,
        }
    )  # extend conversation with function response
second_response = client.chat.completions.create(
    model=MODEL_ID,
    messages=messages,
)

Edit

After some investigation, the name property is filtered out because it is not on the ChatCompletionRequestToolMessage class. It must be added to fix the problem.

llama-cpp-python/llama_cpp/llama_types.py

Lines 221 to 224 in f6ed21f

    
           class ChatCompletionRequestToolMessage(TypedDict): 
        
               role: Literal["tool"] 
        
               content: Optional[str] 
        
               tool_call_id: str

axel7083 · 2024-08-02T12:54:31Z

llama_cpp/llama_chat_format.py

@@ -1826,7 +2544,6 @@ def generate_schema_from_functions(functions, namespace="functions") -> str:
    def prepare_messages_for_inference(
        messages: List[llama_types.ChatCompletionRequestMessage],


For very weird reason, the messages do not contain the name property when the role is tool.

Here, in the example I provided #1509 (review) the messages is equal to

[ .... { "role":"tool", "content":"{\"location\": \"Hanoi, Vietnam\", \"temperature\": \"unknown\"}", "tool_call_id":"call_yQZk3sIQsIwGuco1lV4QVbTN" } ]

In the tool message we do not have the name which is required by the chat_format. leading to the error 'dict·object'·has·no·attribute·'name' raised by jinja.

See https://github.com/abetlen/llama-cpp-python/pull/1509#pullrequestreview-2215230004#Edit for the fix

jeffrey-fong added 4 commits June 5, 2024 13:35

set up abstract classes for non-streaming

b342f39

set up abstract classes for streaming

115079b

Merge branch 'abetlen:main' into functionary-v2.5

4672f2a

update README

7a3af05

jeffrey-fong mentioned this pull request Jun 8, 2024

Implement code interpreter feature for functionary #1433

Closed

allenporter suggested changes Jul 14, 2024

View reviewed changes

allenporter mentioned this pull request Jul 15, 2024

tool_call function "name" has incorrect format so function calling does not work for functionary #1560

Open

4 tasks

allenporter reviewed Jul 15, 2024

View reviewed changes

axel7083 reviewed Aug 2, 2024

View reviewed changes

This was referenced Aug 6, 2024

Add a recipe that works with function calling models containers/ai-lab-recipes#562

Closed

fix: llama_types & unit tests for functionary jeffrey-fong/llama-cpp-python#1

Open

axel7083 mentioned this pull request Aug 29, 2024

Allow python packages to contribute to LlamaChatCompletionHandlerRegistry #1715

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate Functionary v2.5 + Refactor Functionary Code #1509

Integrate Functionary v2.5 + Refactor Functionary Code #1509

jeffrey-fong commented Jun 5, 2024

allenporter commented Jul 13, 2024

allenporter left a comment

allenporter Jul 13, 2024

axel7083 Aug 2, 2024

allenporter Jul 13, 2024

allenporter Jul 13, 2024 •

edited

Loading

allenporter Jul 13, 2024

allenporter Jul 13, 2024

allenporter Jul 14, 2024

allenporter Jul 14, 2024

allenporter Jul 14, 2024

allenporter Jul 14, 2024

allenporter Jul 14, 2024

allenporter Jul 14, 2024

allenporter commented Jul 14, 2024

allenporter Jul 15, 2024

allenporter Jul 15, 2024

allenporter Jul 15, 2024

axel7083 left a comment •

edited

Loading

axel7083 Aug 2, 2024

axel7083 Aug 2, 2024

		completion = cast(
		llama_types.Completion, self.llama.create_completion(

	content += prompt[prompt.rindex(content_prefix) + len(content_prefix):] + completion_text
	content += prompt[prompt.rindex(content_prefix) + len(content_prefix):]

	class ChatCompletionRequestToolMessage(TypedDict):
	role: Literal["tool"]
	content: Optional[str]
	tool_call_id: str

		@@ -1826,7 +2544,6 @@ def generate_schema_from_functions(functions, namespace="functions") -> str:
		def prepare_messages_for_inference(
		messages: List[llama_types.ChatCompletionRequestMessage],

Integrate Functionary v2.5 + Refactor Functionary Code #1509

Are you sure you want to change the base?

Integrate Functionary v2.5 + Refactor Functionary Code #1509

Conversation

jeffrey-fong commented Jun 5, 2024

allenporter commented Jul 13, 2024

allenporter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allenporter Jul 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allenporter commented Jul 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

axel7083 left a comment • edited Loading

Choose a reason for hiding this comment

Debugger

Reproduce

Edit

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allenporter Jul 13, 2024 •

edited

Loading

axel7083 left a comment •

edited

Loading